Audio signals change drastically and quickly. A 100 Hz sine tone goes from (absolute value) peak to peak in 5 milliseconds. An attempt to detect its peak amplitude in a lesser time interval might incorrectly evaluate a non-peak values as a peak. What we really care about is the shape of the peaks over time, so a certain amount of latency is necessary to represent that shape properly. And, if we plan to use the envelope to shape other sounds, some smoothing of the envelope is necessary to minimize the audio-rate modulation it causes when applied to other signals.