Computer Audio: Musical Applications of Digital Signal Processing

UCI

Digital Audio by Christopher Dobrian

Textbook (Dodge and Jerse), chapters 2, 3, 4, 5 (pp. 115-139), 6, 7 (section 7.2), 10, and 12 (pp. 402-416).

The MSP manual and tutorials contained in the document *MSP2.pdf**.*

Possible supplementary readings include Road's *Computer Music Tutorial* and Strawn's *Digital Audio Signal Processing*, both of which are on reserve in the Arts Media Center.

See also the glossary in the textbook, and the web sites below.

A MIDI Tutorial - MIDI Messages

Tutorial on MIDI and Music Synthesis by Jim Heckroth, sections 1-8

Why do we listen to sound in stereo?

Because we have two ears. The listener's sense of the location of recorded sound is greatly enhanced by providing a slightly different audio signal to each ear, either by headphones or speakers.

What is interauraral intensity difference (IID)?

It refers to the fact that sounds are slightly louder in one ear than in the other, due to a difference in distance of the two ears from the sound source, but probably even more significantly due to the obstructive effect of the head when the sound is off the central axis from the listener's head.

What is interaural time difference (ITD)?

It refers to the fact that, if the sound is off the central axis of the head, a sound reaches one ear slightly later than the other ear, due to the difference in distance of the two ears from the sound source.

What is the speed of sound (in air, at sea level)?

Approximately 345 meters/second.

What is the range of time delays that are significant for simulating ITD?

0.02 ms to 0.6 ms.

How you might you use delay between left and right channels (in Csound) to evoke a sense of location?

Use the intended virtual location to calculate the difference in distance to the two ears. Use those distances and the speed of sound to calculate the ITD. Use the "delay" unit generator (or the "delayr/delayw" pair) to delay the signal; send the undelayed signal to one channel, and the delayed signal (delayed by the ITD) to the other channel.

How does amplitude vary with the distance of a sound source?

A is propotional to 1/D.

In addition to amplitude, what other factor is our primary cue for distance?

The ratio of direct sound to reflected (reverberated) sound. When sounds are closer, the ratio of direct/reflect sound is higher.

What is the main factor that permits us to locate sounds situated behind us?

The direction-dependent filtering effect of the pinnae.

What are head-related transfer functions (HRTFs)?

HRTFs are a set of impulse responses (filtering functions) that correspond to the filtering effects of the pinnae from different locations. Applying the HRTF for a particular location to a sound halps make the sound seem to come from that location.

How are HRTFs measured?

Impulses are recorded from a variety of locations equidistant from microphones placed in the "ears" of a dummy head. The resulting recordings are used as impulse responses for the HRTFs.

How would you implement panning in Max/MSP to simulate a sound moving from one side to the other?

The simplest way is to use a "line~" object to ramp the amplitude of one speaker from 0 to 1 over the course of the sound; use 1 minus that value to control the amplitude ramp of the other speaker.

What is the difference between "linear" panning and "equal power" panning?

Linear panning uses the above-described method of panning, such that the over-all amplitude control on one channel is 1 minus the amplitude control imposed on the other channel. Equal-power panning uses the same method, but uses the square root of the master amplitude control value for each channel, rather than the value itself. This is comparable to panning along an arc of constant distance from the listener (by maintaining equal intensity at every point), and avoids the "hole-in-the-middle" effect sometimes caused by linear panning.

What is the "Doppler effect"?

As a sound source moves toward you its perceived frequency is shifted upward by an amount determined by its velocity (relative to you) because it is, to some extent, outrunning its own sound waves. As it moves away from you, it's relative velocity is negative) so the perceived frequency is actually lowered. Thus, as a sound moves rapidly past you, the "Doppler shift" occurs (a rather sudden downward frequency shift as the relative velocity of the source goes from positive to negative). The perceived frequency can be calculated as fc/(c-v), where f is the frequency of the source sound, v is the source's velocity relative to the listener, and c is the speed of sound. Another way to calculate the Doppler effect is to delay the sound by an amount that corresponds to its virtual distance from the listener (delay equals the distance divided by the speed of sound). This requires continually calculating the source's virtual distance as it moves, and applying a continual time-variable delay to the sound.

As a sound is moving toward you, does its frequency vary upward or downward as its velocity relative to you increases?

Upward.

What is a low-pass filter?

A low-pass filter attenuates high frequencies while allowing lower frequencies to pass through more-or-less unaltered.

What is a high-pass filter?

A high-pass filter attenuates low frequencies while allowing higher frequencies to pass through more-or-less unaltered.

What is the technical definition of the "cutoff frequency"?

The cut-off frequency (of a high-pass or low-pass filter) is technically defined as the frequency at which the output amplitude is attenuated 3 decibels relative to the input signal. In practice, however, in the case of a resonant low-pass filter, the term "cut-off" frequency is often used in the same way as the term "center frequency" is used for band-pass filters (see below).

What is a band-pass filter?

A band pass filter attenuates all frequencies except those that lie in a certain "passband" (range). Often a band-pass filter not only attenuates frequencies outside the specified range, but also boosts (resonates) the frequencies that lie within the passband. This is called a "resonant band-pass filter".

What is the technical definition of the "bandwidth"?

The bandwidth is the difference between the highest frequency of the passband and the lowest frequency of the passband. The upper and lower cut-off frequncies are defined as the frequencies at which the output amplitude is attenuated 3 decibels relative to the input signal. For example, if the amplitude drops off around 2000 Hz such that at 1800 and 2200 Hz the output amplitude is 3dB less than it was in the input signal, the bandwidth is 2200 - 1800 = 400 Hz.

What is the "center frequency" of a band-pass filter?

The center frequency is the frequency that lies in the center of the passband of a band-pass filter (2000 Hz in the above example). In the case of a resonant band-pass filter, it is usually the frequency of greatest resonance.

What is Q?

For a band-pass filter, the Q is the cutoff frequency divided by the bandwidth.

How is Q calculated for a low-pass filter?

An IIR lowpass filter (see definition below) can have a variable steepness of cutoff (usually combined with resonance near the cutoff frequency. For such filters, the Q is usually calculated as if it were a band-pass filter, in which the center frequency is the frequency of greatest resonance and the "bandwidth" is equal to twice the difference between the cutoff frequency and the frequency of greatest resonance.

What is the audible effect of a low-pass filter?

As the high frequencies are attenuated, the sound becomes less "bright" and more "muffled".

What is the audible effect of increasing the Q?

A resonant low-pass filter or band-pass filter will have more and more spectral emphasis around the resonant frequency as the Q is increased. The Q can be increased (i.e., the bandwidth can be sufficiently narrowed) to the point where the passband is heard as a single pitch (or narrow pitch region).

What is the primary means of digitally filtering sounds?

Most digital filters operate on the principle of frequency-dependent interference that occurs when a signal is combined (in varying proportions) with one or more delayed versions of itself.

What is the general equation for filtering using scaled, time-delayed versions of a digital signal?

The output signal is equal to the scaled input signal, plus independently scaled and delayed copies of itself, minus independently scaled and delayed copies of previous outputs:

y(n) = a0x(n) + a1x(n-1) + ... - b1y(n-1) - b2y(n-2) ...

What is meant by a "finite impulse response" (FIR) filter (also known as a "feedforward" filter)?

I finite impulse response means that the filter uses a finite number of delayed copies of the input, and no copies of previous outputs. The "impulse response" is the signal created by the coefficients of each term of the input half of the filter equation. So, since the nonzero coefficients eventually come to an end in such a filter, the impusle response is finite, and when the input ceases the output the sound will go away, extended only by the duration of the impulse response.

By deduction, then, what is an "infinite impulse response" (IIR) filter (also known as a "feedback" filter)?

An IIR filter feeds some copies of previous outputs back into the input of the filter (i.e., it has terms from the second half of the filter equation). As a result the output signal could potentially continue infinitely if the coefficients of the feedback copies are not small enough. (They must certainly have a magnitude less than 1, and must be even smaller if multiple feedback copies are being used.)

What is a comb filter?

A comb filter has regularly-spaced (i.e. harmonic) frequencies of emphasis (separated by regularly-spaced frequencies of attenuation), such that the spectral plot of the amplitude response looks like a comb.

How is a comb-filtering effect achieved?

An FIR comb filter is achieved by combining the input signal with a scaled and delayed copy of itself. An IIR comb filter is achieved by combining the input with a scaled and delayed copy of a prior output.

How can one calculate the fundamental frequency of a comb filter?

The funamental frequency is the inverse of the delay time. For example, if the comb filter uses a 1 ms delay, the filter will have amplitude peaks every 1000 Hz.

What is "white noise"?

White noise is sound that has a random (and thus unpredictable but over-all uniform) distribution of amplitudes in the time domain, and of frequencies (between 0Hz and the Nyquist frequency) in the spectral domain.

How is white noise generated in a computer?

By choosing discrete sample values at random.

What is meant by the terms "time domain" and "frequency domain", when viewing discrete audio signals?

The word "domain" refers to the x axis of the graph. A signal viewed in the time domain views amplitude as a function of time; a signal viewed in the frequency domain views amplitude as a function of the different frequencies.

Given that sound occurs only in passing time, what is actually depicted in a two-dimensional plot (amplitude over frequency) of a sound spectrum for a single "instant"?

A spectral plot shows the spectrum of a theoretical sound, or a single cycle of a periodic wave.

What is the "Fourier theorem"?

The Fourier theorem states that any periodic signal can be mathematically represented as the sum of harmonically-related sinusoids of independent amplitude and phase.

What is "Fourier analysis"?

Fourier analysis is using the Fourier theorem to discover the precise frequency content (spectrum) of a sound.

What is the "discrete Fourier transform"?

The discrete Fourier transform (DFT) is a mathematical process by which a discrete time-domain signal can be converted into a corresponding discrete frequency-domain signal.

How does the number of discrete samples used in a Fourier transform affect the number of frequency "bins" in the resulting spectrum?

The number of discrete samples in each DFT determines the number of frequency ranges ("bins") that will be represented (between 0Hz and the sampling frequency) in the frequency-domain representation of that signal.

What are the advantages and disadvantages of increasing the number of discrete samples in a Fourier transform?

As the number of samples in a DFT increases, the precision of frequency resolution in the spectral plot increases. However, since a greater number of samples covers a greater period of time, the DFT will necessarily represent the average amplitude of each frequency over that period, so a DFT of a longer period might cause some "smearing" of the amplitude since a single value in the spectral plot is used to represent a (possibly varying) amplitude of each frequency range in the time-domain signal. For example, with a sampling rate of 32,768 Hz, an FFT of 128 samples covers a time period of 128/32,768=0.0039 seconds (4 ms) and yields frequency bins that cover a range of 32,768/128=256 Hz each. A DFT of 2048 samples, by contrast, would yield frequency bins of only 16 Hz (good frequency resolution) but would cover a time period of 0.0625 seconds (but a lot of variety can happen in a sound in 1/16 of a second).

How can the Fourier transform of a sound be used for resynthesis of new sounds?

A spectral plot can be re-converted into a time-domain signal using the mathematical process of the inverse DFT (IDFT). A spectral plot can be modified in many ways, however. For example, amplitudes of particular frequencies can be increased or decreased (spectral filtering), the entire plot can be multiplied by the spectrum of another signal (cross-synthesis), entire plot can be shifted up or down in frequency (frequency-shifting), the plot can be used for resynthesis of a longer or shorter time period than the original (time compression and expansion), one can perform interpoltaions between spectral plots over a certain period of time (spectral interpolation), etc.(

What is convolution?

Convolution is the process of taking one finite digital signal, scaling and delaying that signal by the amplitude and sample number of each discrete sample of another finite digital signal, and adding all those delayed-scaled copies together. It is a commutative process; i.e., convolving signal A with signal B give the same result as convolving signal B with signal A.

What is the relationship between convolution in the time domain and multiplication in the frequency domain?

Convolution in one domain is exactly equivalent to multiplication in the other domain. Thus filtering is really convolution of an audio signal and an impulse response in the time domain, which would be equivalent to multiplying the DFT of the signal and the DFT of the impulse repsonse in the frequency domain.

How are multi-tracking, mixing, and filtering used to enhance clarity in a complex recorded texture?

If individual instruments, voices, and sounds are each recorded on a separate track, then each track can be independently adjusted for amplitude (to get the desired balance, or to highlight certain sounds), independently filtered (to give each sound a distinctive region of the spectrum), and processed (to give each sound its own unique, distinctive character).

What are some of the uses of delay in audio processing for commercial music recordings?

Delay(s) can be used to add a comb filtering effect, a "slapback" blurring of the attack, discrete echo effects, reverberation, call-and response effects, chorusing, flanging, etc.

What is "flanging"?

Time-varying delay. The delay time is modulated by some control source, such as a low-frequency oscillator, creating subtle (or not-so-subtle) Doppler effects. When a flanged sound is mixed with the unflanged original, a time-varying interference (modulated comb filtering) occurs.

What is "chorusing"?

The "chorus effect" is achieved by adding multiple slightly detuned versions of a signal together, emulating the slightly-out-of-tune effect of multiple voices or instruments playing in near-unison. This is most simply achieved by adding a (very slightly) delayed copy of a signal to the original, with the delay being continually varied to change its tuning slightly from the original. Unlike flanging (which is usually controlled by a periodic low-frequency control source), chorusing is usually controlled by linearly interpolated random delay times (within a very small range) so that the original and delayed signals are qlways out of tune by an unprdeictably varying amount.

What is "gated reverb"?

Reverb is added to the sound, but instead of being allowed to decay naturally, the reverb is cut off quite suddenly. This allows reverb to be used to "enrich" or "fatten" a sound, without extending its duration much. It is particularly used for percussive sounds in commercial recording. In extreme cases, the effect is almost like a burst of filtered noise.

Posted March 12, 2002

Christopher Dobrian

dobrian@uci.edu