Acoustic Processing of Speech
Commonly called signal analysis or feature extraction
The term features refers to the vector of numbers which represents one time slice of a speech signal
LPC Features
These are spectral features which means that they represent the waveform in terms of the distribution of different frequencies that make up the waveform
Sound Waves
The input of a speech recognizer is a complex series of changes of air pressure
These changes in air pressure originate with the speaker and are caused by the specific way that air passes through something called the glottis and out the oral or nasal cavities.
We represent sound waves by plotting the change in air pressure over time
One way of visualizing this is to imagine a graph plot of a vertical plate which is blocking the air pressure waves
The two important characteristics of a wave are its:
- Frequency
- Amplitude
The frequency is the number of times a second that a wave repeats itself/cycles
- A high value on the vertical axis (a high
amplitude
) indicates that there is more air pressure at that point in time. - A zero value means that there is normal pressure
- A negative value means there is lower than normal air pressure
The pitch of a sound is the perceptual correlate of frequency
The loudness of a sound is the perceptual correlate of the power, which is related to the square of the amplitude
Spectra
While some broad phonetic features can be interpreted from a waveform, more detailed classification requires a different representation of the input in terms of spectral features
Calculating Hz
a = Count the big patterns
b = Time Taken
c = Smaller Patterns
d = Smaller Patterns Again
- a/b = answer1
- answer1 * c = answer2
- answer2 * d = answerFinal
Spectrum
Is a representation of these different frequency components of a wave
Fournier transform – A mathematical procedure which separates out each of the frequency components of a wave
Many speech applications use an LPC (Linear Predictive Coding) spectrum which makes it easier to see where the peaks are
X-Axis – Shows Frequency
Y-Axis – Shows Magnitude
Why is Spectrum Useful?
The use of spectral information is essential to both human and machine speech recognition
Spectrogram
A spectrogram is a way of envisioning how the different frequencies which make up a waveform change overtime
Why Do Different Vowels have Different Spectra?
The formants are caused by the resonant cavities of the mouth
The oral cavity can be thought of as a filter which selectively passes through some of the harmonics of the vocal cord vibrations.
Moving the tongue creates spaces of different size the mouth which selectively amplify waves of the appropriate wavelength, hence amplifying different frequency bands.
Feature Extraction
Process begins with the sound waves and ends with a feature vector
- An input sound-wave is first digitized
- The process of analogue-to-digital conversion has two steps:
- Sampling
- Quantization
- A signal is sampled by measuring its amplitude at a particular time
- The sampling rate is the number of samples taken per second
- In order to measure a wave properly, must have a minimum of two samples for cycle. One that measures the positive and another for the negative
- More than two samples per cycle increases the amplitude accuracy, but less than two samples will cause the frequency of the wave to be missed.
- Therefore, max frequency wave that can be measured is one whose frequency is half that of the sample rate
- Maximum frequency for a given sampling rate is called the nyquist frequency
Leave a Reply