Biometrics 101 – Voice

Acoustic Processing of Speech

Commonly called signal analysis or feature extraction

The term features refers to the vector of numbers which represents one time slice of a speech signal

LPC Features

These are spectral features which means that they represent the waveform in terms of the distribution of different frequencies that make up the waveform

Sound Waves

The input of a speech recognizer is a complex series of changes of air pressure

These changes in air pressure originate with the speaker and are caused by the specific way that air passes through something called the glottis and out the oral or nasal cavities.

We represent sound waves by plotting the change in air pressure over time

One way of visualizing this is to imagine a graph plot of a vertical plate which is blocking the air pressure waves

The two important characteristics of a wave are its:

Frequency
Amplitude

The frequency is the number of times a second that a wave repeats itself/cycles

A high value on the vertical axis (a high amplitude) indicates that there is more air pressure at that point in time.
A zero value means that there is normal pressure
A negative value means there is lower than normal air pressure

The pitch of a sound is the perceptual correlate of frequency
The loudness of a sound is the perceptual correlate of the power, which is related to the square of the amplitude

Spectra

While some broad phonetic features can be interpreted from a waveform, more detailed classification requires a different representation of the input in terms of spectral features

Calculating Hz

a = Count the big patterns
b = Time Taken
c = Smaller Patterns
d = Smaller Patterns Again

a/b = answer1
answer1 * c = answer2
answer2 * d = answerFinal

Spectrum

Is a representation of these different frequency components of a wave

Fournier transform – A mathematical procedure which separates out each of the frequency components of a wave

Many speech applications use an LPC (Linear Predictive Coding) spectrum which makes it easier to see where the peaks are

X-Axis – Shows Frequency
Y-Axis – Shows Magnitude

Why is Spectrum Useful?

The use of spectral information is essential to both human and machine speech recognition

Spectrogram

A spectrogram is a way of envisioning how the different frequencies which make up a waveform change overtime

Why Do Different Vowels have Different Spectra?

The formants are caused by the resonant cavities of the mouth

The oral cavity can be thought of as a filter which selectively passes through some of the harmonics of the vocal cord vibrations.

Moving the tongue creates spaces of different size the mouth which selectively amplify waves of the appropriate wavelength, hence amplifying different frequency bands.

Feature Extraction

Process begins with the sound waves and ends with a feature vector

An input sound-wave is first digitized
The process of analogue-to-digital conversion has two steps:
- Sampling
- Quantization
A signal is sampled by measuring its amplitude at a particular time
The sampling rate is the number of samples taken per second
In order to measure a wave properly, must have a minimum of two samples for cycle. One that measures the positive and another for the negative
- More than two samples per cycle increases the amplitude accuracy, but less than two samples will cause the frequency of the wave to be missed.
- Therefore, max frequency wave that can be measured is one whose frequency is half that of the sample rate
- Maximum frequency for a given sampling rate is called the nyquist frequency

0xDMR