Speed Quotient & other

10.3. The Speed Quotient

In acoustics the Speed Quotient reflects the asymmetry of the glottal pulse. In the EGG domain, the increase and decrease of the signal amplitude reflect the varying contact area of the vocal folds, but have almost nothing in common with the glottal airflow. However, the contact increase for normal voices during glottal closing is definitely steeper than the contact decrease during opening. This causes the EGG pulses to be strongly skewed to the left, in the opposite direction to the glottal airflow pulse. Comparing the changes in the contact area with those in the glottal area (obtained by PGG or high-speed filming) during the pitch period, it can be stated that the glottal area is more symmetrical during increase/decrease than its EGG counterpart (compare Childers & Krishnamurthy, 1985). The glottal airflow is also more skewed than the area function which is an effect of the coupling between the voice source and the vocal tract (Childers & Wong, 1994), as well as of the vocal tract striction (Bickley & Stevens, 1986). The definition of the Speed Quotient involves the peak glottal flow, which for obvious reasons is not important for the EGG signal. However, the durations of the opening and closing of the vocal folds may very well be compared. Again, it should not be expected that this results are similiar to the results of the airflow measurements, but they may be suitable for the characterization of phonation phenomena. The studies by Roach and Hardcastle (1979) and later of Esling (1984) demonstrate the usefulness of such a measure. In the first study the skewness is defined (in terms of Fig. 22) as:

In the second study only 80% of the signal peak amplitude are measured, from 10% above the baseline to 10% below the signal peak in order to compensate for the slight changes in electrical capacitance of the speakers' throat and for attenuation at the peaks (as shown in section 8). This leads to the following definition of skewness (in terms of Fig. 22)

Esling (1984) claims that skewness is highly dependent on the phonation type. He investigates modal, creaky, breathy, whispery, ventricular (produced with false folds) and falsetto voices. EGG signals are registered for all voice qualities for a sustained phonation of /a:/ without pitch control and for the utterance /bi:d/ on four different levels of pitch. For the calculation of the EGG skewness a minimum of 0.5 s of each recording was used. These computations were done manually. Averaged values for pitch and skewness were compared for the respective phonation types. Skewness exhibited the smallest values for creaky voices (11%) and increased continously from modal to ventricular to harsh voices. Skewness doubles ( to about 40%) for whispery and breathy voices. For falsetto it is 60%, showing an almost symmetrical pulse shape. Esling (ibid.) explains these patterns (by comparing the results to high-speed films of laryngeal structures) through changes in the anterior-posterior vocal fold stretching during pitch increase, and through the increase in the lateral stricture during changes in glottal openness. He also suggests that not only the skewness of the EGG pulses depends on phonation type, but also the rise and fall durations.

10.4. Other parametrizations of the EGG signal.

Various efforts has been made to find a way to represent the voice variation in the EGG domain. An extensive set of parameters was proposed by Houben et al. (1992). They describe the shape of the EGG in terms of the features of an average waveform. The EGG waveform was preconditioned (FFT bandpass filtering with a 55 to 4000 Hz band), and recordings of ca. 0.3 s were used for further analysis. The maximum of the differentiated EGG signal marks the start of a period, and the pulses are superimposed (using the start of the period just mentioned as a reference point) to obtain an averaged, typical pulse. During this process of averaging, the jitter, shimmer and shape (defined as the minimum of the squared amplitude differences within a period) factors are extracted. The shape factor is proportional to the "smoothness" of the waveform, thus, it is higher for a higher noise level in the signal. The description of the shape is also obtained from the averaged waveform. The crest factor (the ratio of the peak value and the root mean square value computed for the whole period) characterizes the peakedness of the Lx pulse. After the amplitude and duration normalization (i.e data is shrinked to the 0..1 interval) of the average pulse has been accomplished, the irregularities of the rising flanks are compared within the differentiated EGG signal by means of a surface comparison of the dips and the entire area. Additional parameters include the Close Quotient as well as the Speed Quotient. The measurements were conducted to establish statistical representations of repeatability and variability for a speaker's chest register. 7 normal and 12 pathological subjects (mostly woman) produced a sustained /a:/ in different settings of pitch/intensity. This reproduction experiment showed that at least 5 measurements per intensity/frequency pair are needed to get a reasonable estimation of the EGG parameters, whereas the measurements for differing pitch/intensity show great subject variability.

The results of the EGG parametrizations lead to the conclusion that the proposed parameters can be used to describe voice quality. The relation between voice quality and the EGG waveform will be presented in the following section.