7. Objective measurements of voice quality

The behavior of phonation has been a subject of multidisciplinary scientific investigation. The exact details of the voicing mechanism are still unknown and a general picture of voicing still cannot be given. The current investigation deal with the following questions:

1. How is linguistic information (especially intonation and stress) coded in voice quality?

2. How can voice quality be measured objectively and quantitatively ?

3. How do pathological voices differ?

4. How can glottal activity be measured without influencing articulation?

5. How adequate are the models of vocal fold movements for the description of real phonation?

The established relationships between a produced acoustical signal and the voice source are complex and, since we are only able to observe the behavior of voicing indirectly, prone to error.

In the following sections the term "voice quality" will be understood in its narrower meaning, i.e. limited to the phonation process.

Typically, experiments with voice quality involve a lot of measurements with different methods and techniques. This makes scoring multi-dimensional and complex. Another important issue in the investigation of the voice source is the selection of the stimuli used for linguistic experiments. They should be selected especially carefully in order to keep articulatory behavior controllable (unchanged) as much as possible. As has been shown, glottal activity differs not only in the linguistic aspect, but may also depend on additional factors like psychological stress and emotions, making the repetition of measurements questionable. Nevertheless, careful experimental technique, care in the examination of the results and concentration on only the main aspects, led to significant results. Thus, it is reasonable to assume that the methods developed in this study will enable us to find the answers to the questions formulated above. It should be noted however, that some voice features were not taken into consideration in the described experiments.

7.1. Parametrization of the glottal flow.

The principal features of the glottal flow signal can be described by a number of parameters. The most important parameters for perceived voice quality are given in Fig.13.

The following parameters are used to describe the glottal volume velocity (Ug) waveform:

Figure 13. Schematic description of a glottal waveform Ug and its time derivative (after: Hanson, 1996:11; Slujiter, 1995: 97). The following abbreviations are used: T0 - duration of the pitch period, t1-begin of the airflow, t2- instant of the maximum glottal flow of the amplitude AV through the glottis, t3- moment of the glottal closure and maximum change of glottal flow, t4 - instant of complete glottal closure.

                               

The glottal flow signal may have a DC offset10, for example in the case of an incomplete closure, when the airflow bypasses the vocal folds (also called residual flow). The pitch period is denoted T0 (reciprocal to the fundamental frequency F0). The glottal airflow increases between t1 and t2 and decreases again between t2 and t4 due to the opening and closing motions of the vocal folds. The amplitude of the modulated glottal airflow is indicated by the AC component (denoted as AV in Fig.13), while the sum of the DC and AC components determines the maximum of overall airflow through the glottis. The instant in time of the greatest negative slope of the flow derivative (denoted as t3) corresponds to the instant of the main vocal tract excitation. According to the source-filter theory of speech production lip radiation is represented (approximated) by the derivative of the produced acoustic signal. Thus, the intensity of the produced acoustic wave depends rather on the derivative of the glottal flow signal than the amplitude of the flow itself, i.e. the derivative is the effective excitation of the vocal tract (EE) (Fant, 1982; Hanson, 1995:12).

An important representation of the glottal flow is given by the Open Quotient (OQ), which is the ratio of the time in which the vocal folds are open and the whole pitch period duration ((t4-t1)/T0), and the Speed Quotient (also called skewness or rk), which is defined as the ratio of rise and fall time of the glottal flow ((t2-t1)/(t4-t2)).

The Open Quotient indicates the duty ratio of the glottal airflow. A change of the duty ratio substantially changes the spectrum of an excitation, it is also highly correlated to physiological constraints, as is the case in different phonation types.

The Speed Quotient reflects the asymmetry of the glottal pulse. The glottal airflow is usually skewed to the right, which means that the decrease of the airflow is faster than its increase (Rothenberg, 1981; Ananthapadmanabha, 1984; Titze, 1988).

The values of the parameters can be directly related to the phonation types described in the previous sections. For example, due to incomplete glottal closure, a larger open quotient and DC flow are typical of breathy voice, while for pressed voice a reduced open quotient and reduced airflow through the glottis are to be expected. The abruptness of the closure as well as the skewness of the glottal pulse are of special importance to the spectral properties of a produced sound. The effects of varying glottal flow quotients on the glottal flow spectra will be described in detail in section 7.4.

A more detailed description of the glottal signal is obtained by matching the measured glottal waveform with a theoretical model of the voice source. Among the numerous models proposed in the literature (see Ní Chasaide & Gobl, 1997:435 for a survey) the LF (Liljencrants-Fant) model of differentiated glottal flow is widely used (Fant et al., 1985, 1995). In this model a waveform is described by a set of mathematical functions that model a given segment of the waveform. In the LF model, the derivative of volume velocity is modelled by the following set of functions:

                                                        (3)

The following parameters are used in eq.(3):

Finally, it is assumed that the area below the modelled curve for the t1..t2 segment is equal to the area of the t2..T0 part of the waveform.

The Open and Speed Quotients are also represented in the LF model.

7.2. Methods for the measurement of glottal activity.

The following section contains a survey of the various methods for the measurement of glottal activity as well as the results of these methods when applied to the signals of various voice qualities. The relations between parameters defined above are also described.

Numerous methods of analysis and observation of the laryngeal functions during speech have been developed in recent years. Phonation comprises mechanical and aerodynamic phenomena. Direct observation of vocal fold movements is possible only visually, whereas in practice the aerodynamic excitation signal can be examined only indirectly. Visual inspection enables the researcher to measure the dynamic (and static) parameters of phonation; glottal flow must be measured either indirectly or with additional sensors.

These methods can be classified as:

Follow this link to find more about techniques of vocal folds observations.