IV The EGG and pathological voice quality

The subject of the next experiment is to use the electroglottographic signal to distinguish the voice qualities used in the paralinguistic context of communication. It is the goal of the experiment to differentiate pathological voice qualities on the phonation level while employing only the EGG waveform. As shown in section 11 the differentiation of various phonation types for the same speaker is definitely feasible. It is the more challenging task to find correlates not only of voice quality but but also of the laryngeal abnormalities underlying the voice quality changes. Moreover, the labelling of pathological voices, even when using inter-subjective voice scales (such as the GRBAS or the GRB scales), is based on unverified perceptual judgements (see section 4) and therefore not objective. Thus, the results of the present experiment are expected to establish a phonetically verifiable relation between the auditory classification of voices and the physiological constraints which can be derived from the EGG signal.

An additional effort is made to find the most suitable parameters for voice classification. The results of the evaluation of the statistical data are accompanied by the results of the rough sets machine learning classification.

Some important measures of pathological voice quality were presented in sections 4 and 7.6, while the electroglottographic parameters related to the paralinguistic layer of communication were first described in section 13. Based on the literature, the following conclusions about the use of the EGG in the study of pathological voice qualities can be drawn:

  1. The EGG waveform of pathological voice is often distorted compared to normal voice (Motta et al., 1990).
  2. The EGG waveform exhibits great variation between periods.
  3. The values of the basic quotients (Open or Speed Quotient) differ from the normal values (Esling, 1984; Dejonckere & Lebacq, 1985; Houben et al., 1992) (see Table 16).
  4. The detection of pitch periods may be complicated due to the unstable shape of the waveform as well as the weak maximum of the first derivative (required for the segmentation of the glottal waveform proposed in section 12). However, proper signal conditioning (i.e. application of various signal processing techniques) can help in the recognition of the closure instant (Vieira et al., 1996).
  5. The waveform often exhibits additional features that are characteristic of certain diseases, for example notches on the rising contact slope in the case of nodules of the vocal folds (Motta et al. 1990).
  6. The duration of certain phases of vocal folds contact differs from that of normal subjects (Hanson et al., 1988).
  7. A deficiency of closure is observed in the EGG waveform (Wechsler, 1976). (see p.1)
  8. The amplitude of the EGG is almost always weaker for pathological voices than for the normal voices. A broadband noise is often superimposed on the recordings (Colton & Conture, 1990).
  9. The amplitude perturbation of the EGG correlates well with perceptual judgements of voice hoarseness (Haji et al., 1986).

    Additional remarks concering the processing of pathological voices in the EGG domain can be summarized as follows:

  10. It is the common method of EGG evaluation to average the waveform shape for all registered periods and to compare the resulting shapes (corresponding to different subjects) with each other (Motta et al., 1990; Houben et al., 1992).
  11. It is difficult to obtain high-quality recordings of patients, because they are often nervous, consequently find it difficult to sit still, and are therefore difficult to keep in the neccessary stable position (Colton & Conture, 1990).
  12. The identification of the underlaying pathology is rarely done automatically as visual inspection of the waveform is still often necessary.

In a pilot study (Marasek, 1995b) different voice qualities (modal, breathy, creaky and asthenic Parkinson patients' voices) were identified using only EGG waveform parametrization.

In that study, a set of 25 parameters was used for the description of the waveform, including some of the parameters also used in this study. The classification of 16 subjects into 4 classes was succesful (all cases were properly classified). Several repetitions of two-syllable words containing stop-vowel transition sequences were used as stimuli (/pane/, /tane/). The recordings were very short (ca. 10 s) and were originally obtained for a purpose other than the classification of voice quality. Hence, the study was limited and an auditory classification of the voice qualities was hardly possible. The most prominent factors of the voice classification were the distances between the spectra of the EGG waveforms, the kurtosis of the F0 distribution, the relative durations of contact and open phases, as well as the stability of the signal during the rising contact phase (computed as the distance between the original waveform and the straight line model).

The experiment showed, however, that in further research some modifications of the procedures are required.