Procedure

18.2. Procedure

Two basic types of pitch accents (H*L and L*H) and two boundary tones (L% and H%) were investigated for the vowels /a/ and /i/. Additionally, the effects of the position of the vowel in the sentence was taken into account. Different renditions of the following sentences were used in the experiment:

(WAV file 38 kB) "weil die Latte auf die Latte klatscht" (because the slat smacks against the slat)
(WAV file 45 kB) "weil die Litze in der Litze klickt" (because the braid clicks in the braid).

Three intonational patterns were generated, all with two pitch accents (first=prenuclear, second=nuclear) and a final boundary tone. The three patterns are sketched in Fig.33. For a type A sentence the data for positions I (H*L tone) and III (L%) was collected, for the sentence of the type B positions II (H*L) and III (L%) and for type C the vowels in the positions I (L*H), II (L*H) and III (H%) were selected.

Four subjects (two females and two males, native speakers of German, all with no history of speech, voice or hearing disorders; their ages ranged from 24 to 34 and none of them smoke) repeated the sentences in a given order with an appropriate intonational pattern and normal, comfortable voice effort. Every pattern was repeated at least five times. The speaker could repeat the pattern if he/she felt that he/she had produced it incorrectly. The target patterns were sketched on the sheets that the subject read from during the experiments, but the expected intonation was also provided verbally by the supervisor before the start of the recording. The recordings were made in a sound-treated room using a Sony ECM 418 condenser microphone, and the EGG signal was collected simultaneously using a portable Laryngograph Processor. Both channels were recorded directly on the professional DAT recorder with sampling frequency of 48 kHz and 16-bit resolution.

The recordings were directly (i.e in a digital form) transmitted from DAT tape to workstation (Silicon Graphics Indy) and the sampling frequency was reduced to 16 kHz. The EGG signal was not modified or filtered (besides anti-aliasing filtering for sampling frequency reduction). For every speaker the two best renditions of each pitch pattern and each vowel were chosen and the appropriate vowels were extracted.

Figure 33. Patterns of different pitch accents used in the experiment with /a/ vowels ("weil die Latte auf die Latte klatscht"). Additionally, the positions of the /a/ vowels are marked. Pattern A: H*L (on a1) L% (on a3) Pattern B: H*L (on a2) L% (on a3) Pattern C: L*H (on a1 or on a2) H% (on a3)

Principally, a full duration of the vowel was analyzed, i.e. an extracted segment starts at the beginning of the stable second formant and ends at the silence of the following stop /t/. The selected EGG segments were described using the method mentined above, i.e. every pitch period was extracted and characterized by the durations and slopes of 6 straight segments. The full opening and full closing were tagged at 10% and 90% of the peak-to-peak amplitude within every EGG period. Although the raw, unfiltered form of the EGG signal was used, the base line (the Gx component) of the EGG was rather stable for all vowels. All pitch periods were recognized by the segmentation algorithm without any problems. For every straight line segment the duration and slope as well as the distance to the original waveform were measured. Peak-to-peak amplitude of the EGG waveform, Open Quotient (both types, as defined in section 12), Speed Quotient, fundamental frequency and "distribution of F0" ( a kind of jitter, measured as a relative difference in pitch period duration between neighboring periods[13]) were also measured for all periods of the given vowel. The results were averaged for the whole vowel duration and the resulting mean values and standard deviations were statistically processed. Due to the limited number of subjects no data normalization was necessary. Like in the experiment on word stress all the processing of EGG data was done automatically within the ESPS/waves+ environment.

It should be noted that visual inspection allows changes in the shape of the EGG signal during vowel articulation to be located. Especially the last period before stop closure has a more sinusoidal form and a smaller amplitude than the preceding ones.

13. the accuracy of pitch period determination depends also on sampling frequency. Here the signal was was sampled at 16 kHz, so for a female voice with F0=200 Hz the accuracy is 1.25%.