21.2. Results of the automated data classification
The data was classified using
a degree of roughness of 0.3 (a quite detailed description of cases)
and rule precision threshold of 0.5 (rather general
rules). The rules generated by the rough sets system
and the attribute strength report are given in Tables
14 and 15.
Please note, that the ranges of parameter values
given in Table 14 are only interpretable for this
particular experiment and are presented here only for completeness of the
result presentation.
The correctness of the data classification was tested
additionally using the leave-one-out method. As mentioned above, the verification
of the results is possible only for non-rough models. The error rate of
the leave-one-out verification of the data classification for precise model
is about 8%, which is a good result that demonstrates both the high
predictive capability of the parameters used and the strong dependency patterns
(Reduct System, 1993).
-
-
Table 14: Results of the rule generation using the
rough sets method. System setup: Roughness=0.30, Rule Precision
Threshold= 0.50. Abbreviations: f0std - standard deviation of F0 [Hz],
ectm - relative duration of the end of closing [%],sctm - relative duration
of the start of closing [%], otm - relative duration of open phase [%], eotm
- relative duration of the end of opening [%], soam - start of opening slope,
ecam - end of closing slope, peakm - peak-to-peak amplitude, opvm - variation
of the open phase.
|
decision |
rule |
group==> control |
[15.70< sotm<=24.10] & [13.24< otm<=21.52] &
[604.80< ecam<=1717.40] | OR | |[15.70< sotm<=24.10] &
[13.24< otm<=25.65] & [882.95< ecam<=1717.40] |
group==> breathy |
|[scam>120.01] & [21.52< otm<=38.06] & [ecam>1161.10]
& [133.64< f0m<=223.94]
| OR
| |[scam>120.01] & [ecam>1161.10] &
[133.64< f0m<=205.88]
| OR
| |[21.52< otm<=38.06] & [1252.50<
opvm<=2058.50] & [205.88< f0m<=223.94]
| OR
| |[scam>120.01] & [otm<=21.52 or
otm>38.06] & [ecam>1161.10] & [1252.50< opvm<=2058.50]
& |[f0m<=205.88 or f0m>223.94]
| OR
| |[scam<=120.01] & [21.52< otm<=38.06]
& [205.88< f0m<=223.94] |
group==> rp-with-comp |
| |[f0std<=9.87 or f0std>72.35] & [(3.87< ectm<=5.76)
or (7.66< ectm<=9.55)]& |[sctm<=17.08 or sctm>43.80]
| OR
| |[f0std<=9.87 or f0std>72.35] & [3.87<
ectm<=9.55] & [sctm<=17.08 or | | |sctm>43.80] & [4.97<
otm<=25.65]
| OR
| |[eotm<=16.82] & [25.65< otm<=33.93]
| OR
| |[eotm<=16.82] & [3.87< ectm<=5.76]
| OR
| |[f0std<=9.87 or f0std>72.35] &
[eotm>16.82] & [17.08< sctm<=30.44] & | | |[otm<=4.97
or otm>33.93]
| OR
| |[3.87< ectm<=9.55] & [30.44<
sctm<=43.80] & [otm<=4.97 or otm>25.65] |
group==> rp-without-c |
|[soam>-185.44] & [79.46< f0m<=115.58] |
|OR
||[ecam<=326.65] & [soam>-185.44] &
[8115.00< peakm<=19710.00] & [f0m>115.58] |
|OR
| |[ecam<=326.65] & [18.79< f0std<=54.50]
& [peakm<=8115.00 or peakm>19710.00]
| OR
| |[18.79< f0std<=54.50] & [ectm<=7.66
or ectm>17.12]
| OR
| |[ecam>326.65] & [7.66< ectm<=17.12]
& [8115.00< peakm<=19710.00] |
group==> chordectomy |
|[ctm<=7.80 or ctm>12.03] & [9.87< f0std<=72.35] &
[sotm<=11.51]
| OR
||[ecam>326.65] & [ctm<=7.80 or
ctm>12.03] & [9.87< f0std<=81.27]
| OR
| |[otm<=8.90] & [7.80< ctm<=12.03]
| OR
| |[otm<=8.90] & [f0std<=9.87 or
f0std>81.27]
| OR
| |[otm<=8.90] & [sotm>15.70]
| OR
| |[ecam<=326.65] & [7.80< ctm<=12.03]
& [9.87< f0std<=81.27] & [sotm>11.51]
| OR
| |[ecam<=326.65] & [ctm<=7.80 or
ctm>12.03] & [f0std<=9.87 or f0std>72.35] & | |
|[sotm>11.51] |
-
-
Table 15: Attribute strength
report of the rules in Table 14. System setup and
abbreviations are as used in Table 14.
|
decision |
coverage |
attribute |
relative parameter strength |
group==> control |
100 % |
otm |
0.71 |
ecam |
0.71 |
sotm |
0.56 |
group==> breathy
|
100 % |
scam |
0.5 |
otm |
0.48 |
f0m |
0.48 |
ecam |
0.46 |
opvm |
0.42 |
group==> rp-with-comp
|
100 % |
ectm |
0.54 |
sctm |
0.44 |
otm |
0.42 |
f0std |
0.35 |
eotm |
0.29 |
group==> rp-without-c |
100 % |
f0m |
0.46 |
ecam |
0.44 |
f0std |
0.42 |
soam |
0.40 |
ectm |
0.35 |
peakm |
0.35 |
group==> chordectomy |
100 % |
f0std |
0.65 |
sotm |
0.56 |
ecam |
0.54 |
otm |
0.54 |
ctm |
0.46 |
Generally speaking, the results of the automated data
classification resemble the results of the statistical data
analysis (see section 21.3). The variables
used in the rules are mostly those of statistical significance. The critical
values of the parameters used in the rules agree to a large extent with the
results summarized in Table 16. Rule coverage
is full, which means that all the data is properly classified.
The parameter strength report (the relative
importance of the parameter for the validation of a given rule) reveals
interesting results (Table 15). For example, in the
hierarchy of the factors needed for the proper classification of breathy
voice, the most important one is the duration of the no-contact phase, followed
by the steepness of the start of the contact rise phase (start of closing),
F0 and the steepness of the second part of the contact rise (end of closing).
Comparing the factors that are important for classifying
the cases, one can see that the steepness of the closing phase occurs in
all rules. Duration of the closing and opening phases, peak-to-peak
amplitude, its variation and duration of the opening phase also play important
roles in this deterministic classification of data.
That is a screen shot from Datalogic (Reduct System) program: