introduction: - florida institute of technologymy.fit.edu/~vkepuska/thesis/chih-ti shih... · web...
TRANSCRIPT
INTRODUCTION:
The objective of this thesis is to research and develop prosodic features for discriminating proper names uses in alerting (e.g., John, can I have that book?) from reverential context (e.g., I saw John yesterday). Prosodic measurements based on pitch and energy are analyzed to introduce new prosodic based features to the Wake-Up-Word Speech Recognition system . During the process of finding the prosodic features, an innovative data collection method had been designed and developed.
In the conventional automatic speech recognition system, the users are required to physically activate the recognition system by clicking a button, or by manually starting the application. The Wake-Up-Word Speech Recognition system invented by Kpuska the way people can activate their systems by enabling the users to use their voice only. The Wake-Up-Word Speech Recognition System will eventually further improve the way people use speech recognition system by enabling speech only interfaces.
In the Wake-Up-Word Speech Recognition system , a word or phrase is used as a Wake-Up-Word (WUW) word indicating the system that the user requires its attention (e.g., alerting context). Any user can call activate the system by uttering WUW (e.g., Operator), that will enable the application to accept the following command (e.g., Next slide please). Since the same word may occur during referential context, instead of needing attention from the system, it is important to discriminate accurately between the two. This use of the same word refers to it as a none-Wake-Up-Word, nonWUW context. The following examples further demonstrate the use of the word Operator in those two contexts:
Example sentence 1: Operator, please go to the next slide.Example sentence 2: We are using the word operator as the WUW.
Depicted cases above indicate different user intentions. In the first example, the word operator is been used as a way to alert the system and get its attention. In the second example, the same word operator, is used to refer to it and thus the term referential context. Current Wake-Up-Word Speech Recognition system implements only the pre & post WUW silence as a prosodic feature to differentiate the alerting and referential contexts. In this thesis, pitch and energy based prosodic features are used. The problem of general prosodic analysis is introduced in Section 1.1.
In Chapter 2, the use of the pitch as a prosodic feature is described. The pitch in general represents the intonation of the speech, and the intonation is used to convey linguistic and paralinguistic information of that speech (Lehiste, 1970) . The definition and characteristics of pitch will be covered in Section 2.1. In Section 2.2, pitch estimation method named eSRFD (Enhanced Super Resolution Fundamental Frequency Determinator) (Bagshaw, 1994) is introduced. Finally, in Section 2.3, derivation of multiple pitch-based features from pitch measurements to find the best feature to discriminate the WUW used in alerting context from reverential.
In Chapter 3, an additional prosodic feature based on energy is described. The definition of prominence, an important prosodic feature based on energy and pitch, and its characteristics will be covered in Section 3.1. In the following Section 3.2, description of energy is computation is presented. Finally, in Section 3.3, derivation of multiple energy features from the energy measurement is presented and analyzed.
In Chapter 4, an innovative idea of performing speech data collection is presented. After a number of prosodic analysis experiments conducted using WUWII Corpus (Tudor, 2007), the validation of obtained results was deemed necessary using a different data set. Since to our knowledge no specialized speech database is available, the Dr. Wallaces idea was adopted to collect the data from the movies. We designed a system which extracts speech from audio channel and if necessary video information from recorded medium (e.g., DVD) of movies and/or TV series. This project is currently under development by the Dr. Kpuskas VoiceKey Group.
The problem definition and system introduction will be explained in Section 4.1, followed by the system design in Section 4.2.
1.1 Prosodic Analysis
The word prosody refers to the intonation and rhythmic aspect of a language (Merriam-Webster Dictionary). Its etymology comes from ancient Greek, where it was used in singing with instrumental music. In later time, the word was used for the science of versification and the laws of meter, governing the modulation of the human voice in reading poetry aloud. In modern phonetics the word prosody is most often referred to those properties of speech that cannot be derived from the segmental sequence of phonemes underlying human utterances. (William J. Hardcastle, 1997).
Based on the phonological aspect; the prosody maybe classified into structure, tune and prominence.
1. The prosodic structure refers to the noticeable break or disjunctures between words in sentences which can also be interpreted as the duration of the silence between words as a person speak. This factor has been considered in the current Wake-Up-Word Speech Recognition system where the minimal silence period before the WUW and after must be present. The silence period before the WUW is usually longer than the average silence period of nonWUW or other parts of the sentence.
2. The tune refers to the intonational melody of an utterance (Jurafsky & Martin) which can be quantified by pitch measurement also known as fundamental frequency of the speech. The detail on the pitch characteristic, pitch estimation algorithm and the usage of pitch features are presented and explained in Chapter 2.
3. Finally, the prominence includes the measurement of the stress and accent in a speech. The prominence is measured in our experiments using the energy of the sound. The details of energy computation, feature derivation based on energy, and experimental results are presented in Chapter 3.
PITCH FEATURES:
In this chapter intonation melody of an utterance, computed using pitch measurement, is descried. The pitch feature also referred as fundamental frequency, and the comparison of various pitch estimation algorithms are covered in section 2.1. Based on those results from multiple fundamental frequency algorithms (FDA) the eSRFD (Enhanced Super Resolution Fundamental Frequency Determinator) is selected as the algorithm of choice to perform the pitch estimation. The details of the eSRFD algorithm are covered in section 2.2. Derivation of multiple pitch-based features and their performance evaluations are covered in section 2.3.
1.1 Pitch and pitch estimation methods
The intonation is one of the prosodic features that contain the information that may be the key to discriminate the referential context and the alerting context. The intonation of a speech is strictly interpreted as the ensemble of pitch variations in the course of an utterance (Hart, 1975). Unlike tonal languages such as Mandarin Chinese language that has lexical forms that are distinguished by different levels or patterns of pitch of a particular phoneme. The pitch in the intonation languages such as English language, Germanic languages, Romance languages, and Japanese languages, is been used syntactically. In addition, the intonation patterns in the intonational languages are grouped with number of words which are called intonation groups. Intonation groups of words are usually uttered in one single breath. The pitch measurement in the intonation languages reveals the emotion of a person and the intention of his/her speech. For example:
Can you pass me the phone?
The pattern of continuous rising pitch in the last three words in the above sentence indicates a request.
In strict terms, pitch is defined as the fundamental frequency or fundamental repetition of a sound. The typical pitch range for an adult male is between 60-200 Hz and 200-400 Hz for adult female and children. The contraction of vocal fold produces relatively high pitch and vice versa the expended vocal fold produces lower pitch. This explains the reason a persons rise in pitch when he/she gets nervous or surprised. The reason why a male usually has a lower pitch than female and children can also be explained by the fact that males usually have longer and larger vocal folds.
After years of development of pitch estimation algorithms, pitch estimation methods can be classified into the following three categories:
1. Frequency based methods such as CFD (Cepstrum-based F determinator) and HPS (Harmonic product spectrum), use frequency domain representation of the speech signal to find the fundamental frequency.
2. Time domain based methods such as FBFT (Feature-based F tracker) (Phillips, 1985) uses perceptually motivated features and PP (Parallel processing method) produce fundamental frequency estimates by analyzing the waveform in the time domain.
3. Cross-correlation method such as IFTA (Integrated F tracking algorithm) and SRFD (Super resolution F determinator) uses a waveform similarity metric based on a normalized cross-correlation coefficient.
The method of eSRFD (Enhanced Super Resolution Fundamental Frequency Determinator) (Bagshaw, 1994) was chosen to extract the pitch measurement for the Wake-Up-Word because of its high overall accuracy. According to Bagshaws experiments, the accuracy of the eSRFD algorithm can have voiced and unvoiced combined error rate below 17% and low-gross fundamental frequency error rate of 2.1% and 4.2% for male and female respectively. The Figure 2.1 and Figure 2.2 below show the error rate comparison charts between eSRFD and other FDAs for male and female voice respectively.
Figure A1 FDA Evaluation Chart: Male Speech. Reproduced from (Bagshaw, 1994)
In the Figure 2.1 and Figure 2.2, the purple bars indicate the low-gross F error which refers to the halving error where the pitch has been estimated wrongly with a value about half of the actual pitch. The green bars represent the high-gross F error which refers to the doubling error where the pitch has been estimated wrongly with a value about twice of the actual pitch. The voiced error represented by red bars refers to the miss identified unvoiced frames as voiced ones by the FDA. Finally the unvoiced error means the voiced data is been miss identified as unvoiced data and this error is represented by blue bars.
Figure A2 FDA Evaluation Chart: Female Speech. Reproduced from (Bagshaw, 1994)
The Figure 2.1 and Figure 2.2, refer to male and female fundamental frequency evaluation charts. They depict that the eSRFD algorithm achieves the lowest overall error rate. This result was confirmed in the more recent study of (Veprek & Scordilis, 2002). Consequently, eSRFD it has been chosen to be the FDA in our project.
1.2 eSRFD Frequency Determinator Algorithm
The eSRFD is the advanced version of SRFD (Medan, 1991); The program flow chart of the eSRFD FDA is illustrated in Figure 2.3.
The theory behind the SRFD algorithm is to use a normalized cross-correlation coefficient to quantify the degree of similarity between two adjacent, non-overlapping sections of speech. In eSRFD, a frame is been divided in three consecutive sections instead of two as in the original SRFD algorithm.
At the beginning, the sample waveform is passed through a low-pass filter to remove the signal noise. The sample utterance is then divided into non-overlapping frames of 6.5 ms length (tinterval = 6.5ms) and each frame contains a set of samples, SN, where which is divided into 3 consecutive segments each containing equal number of a varying number of samples, n. The definition of segmentation is defined by the Equation 21 below and further described in Figure 2.4 below.
Figure A1 eSRFD Flow chart
Equation 21
Figure A2 Analysis segments of eSRFD FDA
In eSRFDA each frame is processed by a silence detector which labels the frame as unvoiced if the sum of the absolute values of xmin, xmax, ymin, ymax, zmin and zmax is smaller than a preset value (e.g., 50db signal-to-noise level), vice versa, the frame is voiced if the sum of the absolute values of xmin, xmax, ymin, ymax, zmin and zmax is larger than a preset value (e.g., 50db signal-to-noise level). No fundamental frequency will be searched if the frame is marked as an unvoiced frame. In cases where at least one of the segments of xn, yn or zn is not defined, which usually happens at the beginning of the speech file and the end of the speech file, these frames will be labeled as unvoiced and no FDA will be applied to these frames.
If the frame of sample is not labeled as silence, then candidate values for the fundamental period are searched from values of n within the range Nmin to Nmax by using the normalized cross-correlation coefficient Px,y(n) as described by Equation 22.
Equation 22
In the Equation 22, the decimation factor L is used to lower the computational load of the algorithm. The smaller L values allow higher resolution but also causes increase in computational load of FDA. Larger L values produce faster computation with lower resolution search. The L is set to 1 since the purpose of this research is to find as accurate as possible relationship between pitch measurements in WUW words, thus the computational speed is considered secondary and thus is not taken into account. However, the variable L will be considered when this algorithm is integrated into the WUW Speech Recognition System.
Figure A3 Analysis segments for Px,y(n) in the eSRFD
The candidate values of the fundamental period of a frame are found by locating peaks in the normalized cross-correlation result of Px,y(n). If this value exceeds a specified threshold, Tsrfd, then the frame is further considered voiced candidate. This threshold is adaptive and is dependent on the voice classification of the previous frame and three preset parameters. The definition of Tsrfd is described in the Equation 23. If the previous frame is unvoiced or silent, the Tsrfd is equal to 0.88. If the previous frame is voiced, the Tsrfd is equal to the larger value between 0.75 and 0.85 times the value of Px,y of previous frame Px,y. The threshold is adjusted because the present frame has higher possibility to be classified as voiced if the previous frame is voiced as well.
If the previous frame is unvoiced or silent.
If the previous frame is unvoiced or silent.
Equation 23
In case no candidates for the fundamental period are found in the frame, the frame is reclassified as unvoiced and no further processing will be applied to the unvoiced frame. In another case, the frame is classified as voiced and following process will be used to find the optimal candidate as described next.
After getting the first normalized cross-correlation coefficient Px,y, the second normalized cross-correlation coefficient Py,z, will be calculated for the voiced frame. The normalized cross-correlation coefficient Py,z is described by the Equation 24 below.
Equation 24
After the second normalized cross-correlation, the score will be given to all candidates. If the candidate pitch value of a frame has both Px,y and Py,z larger than Tsrfd, a score of 2 is given to the candidate. If only Px,y is above Tsrfd, a score of 1 is assigned to the candidate. The higher score indicates higher possibility for the candidate to represent the fundamental period of the frame. After candidate scores are given, if there are one or more candidates with a score of 2, all candidates score with 1 in that frame are removed from the candidate list. If there is only one candidate with score of 2, then the candidate is assumed to be the best estimation of fundamental period of the particular frame. If there are multiple candidates with score 1 but no candidate scores of 2, an optimal fundamental period is sought from the remaining candidates.
For the case of multiple candidates with score 1 but no candidate scores of 2, the candidates are sorted in ascending order of fundamental period. The last candidate of the list which has the largest fundamental period represents a fundamental period of nM and nm for mth candidate.
Figure A4 Analysis segments for q(nm) in the eSRFD
Then the third normalized cross-correlation coefficient, q(nm), between two sections of length nM spaced nm apart, is calculated for each candidate. The Equation 25 describes the normalized cross-correlation coefficient, q(nm) used in this case.
Equation 25
After the third normalized cross-correlation coefficient is generated, the q(nm) of the first candidate on the list is assumed to be the optimal value. If the following q(nm) multiplied by 0.77, is larger than the current optimal value, the candidate for which q(nm) is considered to be the new optimal value. We apply the same concept through the list of candidates; resulting with the optimal candidate value.
For the case where only one candidate has score of 1 and no candidate scores of 2, the possibility for the candidate to be the true fundamental period of the frame is low. In such a case, if both previous frames and subsequent frame are silent, the current frame is an isolated frame and is reclassified as silent frame. If either the previous or the next frame is voiced frame, we assume the candidate of the current frame is the optimal and it defines the fundamental period of the current frame.
The above algorithm has high possibility to miss identify the voiced frame to unvoiced or silent frames. In order to counteract this imbalance, biasing is applied when all of the three conditions below are satisfied:
The two previous frames were voiced frames.
The fundamental period of the previous frame is not temporarily on hold.
The fundamental frequency of the previous frame is less than 7/4 times the fundamental frequency of its next voiced frame and greater than 5/8 of the next frame.
After getting the fundamental frequency, in order to further minimize the occurrence of doubling or halving errors, the pitch contour is passed through a median filter.
The median filter is of length 7 as the default size, but it will decrease the size to 5 or 3 in case there are less than 7 consecutive voiced frames. The Figure 2.7 below shows an example of doubling points being corrected by the medium filter. In the Figure 2.7, the top row shows the pitch measurement generated by eSRFD FDA and the bottom row shows the fixed measurement by medium filter. As we can see from the figure, the two points marked as doubling error were fixed by medium filter.
(Doubling Error)
Figure A5 medium filter example
We applied the above pitch estimation method to the WUWII (wake-up-word II corpus) which contains approximately 3410 utterances and every utterance contains at least one WUW. The Figure 2.8 displays a sample utterance containing the following sentence:
Hi. You know, I have this cool wildfire service and, you know, I'm gonna try to invoke it right now. Wildfire
Figure A6 Example, WUWII00073_009.ulaw
In the Figure 2.8, the first row shows the waveform of the speech, the second row shows the pitch estimation from eSRFD FDA, the third shows the pitch estimation after the medan filter and the last row shows the spectrogram of the speech. The WUW of this sentence is Wildfire which is the section delineated between two red lines.
1.3 Pitch features
The pattern of the fundamental frequency contour of utterance waveforms represents the intonation of the speech. Since this problem of discriminating the use of the words in alerting context from referential context to our best knowledge has never been done before, a specialized corpus containing WUWs is necessary. In this project, the corpus named WUWII was chosen. The WUWII corpus contains 3410 sample utterances and each utterance sentence contains at least one of the five different WUWs. The 5 WUWs are Wildfire, Operator, ThinkEngine, Onword and Voyager.
In our hypothesis, the intonation rise on the WUW, thus there should be an increment on the average pitch and maximum pitch on the WUW sections compare to the nonWUW sections.
Based on the above hypothesis, the average pitch and maximum pitch of the WUW are considered and the following twelve features are derived.
1. APW_AP1SBW: The relative change of the average pitch of WUW to the average pitch of the previous section just before WUW.
2. AP1sSW_AP1SBW: The relative change of the average pitch of the first section of WUW to the average pitch of previous section just before WUW.
3. APW_APALL: The relative change of the average pitch of WUW to the average pitch of the entire speech sample excluding the WUW sections.
4. AP1sSW_APALL: The relative change of the average pitch of the first section of the WUW to the average pitch of the entire speech sample excluding the WUW sections.
5. APW_APALLBW: The relative change of the average pitch of the WUW to the average pitch of entire speech sample before the WUW.
6. AP1sSW_APALL: The relative changes of the average pitch of the first section of the WUW to the average pitch of the entire speech sample excluding the WUW sections.
7. MaxP_MaxP1SBW: The relative change of the maximum pitch in the WUW sections to the maximum pitch in the previous section just before the WUW.
8. MaxP1sSW_MaxP1SBW: The relative change of the maximum pitch in the first section of the WUW to the maximum pitch of the previous section just before the WUW.
9. MaxPW_MaxPAll: The relative change of the maximum pitch of the WUW to the maximum pitch of the entire speech sample excluding the WUW sections.
10. MaxP1sSW_MaxPAll: The relative change of the maximum pitch of the first section of the WUW to the maximum pitch of the entire speech sample excluding the WUW sections.
11. MaxP1sSW_MaxPAllBW: The percentage changes of the maximum pitch in the first section of the WUW to the maximum pitch of the entire speech before the WUW.
12. MaxPW_MaxPAllBW: The percentage changes of the maximum pitch in the WUW sections to the maximum pitch of the entire speech sample before the WUW.
In the presented experiment, no significant discriminating pattern is found from the results. The results of WUW experiments using the pitch measurement defined above are shown in Table 21. The best feature for the all WUWs is the relative change of the maximum pitch of WUW to the maximum pitch of the previous section just before WUW. The result can be improved if clear syllabic boundaries are defined. However, syllabuses in English language are not clearly defined. The details of the results are shown in Appendix A.
Beside the above features, other approaches such as pitch measurement patterns can also been used to discriminate the WUWs and nonWUWs. This is one of the current research topics by Raymond Sastraputera, a graduate student working with Dr. Kepuska. The potential approaches of pitch based features are covered in the Chapter 5.
WUW: All
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
APW_AP1SBW
1415
726
51
0
0
689
49
AP1sSW_AP1SBW
1415
735
52
0
0
680
48
APW_APALL
2282
947
41
0
0
1335
59
AP1sSW_APALL
2282
996
44
2
0
1284
56
APW_APALLBW
2188
962
44
0
0
1226
56
AP1sSW_APALL
2188
1003
46
2
0
1183
54
MaxP_MaxP1SBW
1415
948
67
53
4
414
29
MaxP1sSW_MaxP1SBW
1415
719
51
54
4
642
45
MaxPW_MaxPAll
2282
1020
45
109
5
1153
51
MaxP1sSW_MaxPAll
2282
716
31
213
9
1353
59
MaxP1sSW_MaxPAllBW
2188
1069
49
111
5
1008
46
MaxPW_MaxPAllBW
2188
1003
35
2
10
1183
55
Table A1 Pitch Features Result, All WUWs
ENERGY FEATURES
As it was mentioned in Section 1.1, the prominence can be measured using energy of the utterance. If pitch represents the intonation of a speech then the energy is representing the stress of the speech. In this chapter the same concept from the pitch that described in Chapter 2 was used with energy to generate the similar feature set.
1.4 Energy Characteristic
In an English sentence, certain syllables are more prominent than others and these syllables are called accented syllables. Accented syllables are usually either louder or longer compared to the other syllables in the same word. In English language, different position of the accented syllables on the same word is used to differentiate the meaning of the word. For example, the word object (noun [ ab.dzekt ]) compared to the same word object used as verb (verb [ab.dzekt]) (Cutler, 1986) has a different place of accented syllables. The position of the accent syllables is indicated by in the phonetic transcription. If this idea of accented speech is applied to the entire sentence instead of one single word, then it may provide additional clues about the use of a word of interest and its meaning within the sentence.
Classifying the factors that model speakers speech and how they choose to accentuate a particular syllable within the whole sentence is a very complex problem. However, the measurement of the accented syllables can be simply done by using energy of the speech signal and its pitch change.
1.5 Energy Extraction
The energy of speech signal can be expressed by Parsevals Theorem as in the Equation 3-1 below.
Equation 31
In the Equation 3-1, the energy of a signal is been defined in both the time or frequency domain. Both |x[n]|2 and |X()|2 represent the energy density which can be thought as energy per unit of time and energy per unit of frequency.
The energy of a fixed frame size (6.5ms), same as in pitch computation, is used here as well. After the energy is calculated for all samples of each utterance in the WUWII corpus, the energy features are computed in similar fashion as the pitch features section 2.3 as described in the next section.
1.6 Energy Features
As in the previous experiments with pitch features, 12 energy based features were computed and tested. The features are represented as the relative change which is explained in the Equation 32.
Relative Change between A and B=
Equation 32
The features are listed below:
1. AEW_AE1SBW: The relative change of the average energy of the WUW to the average energy of previous section just before the WUW.
2. AE1sSW_AE1SBW: The relative change of the average energy of the first section of the WUW to the average energy of previous section just before the WUW.
3. AEW_AEAll: The relative change of the average energy of the WUW to the average energy of the entire sample speech excluding the WUW sections.
4. AE1sSW_AEAll: The relative change of the average energy of the first section in the WUW to the average energy of the entire utterance excluding the WUW sections.
5. AEW_AEAllBW: The relative change of the average energy of the WUW to the average energy of all speech before the WUW.
6. AE1sSW_AEAllBW: The relative change of the average energy of the first section in the WUW to the average energy of the entire sample speech before the WUW.
7. MaxEW_MaxE1SBW: The relative change of the maximum energy in the WUW sections to the maximum energy in the previous section of the WUW.
8. MaxE1sSW_MaxEAllBW: The relative change of the maximum energy in the first section of WUW to the maximum energy in the entire speech before of the WUW.
9. MaxEW_MaxEAll: The relative change of the maximum energy in the WUW to the maximum energy of the entire speech sample excluding the WUW section.
10. MaxE1sSW_MaxEAll: The relative change of the maximum energy in the first section of the WUW to the maximum energy of the entire speech sample excluding the WUW section.
11. MaxE1sSW_MaxEAllBW: The relative change of the maximum energy in the first section of the WUW to the maximum energy of the entire speech before the WUW.
12. MaxEW_MaxEAllBW: The relative change of the maximum energy in the WUW sections to the maximum energy of the entire speech sample before the WUW.
In this experiment few of the features may not be implementable in real-time application since they relay on the measurements after the WUW word of interest. However for it may lead to interesting conclusions. For real time speech recognition systems those features that do not relay on the features past WUW word of interest are the most useful. The Table 31 below shows the results of the measurements of on energy features based on all WUWs of WUWII corpus, namely the words Operator, ThinkEngine, Onword, Wildfire and Voyager. The details broken done for each word are included in Appendix B.
WUW: All WUWs
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
1479
1164
79
0
0
315
21
AE1sSW_AE1SBW
1479
1283
84
1
0
240
16
AEW_AEAll
2175
1059
49
9
9
1116
51
AE1sSW_AEAll
2175
1155
53
2
0
1018
47
AEW_AEAllBW
1969
1427
72
0
0
542
28
AE1sSW_AEAllBW
1969
1562
79
3
0
404
21
MaxEW_MaxE1SBW
1479
1244
84
20
1
215
15
MaxE1sSW_MaxEAllBW
1479
1221
83
13
1
245
17
MaxEW_MaxEAll
2175
1373
63
13
1
245
17
MaxE1sSW_MaxEAll
2175
1336
61
25
1
814
37
MaxE1sSW_MaxEAllBW
1969
1209
61
16
1
744
38
MaxEW_MaxEAllBW
1969
1562
60
3
1
404
39
Table A1 Energy Feature Result of All WUW
Based on the results shown in Table 31 above, the following three features performed the best in discriminating WUW word from others word tokens:
AE1sSW_AE1SBW: The relative change of the average energy of the first section in the WUW compared to the average energy of the last section before WUW. Using this feature 84% of data shows the average energy of the first section of the WUW is higher than the average energy of the previous section. The result is illustrated in Figure 3.1 below depicting distribution of features as well as cumulative distribution.
Figure A1 Distribution and Cumulative plots of energy feature, AE1sSW_AE1SBW.
MaxEW_MaxE1SBW: The relative change of the Maximum energy in the WUW sections compared to the maximum energy from the last section before WUW. Using this feature 84% of the samples show that the maximum energy in the WUW sections is higher than the maximum energy of the previous section. The distribution o features as well as cumulative distribution are shown in the Figure 3.2 below.
Figure A2 Distribution and Cumulative plot of energy feature, the Max Energy of WUW.
The relative change of the Maximum energy of the first section of WUW compared to the maximum energy from the last section before WUW. This feature correctly discriminated 83% of cases that exhibited higher maximum energy of the first section of WUW than the maximum energy of the previous section. The cumulative and distribution plots of this feature are shown in the Figure 3.3.
Figure A3 Distribution and Cumulative plot of energy feature, the Max Energy of the 1st section in WUW.
The above results are based on all the data including all five different WUWs. Thus, investigating each word independently may be more appropriate. The detail performance result of each individual WUWs is covered in Appendix B.
Linguistically, one of the more appropriate WUWs is the word Operator. This word is also been used in the current Wake-Up-Word Speech Recognition System . Based on the result on the Table 32, two features show over 90% of the WUW cases to have average or maximum energy is higher than the other regions of the speech. These two features are:
AE1sSW_AE1SBW: The relative change of the average energy of the first section in the WUW compare to the average energy of the last section before WUW. Using this feature 94% of samples has the first section of the WUW with higher average energy then previous section.
AE1sSW_AEAllBW: The relative change of the average energy of the first section in the WUW compared to the average energy of the entire speech before the WUW sections. Using this feature 91% of samples show the first section of WUW has higher average energy.
WUW: Operator
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
275
228
83
0
0
47
17
AE1sSW_AE1SBW
275
258
94
0
0
17
6
AEW_AEAll
418
248
59
0
0
170
41
AE1sSW_AEAll
418
290
69
1
0
127
30
AEW_AEAllBW
394
303
77
0
0
91
23
AE1sSW_AEAllBW
394
359
91
1
0
34
9
MaxEW_MaxE1SBW
275
240
87
1
0
34
12
MaxE1sSW_MaxEAllBW
275
243
88
0
0
32
12
MaxEW_MaxEAll
418
290
69
4
1
124
30
MaxE1sSW_MaxEAll
418
285
68
6
1
127
30
MaxE1sSW_MaxEAllBW
394
272
69
4
1
118
30
MaxEW_MaxEAllBW
394
359
68
1
1
34
30
Table A2 Energy Feature Result of WUW Operator
Based on the preformed experiment, WUW Wildfire achieved the best overall result. Using this word, 4 features scored higher than 90%. The results are shown on Table 33. The four best features are:
AEW_AE1SBW: The relative change of the average energy of the entire WUW compared to the average energy of the last section just before WUW. It shows 90% of the average energy of WUW is higher than the previous section.
AE1sSW_AE1SBW: The relative change of the average energy of the first section of the WUW compared to the average energy of the last section before the WUW. Using this feature 93% of samples show the first section of WUW has higher average energy.
MaxEW_MaxE1SBW: The relative change of the maximum energy of the WUW sections compared to the maximum energy in the last section before WUW. Using this feature 91% of samples show the WUW has higher maximum energy.
MaxE1sSW_MaxEAllBW: The relative change of the maximum energy of the WUW sections compared to the maximum energy of all sections before WUW. Using this feature 90% of samples show the first section in the WUW has higher maximum energy.
WUW: Wildfire
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
282
253
90
0
0
29
10
AE1sSW_AE1SBW
282
261
93
0
0
21
7
AEW_AEAll
340
173
51
0
0
167
49
AE1sSW_AEAll
340
185
54
0
0
155
46
AEW_AEAllBW
298
252
85
0
0
46
15
AE1sSW_AEAllBW
298
265
89
0
0
33
11
MaxEW_MaxE1SBW
282
258
91
8
3
16
6
MaxE1sSW_MaxEAllBW
282
253
90
2
1
27
10
MaxEW_MaxEAll
340
230
68
4
1
106
31
MaxE1sSW_MaxEAll
340
219
64
4
1
117
34
MaxE1sSW_MaxEAllBW
298
195
65
4
1
99
33
MaxEW_MaxEAllBW
298
265
62
0
1
33
36
Table A3 Energy Feature Result of WUW Wildfire
The complete results are shown in Appendix II.
From the obtained results above, it can be concluded that WUW is frequently accentuated compared to the rest of the words in the utterance.
DATA COLLECTION
In this chapter, we are introducing a revolution way to collect speech samples. We will also introduce the preliminary design of this data collection system in this chapter.
1.7 Introduction to the data collection
After we developed WUW discriminant features based on two prosodic measurements of pitch and energy, described in Chapter 2 and 3, we realized the data we used to generate those features may not be the most suitable. The corpus we used in the project was WUWII corpus. It only provides the data on the WUW under alerting situation and doesnt contain the data for the same word used under referential situation. As the result, we can only perform analysis based on the changes between alerting type of WUW against the overall sentence and not the information with the same word in the referential situation. Another drawback of the current WUWII corpus is that it contains speech that is not spontaneous. The testers were told to use the WUW to make up a sentence. Under such circumstance, tester may change the way he/she normally speaks.
In order to perform a more complete analysis, we will need a corpus which includes both alerting and referential WUW context with natural speaking utterances. Dr. Wallace came up with an idea to extract audio samples from movies and TV series.
Extracting speech samples from movies and TV series has the following advantages compared to the previous data collection method:
1. The speech examples are more natural. The speech from professional actors is more natural since they tend to think and speak like a particular character and act the situation of the character that they are depicting.
2. The data collection process will cost much less since we are not compensating individuals to record their voice. We are not currently considering the problem of the copyright since we use the data for scientific research purposes only.
3. Large number of data can be collected in a short period of time once the process is fully automated.
4. The voice channel data is of CD quality. In this project, we are extracting speech data from recorded videos compared to the conventional phone line or cell recording contained in WUWII corpus.
5. No manual labeling is required. We plan to use the transcripts obtained from the video channel (System Design). The transcripts provide time stamps for all spoken sentences. Thus, manual labeling is not needed.
With the listed advantages, we are planning to design an automatic data collection system to collect specific speech data suitable for prosodic analysis of the proper name use in referential context vs. alerting (or WUW) context.
1.8 System Design
The data collection project is a part of the prosodic features analysis project which is illustrated by the program flow chart in the. The prosodic features analysis project can be divided into three sub projects. In the Figure 4.1, the green boxes represent the project of prosodic features extraction which had been described in Chapter 2 and 3 of this thesis.
Figure A1 Program Flow Chart
The green boxes in the Figure 4.1 represent the functions of the prosodic feature extraction and analysis project. The blue boxes depict the WUW data collection project. Finally, the purple boxes represent the future project on video analysis.
In the prosodic feature analysis project, we use the prosodic features generated from acoustic measurement to differentiate the context of the words. In a part of the WUW data collection project the language analysis tools will be used to automatically classify the words of interest in this case referential or alerting. At the moment the capabilities of this tool, RelEx must be augmented in order to achieve this goal. The outcome of the WUW Speech Data Collection project will not only build a specialized corpus for the prosodic analysis project, but also provide a confirmation to the result from prosodic analysis. The detailed program flow chart of the WUW Speech Data Collection System is shown in the Figure 4.2 below.
Figure A2 WUW Audio Data Collection System Program Flow Diagram
The input of the system will be (1) the video file of the movies or TV series, (2) video transcription file if provided will be used otherwise it will be extracted from the video stream , and (3) English first names dictionary . In the case when there is no video transcription file and the subtitles are encoded into the video stream, the subtitle extractor, Subrip will extract subtitles and time stamps of the sentence from the video stream. An example of transcription file is provided in the Figure 4.3 below.
Figure A3 Example of Video Transcription File
The transcription files provide the following information: date and time when the files were been created, subtitle index number, start time and end time of each subtitle and, the subtitle transcription.
The audio extractor will extract audio channel from the video file. Then, using English first names dictionary and the sentence transcription with time markers, an application program called sentence parser was developed by VoiceKey team members to select sentences that include English first names. The Figure 4.4 below shows an example of the output of the sentence parser.
Figure A4 Example of output of the sentence parser
In the next step, the audio parser will use the information from the sentence parser to extract the corresponding audio sections from the audio file produced by media audio extractor .
After extraction of the sentence that contains a name, the RelEx is used to analyze the selected sentence. The RelEx is an English-language semantic relationship extractor based on Carnegie-Mellon link parser . The RelEx is able to provide sentence information on subject, object, indirect object and various words tagging such as verb, gender and noun. The current status of the WUW data collection project is at developing a rule based or statistical pattern recognition process based on the relationship information produced by RelEx. Ultimately, the system will be able to accurately identify if the name in the sentence is used in WUW or nonWUW context.
A necessary step in automation process is to obtain precise time markers indicating the words of interest. To achieve this one could use the HTK , a Hidden Markov Model Toolkit, to perform forced alignment on the audio input. The HTK was initially developed by Machine Intelligence Laboratory (formerly known as the Speech Vision and Robotics Group) of the Cambridge University Engineering Department (CUED). The HTK uses Hidden Markov model (HMM) which compares the acoustic features of the incoming audio with the known acoustic features of the typically 41 English phonemes to predict the most likely combination of phonemes reflecting to the audio and maps the words from the lexicon dictionary. In our case, since the transcription of the sentences is known, HTK is used to perform to map the phonemes of known words to the corresponding time intervals. The phoneme time labels or equivalently word boundaries of the spoken sentence are used to locate in time the WUWs or nonWUWs. Note that this step can be also performed by Microsofts SDK speech recognition system that is fully integrated in Microsofts Vista OS. The advantage of the Microsofts system is that we do not need to train it since the acoustic models are pre-built. However, a development of the application incorporating the Microsofts SDK features is necessary. Alternatively, HTK does not require any significant integration coding, however it does require accurate models. Automation of the described data collection process will be made possible by integrating the outputs from RelEx with the forced alignment.
With time segmented sentence labels of the audio stream indicating the WUW or nonWUW context, a new corpus can be generated just like WUWII corpus. This data will be used to perform prosodic analysis and develop new or refine existing prosodic features. It is expected that further study with the new data will not only validate the current prosodic analysis result, but also provide directions on developing new prosodic features. The ultimate goal is to find out the prosodic patterns on WUW, nonWUW and other parts of the sentence.
Conclusion
This thesis investigated two prosodic features and designed an innovative way of data collection system.
The pitch based features in section 2.3 did not provide significant discriminating patterns. The following are the potential solutions to improve the performance:
1. Build a specialized corpus which contains both WUWs and nonWUWs. The speech sentences in the current corpus, WUWII, only contain WUWs but no nonWUW. A new speech data collection system is designed in chapter 4 in order to improve the performance of the features.
2. Use different approaches on defining pitch based features. Instead of using average and maximum pitch measurements of the WUW, the pitch contour pattern should also been considered. Since we are interested on the general pattern of WUWs instead of a specific WUW, patterns which exclude the pitch pattern of the word.
The energy based features in section 3.3 provide significant discriminating patterns. The future improvement is to quantify the level of change compare WUWs to the nonWUWs.
The new data collection system is an ongoing project which will eventually provide sufficient data on both WUWs and nonWUWs. The data will help us on research new patterns for discriminate alerting context and referential context.
References
AOAMedia.com. (n.d.). AoA Audio Extractor. Retrieved from AOAMedia.com.Bagshaw, P. C. (1994). Automatic prosodic analysis form computer aided pronunciation teaching.Cutler, A. (1986). Forbear is a homophone: Lexical prosody does not constrain lexical access. Language and Speech. Hart, J. '. (1975). Integrating different levels of intonation analysis. Phonetics , pp. 309-327.Jurafsky, D., & Martin, J. H. (n.d.). Speech and Language Processing. An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.Kpuska, V. C. (2006). Leading and Trailing Silence in Wake-Up-Word Speech Recognition. Industry, Engineering & Management Systems 2006. Cocoa Beach.Kpuska, V. WUWII Corpus. Lehiste, I. (1970). Supersegmental. Cambridge Massachusetts: The Massachusetts Institute Technology Press .Machine Intelligence Laboratory of the Cambridge University Engineering Department . (n.d.). HTK, The Hidden Markov Model Toolkit .Medan, .. Y. (1991). Super resolution pitch determination of speech signals. IEEE Trans. Signal Processing ASSP-39(1), 40-48 .Merriam-Webster Dictionary. (n.d.). Merriam-Webster Dictionary.Novamente LLC. (n.d.). RelEx Semantic Relationship Extractor. Retrieved from http://opencog.org/wiki/RelExPattarapong, R., Ramdhan, R., & Beharry, X. (2009). Sentence Parser Program.Phillips, M. (1985). A feature-based time domain pitch tracker. Journal f the Acoustical Society of America , 77, S9-S10(A).Rojanasthien, P., Ramdhan, R., & Beharry, X. (2009). Audio Parser Program.Temperlyey, D., Lafferty, J., & Sleator, D. (n.d.). CMU Link Grammar Parser.Tudor, K. B. (2007). Triple Scoring of Hidden Markov Models in Wake-Up-Word Speech Recognition.Veprek, P., & Scordilis, M. (2002). Analysis, Enhancement and Evaluation of Five Pitch Determination Techniques. Elsevier Science Journal of Speech Communication , 37, 249-270.William J. Hardcastle, J. L. (1997). The handbook of phonetic sciences. p. 640.Zuggy. (n.d.). SubRip.
Pitch Feature Experimental ResultWake-Up-Word: All
Feature
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
APW_AP1SBW
1415
726
51
0
0
689
49
AP1sSW_AP1SBW
1415
735
52
0
0
680
48
APW_APALL
2282
947
41
0
0
1335
59
AP1sSW_APALL
2282
996
44
2
0
1284
56
APW_APALLBW
2188
962
44
0
0
1226
56
AP1sSW_APALL
2188
1003
46
2
0
1183
54
MaxP_MaxP1SBW
1415
948
67
53
4
414
29
MaxP1sSW_MaxP1SBW
1415
719
51
54
4
642
45
MaxPW_MaxPAll
2282
1020
45
109
5
1153
51
MaxP1sSW_MaxPAll
2282
716
31
213
9
1353
59
MaxP1sSW_MaxPAllBW
2188
1069
49
111
5
1008
46
MaxPW_MaxPAllBW
2188
1003
35
2
10
1183
55
Table A1 Pitch Features Result of All WUW
Figure A1 Distribution and Cumulative plot of pitch feature, APW_AP1SBW
Figure A2 Distribution and Cumulative plot of pitch feature, AP1sSW_AP1SBW
Figure A3 Distribution and Cumulative plot of pitch feature, APW_APALL
Figure A4 Distribution and Cumulative plot of pitch feature, AP1sSW_APALL
Figure A5 Distribution and Cumulative plot of pitch feature, APW_APALLBW
Figure A6 Distribution and Cumulative plot of pitch feature, AP1sSW_APALL
Figure A7 Distribution and Cumulative plot of pitch feature, MaxP_MaxP1SBW
Figure A8 Distribution and Cumulative plot of pitch feature, MaxP1sSW_MaxP1SBW
Figure A9 Distribution and Cumulative plot of pitch feature, MaxPW_MaxPAll
Figure A10 Distribution and Cumulative plot of pitch feature, MaxP1sSW_MaxPAll
Figure A11 Distribution and Cumulative plot of pitch feature, MaxP1sSW_MaxPAllBW
Figure A12 Distribution and Cumulative plot of pitch feature, MaxPW_MaxPAllBW
WUW: Operator
WUW:OperatoFeature
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
APW_AP1SBW
268
122
46
0
0
146
54
AP1sSW_AP1SBW
268
113
42
0
0
155
58
APW_APALL
461
184
40
0
0
277
60
AP1sSW_APALL
461
182
39
0
0
279
61
APW_APALLBW
455
187
41
0
0
268
59
AP1sSW_APALL
455
179
39
0
0
276
61
MaxP_MaxP1SBW
268
155
58
12
4
101
38
MaxP1sSW_MaxP1SBW
268
94
35
8
3
166
62
MaxPW_MaxPAll
461
192
42
27
6
240
52
MaxP1sSW_MaxPAll
461
144
31
48
10
269
58
MaxP1sSW_MaxPAllBW
455
209
46
27
6
219
48
MaxPW_MaxPAllBW
455
179
33
0
12
276
55
Table A2 Pitch Features Result of WUW Operator
Figure A13 Distribution and Cumulative plot of pitch feature, APW_AP1SBW
Figure A14 Distribution and Cumulative plot of pitch feature, AP1sSW_AP1SBW
Figure A15 Distribution and Cumulative plot of pitch feature, APW_APALL
Figure A16 Distribution and Cumulative plot of pitch feature, AP1sSW_APALL
Figure A17 Distribution and Cumulative plot of pitch feature, APW_APALLBW
Figure A18 Distribution and Cumulative plot of pitch feature, AP1sSW_APALL
Figure A19 Distribution and Cumulative plot of pitch feature, MaxP_MaxP1SBW
Figure A20 Distribution and Cumulative plot of pitch feature, MaxP1sSW_MaxP1SBW
Figure A21 Distribution and Cumulative plot of pitch feature, MaxPW_MaxPAll
Figure A22 Distribution and Cumulative plot of pitch feature, MaxP1sSW_MaxPAll
Figure A23 Distribution and Cumulative plot of pitch feature, MaxP1sSW_MaxPAllBW
Figure A24 Distribution and Cumulative plot of pitch feature, MaxPW_MaxPAllBW
WUW: Wildfire
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
APW_AP1SBW
266
111
42
0
0
155
58
AP1sSW_AP1SBW
266
132
50
0
0
134
50
APW_APALL
323
70
22
0
0
253
78
AP1sSW_APALL
323
89
28
0
0
234
72
APW_APALLBW
297
73
25
0
0
224
75
AP1sSW_APALL
297
97
33
0
0
200
67
MaxP_MaxP1SBW
266
175
66
12
5
79
30
MaxP1sSW_MaxP1SBW
266
141
53
12
5
113
42
MaxPW_MaxPAll
323
84
26
9
3
230
71
MaxP1sSW_MaxPAll
323
54
17
11
3
258
80
MaxP1sSW_MaxPAllBW
297
79
27
9
3
209
70
MaxPW_MaxPAllBW
297
97
18
0
0
200
79
Table A3 Pitch Features Result of WUW Wildfire
Figure A25 Distribution and Cumulative plot of pitch feature, APW_AP1SBW, WUW: Wildfire
Figure A26 Distribution and Cumulative plot of pitch feature, AP1sSW_AP1SBW, WUW: Wildfire
Figure A27 Distribution and Cumulative plot of pitch feature, APW_APALL, WUW: Wildfire
Figure A28 Distribution and Cumulative plot of pitch feature, AP1sSW_APALL, WUW: Wildfire
Figure A29 Distribution and Cumulative plot of pitch feature, APW_APALLBW, WUW: Wildfire
Figure A30 Distribution and Cumulative plot of pitch feature, AP1sSW_APALL, WUW: Wildfire
Figure A31 Distribution and Cumulative plot of pitch feature, MaxP_MaxP1SBW, WUW: Wildfire
Figure A32 Distribution and Cumulative plot of pitch feature, MaxP1sSW_MaxP1SBW, WUW: Wildfire
Figure A33 Distribution and Cumulative plot of pitch feature, MaxPW_MaxPAll, WUW: Wildfire
Figure A34 Distribution and Cumulative plot of pitch feature, MaxP1sSW_MaxPAll, WUW: Wildfire
Figure A35 Distribution and Cumulative plot of pitch feature, MaxP1sSW_MaxPAllBW, WUW: Wildfire
Figure A36 Distribution and Cumulative plot of pitch feature, MaxPW_MaxPAllBW, WUW: Wildfire
Energy Feature Experimental Result
WUW: All WUWs
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
1479
1164
79
0
0
315
21
AE1sSW_AE1SBW
1479
1283
84
1
0
240
16
AEW_AEAll
2175
1059
49
9
9
1116
51
AE1sSW_AEAll
2175
1155
53
2
0
1018
47
AEW_AEAllBW
1969
1427
72
0
0
542
28
AE1sSW_AEAllBW
1969
1562
79
3
0
404
21
MaxEW_MaxE1SBW
1479
1244
84
20
1
215
15
MaxE1sSW_MaxEAllBW
1479
1221
83
13
1
245
17
MaxEW_MaxEAll
2175
1373
63
13
1
245
17
MaxE1sSW_MaxEAll
2175
1336
61
25
1
814
37
MaxE1sSW_MaxEAllBW
1969
1209
61
16
1
744
38
MaxEW_MaxEAllBW
1969
1562
60
3
1
404
39
Table B1 Energy Features Result of All WUW
Figure B1 Distribution and Cumulative plot of energy feature, the average energy of WUW
Figure B2 Distribution and Cumulative plot of energy feature, the average energy of WUW
Figure B3 Distribution and Cumulative plot of energy feature, the average energy of WUW
Figure B4 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW
Figure B5 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW
Figure B6 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW
Figure B7 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW
Figure B8 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW
Figure B9 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW
Figure B10 Distribution and Cumulative plot of energy feature, the maximum energy of the first section in WUW
Figure B11 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW
Figure B12 Distribution and Cumulative plot of energy feature, the maximum energy of the first section in WUW
WUW: Operator
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
275
228
83
0
0
47
17
AE1sSW_AE1SBW
275
258
94
0
0
17
6
AEW_AEAll
418
248
59
0
0
170
41
AE1sSW_AEAll
418
290
69
1
0
127
30
AEW_AEAllBW
394
303
77
0
0
91
23
AE1sSW_AEAllBW
394
359
91
1
0
34
9
MaxEW_MaxE1SBW
275
240
87
1
0
34
12
MaxE1sSW_MaxEAllBW
275
243
88
0
0
32
12
MaxEW_MaxEAll
418
290
69
4
1
124
30
MaxE1sSW_MaxEAll
418
285
68
6
1
127
30
MaxE1sSW_MaxEAllBW
394
272
69
4
1
118
30
MaxEW_MaxEAllBW
394
359
68
1
1
34
30
Table B1 Energy Feature Result of WUW Operator
Figure B13 Distribution and Cumulative plot of energy feature, the average energy of WUW, Operator
Figure B14 Distribution and Cumulative plot of energy feature, the average energy of WUW, Operator
Figure B15 Distribution and Cumulative plot of energy feature, the average energy of WUW, Operator
Figure B16 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, Operator
Figure B17 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, Operator
Figure B18 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, Operator
Figure B19 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Operator
Figure B20 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Operator
Figure B21 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Operator
Figure B22 Distribution and Cumulative plot of energy feature, the maximum energy of the first section in WUW, Operator
Figure B23 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Operator
Figure B24 Distribution and Cumulative plot of energy feature, the maximum energy of the first section in WUW, Operator
WUW: ThinkEngine
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
293
182
62
0
0
111
38
AE1sSW_AE1SBW
293
194
66
1
0
98
33
AEW_AEAll
414
159
38
0
0
255
62
AE1sSW_AEAll
414
178
43
0
0
236
57
AEW_AEAllBW
388
201
52
0
0
187
48
AE1sSW_AEAllBW
388
229
59
1
0
158
42
MaxEW_MaxE1SBW
293
209
71
3
1
81
28
MaxE1sSW_MaxEAllBW
293
195
67
5
2
93
32
MaxEW_MaxEAll
414
197
48
3
1
214
52
MaxE1sSW_MaxEAll
414
186
45
2
0
226
55
MaxE1sSW_MaxEAllBW
388
180
46
3
1
205
53
MaxEW_MaxEAllBW
388
229
45
1
1
158
54
Table B2 Energy Feature Result of WUW ThinkEngine
Figure B25 Distribution and Cumulative plot of energy feature, the average energy of WUW, ThinkEngine
Figure B26 Distribution and Cumulative plot of energy feature, the average energy of WUW, ThinkEngine
Figure B27 Distribution and Cumulative plot of energy feature, the average energy of WUW, ThinkEngine
Figure B28 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, ThinkEngine
Figure B29 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, ThinkEngine
Figure B30 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, ThinkEngine
Figure B31 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, ThinkEngine
Figure B32 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, ThinkEngine
Figure B33 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, ThinkEngine
Figure B34 Distribution and Cumulative plot of energy feature, the maximum energy of the first section in WUW, ThinkEngine
Figure B35 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, ThinkEngine
Figure B36 Distribution and Cumulative plot of energy feature, the maximum energy of the first section in WUW, ThinkEngine
WUW: Onword
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
262
207
79
0
0
55
21
AE1sSW_AE1SBW
262
221
84
0
0
41
16
AEW_AEAll
435
215
49
0
0
220
51
AE1sSW_AEAll
435
226
52
0
0
209
48
AEW_AEAllBW
389
306
79
0
0
83
21
AE1sSW_AEAllBW
389
327
84
0
0
62
16
MaxEW_MaxE1SBW
262
228
87
5
2
29
11
MaxE1sSW_MaxEAllBW
262
226
86
3
1
33
13
MaxEW_MaxEAll
435
229
69
2
0
134
31
MaxE1sSW_MaxEAll
435
295
68
3
1
137
31
MaxE1sSW_MaxEAllBW
389
261
67
2
1
126
32
MaxEW_MaxEAllBW
389
327
66
0
1
62
33
Table B4 Energy Feature Result of WUW Operator
Figure B37 Distribution and Cumulative plot of energy feature, the average energy of WUW, Onword
Figure B38 Distribution and Cumulative plot of energy feature, the average energy of WUW, Onword
Figure B39 Distribution and Cumulative plot of energy feature, the average energy of WUW, Onword
Figure B40 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, Onword
Figure B41 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, Onword
Figure B42 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, Onword
Figure B43 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Onword
Figure B44 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Onword
Figure B45 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Onword
Figure B46 Distribution and Cumulative plot of energy feature, the maximum energy of the first section in WUW, Onword
Figure B47 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Onword
Figure B48 Distribution and Cumulative plot of energy feature, the maximum energy of the first section in WUW, Onword
WUW: Wildfire
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
282
253
90
0
0
29
10
AE1sSW_AE1SBW
282
261
93
0
0
21
7
AEW_AEAll
340
173
51
0
0
167
49
AE1sSW_AEAll
340
185
54
0
0
155
46
AEW_AEAllBW
298
252
85
0
0
46
15
AE1sSW_AEAllBW
298
265
89
0
0
33
11
MaxEW_MaxE1SBW
282
258
91
8
3
16
6
MaxE1sSW_MaxEAllBW
282
253
90
2
1
27
10
MaxEW_MaxEAll
340
230
68
4
1
106
31
MaxE1sSW_MaxEAll
340
219
64
4
1
117
34
MaxE1sSW_MaxEAllBW
298
195
65
4
1
99
33
MaxEW_MaxEAllBW
298
265
62
0
1
33
36
Table B5 Energy Feature Result of WUW Wildfire
Figure B49 Distribution and Cumulative plot of energy feature, the average energy of WUW, Wildfire
Figure B50 Distribution and Cumulative plot of energy feature, the average energy of WUW, Wildfire
Figure B51 Distribution and Cumulative plot of energy feature, the average energy of WUW, Wildfire
Figure B52 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, Wildfire
Figure B53 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, Wildfire
Figure B54 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, Wildfire
Figure B55 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Wildfire
Figure B56 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Wildfire
Figure B57 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Wildfire
Figure B58 Distribution and Cumulative plot of energy feature, the maximum energy of the first section in WUW, Wildfire
Figure B59 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Wildfire
Figure B60 Distribution and Cumulative plot of energy feature, the maximum energy of the first section in WUW, Wildfire
WUW: Voyager
Valid Data
Pt > 0
% > 0
Pt = 0
% = 0
Pt < 0
% < 0
AEW_AE1SBW
281
220
78
0
0
61
22
AE1sSW_AE1SBW
281
229
81
0
0
52
19
AEW_AEAll
361
149
41
0
0
212
59
AE1sSW_AEAll
361
161
45
1
0
199
55
AEW_AEAllBW
325
207
64
0
0
118
36
AE1sSW_AEAllBW
325
222
68
1
0
102
31
MaxEW_MaxE1SBW
281
234
83
2
1
45
16
MaxE1sSW_MaxEAllBW
281
231
82
2
1
48
17
MaxEW_MaxEAll
361
172
48
5
1
184
51
MaxE1sSW_MaxEAll
361
167
46
7
2
187
52
MaxE1sSW_MaxEAllBW
325
148
46
3
1
174
54
MaxEW_MaxEAllBW
325
222
44
1
1
102
55
Table B6 Energy Feature Result of WUW, Voyage
Figure B61 Distribution and Cumulative plot of energy feature, the average energy of WUW, Voyager
Figure B62 Distribution and Cumulative plot of energy feature, the average energy of WUW, Voyager
Figure B63 Distribution and Cumulative plot of energy feature, the average energy of WUW, Voyager
Figure B64 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, Voyager
Figure B65 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, Voyager
Figure B66 Distribution and Cumulative plot of energy feature, the average energy the first section in WUW, Voyager
Figure B67 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Voyager
Figure B68 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Voyager
Figure B69 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Voyager
Figure B70 Distribution and Cumulative plot of energy feature, the maximum energy of the first section in WUW, Voyager
Figure B71 Distribution and Cumulative plot of energy feature, the maximum energy of the WUW, Voyager
Figure B72 Distribution and Cumulative plot of energy feature, the maximum energy of the first section in WUW, Voyager
Works CitedAOAMedia.com. (n.d.). AoA Audio Extractor. Retrieved from AOAMedia.com.Bagshaw, P. C. (1994). Automatic prosodic analysis form computer aided pronunciation teaching.Campbell, M. (n.d.). Behind The Name. Retrieved from http://www.behindthename.com/Cutler, A. (1986). Forbear is a homophone: Lexical prosody does not constrain lexical access. Language and Speech. Hart, J. '. (1975). Integrating different levels of intonation analysis. Phonetics , pp. 309-327.Jurafsky, D., & Martin, J. H. (n.d.). Speech and Language Processing. An introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.Kpuska, V. C. (2006). Leading and Trailing Silence in Wake-Up-Word Speech Recognition. Industry, Engineering & Management Systems 2006. Cocoa Beach.Kpuska, V. WUWII Corpus. Lehiste, I. (1970). Supersegmental. Cambridge Massachusetts: The Massachusetts Institute Technology Press .Machine Intelligence Laboratory of the Cambridge University Engineering Department . (n.d.). HTK, The Hidden Markov Model Toolkit .Medan, .. Y. (1991). Super resolution pitch determination of speech signals. IEEE Trans. Signal Processing ASSP-39(1), 40-48 .Merriam-Webster Dictionary. (n.d.). Merriam-Webster Dictionary.Novamente LLC. (n.d.). RelEx Semantic Relationship Extractor. Retrieved from http://opencog.org/wiki/RelExPattarapong, R., Ramdhan, R., & Beharry, X. (2009). Sentence Parser Program.Pattarapong, R., Ronald, R., & Xerxes, B. (2009). Audio Parser Program.Phillips, M. (1985). A feature-based time domain pitch tracker. Journal f the Acoustical Society of America , 77, S9-S10(A).Temperlyey, D., Lafferty, J., & Sleator, D. (n.d.). CMU Link Grammar Parser.Tudor, K. B. (2007). Triple Scoring of Hidden Markov Models in Wake-Up-Word Speech Recognition.Veprek, P., & Scordilis, M. (2002). Analysis, Enhancement and Evaluation of Five Pitch Determination Techniques. Elsevier Science Journal of Speech Communication , 37, 249-270.William J. Hardcastle, J. L. (1997). The handbook of phonetic sciences. p. 640.Zuggy. (n.d.). SubRip.
UnvoicedECDHPSFBFTPPIFTASRFDeSRFD1814471045VoicedECDHPSFBFTPPIFTASRFDeSRFD2071316171511Gross0 Error HighECDHPSFBFTPPIFTASRFDeSRFD4520211Gross 0 Error LowECDHPSFBFTPPIFTASRFDeSRFD12712120.5UnvoicedECDHPSFBFTPPIFTASRFDeSRFD31.5193.56523VoicedECDHPSFBFTPPIFTASRFDeSRFD222112.51316127Gross0 Error HighECDHPSFBFTPPIFTASRFDeSRFD0.50.50.50.50.511Gross 0 Error LowECDHPSFBFTPPIFTASRFDeSRFD423.534.55.50
29
}
,...,
|
)
(
{
max
max
N
N
N
i
i
s
s
N
-
-
=
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-LSAE)/LSAE cumulative plot
(WUWAE-LSAE)/LSAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-AllAE)/AllAE cumulative plot
(WUWAE-AllAE)/AllAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW cumulative plot
(WUWAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-LSAE)/LSAE cumulative plot
(WUW1stAE-LSAE)/LSAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-AllAE)/AllAE cumulative plot
(WUW1stAE-AllAE)/AllAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW cumulative plot
(WUW1stAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-LSMAXE)/LSMAXE cumulative plot
(WUWMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUWMAXE-AllMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-LSMAXE)/LSMAXE cumulative plot
(WUW1stMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-LSAE)/LSAE cumulative plot
(WUWAE-LSAE)/LSAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-AllAE)/AllAE cumulative plot
(WUWAE-AllAE)/AllAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW cumulative plot
(WUWAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-LSAE)/LSAE cumulative plot
(WUW1stAE-LSAE)/LSAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-AllAE)/AllAE cumulative plot
(WUW1stAE-AllAE)/AllAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW cumulative plot
(WUW1stAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-LSMAXE)/LSMAXE cumulative plot
(WUWMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUWMAXE-AllMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-LSMAXE)/LSMAXE cumulative plot
(WUW1stMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-LSAE)/LSAE cumulative plot
(WUWAE-LSAE)/LSAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-AllAE)/AllAE cumulative plot
(WUWAE-AllAE)/AllAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW cumulative plot
(WUWAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-LSAE)/LSAE cumulative plot
(WUW1stAE-LSAE)/LSAE
%
}
,...
1
|
)
(
)
(
{
}
,...
1
|
)
(
)
(
{
}
,...
1
|
)
(
)
(
{
n
i
n
i
s
i
x
z
n
i
i
s
i
x
y
n
i
n
i
s
i
x
x
n
n
n
+
=
=
=
=
-
=
=
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-AllAE)/AllAE cumulative plot
(WUW1stAE-AllAE)/AllAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW cumulative plot
(WUW1stAE-AllAE
b
efore
W
UW)/AllAE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-LSMAXE)/LSMAXE cumulative plot
(WUWMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUWMAXE-AllMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-LSMAXE)/LSMAXE cumulative plot
(WUW1stMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-AllMAXE)/AllMAXE cumulative plot
(WUWMAXE-AllMAXE)/AllMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW cumulative plot
(WUW1stMAXE-AllAMAXE
b
efore
W
UW)/AllMAXE
b
efore
W
UW
%
max}
min
min
/
[
1
]
/
[
1
2
2
]
/
[
1
,
,...;
1
,
0
|
{
)
(
*
)
(
)
(
*
)
(
)
(
N
n
N
i
iL
N
n
jL
y
jL
x
jL
y
jL
x
n
P
L
n
j
L
n
j
L
n
j
y
x
+
=
=
=
=
=
88
.
0
=
srfd
T
)
'
(
'
85
.
0
,
75
.
0
max(
0
,
n
p
T
y
x
srfd
=
max}
min
min
/
[
1
]
/
[
1
2
2
]
/
[
1
,
,...;
1
,
0
|
{
)
(
*
)
(
)
(
*
)
(
)
(
N
n
N
i
iL
N
n
jL
y
jL
x
jL
y
jL
x
n
P
L
n
j
L
n
j
L
n
j
z
y
+
=
=
=
=
=
=
=
=
+
+
+
+
=
]
[
1
]
[
1
2
2
]
[
1
)
(
*
)
(
)
(
*
)
(
)
(
M
M
M
n
j
n
j
m
M
n
j
m
M
m
n
n
j
y
j
s
n
n
j
s
j
s
n
q
-
=
-
=
n
d
X
n
x
p
p
w
w
p
2
2
)
(
2
1
]
[
B
B
A
-
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAE-LSAE)/LSAE cumulative plot
(WUW1stAE-LSAE)/LSAE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUWMAXE-LSMAXE)/LSMAXE cumulative plot
(WUWMAXE-LSMAXE)/LSMAXE
%
0102030405060708090100
0
10
20
30
40
50
60
70
80
90
100
(WUW1stMAXE-LSMAXE)/LSMAXE cumulative plot
(WUW1stMAXE-LSMAXE)/LSMAXE
%
-1-0.500.511.522.533.54
0
50
100
150
200
250
(WUWAP-LSAP)/LSAP cumulative plot
(WUWAP-LSAP)/LSAP
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
(WUW1stAP-LSAP)/LSAP cumulative plot
(WUW1stAP-LSAP)/LSAP
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
300
350
400
(WUWAP-AllAP)/AllAP cumulative plot
(WUWAP-AllAP)/AllAP
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
300
350
(WUW1stAP-AllAP)/AllAP cumulative plot
(WUW1stAP-AllAP)/AllAP
%, No. of Data
-1-0.500.511.522.533.54
0
20
40
60
80
100
120
140
160
180
(WUWMAXP-LSMAXP)/LSMAXP cumulative plot
(WUWMAXP-LSMAXP)/LSMAXP
%, No. of Data
-1-0.500.511.522.533.54
0
20
40
60
80
100
120
140
160
180
200
(WUW1stMAXP-LSMAXP)/LSMAXP cumulative plot
(WUW1stMAXP-LSMAXP)/LSMAXP
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
300
350
(WUWMAXP-AllMAXP)/AllMAXP cumulative plot
(WUWMAXP-AllMAXP)/AllMAXP
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
300
(WUWMAXP-AllMAXP)/AllMAXP cumulative plot
(WUWMAXP-AllMAXP)/AllMAXP
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
300
350
400
(WUW1stMAXP-AllAMAXP
b
efore
W
UW)/AllMAXP
b
efore
W
UW cumulative plot
(WUW1stMAXP-AllAMAXP
b
efore
W
UW)/AllMAXP
b
efore
W
UW
%, No. of Data
-1-0.500.511.522.533.54
0
50
100
150
200
250
300
(WUWMAXP-AllMAXP
b
efore
W
UW)/AllMAXP
b
efore
W
UW cumulative plot
(WUWMAXP-AllMAXP
b
efore
W
UW)/AllMAXP
b
efore
W
UW
%, No. of Data
-1-0.500.511.522.533.54
0
10
20
30
40
50
60
70
80
90
100
(WUWAP-LSAP)/LSAP cumulative plot
(WUWAP-LSAP)/LSAP
%, No. of Data
-1-0.500.511.522.533.54
0
10
20
30
40
50
60
70
80
90
100
(WUW1stAP-LSAP)/LSAP cumulative plot
(WUW1stAP-LSAP)/LSAP
%, No. of Data
-1-0.500.511.522.533.54
0
10
20
30
40