![Page 1: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/1.jpg)
Information Extraction from Spoken Language
Dr Pierre DumouchelScientific Vice-President, CRIM
Full Professor, ÉTS
![Page 2: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/2.jpg)
PUT RAW DATA NOW and then LINK DATA
• http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
![Page 3: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/3.jpg)
PUT RAW DATA NOW
• Text• Data (numbers, statistics)• Data (audio, video)
![Page 4: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/4.jpg)
LINKED DATA
• Information is in the relationship between data• Find relationship between them
![Page 5: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/5.jpg)
IBM’s Watson and Jeopardy
![Page 6: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/6.jpg)
Proposal
• Information Extraction in radio and television documents– Industrial Partners:
• CEDROM Sni• Irosoft
– Universities and Research Center• CRIM• ÉTS• INRS-EMT• McGill
• NSERC Strategic Project Proposal
![Page 7: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/7.jpg)
Process Raw Audio Data
• Automatic Speech Recognition (ASR)• Parsing • Indexation
ASR Parsing Indexation
![Page 8: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/8.jpg)
Closed-captioning / Subtitling
VOICEWRITER
![Page 9: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/9.jpg)
Closed- captioning / Subtitling
• Done with the help of a VoiceWriter that:– Respeaks– Adds punctuation– Selects proper dictionary– Does not speak during advertising– Wraps up information when more than one
speakers speak in the same time or when the speech rate is too fast.
– Translates
![Page 10: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/10.jpg)
How to process raw audio data?
ASR Parsing Indexation
AudioDiarization
Speaker Diarization
Speaker Recognition
Speaker RolePunctuationStructural
SegmentationTopic
Recognition
![Page 11: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/11.jpg)
Audio Diarization
• Aims to segment an audio recording into acoustically homogeneous parts– Distinguish between speech and music– Distinguish between advertising and news
![Page 12: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/12.jpg)
Speaker diarization
• Aims to segment a speech signal into its speech turns
![Page 13: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/13.jpg)
Speaker Recognition
![Page 14: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/14.jpg)
Speaker Role
• In broadcast news speech, most speech is from anchors and reporters. The remaining is from excerpts from quotations or interviews and are referred as sound bites.
• Detecting speaker role is important to improve: – acoustice speech recognizer– information extraction
![Page 15: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/15.jpg)
Punctuation• Some language analysis tasks such as parsing
and entity extraction needs punctuations (dots and commas) in order to work properly.
![Page 16: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/16.jpg)
Structural Segmentation
• Sentence segmentation, paragraph segmentation, story segmentation are important features for speech understanding applications from parsing and information extraction at the basic level.
• This problem is absent in text processing but has to be solved in speech processing.
![Page 17: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/17.jpg)
Topic Spotting
• Aims to identify the topic of a speech signal. It is useful to adapt the different components of the system as well as to add metatag on a speech signal.
• Example: La belle ferme le voile– La: the, her– Belle: beautiful, beauty– Ferme: farm, closes– Le: the, his– Voile: veil, blocks the view– Two hypothetic translations:
• The veil is closed by the beauty• The beautiful farm blocks his view
![Page 18: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/18.jpg)
How to improve Information Extraction from speech?
By improving ASR Components
![Page 19: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/19.jpg)
Automatic Speech Recognizer
• Performance drops when• Out-of-vocabulary (Lexical models)• Multiple users (Acoustic models)• Multiple microphones (Acoustic models)• Multiple topics (Language models)• Cross-over talks (All models)
![Page 20: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/20.jpg)
How to improve Information Extraction from speech?
• More data are better data.• More similar data are better data. Similar in
terms of– Topic – Coming from the same time period. Specifically,
more recent.• Example: Japan
– Prediction of what will happen and who will speaks.
![Page 21: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/21.jpg)
More data are better data
• Use of the huge amount of web information• Use super computer infrastructure in order to
model it in a reasonable time:– Compute Canada infrastructure: CLUMEQ– Cluster of university computers
![Page 22: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/22.jpg)
More similar data are better data
• Exploiting redundancies in different media information:– Anchor speech is predominant.– Reporters often appear at specific times, day after
day– Advertisings appear (and repeat) near specific
time slot, day after day.– The same news is often reused from one media to
another.
![Page 23: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/23.jpg)
Exploiting redundancies in different media information
![Page 24: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/24.jpg)
Exploiting redundancies in different media information
![Page 25: Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS](https://reader035.vdocuments.us/reader035/viewer/2022062619/55181a21550346a7318b4790/html5/thumbnails/25.jpg)
And then ….
ASR Parsing Indexation
AudioDiarization
Speaker Diarization
Speaker Recognition
Speaker RolePunctuationStructural
SegmentationTopic
Recognition