towards spoken knowledge structuring and...
TRANSCRIPT
![Page 1: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/1.jpg)
Speaker: Hung-yi Lee
Towards Spoken Knowledge
Structuring and Organization
When Speech Processing Technology
meets MOOCs
![Page 2: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/2.jpg)
Introduction
2012 is the year of the massive open online
course (MOOC)
Instructors post recorded video/audio of their lectures on
online lecture platforms.
Learners worldwide can easily access the curricula.
More learning materials
![Page 3: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/3.jpg)
Too much materials ……
Learner
I want to learn “SVM”.
![Page 4: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/4.jpg)
A course is too much
Learner
I want to learn “SVM”.
Machine Learning(by Andrew Ng)
XII. Support Vector
Machines (Week 7)
Learning From Data
(Yaser S. Abu-Mostafa)
Lecture 14: Support
Vector Machines
Lecture 15: Kernel
Methods
機器學習基石(by 林軒田)
![Page 5: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/5.jpg)
A course is not enough
Inter-discipline
Learner
I want to learn
“amino acid”.
Introduction to Biology
Amino acid
Amino acid
Organic Chemistry
![Page 6: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/6.jpg)
Vision: Personalized Courses
I want to learn “XXX”.
I am a graduate student of
computer science.
I can spend 10 hours.Learner
I open a course for you.
on-line learning
material
Spoken Language Processing techniques can be very
helpful.
The spoken content in courses plays the most
important role in conveying the knowledge.
![Page 7: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/7.jpg)
Outline
Part I: Overview each block in spoken knowledge
structuring and organization
Speech Recognition
Temporal Structure
Spoken content retrieval
Linking related lectures
Speech summarization
Knowledge graph construction
Inferring prerequisite and advanced concepts
Part II: Spoken Content Retrieval
Part III: Speech Summarization
Part IV: Demo
![Page 8: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/8.jpg)
Part I: Overview
![Page 9: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/9.jpg)
(Multilingual)
Speech
Recognition
Speech Recognition
Recognition
Output
Multimedia
(Audio/Video)
![Page 10: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/10.jpg)
Speech Recognition
Lectures on Coursera and edX has manual
transcriptions
Most lectures on the Internet do not have
transcriptions
Speech recognition!
![Page 11: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/11.jpg)
Speech Recognition
Speech
Recognition
Speech Recognition is the foundation of the following
speech techniques
Spoken Lectures
DNA is the material
of ……
Hidden Markov Model
N-gram Language Models
![Page 12: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/12.jpg)
Speech Recognition
Speech
Recognition
Speech Recognition is the foundation of the following
speech techniques
Spoken Lectures
DNA is the material
of ……
N-gram Language Models
Deep Neural Network
![Page 13: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/13.jpg)
Speech Recognition
Speech
Recognition
Speech Recognition is the foundation of the following
speech techniques
Spoken Lectures
DNA is the material
of ……
Deep Neural Network Continues Language Model
![Page 14: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/14.jpg)
(Multilingual)
Speech
Recognition
Multi-layer Temporal Structure
Recognition
Output
Multimedia
(Audio/Video)
Temporal Structure
![Page 15: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/15.jpg)
Multi-layer temporal structure
lectures
course
ch 1 ch 2 ……
Learner
Too long ……
![Page 16: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/16.jpg)
Multi-layer temporal structure
lectures
sections
paragraphs
course
ch 1 ch 2
Paragraph
1
Paragraph
2
Paragraph
3 ……
……
……
audio
Not Directly
Available
several
utterances
several
utterances
several
utterances
Find automatically
![Page 17: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/17.jpg)
(Multilingual)
Speech
Recognition
Spoken Content Retrieval
Spoken
Content
Retrieval
Recognition
Output
Multimedia
(Audio/Video)
Temporal Structure
![Page 18: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/18.jpg)
Basic goal: return paragraphs or sections containing
keywords
This is called “Spoken Term Detection” (口語詞彙偵測)
Spoken Content Retrieval – Goal
“DNA”
user
sections
Paragraph
1
Paragraph
2
Paragraph
3 ……
……
several
utterances
several
utterances
several
utterances
paragraphs
audio
![Page 19: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/19.jpg)
Basic goal: return paragraphs or sections containing
keywords
This is called “Spoken Term Detection” (口語詞彙偵測)
Advanced goal: Semantic retrieval of spoken content
Spoken Content Retrieval – Goal
user
“Inheritance
Material”
I know that the user is
looking for “DNA”.
Retrieval system
![Page 20: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/20.jpg)
(Multilingual)
Speech
Recognition
Visualizing Search Results
Spoken
Content
Retrieval
Recognition
Output
Multimedia
(Audio/Video)
Temporal Structure
Easy to browseLinking Similar
Lectures
![Page 21: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/21.jpg)
Learner
Search
With spoken content retrieval, we can use keywords
to search related lectures
Electronic
Textbooks
Course AInheritance
Lecture 8-2 Lecture 8-3 Lecture 8-4
Lecture 5-4 Lecture 5-5 Lecture 5-6
Course B
“DNA”
![Page 22: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/22.jpg)
DNA
ReplicationDNA
Structure
Learner
Linking
Linking lectures with similar content
Compute similarity between lectures in courses and sections in
textbooks
Merge the materials with high cosine similarity
Course AInheritance
Lecture 8-2 Lecture 8-3 Lecture 8-4
Lecture 5-5 Lecture 5-6
Course BLecture 5-4
“DNA”
![Page 23: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/23.jpg)
(Multilingual)
Speech
Recognition
Speech Summarization
Syntactic and Semantic
Analysis
Speech
Summarization
Spoken
Content
Retrieval
Recognition
Output
Multimedia
(Audio/Video)
Temporal Structure
Easy to browseLinking Similar
Lectures
![Page 24: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/24.jpg)
Speech Summarization
Audio is hard to browse
Lecture
SummarySelect the most informative segments to
form a compact version
40 minutes
1 minute
![Page 25: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/25.jpg)
(Multilingual)
Speech
Recognition
Knowledge Graph Construction
Syntactic and Semantic
Analysis
Speech
Summarization
Spoken
Content
Retrieval
Key Term
Extraction
Recognition
Output
Knowledge
Graph
Multimedia
(Audio/Video)
Temporal Structure
Easy to browseLinking Similar
Lectures
![Page 26: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/26.jpg)
Audio transcriptions of video clips
Knowledge Graph Construction
- Keyterm Extraction
Knowledge graph construction
Keyterm extraction
Keyterm
ExtractionDNA
Inheritance
RNA
Adenosine triphosphate
……
![Page 27: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/27.jpg)
Knowledge Graph Construction
- Relation Extraction
Knowledge graph construction
Keyterm extraction
Find relation between keyterms
Co-reference
Resolution
Transcriptions As DNA encodes RNA,
it is the material of inheritance.
![Page 28: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/28.jpg)
Syntactic
Parsing
Knowledge Graph Construction
- Relation Extraction
Knowledge graph construction
Keyterm extraction
Find relation between keyterms
Co-reference
Resolution
Transcriptions As DNA encodes RNA,
it is the material of inheritance.DNA
Syntactic
parsing tree
![Page 29: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/29.jpg)
Relation
Extraction
Syntactic
Parsing
Knowledge Graph Construction
- Relation Extraction
Knowledge graph construction
Keyterm extraction
Find relation between keyterms
Co-reference
Resolution
Transcriptions
relation
Syntactic
parsing tree
[Mausam, EMNLP’12].
![Page 30: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/30.jpg)
Knowledge Graph Construction
- Relation Extraction
Knowledge graph construction
Keyterm extraction
Find relation between keyterms
Knowledge
Graph
DNA
RNA
Protein
GenomeGene
Nucleotide
Inheritance
Knowledge Graph
![Page 31: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/31.jpg)
(Multilingual)
Speech
Recognition
Inferring Learning Path
Syntactic and Semantic
Analysis
Speech
Summarization
Spoken
Content
Retrieval
Key Term
Extraction
Recognition
Output
Knowledge
Graph
Multimedia
(Audio/Video)
Temporal Structure
Easy to browseInferring
Learning Path
Linking Similar
Lectures
![Page 32: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/32.jpg)
Inferring Learning Path
Inferring prerequisite and advanced concepts
Construct a knowledge graph
DNA
RNA
Protein
GenomeGene
Nucleotide
Inheritance
First mentioned
in Lecture 1
First mentioned
in Lecture 3
Analyze the positions where the concepts are mentioned
the first time in a course
First mentioned
in Lecture 10
![Page 33: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/33.jpg)
Inferring Learning Path
Inferring prerequisite and advanced concepts
Construct a knowledge graph
DNA
RNA
Protein
GenomeGene
Nucleotide
Inheritance
First mentioned
in Lecture 1
First mentioned
in Lecture 3
“Inheritance” is the prerequisite concept of “DNA”
“RNA” is the advanced concept of “DNA”
First mentioned
in Lecture 10
![Page 34: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/34.jpg)
Part II:
Spoken Content Retrieval
![Page 35: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/35.jpg)
People think ……
Spoken Content Retrieval
Speech Recognition
+
Text Retrieval=
![Page 36: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/36.jpg)
Speech Recognition + Text Retrieval
Speech
Recognition
Text
Retrieval
ResultText
Retrieval Query learner
Spoken Lectures
Models
Spoken Content Retrieval
= Speech Recognition + Text Retrieval
![Page 37: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/37.jpg)
Speech recognition has uncertainty
Speech Recognition + Text Retrieval
……….. DNA ...
x1
R(x1)=0.9
x2
R(x2)=0.3
x1 0.9
x2 0.3
…user
“DNA”
DNA ………….
![Page 38: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/38.jpg)
Is the problem solved?
Speech
Recognition
Text
Retrieval
ResultText
Retrieval Query learner
Spoken Lectures
Models
The retrieval performance seriously degrades with
inevitable recognition errors.
In real application, speech recognition accuracy can be
low.
![Page 39: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/39.jpg)
Is the problem solved?
Speech
Recognition
Text
Retrieval
ResultText
Retrieval Query learner
Spoken Lectures
Models
To make retrieval performance less limited by recognition
errors
We need new ideas beyond cascading speech recognition
and text retrieval.
![Page 40: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/40.jpg)
My point
Spoken Content Retrieval
Speech Recognition
+
Text Retrieval=
![Page 41: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/41.jpg)
Beyond Cascading Speech
Recognition and Text Retrieval
Incorporating Information Lost in Standard Speech
Recognition
Improving Recognition Models by User Relevance
feedback
Query Expansion with Speech Signals
Spoken Content Retrieval without Speech
Recognition
Interactive Retrieval
![Page 42: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/42.jpg)
Beyond Cascading Speech
Recognition and Text Retrieval
Incorporating Information Lost in Standard Speech
Recognition
Improving Recognition Models by User Relevance
feedback
Query Expansion with Speech Signals
Spoken Content Retrieval without Speech
Recognition
Interactive Retrieval
![Page 43: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/43.jpg)
Similarity
Is it truly “DNA”?
Recognition
ModelsAcoustic Models
Language Model
Lots of inaccurate assumption
Speech
Recognition
![Page 44: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/44.jpg)
Similarity
“DNA”
“DNA”
“DNA”
similarity
It is not realistic to find examples for all queries.
Use Pseudo-relevance Feedback (PRF)
Is it truly “DNA”?
![Page 45: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/45.jpg)
Retrieval
System
Pseudo Relevance Feedback (PRF)
Query Q
First-pass Retrieval Resultx1
x2 x3
Recognition
Output
![Page 46: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/46.jpg)
Retrieval
System
Pseudo Relevance Feedback (PRF)
Query Q
Confidence scores
R(x1)
First-pass Retrieval Resultx1
x2 x3
R(x2) R(x3)
Not shown to the user
Recognition
Output
![Page 47: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/47.jpg)
Retrieval
System
Pseudo Relevance Feedback (PRF)
Query Q
R(x1)
First-pass Retrieval Resultx1
x2 x3
R(x2) R(x3)
Assume the result with high confidence scores as correct
Examples of Q
Considered as examples of Q
Recognition
Output
![Page 48: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/48.jpg)
Retrieval
System
Pseudo Relevance Feedback (PRF)
Query Q
R(x1)
First-pass Retrieval Resultx1
x2 x3
R(x2) R(x3)
similar dissimilar
Examples of Q
Recognition
Output
![Page 49: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/49.jpg)
Retrieval
System
Pseudo Relevance Feedback (PRF)
Query Q
R(x1)
First-pass Retrieval Resultx1
x2 x3
R(x2) R(x3)
time 1:01
time 2:05
time 1:45
…
time 2:16
time 7:22
time 9:01
Rank according to new scores
Examples of Q
Recognition
Output
![Page 50: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/50.jpg)
How to compute the similarity of two audio
segments?
Similarity between Audio Segments
Use a feature vector to present a short time span.
similarity
… …
……
![Page 51: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/51.jpg)
How to compute the similarity of two audio
segments?
Similarity between Audio Segments
A audio segment is a sequence of feature vectors.
similarity
… … … … … … … … … … …
![Page 52: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/52.jpg)
Similarity between Audio Segments
Dynamic Time Warping (DTW)
… … … … … … …
…
…
…
…
…
…
![Page 53: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/53.jpg)
(A) (B)
Digital Speech Processing (DSP) of NTU based on lattices
Pseudo Relevance Feedback (PRF)
- Experiments
Evaluation Measure: MAP (Mean Average Precision)
![Page 54: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/54.jpg)
(A) (B)
Pseudo Relevance Feedback (PRF)
- Experiments
(B): speaker independent (50% recognition accuracy)
(A): speaker dependent (84% recognition accuracy)
(A) and (B) use different speech recognition systems
![Page 55: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/55.jpg)
(A) (B)
PRF (red bars) improved the first-pass retrieval
results with lattices (blue bars)
Pseudo Relevance Feedback (PRF)
- Experiments
![Page 56: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/56.jpg)
Graph-based Approach
In PRF, each result considers the similarity to some
examples
Consider the similarity between all results
Formulated as a problem on graph
![Page 57: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/57.jpg)
Graph Construction
The first-pass results is considered as a graph.
Each retrieval result is a node
First-pass Retrieval
Result from lattices
x1
x2
x3
x2
x3
x1
x4
x5
![Page 58: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/58.jpg)
Graph Construction
The first-pass results is considered as a graph.
Nodes are connected if their retrieval results are similar.
x2
x3
x1
x4
x5Dynamic Time Warping
(DTW) Similarity
similar
![Page 59: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/59.jpg)
Changing Confidence Scores by Graph
The score of each node depends on its neighbors.
x2
x3
x1
x4
x5
R(x1)
R(x2)
R(x3)
R(x5)R(x4)
high
high
近朱者赤
近墨者黑
![Page 60: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/60.jpg)
Changing Confidence Scores by Graph
The score of each node depends on its neighbors.
x2
x3
x1
x4
x5
R(x1)
R(x2)
R(x3)
R(x5)R(x4)
low
low
近朱者赤
近墨者黑
![Page 61: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/61.jpg)
The score of each node depends on its connected
nodes.
x2
x3
x1
x4
x5
Changing Confidence Scores by Graph
Score of x1 depends on
the scores of x2 and x3
![Page 62: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/62.jpg)
The score of each node depends on its connected
nodes.
x2
x3
x1
x4
x5
Changing Confidence Scores by Graph
Score of x1 depends on
the scores of x2 and x3
Score of x2 depends on
the scores of x1 and x3
![Page 63: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/63.jpg)
The score of each node depends on its connected
nodes.
x2
x3
x1
x4
x5
……
Changing Confidence Scores by Graph
Score of x1 depends on
the scores of x2 and x3
Score of x2 depends on
the scores of x1 and x3
The scores are found by random walk algorithm.
![Page 64: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/64.jpg)
Digital Speech Processing (DSP) of NTU based on lattices
(A) (B)
Graph-based Approach -
Experiments
(B): speaker independent (low recognition accuracy)
(A): speaker dependent (high recognition accuracy)
![Page 65: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/65.jpg)
(A) (B)
Graph-based re-ranking (green bars) outperformed PRF (red
bars)
Graph-based Approach -
Experiments
![Page 66: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/66.jpg)
Graph-based Approach –
Experiments on Babel Program
Join Babel program (巴別塔計畫) at MIT
Evaluation program of spoken term detection
More than 30 research groups divided into 4
teams
Spoken content to be retrieved are in special
languages
![Page 67: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/67.jpg)
Graph-based Approach –
Experiments on Babel Program
3 out of 4 teams used this approach
Speech recognition system is based on deep neutral networks
![Page 68: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/68.jpg)
New Directions for
Spoken Content Retrieval
Incorporating Information Lost in Standard Speech
Recognition
Improving Recognition Models by User Relevance
feedback
Query Expansion with Speech Signals
Spoken Content Retrieval without Speech
Recognition
Interactive Retrieval
![Page 69: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/69.jpg)
Online search engine optimizes performance by user
relevance feedback
E.g. click-through data [T Joachims, SIGKDD 02]
User Relevance Feedback
time 1:10 T
time 2:01 F
time 3:04 T
time 5:31 T
time 1:10 T
time 2:01 F
time 3:04 T
time 5:31 F
time 1:10 F
time 2:01 F
time 3:04 T
time 5:31 T
Query Q1 Query Q2 Query Qn
……
![Page 70: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/70.jpg)
Update Recognition Models
Speech
Recognition Models
TextText
Retrieval Query learner
Spoken
Content
Retrieval
Result
update
optimize
time 1:10
time 2:01
time 3:04
time 5:31 T
time 1:10 T
time 2:01 F
time 3:04
time 5:31
time 1:10 F
time 2:01
time 3:04
time 5:31
Query Q1 Query Q2 Query Qn
……
![Page 71: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/71.jpg)
New Directions for
Spoken Content Retrieval
Incorporating Information Lost in Standard Speech
Recognition
Improving Recognition Models by User Relevance
feedback
Query Expansion with Speech Signals
Spoken Content Retrieval without Speech
Recognition
Interactive Retrieval
![Page 72: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/72.jpg)
Query Expansion
for Text Retrieval
learner
“Inheritance material”
Retrieval
system
Search “Inheritance material” and “DNA”
To handle the problem of semantic retrieval,
retrieval system will expand the user query.
![Page 73: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/73.jpg)
Query Expansion
for Spoken Content Retrieval
learner
“Inheritance material”
Search “Inheritance material” and “DNA”
Expand the queries by speech signals
Speech signals
of “DNA”
Retrieval
system
![Page 74: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/74.jpg)
Query Expansion
for Spoken Content Retrieval
learner
“Inheritance material”
Search “Inheritance material” and “DNA”
Retrieval
system
Recognition
output (Text)
Speech signals
of “DNA”
Match on speech signal level
![Page 75: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/75.jpg)
New Directions for
Spoken Content Retrieval
Incorporating Information Lost in Standard Speech
Recognition
Improving Recognition Models by User Relevance
feedback
Query Expansion with Speech Signals
Spoken Content Retrieval without Speech
Recognition
Interactive Retrieval
![Page 76: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/76.jpg)
Spoken Content Retrieval without
Speech Recognition
Spoken Queries
Spoken
Lectures
user
“DNA”
Match on speech
signal level
![Page 77: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/77.jpg)
New Directions for
Spoken Content Retrieval
Incorporating Information Lost in Standard Speech
Recognition
Improving Recognition Models by User Relevance
feedback
Query Expansion with Speech Signals
Spoken Content Retrieval without Speech
Recognition
Interactive Retrieval
![Page 78: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/78.jpg)
Interactive Retrieval
Model the interactive retrieval process as
Markov Decision Process (MDP)
user
Deep Neural
Network
Find something
related to “speech”?
![Page 79: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/79.jpg)
Part III:
Speech Summarization
![Page 80: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/80.jpg)
MMR approach
Maximum marginal relevance (MMR)
approach
Unsupervised approach: Use heuristic rules to
select utterances
Select utterances whose content are similar to
the whole lectures
Minimize redundancy in summary at the same
time
![Page 81: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/81.jpg)
Supervised Approach
Training dataLecture 1
Lecture 2
Lecture 3
2nd and 4th utterances
form the summary
3rd utterances form the
summary
1st and 2nd utterances
form the summary
Use the training data to learn model
for summarization
![Page 82: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/82.jpg)
Supervised Approach
– Binary Classification
Summarization problem can be formulated as a
binary classification program
Included in the summary or not
utterance 1
utterance 2
utterance 3
utterance 4
Binary
Classifier-1
+1
+1
-1
utterance 2
utterance 3
classification
result
summary
Binary
ClassifierBinary
ClassifierBinary
Classifier
Lecture
![Page 83: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/83.jpg)
Supervised Approach
– Binary Classification
Training data
Lecture 1
Lecture 2
Lecture 3
The utterances in the
summary are positive
examples.
Otherwise, negative
examples
Train a binary classifier
![Page 84: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/84.jpg)
Supervised Approach
– Binary Classification
Binary classifier individually considers each utterance
To generate a good summary, “global information” should be
considered
Example: summary should be concise
大家好 ……
LSA就是 Latent semantic analysis
LSA用來強化 summarization
LSA可以用來強化 summarization
我再說一次
……
…….
LSA可以用來強化 summarization
Spoken DocumentSummary
More advanced machine learning techniques
![Page 85: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/85.jpg)
Globally considering the whole
spoken lectures
Learn a special model by structured learning
techniques
Input: whole lecture
Output: summary
Special
ModelLecture
Summary
Consider the
whole lecture
3 utterances
selected in
summary
![Page 86: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/86.jpg)
Evaluation Function
Evaluation function of utterance set F(s)
s: utterance set in a lecture
F(s) 10
評分utterance
set s
how suitable it is to
consider utterance set
s as the summary
Properties:
• Concise?
• Include
keyword?
• Short enough?
……….
How good it is to take this
utterance set as summary?
Lecture
![Page 87: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/87.jpg)
Evaluation Function
– How to summary
With F(s), we can do summarization on new
lectures now
Lecture
s1
s2
s3
s4
s5
s6
s7
Compute F(s) for
all utterance sets
If s6
maximizes
F(s)
summary
Enumerate all
the possible
utterance set s
![Page 88: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/88.jpg)
Evaluation Function
Evaluation function of utterance set F(s)
s: utterance set in a lecture
F(s) 10
評分utterance
set s
how suitable it is to
consider utterance set
s as the summary
Properties:
• Concise?
• Include
keyword?
• Short enough?
……….
How good it is to take this
utterance set as summary?
Lecture
What properties
should F(s) check?
![Page 89: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/89.jpg)
Learn F(s) from training data
Reference
summary
Reference
summary
Evaluation Function - Training
…
9
7
-4
highFind F(s) such that
lecture
Training data
F(s)
F(s)
F(s)
Structured SVM: I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support
Vector Learning for Interdependent and Structured Output Spaces, ICML, 2004.
![Page 90: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/90.jpg)
Speech Summarization - Structure
Temporal structure helps summarization
lectures
sections
paragraphs
ch 1
Paragraph
1
Paragraph
2
Paragraph
3 ……
……
audio
several
utterances
several
utterances
several
utterances
![Page 91: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/91.jpg)
Speech Summarization - Structure
Temporal structure helps summarization
Long summary: consecutive utterances in a
paragraph are more likely to be
Short summary: one utterance is selected on behalf
of a paragraph.
…𝑥𝑖+3𝑥𝑖−2 𝑥𝑖−1 𝑥𝑖+6𝑥𝑖+4 𝑥𝑖+5
…𝑥𝑖+3𝑥𝑖−2 𝑥𝑖+6𝑥𝑖+4
Important paragraph
Representative of the paragraph
𝑥𝑖+5𝑥𝑖 𝑥𝑖+1 𝑥𝑖+2𝑥𝑖−1
𝑥𝑖 𝑥𝑖+1 𝑥𝑖+2
Paragraph 1 Paragraph 2 Paragraph 3
Paragraph 1 Paragraph 2
![Page 92: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/92.jpg)
Evaluation Function - Structure
Add structure information into evaluation function
of utterance set F(s)
F(s) 100評分
Properties:
• Concise?
• Include
keyword?
• Short enough?
……….
utterances
Given the
information of
structure
Paragraph 1 Paragraph 2
![Page 93: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/93.jpg)
Speech Summarization - Structure
Structure in text are clear
Paragraph boundaries are directly known
For spoken content, there is no obvious
structure
Here the structure are considered as “hidden
variables”
Structured learning with hidden variables
![Page 94: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/94.jpg)
Speech Summarization -
Experiments
Evaluation Measure: ROUGE-1 and ROUGE-2
Larger scores means the machine-generated summaries
is more similar to human-generated summaries.
![Page 95: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/95.jpg)
Part IV:
Demo
![Page 96: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/96.jpg)
On-line lecture platforms (MIT)
“Cang-Jie (倉頡)”:
Search lecture recording and textbook
Linking video clips or textbook sections with similar
content
Inferring prerequisite and advanced concepts
http://people.csail.mit.edu/tlkagk/Cangjie/
![Page 97: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/97.jpg)
Concluding Remarks
![Page 98: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/98.jpg)
(Multilingual)
Speech
Recognition
Towards Spoken Knowledge
Structuring and Organization
Syntactic and Semantic
Analysis
Speech
Summarization
Spoken
Content
Retrieval
Key Term
Extraction
Recognition
Output
Knowledge
Graph
Multimedia
(Audio/Video)
Temporal Structure
Visualizing
Search ResultEasy to browse
Inferring
Learning Path
![Page 99: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/99.jpg)
Ultimate Goal
I want to learn “XXX”.
I am a graduate student of
computer science.
I can spend 10 hours.Learner
I open a course for you.
on-line learning
material
Personalized course for each learner
![Page 100: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/100.jpg)
Thank You for Your Attention
![Page 101: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/101.jpg)
Appendix
![Page 102: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/102.jpg)
Video Demonstration
![Page 103: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/103.jpg)
Paragraph Boundaries
With speech recognition, we know the content of
each utterances
Compute their similarities
Find the boundary of paragraph such that
The content of the utterances in a paragraph is
similar
Paragraph
1
Paragraph
2
Paragraph
3 ……paragraphs
audio
Paragraph
4
![Page 104: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/104.jpg)
Slide Boundaries
The slides are modeled as HMMs
Align the slides with paragraphs
Paragraph
1
Paragraph
2
Paragraph
3 ……paragraphs
audio
Paragraph
4
s1 s2
![Page 105: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/105.jpg)
Evaluation function F(s)
A good summary should
1. include the most important utterance
2. but minimize the redundancy at the same time
3. not too long
Utterance set s fulfill the above
requirement should have large F(s)
![Page 106: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/106.jpg)
Evaluation function F(s)
A good summary should
1. include the most important utterance
2. but minimize the redundancy at the same time
3. not too long
sx
i
i
xIsF
I(xi): importance of utterance xi
ii xfwxI f(xi): feature of sentence xi
![Page 107: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/107.jpg)
Evaluation function F(s,D)
A good summary should
1. include the most important utterance
2. but minimize the redundancy at the same time
3. not too long
Di sx
ixIsF
I(xi): importance of utterance xi
ii xfwxI f(xi): feature of sentence xi
Lexical feature: use speech recognition
to transcribe each utterance into text
similarity to the transcriptions of whole
lectures
latent topic distribution
how many keywords
……
Prosodic feature:
Energy, pitch, syllable duration,
pause duration
weights for each feature
![Page 108: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/108.jpg)
Evaluation function F(s)
A good summary should
1. include the most important utterance
2. but minimize the redundancy at the same time
3. not too long
λ is a parameter to be determined.
sx sxx
jii
i ji
xxSimxIsF,
,
Sim(xi, xj): similarity between utterances xi and xj
![Page 109: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/109.jpg)
Evaluation function F(s)
A good summary should
1. include the most important utterance
2. but minimize the redundancy at the same time
3. not too long
sx sxx
jii
i ji
xxSimxIsF,
,
sx
i
i
KxL(constraint)
L(xi): length of utterance xi
K: length constraint of summary
![Page 110: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/110.jpg)
Evaluation function F(s)
A good summary should
1. include the most important utterance
2. but minimize the redundancy at the same time
3. not too long
sx sxx
jii
i ji
xxSimxIsF,
,
sx
i
i
KxL(constraint)
ii xfwxI Jointly learn from
training data
![Page 111: Towards Spoken Knowledge Structuring and Organizationspeech.ee.ntu.edu.tw/~tlkagk/slide/MyTalk_NTUCS_v8.pdf · Spoken Language Processing techniques can be very helpful. ... Easy](https://reader030.vdocuments.us/reader030/viewer/2022040714/5e1bcc96df9bc544b9109ca8/html5/thumbnails/111.jpg)
Idea of Training
Training data
D1
D2
Reference
summaryR1
Reference
summaryR2
……
Find w and λ in F(s) such that
F(R1) > F(sD1)
F(R2) > F(sD2)
……
sD1 is all utterance set in D1, except R1
sD2 is all utterance set in D2, except R2
I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for
Interdependent and Structured Output Spaces, ICML, 2004.