thematic alignment of static documents with meeting dialogs dalila mekhaldi diva group department of...

19
Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

Upload: agate-mounier

Post on 04-Apr-2015

102 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

Thematic Alignment of Static Documents with Meeting Dialogs

Dalila Mekhaldi

Diva GroupDepartment of Computer Science

University of Fribourg

Page 2: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

Outline

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Thematic Segmentation Alignments Grouping Conclusion & Perspectives

Page 3: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

Introduction

In document-centric meetings (lectures, teleconferencing, press reviews, etc.):

Static documents are present Should be integrated in a common multimedia archive Need to build links between documents and other media

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Page 4: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

Document Alignments

Several way to link static documents with other meeting data:

Document/Image alignment Document/Speech alignment

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Page 5: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

Document/Speech Alignment

Links static data (documents) to temporal data (audio). Enriches the documents with temporal indexes and thematic

links. Helps:

Building document-based browsing interfaces. Improving documents search and retrieval.

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Page 6: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

3 alignment categories Thematic: lexical similarity of document/speech parts Quotation of a document part Reference to a document part

Document/Speech Alignment

Text decomposition into segmentsDocument

Logical Syntactic

Speech transcript Turns Utterances

Document

LogicalSyntactic Utterances

Turns

Speech Transcript

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Page 7: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

<Turn id=”1”> … <utterance id="3" StartTime="47.429" EndTime="61.062" speaker="spk2">

Alors euh.. mardi 15 juillet, hier, euh.. la commission d'enquête parlementaire en France a rendu un rapport euh..sur la gestion des entreprises.. entreprises publiques. </ utterance > <utterance id="4" StartTime="61.062" EndTime="71.806" speaker="spk1">

Euh.. Très critique sur la gestion de France Telecom et d'EDF, leurs politiques d'acquisitions ont été menées sans que les moyens humains, ... </ utterance > …</Turn>

Speech Transcript

<sentence id="77">Rendu public mardi 15 juillet, le rapport de la commission d'enquete parlementaire sur la gestion des entreprises publiques, presidee par Philippe Douste-Blazy, secretaire general de l'UMP.</sentence >

<sentence id="78">Tres critique sur la gestion de France Telecom et d'EDF - leurs politiques d'acquisitions ont ete menees sans que les moyens humains, techniques, financiers aient ete adaptes en consequence ..</sentence >…

Similarity based

matching

Thematic Alignment

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Page 8: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

similaritiesS1’

S2’

S3’

Speech transcript segments

S1

S2

Document segments

Similarity based matching Vectors of weighted terms:

S1V1={t1, t2,..}; S1’V1’={t1’, t2’,..} a. Stop-words removing, Stemming b. Similarity metrics between units

Jaccard = |V1 V1’| / |V1 V1’| Dice = 2 × |V1 V1’| / |V1| + |V1’| Cosine =|V1 V1’| / |V1| |V1’|

Two strategies: One-best and multiple alignments

Thematic Alignment

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Page 9: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

One-best Alignment Evaluation

Precision: N3/ N2 Recall: N3/ N1

Precision

0

0.2

0.4

0.6

0.8

1

sent/utt utt/sent turn/logic

Cosine

Dice

Jaccard

Recall

0

0.2

0.4

0.6

0.8

1

sent/utt utt/sent turn/logic

Cosine

Dice

Jaccard

Improve the similarity metrics with a semantic dictionary

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Manual ground truth for 8 meetings• N1: alignments to found (manual )• N2: alignments found (automatic) => • N3: correct alignments found (automatic)

Page 10: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

MeetingThematic

Segmentation

Doc/SpeechThematic

Alignment

Multiple Alignments Evaluation

A1

A2

A3

A4

A5

<Thematic_Segment id=” S1”> …

<utterance id="3" StartTime="47.429" EndTime="61.062" speaker="spk1"> Alors euh.. mardi 15 juillet, hier, euh.. la commission d'enquête parlementaire en France a rendu un rapport euh..sur la gestion des entreprises.. entreprises publiques. </utterance >

<utterance id="4" StartTime="61.062" EndTime="71.806" speaker="spk1">Euh.. en gros, ça dit que le modèle français des entreprises publiques ne répond plus aux nouvelles exi.. exigences internationales et européenne. </utterance > …

</Thematic_Segment > …

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Page 11: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

Thematic alignment (e.g sentences/utterances) • Alignability arcs• Similarity weights

nodes nodes size

Sentences

utterances

10

9

12

13

11

3

4

7

8

5

77 78 79 80 81 84 85

Similarity value

The most connected sub-graphs

Thematic regions

Documentsentences

Speech utterances

84

85

91

10

9

12

13

14

11

(84, 9)0.42

80

773

4

781

8

785

79

(91, 8)0.25

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Meeting Thematic Segmentation

Page 12: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

MeetingThematic

Segmentation

Doc/SpeechThematic

Alignment

MeetingThematic

Segmentation

Doc/SpeechThematic

Alignment

Document sentences

Speech utterances

a. Bi-graph representation of the multiple alignment pairs.

Meeting Themes

b. Densest regions extraction (using clustering)

A1 A2 A3 A4 A5

S1

S2

S3

S4

S5

c. Segments extraction (clusters projection)

Meeting Thematic Segmentation

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Page 13: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

Thematic Segmentation Evaluation

• Manual ground-truth for 22 meetings 1. Speech: 2 main sets

• Stereotyped: 2.7 utterances/turn (ratio>2)• Non-stereotyped: 1.3 utterances/turn (ratio<=2)

2. Documents: 2 main sets• Mono-document• Multi-documents

• Comparison with 2 mono-modal methods: Texttiling, Baseline Speech Baseline: turn-based segmentation Documents Baseline : reflexive alignment/clustering

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Page 14: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

• Pk (Beeferman) metric 0 for a perfect segmentation.

Bi-modalTexttilingBaseline

a. Speech b. Documents

Thematic Segmentation Evaluation

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Stereotyped Non-stereotyped

Stereotyped Non-stereotyped

Mono-document Multi-documents

Meetings

Pk

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Stereotyped Non-stereotyped Meetings

Pk

Page 15: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

Our bi-modal method outperforms standard mono-modal methods:

Analysis

bridges the gaps between documents and speech transcript

detects the similar segments

DocumentA1 A2 A3 A4 A5

S1

S2

S3

S4

S5

A1 A2 A3 A4 A5

S1

S2

S3

S4

S5

A1 A2 A3 A4 A5

S1

S2

S3

S4

S5

Documents greatly help structuring meetings

more precise in computing the segments number

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Page 16: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

Alignments Grouping

1. Implementation of a framework that:

a. Combine the various levels, to correct the false alignments pairs, e.g.

(sentences x utterances) & (logical blocks x turns)

Speech T1 T2

U1 U2 U3

Document L1 L2

S1 S2

<Turn> <Them_Align with Logic> <utterances> <utt>

<Them_Align with Sent> <Quotations with Sent> <References with Logic>

b. Combine the 3 alignments categories (Thematic, Quotations and References) to improve the document/speech alignment

<Logic> <Them_Align with Turns> <sentences> <Sent>

<Them_Align with utt>

Speech Document

Introduction Thematic Alignment

One-best Alignment Multiple Alignment

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Page 17: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

2. A tool for the visualization that: Highlights the alignment

categories (Thematic, Quotations, References)

Represent the various structures of the documents/speech as Layers.

Alignments Grouping

Introduction Thematic Alignment

One-best Alignment Multiple Alignment

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Speech

Document

Page 18: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

Conclusion

Thematic Alignment of documents with meeting dialog Is a solution for integrating static documents into

multimedia archives:• Conference• Lectures, etc.

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives

Page 19: Thematic Alignment of Static Documents with Meeting Dialogs Dalila Mekhaldi Diva Group Department of Computer Science University of Fribourg

Perspectives

Automatic transcription of the speech Generalize the alignment on other:

documents types with few text (e.g. slides, agenda) meeting kinds where documents are discussed

irregularly (e.g. conferences)

Introduction Thematic Alignment

One-best Alignment Multiple Alignments

o Meeting Segmentation Alignments Grouping Conclusion & Perspectives