![Page 1: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/1.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Automatic Text Segmentation:Text Relationship Map (Salton 1996)
Ing. Leonardo RigutiniDipartimento di Ingegneria dell’Informazione
Università di SienaVia Roma 53
53100 – SIENA – [email protected]
![Page 2: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/2.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Text Relationship map
• Salton 1996
• Vector space model:• Di=(di1 , di2 , … , dit )
• dik = peso del termine Tk nel documento Di
• Sim( Di , Dj ) = dik x djk
• Sim viene normalizzata in modo da (0,1)
• Una volta calcolate le similitudini si costruisce la mappa
![Page 3: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/3.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Text Relationship map
Figure 1: Text Relationship Map: articoli di enciclopedia riguardanti l’energia termo-nucleare
1183017012
17016
1919922387
8907
0.57
0.38
0.49
0.50
0.23
0.09
0.54
0.33
0.24
Link under 0.01 ignored
![Page 4: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/4.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Nodi e archi
• Importanza di un nodo correlata al numero di archi incidenti:• Un nodo centrale è caratterizzato da un grande
numero di archi
• Grafo altamente connesso:• Molti nodi importanti• Trattazione dell’argomento omogenea
• Grafo debolmente connesso:• Nodi importanti sparsi• Piu’ argomenti separati (poca omogeneita’)• Trattazione cronologica, geografica ecc ...
![Page 5: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/5.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Grafo altamente connesso
![Page 6: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/6.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Grafo scarsamente connesso
![Page 7: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/7.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Automatic Text Decomposition
• Studio delle relazioni tra i nodi del grafo
• Due tipi di analisi:
• Segmentiunita’ di testo (nodi) omogenee e contigue, altamente connesse tra loro e poco connesse con i restanti nodi del grafo.
• Tematicheunita’ di testo semanticamente omogenee senza vincoli di adiacenza.
![Page 8: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/8.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Text Segments - 1
• Trovare gap nelle connessioni tra paragrafi adiacenti
• Vengono eliminati i collegamenti tra nodi distanti oltre un certo k (Salton pone k=5)
![Page 9: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/9.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Text Segments - 2
• Non e’ garantita la coerenza del tema trattato
• Molti argomenti possono essere trattati in maniera non lineare
Per cercare coerenza bisogna rilassare il vincolo di adiacenza e considerare tutti i
collegamenti esistenti
Text Theme
![Page 10: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/10.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Text Themes
• Si considerano i triangoli presenti nel grafo• triangolo = insieme di tre nodi mutualmente
correlati
• Ogni triangolo e’ rappresentato da • un vettore centroide Ci=(N1,N4,N8) dove Nk e’ il nodo
k
• un valore Si che e’ la media dei vettori del triangolo
• Fusione dei centroidi:• I triangoli vengono fusi quando la similitudine tra
coppie di centroidi supera una determinata soglia• Il processo si ripete fino a che nessuna fusione e’
possibile
![Page 11: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/11.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Text Themes - es
![Page 12: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/12.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Relazioni tra segmenti e temi
• E’ possibile calcolare gradi di similitudine:
• segment-segmentinformazioni sulla struttura del documento (figura 7)
• theme-themeinformazioni sulla centralita’ di alcune tematiche e sulla particolarita’ di altre (figura 8)
• theme-segmenttipo di documento: – singolo tema trattato sotto piu’ punti di vista– piu’ temi scorrelati– un tema centrale e vari paragrafi secondari]– ecc...
![Page 13: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/13.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Segment-segment
![Page 14: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/14.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Theme-theme
![Page 15: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/15.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Theme-segment
1. segmenti & temi abbastanza congruenti:• Tema sviluppato in maniera lineare• parti di testo abbastanza adiacenti • Es.
– articoli su un singolo argomento – articoli su piu’ argomenti abbastanza scorrelati e
trattati in maniera cronologica (relazione 1 a 1)– Temi trattati sotto piu’ punti di vista (T piu’ S)
2. temi e segmenti non congruenti• Argomento sospeso e ripreso in seguito• Es.
– Introduzione e succesive spiegazioni
![Page 16: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/16.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Es. singolo tema
![Page 17: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/17.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Es. Storie multiple
![Page 18: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/18.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Es. Tema scorrelato dal resto del documento
![Page 19: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/19.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Es. grande tema centrale e due piccoli approfondimenti
![Page 20: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/20.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Text retrieval
• Tecniche standard di recupero possono non essere quelle migliori
• Quando una query riguarda un tema discontinuo nel documento, il recupero di segmenti non e’ una buona soluzione, ma e’ meglio restituire un insieme di segmenti
• Quindi:• Per strutture semplici text segment • Per strutture complesse text theme
![Page 21: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/21.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Information retrieval: simple structure
![Page 22: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/22.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Information retrieval: simple structure
![Page 23: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/23.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Information retrieval: complex structure
![Page 24: Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map Automatic Text Segmentation: Text Relationship Map (Salton 1996) Ing. Leonardo](https://reader036.vdocuments.us/reader036/viewer/2022062701/5542eb4e497959361e8bc83f/html5/thumbnails/24.jpg)
Ing. Rigutini Leonardo – Automatic Text Segmentation: Text Relationship Map
Information retrieval: complex structure