![Page 1: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/1.jpg)
FeedMe - a semantic RSS aggregator
Nikola Ljubešić, Damir Boras, Mislav Cimperšak, Marija Tkalec
Faculty of Humanities and Social SciencesUniversity of Zagreb
08. lipnja 2010.
![Page 2: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/2.jpg)
Overview
1. The basic idea
2. Our system
3. Statistical analysis of collected data
4. Usage examples
08. lipnja 2010.
![Page 3: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/3.jpg)
Overview
1. The basic idea
2. Our system
3. Statistical analysis of collected data
4. Usage examples
08. lipnja 2010.
![Page 4: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/4.jpg)
Aggregating news
• collecting news from different information sources as publishing them as a single source
• manual and automated
• automated - problem of repeating information - need for analysis and organization
08. lipnja 2010.
![Page 5: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/5.jpg)
Existing aggregators
• Google News
• EMM NewsExplorer
• MondoPress
08. lipnja 2010.
![Page 6: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/6.jpg)
RSS
• RSS (Really Simple Syndication) - family of web feed formats used to publish frequently updated works
• XML file - readable by humans and machines
• RSS structured, (X)HTML nowadays still not - easier data harvesting through RSS
08. lipnja 2010.
![Page 7: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/7.jpg)
Google Reader
• on-line RSS aggregator
• problems
• loss of information
• repeating information
• unwanted information
08. lipnja 2010.
![Page 8: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/8.jpg)
Our idea
• collect RSS server-side - no loss of entries
• cluster RSS entries concerning their content - complex entries, no duplicates
• enable users to filter information - “affirmate” ie. “negate” specific feeds
08. lipnja 2010.
![Page 9: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/9.jpg)
Filtering
• publish only feed entries containing n or more original feed entries
• “affirmate” feeds - publishing only feed entries containing at least one original entry of all the “affirmative” feeds
• “negate” feeds - not publish feed entries containing any of the original entries from any negated feed
08. lipnja 2010.
![Page 10: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/10.jpg)
Overview
1. The basic idea
2. Our system
3. Statistical analysis of collected data
4. Usage examples
08. lipnja 2010.
![Page 11: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/11.jpg)
FeedMe
• back-end - collecting RSS entries on a half an hour basis and organizing them into clusters
• front-end - web application for
• creating groups of feeds (filtering - minimum elements, affirmating, negating)
• browsing the compiled groups
• publishing groups as new RSS feeds
08. lipnja 2010.
![Page 12: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/12.jpg)
08. lipnja 2010.
![Page 13: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/13.jpg)
Overview
1. The basic idea
2. Our system
3. Statistical analysis of collected data
4. Usage examples
08. lipnja 2010.
![Page 14: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/14.jpg)
The collected data
• 388 RSS feeds
• 38 different portals
• collected from 2010-05-10
• more than 100.000 entries
• cca. 30.000 clusters
08. lipnja 2010.
![Page 15: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/15.jpg)
Distribution of documents regarding the cluster size
0
0,20
0,40
0,60
0,80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
08. lipnja 2010.
![Page 16: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/16.jpg)
Portals publishing on “large” events (>2)Net.hr
Monitor.hrTportal.hr
Index.hrDnevnik.hrNacional.hr
Jutarnji.hrHRT.hr
24sata.hrVecernji.hr
SlobodnaDalmacija.hrRTL.hr
0 20 40 60 80
16
19
24
27
30
45
49
54
64
66
68
77
08. lipnja 2010.
![Page 17: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/17.jpg)
Portals publishing new stories first
Index.hrNet.hr
Monitor.hrDnevnik.hrNacional.hrTportal.hrJutarnji.hr
Vecernji.hrHRT.hr
SlobodnaDalmacija.hr24sata.hr
RTL.hr
0 50 100 150 200
31
50
51
59
62
121
122
131
143
151
161
195
08. lipnja 2010.
![Page 18: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/18.jpg)
Portals publishing new stories first (normalized by portal size)
Tportal.hrJutarnji.hr
Net.hrHRT.hr
Vecernji.hrNacional.hrDnevnik.hrMonitor.hr
RTL.hrIndex.hr
24sata.hrSlobodnaDalmacija.hr
0 0,10 0,20 0,29 0,39
0,31
0,31
0,31
0,32
0,32
0,32
0,32
0,34
0,35
0,38
0,38
0,39
08. lipnja 2010.
![Page 19: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/19.jpg)
Plagiates?Tportal.hr
Dnevnik.hr
Nacional.hr
Net.hr
Jutarnji.hr
Index.hr
Monitor.hr
SlobodnaDalmacija.hr
HRT.hr
0 0,08 0,15 0,23 0,30
0,01
0,01
0,01
0,01
0,02
0,03
0,06
0,09
0,24
08. lipnja 2010.
![Page 20: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/20.jpg)
Overview
1. The basic idea
2. Our system
3. Statistical analysis of collected data
4. Usage examples
08. lipnja 2010.
![Page 21: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/21.jpg)
Filtering by minimum number of elements
08. lipnja 2010.
![Page 22: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/22.jpg)
Filtering by affirmating feeds
08. lipnja 2010.
![Page 23: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/23.jpg)
Filtering by negating feeds
08. lipnja 2010.
![Page 24: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/24.jpg)
Future steps
• user-defined RSS sources
• full-text news portals
• different sources - social networks
• topic tracking
• named entity identification
• sentiment analysis and mining
08. lipnja 2010.
![Page 25: FeedMe - a semantic RSS aggregatorbib.irb.hr/datoteka/507911.ljubesic10-feedme.pdf · Existing aggregators • Google News • EMM NewsExplorer • MondoPress 08. lipnja 2010. RSS](https://reader036.vdocuments.us/reader036/viewer/2022081616/600c7daba2cd915be24db5ff/html5/thumbnails/25.jpg)
Thank you! Questions?
08. lipnja 2010.