multimedia search engine michal krsek, uisk charles university at prague & cesnet ivan doležal,...

20
Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Upload: kathleen-bennett

Post on 24-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Multimedia search engine

Michal Krsek, UISK Charles University at Prague & CESNETIvan Doležal, CESNET

Michal Illich, Jyxo

Page 2: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Electronic Media

• TV & radio

• Organized in channels

• Zero democracy in programming (by channel management)

• Centralized production (big guys business)

Page 3: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Internet

• Not only web (audio/video and others)– remember archie.sura.net?

• IPTV / Live / Video on demand

• Navigation only via web

=> not easy to find specific program in A/V

Page 4: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Search options I

• Voice recognition– Language identification– Accents

• Video recognition– Text interpretation (bush vs. Bush)– Low video quality

Page 5: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Search options II

• Indexing of web pages– Yahoo! does (google bomb target)

Metadata– “Out of the band Metadata” (as in librarian

world)– Metadata in files (added during editing or

encoding)

Page 6: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Project description

• Started in 2003 (oh yes, one year before Truveo)

• “Google for audio and video on Internet”

• No support from content owners

• Modular concept

• Start with .cz Internet

Page 7: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Technical description I

• Crawler– Crawls web and collects addresses (URL)– Exports URL of multimedia files– Software written by Jyxo (Linux console app)

Page 8: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Technical description II

• Distiller– Imports addresses of multimedia files– Distills metadata (and makes XML files)– Makes screenshots (if video in file)– C# software and mplayer (windows apps)– Runs in distributed environment

Page 9: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Technical description III• Database

– Imports XML metadata files to full text DB – Responses back-end queries for web queries – And others fulltext things (i.e. language)

Page 10: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

www.yournamehere.

edu

crawlingCrawls webpages

Gets addressesFilter A/V adresses

distillation

Gets metadata from multimedia files

indexingsearch

Holds fulltext databaseProvides back end for querries

Page 11: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Distillation• Proces description

– Get URL from DB– Get metadata from file available at URL– Get screenshots at 1,30,50 sec – Save metadata & screenshot

Page 12: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Distillation• Use of win32 applications

– Native players (WMP, RP, Qt) for metadata– Mplayer for screenshots

• Takes average one minute– Slow servers/bandwidth– Streaming without fast fw

Page 13: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

DistillerGRID• <= need 16 years to distill 8.500.000 URLs• Ideal application for GRID computing

– Not need of real time response

– Huge amount of computing time needed

• Two ways to create GRID– Build dedicated system

– Use of current capacities

Page 14: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Computing machines• PC/Windows based• HW independent• Secure environment

– Security of hosting system

– Security of distillation process

• Well connected• Not needed to run 24x7• Easy to manage

Page 15: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Configuration• ~100 PCs in student labs • Running on demand during weekends• Virtual machines (MS VPC 2004) in hosting

system (Win XP)• Three different HW configurations • Peak rate about 5000 URLs per minute • SQL as background -> pull distribution of work

Page 16: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Actual status I• HW

– 20 crawlers– 2 servers for fulltext DB (<1.400 USD)– Distillation stations (X office PC)– Connected by 1 Gb/s to CESNET2 -> GEANT2

Page 17: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Actual status II• Database

– EU + .com, .edu– > 13.000.000 URLs– > 8.000.000 valid– > 2.800.000 with screenshots

Page 18: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Live show?

Page 19: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Want to test?• URLs

– http://multimedia.jyxo.cz – http://videoserver.cesnet.cz/videoarchiv_en.php

– For XML interface send me e-mail

Page 20: Multimedia search engine Michal Krsek, UISK Charles University at Prague & CESNET Ivan Doležal, CESNET Michal Illich, Jyxo

Questions ?Comments ?

Michal Krsek, [email protected] (academic service, cooperation)Michal Illich, [email protected] (business service)