semantic video tool - project assignment presentation
TRANSCRIPT
Semantic Video ToolIRTM - Assignment Presentation
28th May 2015 Daniele Di Mitri
SVT in a nutshell
• video directory
• RESTful web application
• NLP semantic video analyzer
• video search engine
• analytics
Using the
video
transcripts!
Corpus
• 1391 video transcripts from TEDTalks
• from the 6 top categories
– technology, business, design, entertainement, science, global issues
• + related metainfo
– no. comments, no. views, category, length, etc.
Why TEDtalks?
• several documents
• equal length
• very good English
NLP operations
• Common NLP operations (NLTK)– Tokenization (punctuation)
– POS Tagging
– Stemming
– Chunking
– Frequent monogram, bigram, trigrams
• Automatic summaries (in 2 sentences)
• TF-IDF based search (powered by scikit-learn)
• Popular video classification
• Anaphora resolution
TF-IDF based
reccomending system
TED
SVT
Why is Monica Lewinsky popular?
comments views
datediff(now,dateup)Log
Popularity Rate =
• Idea: mark as «popular» docs with rate>15 and «unpopular» the rest
• HOW: pipeline SVM & TF-IDF to classify (work in progress)
Anaphora handling
• RegExp (\b(?![a-zA-Z]{2}\s)\w+|')+
Demo!