using search engines for classification: does it still work?

39
USING SEARCH ENGINES FOR CLASSIFICATION: DOES IT STILL WORK? Sten Govaerts, Nik Corthaut, Erik Duval

Upload: sten-govaerts

Post on 06-May-2015

404 views

Category:

Education


0 download

DESCRIPTION

My presentation at the adMIRe workshop on ISM 2009 in San Diego. The presentation is about our study on the use of search engines to classify genres.

TRANSCRIPT

Page 1: Using search engines for classification: does it still work?

USING SEARCH ENGINES FOR CLASSIFICATION: DOES IT

STILL WORK?Sten Govaerts, Nik Corthaut, Erik Duval

Page 2: Using search engines for classification: does it still work?

•Our problem

•Classification using search engines

•The setup

•The evaluation

•Conclusion

Page 3: Using search engines for classification: does it still work?

TUNIFY

Page 4: Using search engines for classification: does it still work?

TUNIFY

Page 5: Using search engines for classification: does it still work?

TUNIFY

Page 6: Using search engines for classification: does it still work?

HOW DOES IT WORK?

• manually annotated metadata

• 5 music experts at Aristo Music and different consultants

• almost 80,000 songs

• but, not enough...

Page 7: Using search engines for classification: does it still work?

PROBLEMS

• satisfying the music choice of all customers

• retail and catering differ from you and me!

• new markets

• react fast on emerging music trends

• adding the full Belgian library catalog

Page 8: Using search engines for classification: does it still work?

GENERATE THE METADATA

• from different sources:

• the audio signal• web sources• the Aristo database• attention metadata

• using our metadata generation framework: SamgI

Page 9: Using search engines for classification: does it still work?

GENRE...

• our master thesis looked at different ways to generate genre...

Page 10: Using search engines for classification: does it still work?

ONE APPROACH...

• M. Schedl, T. Pohle, P. Knees, G. Widmer, “Assigning and Visualizing Music Genres by Web-based Co-occurrence Analysis”, Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 260-265.

• G. Geleijnse, J. Korst, "Web-based Artist Categorization", Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 266 - 271.

Page 11: Using search engines for classification: does it still work?

CLASSIFICATION WITH SEARCH ENGINES

using co-occurrence

Page 12: Using search engines for classification: does it still work?

CLASSIFICATION WITH SEARCH ENGINES

using co-occurrence

Page 13: Using search engines for classification: does it still work?

CLASSIFICATION WITH SEARCH ENGINES

Artist + Genre + Schema

using co-occurrence

Page 14: Using search engines for classification: does it still work?

CLASSIFICATION WITH SEARCH ENGINES

Artist + Genre + Schema

using co-occurrence

Page 15: Using search engines for classification: does it still work?

CLASSIFICATION WITH SEARCH ENGINES

Artist + Genre + Schema

using co-occurrence

Page 16: Using search engines for classification: does it still work?

CLASSIFICATION WITH SEARCH ENGINES

Artist + Genre + Schema

using co-occurrence

Page 17: Using search engines for classification: does it still work?
Page 18: Using search engines for classification: does it still work?

Rock:

Blues:

Country:

Jazz:

Pop:

Metal:

Page 19: Using search engines for classification: does it still work?

Rock:

Blues:

Country:

Jazz:

Pop:

Metal:

0,013

0,009

0,013

0,005

0,0150,009

Page 20: Using search engines for classification: does it still work?

RESULTS

• master thesis student’s results were much worse

• what happened?

• did Google search result count change?

• has Google Search API different results?

• is the student’s implementation correct?

Page 21: Using search engines for classification: does it still work?

HOW TO EVALUATE THIS?

• re-run the original experiment

• evaluate on the same data set: 1995 artists and 9 genres.

• different search engines: Google, Yahoo! and Live! Search.

• over time: 8 times over a period of 36 days.

Page 22: Using search engines for classification: does it still work?

THE DATA SET

Blues Country ElectronicFolk Jazz MetalRap Reggae RnB

Page 23: Using search engines for classification: does it still work?

THE DATA SET

9%

12%

5%4%

41%

13%

2%3%10%

Blues Country ElectronicFolk Jazz MetalRap Reggae RnB

Page 24: Using search engines for classification: does it still work?

THE DATA SET

Blues Country ElectronicFolk Jazz MetalRap Reggae RnB

Page 25: Using search engines for classification: does it still work?
Page 26: Using search engines for classification: does it still work?

MOTION CHART

• http://hmdb.cs.kuleuven.be/muzik/gapminder.html

Page 27: Using search engines for classification: does it still work?
Page 28: Using search engines for classification: does it still work?
Page 29: Using search engines for classification: does it still work?

MORE FINE-GRAINED...

• 18 artists

• more search engines: Google.co.uk/.fr/.be, uk/fr.search.yahoo.com

• twice a day for 53 days

• 250,000 queries!

Page 30: Using search engines for classification: does it still work?

2 Pac Rap

Alan Lomax Folk

Art Pepper Jazz

Cradle of Filth Metal

David Parsons Electronic

Desmond Dekker Reggae

Downpour Metal

IceT Rap

Jerry Butler RnB

Joy Lynn White Country

Louisiana Red Blues

Lou Rawls RnB

LTJ Bukem Electronic

Peter Tosh Reggae

Pinetop Smith Jazz

Robert Johnson Blues

Roy Rogers Country

Steeleye Span Folk

Page 31: Using search engines for classification: does it still work?
Page 32: Using search engines for classification: does it still work?

MAIN SEARCH ENGINE RESULTS

Page 33: Using search engines for classification: does it still work?

REGIONAL GOOGLES

Page 34: Using search engines for classification: does it still work?
Page 35: Using search engines for classification: does it still work?

WHAT TO USE?

• use Google when it’s stable else rely on Yahoo!

• when is it stable? test with a small set

• some artists get classified incorrectly on bad days

• compare the accuracy achieved with the test set to the average.

Page 36: Using search engines for classification: does it still work?

CONCLUSION

• still works after 3 years

• Google -> Yahoo! -> Live! Search

• why does Google fluctuate?

• a generic version of an all purpose classifier is implemented in metadata generation framework

Page 37: Using search engines for classification: does it still work?

FUTURE WORK

• understand the performance differences of regional search engines

• use alternative search engines

• tweak the genre taxonomy depending on the search engine

Page 38: Using search engines for classification: does it still work?

Q & A.

Page 39: Using search engines for classification: does it still work?

DEMO METADATA GENERATION

• http://ariadne.cs.kuleuven.be/samgi-service/