2015 us music year-end report...

18
Group 9 CAO Hengrui CHEN Jiahang LAM Wai Kit

Upload: others

Post on 13-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

Group 9CAO HengruiCHEN JiahangLAM Wai Kit

Page 2: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

Pop Music Portal2015 US Music Year-End Report (Nielsen)

Page 3: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

Data Sources

Page 4: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

Web CrawlerLanguage Used: JavaScript, node.js

AdvantagesAJAX – Crawling multiple pages at the same time

node.js, jQuery, JSON

Page 5: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

DatabasePostgreSQL

AdvantagesRelational Database

NoSQL

Functions/Operators for JSON

Page 6: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

Database SchemaER diagram

Page 7: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

LimitationsQuality

Inconsistency

Incompleteness

Precision

QuantityLarge volume

Complex JSON structure

Page 8: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

Web InterfacePHP, HTML based

http://164.132.194.29:3098/page/index.php

Key featuresFull tables of information

Statistics shown in charts

Search function

Page 9: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

Database adminPostgreSQL admin page

http://164.132.194.29:3069/

Key features:Interface for engineers

SQL queries

Page 10: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

RSwoosh – IntroJava based library

User determined threshold

Key procedures:Divide data into subsets (R, R’)

Retrieve and compare records from subsets

Assign to closest entity set

Repeat until R is empty

Ref: http://ilpubs.stanford.edu:8090/859/1/2008-7.pdf

Page 11: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

RSwoosh – ResultBefore – 61878 records

After – 46772 records

Reduced by 24.4%

Page 12: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

Dedupe – IntroPython based library

Key procedures:Manual grouping

Training (Supervised learning)

Blocking map

Pairwise matching within block

Ref: http://dedupe.readthedocs.io/

Page 13: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

Dedupe – BlockingBlocking map

Page 14: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

ProblemsList of Arabic song names

sourate al munafiqun

sourat al dhuha

sourate athariyat - english translation

sourate al hujurat - hafs muratal

sourat al jumua

sourat al fajr

sourate al hashr

sourate al qalam - warch

Page 15: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

ProblemsList of classical music names

24 Préludes, Op.28: 11. in B major

24 Préludes, Op.28: 13. in F sharp major

24 Préludes, Op.28: 13. in F sharp major

24 Préludes, Op.28: 15. in D flat major ("Raindrop")

24 Préludes, Op.28: 15. in D flat major ("Raindrop")

24 Préludes, Op.28: 17. in A flat major

Page 16: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

MethodologySimple voting

Term frequency (merged track_name)

Aggregation

Spotify > last.fm > DiscogsSpotify Echonest – over 30m known songs, accessible by Spotify player

Last.fm – Many duplicate entries

Discogs – Inaccurate information but most completed

Page 17: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57
Page 18: 2015 US Music Year-End Report (Nielsen)home.cse.ust.hk/~leichen/courses/mscit6000d/notes/group9.pdf · EE4282 FYP Presentation Author: Kenneth LAM Created Date: 5/10/2016 4:44:57

Group 9CAO HengruiCHEN JiahangLAM Wai Kit