mir

46
Music Information Retrieval Deema Aloum Noor Orfahly

Upload: noor-orfahly

Post on 07-Jul-2015

94 views

Category:

Technology


6 download

DESCRIPTION

Music Information Retrieval

TRANSCRIPT

Page 1: MIR

Music Information Retrieval

Deema Aloum Noor Orfahly

Page 2: MIR

Overview

Introduction

Music Document Retrieval

Emotion Detection

Page 3: MIR

Overview

Introduction

Music Document Retrieval

Emotion Detection

Page 4: MIR

What is MIR?

• Music Information Retrieval (MIR): the interdisciplinary science of retrieving information from music. MIR is a small but growing field of research with many real-world applications.

• Objective: make the world’s vast store of music accessible to all.

• The contributing disciplines: computer science, information retrieval, audio engineering, digital sound processing, musicology, library science, cognitive science, psychology, philosophy and law.

Page 5: MIR

MIR Applications

Music Document Retrieval

Recommender System

Track Separation

Automatic Music

Transcription

Rights Managements

Emotion Detection

Page 6: MIR

Music Terms - Pitch & Melody • Pitch is a particular frequency of sound

• E.g., 440 Hz

• Note is a named pitch by us humans. • E.g., Western music generally refers to the

440 Hz pitch as A, specifically A4

• Melody is A pattern of pitches

• Only a sound produced electronically can have only one pitch; all other sounds consist of multiple pitches.

• The mix of frequencies in a sound results in the Timbre

Page 7: MIR

Music Terms - Timbre• In music

– The characteristic quality of sound produced by a particular instrument or voice; tone color.

• In acoustics and phonetics

– The characteristic quality of a sound, independent of pitch and loudness

– Depends on the relative strengths of its component frequencies;

– E.g, A4 on a guitar a sound composed of the following Freq: 440 Hz, 880 Hz, 1320 Hz, 1760 Hz, etc

Page 8: MIR

Overview

Introduction

Music Document Retrieval

Emotion Detection

Page 9: MIR

Music Document Retrieval

Music Identification Music Similarity

Page 10: MIR

MDR - Music Identification

• Metadata-based Approach:

– Music identification relies on information about the content rather than the content itself.

– Ex. TOC

• Content-based Approach:

– Ex. Shazam Service

Page 11: MIR

MDR - Music Identification - TOC

• TOC (Table Of Contents): a representation of the start positions and lengths of the tracks on the disc.

• This feature is highly specific, because it is extremely rare for different albums to share the same lengths of tracks in the same order.

• But, slight differences in the generation of CDs, even from the same source audio material, can produce different TOCs, which will then fail to match each other.

• Ex. freedb

Page 12: MIR

MDR - Music Identification - Shazam

• Shazam:

a mobile app that recognizes music and TV around you. (it lets you record up to 15 seconds of the song you are hearing and then it will tell you everything you want to know about that song: the artist, the name of the song, the album, offer you links to YouTube or to buy the song on iTunes)

Page 13: MIR

MDR - Music Identification - Shazam

The Initial Spectrogram

Page 14: MIR

MDR - Music Identification - Shazam

• They will store only the intense sounds in the song, the time when they appear in the song and at which frequency.

The Simplified Spectrogram

Page 15: MIR

MDR - Music Identification - Shazam

• To store this in the database in a way in which is efficient to search for a match (easy to index), they choose some of the points from within the simplified spectrogram (called “anchor points”) and zones in the vicinity of them (called “target zone”)

Pairing the anchor point with points in a target zone

Page 16: MIR

MDR - Music Identification - Shazam

• For each point in the target zone, they will create a hash that will be the aggregation of the following:

– F1: the frequency at which the anchor point is located

– F2: the frequency at which the point in the target zone is located

– T2 - T1: the time difference between the time when the point in the target zone is located in the song (t2) and the time when the anchor point is located in the song (t1)

• 64-bit struct, 32 bits for the hash and 32 bits for the time offset and track ID.

Page 17: MIR

MDR - Music Identification - Shazam

How do they find the song based on the recorded sample ?

• Repeat the same fingerprinting to the recorded sample.

• Each hash generated from the sample sound, will be searched for a match in the database.

• If a match is found you will have:– The time of the hash from the sample (th1)

– The time of the hash from the song in the database (th2)

• Draw a new graph called scatter graph.– The horizontal axis (X): th2

– The vertical axis (Y): th1

– The point of intersection of the two occurrence times (th1 and th2) will be marked with a small circle.

Page 18: MIR

MDR - Music Identification - Shazam

• If the graph will contain a lot of pairs of th1‘s and th2‘s from the same song, a diagonal line will form.

Scatter graph of a matching run

Page 19: MIR

MDR - Music Identification - Shazam

• Calculate a difference between th2 and th1 (dth) and they will plot it in a histogram.

• If there is a match in the graph plotted, then there will be a lot of dths with the same value.

Histogram of a matching run

Page 20: MIR

MDR – Similarity Search

• The concept of similarity is less specific than identity.

• There are many different types of musical similarity.

– Two different performances played from the same notation

– Same composer

– Same function, for example dances

– Same genre

– Same culture

Page 21: MIR

Query by Humming

Page 22: MIR

QBH – Query Formatting

Page 23: MIR

QBH – Query Comparision

• The elements in the database must have the same representation as the query.

• EX: Dynamic Time Warping

Page 24: MIR

Dynamic Time Warping

Page 25: MIR

QBH – Ranking evaluation measures

A. Mean Reciprocal Rank (MRR):

MRR = (1/3 + 1/2 + 1)/3 = 11/18 or about 0.61

Page 26: MIR

QBH – Ranking evaluation measures

B. Top-X Hit Rate

• The position r of the correct result of the search is in the first X positions or not.

• Mathematically: r(Qi) ≤ X.

Page 27: MIR

Overview

Introduction

Music Document Retrieval

Emotion Detection

Page 28: MIR

Emotions

Page 29: MIR

Emotions?

• Music is language of emotion.

• Users often want to listen to music that is in a certain category of emotions or they want to listen to music that brings them in a certain mood.

• What affect the mood of the song? – Harmony

– Timbre

– Interpretation

– lyrics

Page 30: MIR

Challenging Problem !!

• Ambiguous

– Due to the ambiguities of human emotions.

– Different mood interpretation & perception between individuals

• Cross disciplinary endeavor

– Signal processing

– Machine learning

– Understanding of auditory perception, psychology, and music theory.

• Mood may change over its durations

Page 31: MIR

Different Methods

Contextual text

information

• websites

• tags

• lyrics

Content-based

approaches

• audios

• images

• videos

combining multiple feature

domains

• Audio & Lyrics

• Audio & Tags

• Audio & Images (album covers, artist photos, etc.)

Page 32: MIR

Contextual text information

• Web-Documents

– Artist biographies, album reviews, and song reviews are rich sources of information about music.

– Collect from the Internet by

• querying search engines

• monitoring MP3 blogs

• crawling a music website

– Can be noisy

Page 33: MIR

Mood Representation

Categorical psychometrics

• A set of emotional descriptors (tags)

Scalar/dimensional psychometrics

• Mood can be scaled and measured by a continuum of descriptors or simple multidimensional metrics.

• Most noted: two dimensional Valence-Arousal (V-A) space

Page 34: MIR

Hevner adjective circle

Page 35: MIR

Valence-Arousal (V-A) space

Excited

Clated

Happy

Tense

Stressed

Upset

Arousal

Valiance

Sad

Depressed

Fatigued

Serene

Relaxed

Calm

Activation

Deactivation

PleasantUnpleasant

Page 36: MIR

Valence-Arousal (V-A) space

• Simple, powerful way of thinking about the spectrum of human emotions.

• Both valence and arousal can be defined as subjective experiences (Russell, 1989).

– Valiance describes whether the emotion is positive or negative

– Arousal describes the level of alertness or energy involved in the emotion.

Page 37: MIR

Emotion Recognition Problem

• Multiclass multi label classification or regression problem

• A music piece – an entire song

– a section of a song (e.g., chorus, verse)

– a fixed-length clip (e.g., 30-second song snipet)

– a short-term segment (e.g., 1 second )

Page 38: MIR

Emotion Classification System

Page 39: MIR

Mood representation - vectors

a single multi-dimensional vector

• Each dimension represents

• a single emotion (e.g., angry).

• or a bi-polar pair of emotions (e.g., positive/negative).

a time-series of vectors over a semantic space of emotions

• Track changes in emotional content over the duration of a piece

Page 40: MIR

Mood Representation- Vector Values

• a binary label

– The presence or absence of the emotion

• a real-valued score

– e.g., Likert scale value

– Probability estimate

• A Likert scale is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research

Page 41: MIR

Emotion Classification System

Page 42: MIR

Annotation

• Labeling tasks are time consuming, tedious, and expensive

• Online games “Games With a Purpose”.

Page 43: MIR

Features

Page 44: MIR

Features

Page 45: MIR

Timbre Features

• Musical instruments usually produce sound waves with frequencies

• The lowest frequency is

– The fundamental frequency f0

– Close relation with pitch

• The second and higher frequencies are– Called overtones

Page 46: MIR