mir

Post on 07-Jul-2015

94 Views

Category:

Technology

6 Downloads

Preview:

Click to see full reader

DESCRIPTION

Music Information Retrieval

TRANSCRIPT

Music Information Retrieval

Deema Aloum Noor Orfahly

Overview

Introduction

Music Document Retrieval

Emotion Detection

Overview

Introduction

Music Document Retrieval

Emotion Detection

What is MIR?

• Music Information Retrieval (MIR): the interdisciplinary science of retrieving information from music. MIR is a small but growing field of research with many real-world applications.

• Objective: make the world’s vast store of music accessible to all.

• The contributing disciplines: computer science, information retrieval, audio engineering, digital sound processing, musicology, library science, cognitive science, psychology, philosophy and law.

MIR Applications

Music Document Retrieval

Recommender System

Track Separation

Automatic Music

Transcription

Rights Managements

Emotion Detection

Music Terms - Pitch & Melody • Pitch is a particular frequency of sound

• E.g., 440 Hz

• Note is a named pitch by us humans. • E.g., Western music generally refers to the

440 Hz pitch as A, specifically A4

• Melody is A pattern of pitches

• Only a sound produced electronically can have only one pitch; all other sounds consist of multiple pitches.

• The mix of frequencies in a sound results in the Timbre

Music Terms - Timbre• In music

– The characteristic quality of sound produced by a particular instrument or voice; tone color.

• In acoustics and phonetics

– The characteristic quality of a sound, independent of pitch and loudness

– Depends on the relative strengths of its component frequencies;

– E.g, A4 on a guitar a sound composed of the following Freq: 440 Hz, 880 Hz, 1320 Hz, 1760 Hz, etc

Overview

Introduction

Music Document Retrieval

Emotion Detection

Music Document Retrieval

Music Identification Music Similarity

MDR - Music Identification

• Metadata-based Approach:

– Music identification relies on information about the content rather than the content itself.

– Ex. TOC

• Content-based Approach:

– Ex. Shazam Service

MDR - Music Identification - TOC

• TOC (Table Of Contents): a representation of the start positions and lengths of the tracks on the disc.

• This feature is highly specific, because it is extremely rare for different albums to share the same lengths of tracks in the same order.

• But, slight differences in the generation of CDs, even from the same source audio material, can produce different TOCs, which will then fail to match each other.

• Ex. freedb

MDR - Music Identification - Shazam

• Shazam:

a mobile app that recognizes music and TV around you. (it lets you record up to 15 seconds of the song you are hearing and then it will tell you everything you want to know about that song: the artist, the name of the song, the album, offer you links to YouTube or to buy the song on iTunes)

MDR - Music Identification - Shazam

The Initial Spectrogram

MDR - Music Identification - Shazam

• They will store only the intense sounds in the song, the time when they appear in the song and at which frequency.

The Simplified Spectrogram

MDR - Music Identification - Shazam

• To store this in the database in a way in which is efficient to search for a match (easy to index), they choose some of the points from within the simplified spectrogram (called “anchor points”) and zones in the vicinity of them (called “target zone”)

Pairing the anchor point with points in a target zone

MDR - Music Identification - Shazam

• For each point in the target zone, they will create a hash that will be the aggregation of the following:

– F1: the frequency at which the anchor point is located

– F2: the frequency at which the point in the target zone is located

– T2 - T1: the time difference between the time when the point in the target zone is located in the song (t2) and the time when the anchor point is located in the song (t1)

• 64-bit struct, 32 bits for the hash and 32 bits for the time offset and track ID.

MDR - Music Identification - Shazam

How do they find the song based on the recorded sample ?

• Repeat the same fingerprinting to the recorded sample.

• Each hash generated from the sample sound, will be searched for a match in the database.

• If a match is found you will have:– The time of the hash from the sample (th1)

– The time of the hash from the song in the database (th2)

• Draw a new graph called scatter graph.– The horizontal axis (X): th2

– The vertical axis (Y): th1

– The point of intersection of the two occurrence times (th1 and th2) will be marked with a small circle.

MDR - Music Identification - Shazam

• If the graph will contain a lot of pairs of th1‘s and th2‘s from the same song, a diagonal line will form.

Scatter graph of a matching run

MDR - Music Identification - Shazam

• Calculate a difference between th2 and th1 (dth) and they will plot it in a histogram.

• If there is a match in the graph plotted, then there will be a lot of dths with the same value.

Histogram of a matching run

MDR – Similarity Search

• The concept of similarity is less specific than identity.

• There are many different types of musical similarity.

– Two different performances played from the same notation

– Same composer

– Same function, for example dances

– Same genre

– Same culture

Query by Humming

QBH – Query Formatting

QBH – Query Comparision

• The elements in the database must have the same representation as the query.

• EX: Dynamic Time Warping

Dynamic Time Warping

QBH – Ranking evaluation measures

A. Mean Reciprocal Rank (MRR):

MRR = (1/3 + 1/2 + 1)/3 = 11/18 or about 0.61

QBH – Ranking evaluation measures

B. Top-X Hit Rate

• The position r of the correct result of the search is in the first X positions or not.

• Mathematically: r(Qi) ≤ X.

Overview

Introduction

Music Document Retrieval

Emotion Detection

Emotions

Emotions?

• Music is language of emotion.

• Users often want to listen to music that is in a certain category of emotions or they want to listen to music that brings them in a certain mood.

• What affect the mood of the song? – Harmony

– Timbre

– Interpretation

– lyrics

Challenging Problem !!

• Ambiguous

– Due to the ambiguities of human emotions.

– Different mood interpretation & perception between individuals

• Cross disciplinary endeavor

– Signal processing

– Machine learning

– Understanding of auditory perception, psychology, and music theory.

• Mood may change over its durations

Different Methods

Contextual text

information

• websites

• tags

• lyrics

Content-based

approaches

• audios

• images

• videos

combining multiple feature

domains

• Audio & Lyrics

• Audio & Tags

• Audio & Images (album covers, artist photos, etc.)

Contextual text information

• Web-Documents

– Artist biographies, album reviews, and song reviews are rich sources of information about music.

– Collect from the Internet by

• querying search engines

• monitoring MP3 blogs

• crawling a music website

– Can be noisy

Mood Representation

Categorical psychometrics

• A set of emotional descriptors (tags)

Scalar/dimensional psychometrics

• Mood can be scaled and measured by a continuum of descriptors or simple multidimensional metrics.

• Most noted: two dimensional Valence-Arousal (V-A) space

Hevner adjective circle

Valence-Arousal (V-A) space

Excited

Clated

Happy

Tense

Stressed

Upset

Arousal

Valiance

Sad

Depressed

Fatigued

Serene

Relaxed

Calm

Activation

Deactivation

PleasantUnpleasant

Valence-Arousal (V-A) space

• Simple, powerful way of thinking about the spectrum of human emotions.

• Both valence and arousal can be defined as subjective experiences (Russell, 1989).

– Valiance describes whether the emotion is positive or negative

– Arousal describes the level of alertness or energy involved in the emotion.

Emotion Recognition Problem

• Multiclass multi label classification or regression problem

• A music piece – an entire song

– a section of a song (e.g., chorus, verse)

– a fixed-length clip (e.g., 30-second song snipet)

– a short-term segment (e.g., 1 second )

Emotion Classification System

Mood representation - vectors

a single multi-dimensional vector

• Each dimension represents

• a single emotion (e.g., angry).

• or a bi-polar pair of emotions (e.g., positive/negative).

a time-series of vectors over a semantic space of emotions

• Track changes in emotional content over the duration of a piece

Mood Representation- Vector Values

• a binary label

– The presence or absence of the emotion

• a real-valued score

– e.g., Likert scale value

– Probability estimate

• A Likert scale is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research

Emotion Classification System

Annotation

• Labeling tasks are time consuming, tedious, and expensive

• Online games “Games With a Purpose”.

Features

Features

Timbre Features

• Musical instruments usually produce sound waves with frequencies

• The lowest frequency is

– The fundamental frequency f0

– Close relation with pitch

• The second and higher frequencies are– Called overtones

top related