Download - MIR
![Page 1: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/1.jpg)
Music Information Retrieval
Deema Aloum Noor Orfahly
![Page 2: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/2.jpg)
Overview
Introduction
Music Document Retrieval
Emotion Detection
![Page 3: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/3.jpg)
Overview
Introduction
Music Document Retrieval
Emotion Detection
![Page 4: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/4.jpg)
What is MIR?
• Music Information Retrieval (MIR): the interdisciplinary science of retrieving information from music. MIR is a small but growing field of research with many real-world applications.
• Objective: make the world’s vast store of music accessible to all.
• The contributing disciplines: computer science, information retrieval, audio engineering, digital sound processing, musicology, library science, cognitive science, psychology, philosophy and law.
![Page 5: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/5.jpg)
MIR Applications
Music Document Retrieval
Recommender System
Track Separation
Automatic Music
Transcription
Rights Managements
Emotion Detection
![Page 6: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/6.jpg)
Music Terms - Pitch & Melody • Pitch is a particular frequency of sound
• E.g., 440 Hz
• Note is a named pitch by us humans. • E.g., Western music generally refers to the
440 Hz pitch as A, specifically A4
• Melody is A pattern of pitches
• Only a sound produced electronically can have only one pitch; all other sounds consist of multiple pitches.
• The mix of frequencies in a sound results in the Timbre
![Page 7: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/7.jpg)
Music Terms - Timbre• In music
– The characteristic quality of sound produced by a particular instrument or voice; tone color.
• In acoustics and phonetics
– The characteristic quality of a sound, independent of pitch and loudness
– Depends on the relative strengths of its component frequencies;
– E.g, A4 on a guitar a sound composed of the following Freq: 440 Hz, 880 Hz, 1320 Hz, 1760 Hz, etc
![Page 8: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/8.jpg)
Overview
Introduction
Music Document Retrieval
Emotion Detection
![Page 9: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/9.jpg)
Music Document Retrieval
Music Identification Music Similarity
![Page 10: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/10.jpg)
MDR - Music Identification
• Metadata-based Approach:
– Music identification relies on information about the content rather than the content itself.
– Ex. TOC
• Content-based Approach:
– Ex. Shazam Service
![Page 11: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/11.jpg)
MDR - Music Identification - TOC
• TOC (Table Of Contents): a representation of the start positions and lengths of the tracks on the disc.
• This feature is highly specific, because it is extremely rare for different albums to share the same lengths of tracks in the same order.
• But, slight differences in the generation of CDs, even from the same source audio material, can produce different TOCs, which will then fail to match each other.
• Ex. freedb
![Page 12: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/12.jpg)
MDR - Music Identification - Shazam
• Shazam:
a mobile app that recognizes music and TV around you. (it lets you record up to 15 seconds of the song you are hearing and then it will tell you everything you want to know about that song: the artist, the name of the song, the album, offer you links to YouTube or to buy the song on iTunes)
![Page 13: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/13.jpg)
MDR - Music Identification - Shazam
The Initial Spectrogram
![Page 14: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/14.jpg)
MDR - Music Identification - Shazam
• They will store only the intense sounds in the song, the time when they appear in the song and at which frequency.
The Simplified Spectrogram
![Page 15: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/15.jpg)
MDR - Music Identification - Shazam
• To store this in the database in a way in which is efficient to search for a match (easy to index), they choose some of the points from within the simplified spectrogram (called “anchor points”) and zones in the vicinity of them (called “target zone”)
Pairing the anchor point with points in a target zone
![Page 16: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/16.jpg)
MDR - Music Identification - Shazam
• For each point in the target zone, they will create a hash that will be the aggregation of the following:
– F1: the frequency at which the anchor point is located
– F2: the frequency at which the point in the target zone is located
– T2 - T1: the time difference between the time when the point in the target zone is located in the song (t2) and the time when the anchor point is located in the song (t1)
• 64-bit struct, 32 bits for the hash and 32 bits for the time offset and track ID.
![Page 17: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/17.jpg)
MDR - Music Identification - Shazam
How do they find the song based on the recorded sample ?
• Repeat the same fingerprinting to the recorded sample.
• Each hash generated from the sample sound, will be searched for a match in the database.
• If a match is found you will have:– The time of the hash from the sample (th1)
– The time of the hash from the song in the database (th2)
• Draw a new graph called scatter graph.– The horizontal axis (X): th2
– The vertical axis (Y): th1
– The point of intersection of the two occurrence times (th1 and th2) will be marked with a small circle.
![Page 18: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/18.jpg)
MDR - Music Identification - Shazam
• If the graph will contain a lot of pairs of th1‘s and th2‘s from the same song, a diagonal line will form.
Scatter graph of a matching run
![Page 19: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/19.jpg)
MDR - Music Identification - Shazam
• Calculate a difference between th2 and th1 (dth) and they will plot it in a histogram.
• If there is a match in the graph plotted, then there will be a lot of dths with the same value.
Histogram of a matching run
![Page 20: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/20.jpg)
MDR – Similarity Search
• The concept of similarity is less specific than identity.
• There are many different types of musical similarity.
– Two different performances played from the same notation
– Same composer
– Same function, for example dances
– Same genre
– Same culture
![Page 21: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/21.jpg)
Query by Humming
![Page 22: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/22.jpg)
QBH – Query Formatting
![Page 23: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/23.jpg)
QBH – Query Comparision
• The elements in the database must have the same representation as the query.
• EX: Dynamic Time Warping
![Page 24: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/24.jpg)
Dynamic Time Warping
![Page 25: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/25.jpg)
QBH – Ranking evaluation measures
A. Mean Reciprocal Rank (MRR):
MRR = (1/3 + 1/2 + 1)/3 = 11/18 or about 0.61
![Page 26: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/26.jpg)
QBH – Ranking evaluation measures
B. Top-X Hit Rate
• The position r of the correct result of the search is in the first X positions or not.
• Mathematically: r(Qi) ≤ X.
![Page 27: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/27.jpg)
Overview
Introduction
Music Document Retrieval
Emotion Detection
![Page 28: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/28.jpg)
Emotions
![Page 29: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/29.jpg)
Emotions?
• Music is language of emotion.
• Users often want to listen to music that is in a certain category of emotions or they want to listen to music that brings them in a certain mood.
• What affect the mood of the song? – Harmony
– Timbre
– Interpretation
– lyrics
![Page 30: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/30.jpg)
Challenging Problem !!
• Ambiguous
– Due to the ambiguities of human emotions.
– Different mood interpretation & perception between individuals
• Cross disciplinary endeavor
– Signal processing
– Machine learning
– Understanding of auditory perception, psychology, and music theory.
• Mood may change over its durations
![Page 31: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/31.jpg)
Different Methods
Contextual text
information
• websites
• tags
• lyrics
Content-based
approaches
• audios
• images
• videos
combining multiple feature
domains
• Audio & Lyrics
• Audio & Tags
• Audio & Images (album covers, artist photos, etc.)
![Page 32: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/32.jpg)
Contextual text information
• Web-Documents
– Artist biographies, album reviews, and song reviews are rich sources of information about music.
– Collect from the Internet by
• querying search engines
• monitoring MP3 blogs
• crawling a music website
– Can be noisy
![Page 33: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/33.jpg)
Mood Representation
Categorical psychometrics
• A set of emotional descriptors (tags)
Scalar/dimensional psychometrics
• Mood can be scaled and measured by a continuum of descriptors or simple multidimensional metrics.
• Most noted: two dimensional Valence-Arousal (V-A) space
![Page 34: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/34.jpg)
Hevner adjective circle
![Page 35: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/35.jpg)
Valence-Arousal (V-A) space
Excited
Clated
Happy
Tense
Stressed
Upset
Arousal
Valiance
Sad
Depressed
Fatigued
Serene
Relaxed
Calm
Activation
Deactivation
PleasantUnpleasant
![Page 36: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/36.jpg)
Valence-Arousal (V-A) space
• Simple, powerful way of thinking about the spectrum of human emotions.
• Both valence and arousal can be defined as subjective experiences (Russell, 1989).
– Valiance describes whether the emotion is positive or negative
– Arousal describes the level of alertness or energy involved in the emotion.
![Page 37: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/37.jpg)
Emotion Recognition Problem
• Multiclass multi label classification or regression problem
• A music piece – an entire song
– a section of a song (e.g., chorus, verse)
– a fixed-length clip (e.g., 30-second song snipet)
– a short-term segment (e.g., 1 second )
![Page 38: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/38.jpg)
Emotion Classification System
![Page 39: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/39.jpg)
Mood representation - vectors
a single multi-dimensional vector
• Each dimension represents
• a single emotion (e.g., angry).
• or a bi-polar pair of emotions (e.g., positive/negative).
a time-series of vectors over a semantic space of emotions
• Track changes in emotional content over the duration of a piece
![Page 40: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/40.jpg)
Mood Representation- Vector Values
• a binary label
– The presence or absence of the emotion
• a real-valued score
– e.g., Likert scale value
– Probability estimate
• A Likert scale is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research
![Page 41: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/41.jpg)
Emotion Classification System
![Page 42: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/42.jpg)
Annotation
• Labeling tasks are time consuming, tedious, and expensive
• Online games “Games With a Purpose”.
![Page 43: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/43.jpg)
Features
![Page 44: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/44.jpg)
Features
![Page 45: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/45.jpg)
Timbre Features
• Musical instruments usually produce sound waves with frequencies
• The lowest frequency is
– The fundamental frequency f0
– Close relation with pitch
• The second and higher frequencies are– Called overtones
![Page 46: MIR](https://reader031.vdocuments.us/reader031/viewer/2022020218/559bfe191a28ab59668b4856/html5/thumbnails/46.jpg)