gct731 fall 2014 topics in music technology - music information retrieval pitch detection and...

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Pitch Detection and Tracking Juhan Nam 1

Introduction Music is described with what? The majority of musical symbols are notes which mainly contains pitch information We (our brains) usually memorize music as a melody, that is, a sequence of pitches 2

Outlines Introduction Definition of Pitch Information in Pitch Pitch and Harmonicity Pitch Detection Algorithms Time-Domain Approaches Frequency-Domain Approaches Psychoacoustic Model Approaches Learning-based Approaches Pitch Tracking Applications 3

Definition of Pitch Pitch Defined as auditory attribute of sound according to which sounds can be ordered on a scale from low and high (ANSI, 1994) One way of measuring pitch is finding the frequency of a sine wave that is matched to the target sound in a psychophysical experiment thus, subject to individual persons: e.g. tone-deaf Fundamental Frequency Physical attribute of sounds measured from periodicity Often called F0 Thus, pitch should be discriminated from F0 However, they are very close for sounds of our interest (i.e. musical sounds). So pitch is often used mixed with F0 4

Information in pitch Music Melody or notes Harmony (when there are multiple notes with different pitches ) Size (or register) of musical instruments: Bass, Cello, violin Speech Person: gender, age, identity Context: question, mood, attitude, Meaning: Chinese (Mandarin) Others Vocalization of animals (e.g. birds chirp, whale): size and types, communication 5

Pitch and Harmonicity Not all sounds have pitch Harmonics sounds Regularly spaced harmonic partials Speech or Singing Voice: Vowel Musical Instruments: Piano*, Guitar, Strings, Woodwind, Brass, Organ Non-harmonic sounds No harmonic patterns or irregular harmonic partials Speech or Singing Voice: Consonant Musical Instruments: Drum, Mallet (has pitch but not harmonic) 6 *Inharmonicity in Piano Vibraphone [From Klapuris slides]

Pitch Detection Algorithms Taxonomy of Algorithms Time-Domain Approaches Frequency-Domain Approaches Psychoacoustic Model Approaches Learning-based Approaches 7

Time-Domain Approach Basic Ideas Periodicity: x(t) = x(t+T) Measure similarity (or distance) between two segments Find the period (T) that gives the closest distance Two main approaches Auto-correlation function (ACF): distance by inner product Average magnitude difference function(AMDF): distance by difference (e.g., L1, L2 norm) 8 T

Auto-Correlation Function (ACF) Measuring self-similarity by 9 Singing Voice (Sondhi 1967)

Biased auto-correlation Unbiased auto-correlation Auto-Correlation Function (ACF) 10

Comparison of spectrogram and ACF 11 Spectrogram (tracking max values) ACF (tracking max values)

Interpretation of ACF in Frequency Domain By convolution theorem, auto-correlation can be computed in frequency domain and also efficiently using FFT Thus, the ACF can be computed as 12

Interpretation of ACF in Frequency Domain This is equivalent to ACF is a simple template-based approach in frequency domain Positive weights for (harmonic) peaks and negative weights for valleys 13

Problems in ACF Bias to the large peak around zero lag Not robust to octave errors, particularly, lower octaves ACF is sensitive to amplitude changes Equal weights for all harmonic partials In general, low-numbered harmonic partials are more important in determining pitch 14

Average Magnitude Difference Function (AMDF) Measuring self-similarity by In YIN, p is set to 2 And the AMDF is normalized as 15 Minimize the negative ACF plus a lag-dependent term (de Cheveign & Kawahara, 2002)

Average Magnitude Difference Function (AMDF) 16 AMDF Normalized AMDF

Why YIN (AMDF) works better 17 Robust to changes in amplitude The difference takes care of amplitude changes. This reduces octave errors. Zero-lag bias is avoided by the normalized AMDF The normalized AMDF allows using a fixed threshold Can choose multiple candidates and refine peaks

Example of AMDF (YIN) 18

Frequency-Domain Approach Basic Ideas Periodic in time domain Harmonic in frequency domain Measure how harmonic the spectrum Find F0 that best explains the harmonic patterns (harmonic partials) Template matching Harmonic Sieve or Spectral Template Cepstrum Harmonic-Product-Sum (HPS) 19

Harmonic Sieves (or Comb-filtering) Using sharp harmonic sieves to take peak regions only ACF is similar to this but not sharp enough Sigmund~ (PD) and fiddle~ (MaxMSP) are based on weighted harmonics sieves 20 (Puckette et al. 1998)

Spectral Template Cross-correlation with an ideal template on a log-scale spectrogram 21 [From Ellis e4896 course slides]

Cepstrum Real Cepstrum is defined as Basic ideas Harmonic partials are periodic in frequency domain (Inverse) FFT find the the periodicity 22 Liftering (Noll, 1967)

Harmonic Product Sum (HPS) Harmonic Product Sum (HPS) is obtained by multiplying the original magnitude spectrum its decimated spectra by an integer number 23 (Noll, 1969)

Stabilize & Combine Auditory Model 24 input...... HC...... ACF Summary ACF Correlogram Correlogram is formed by concatenating the ACF of individual HC output Summary ACF is computed by summing the ACF across all channels The peaks in the ACF represent periodicity features This is known to be robust to band-limited noises

Example of Auditory Model 25 Summary ACF

Pitch Tracking Pitch is usually continuous over time Once a pitch with strong harmonicity is detected on a frame, the following frames form smooth pitch contour Pitch tracking methods Post processing: first detect pitch in a frame-by-frame manner and then find a continuous path by smoothing. Median Filtering Dynamic Programming (Talkin, 1995) Probabilistic approach: detect multiple pitch candidates every frame and and find the best path Viterbi-decoding: Probabilistic YIN (Mauch, 2014) 26

Issues and Challenges Voice activity detection (VAD) / singing voice detection (SVAD) Discriminate voice/unvoiced/silent frames Latency: real-time implementation The use of long windows results in slight delay Post-processing / Probabilistic approaches need larger delay Noisy environment Learning-based approaches: NMF or classifiers Active research topic Melody Transcription Pre-dominant Pitch Detection Singing Voice Separation + Pitch Tracking active research topic Polyphonic Pitch 27

Musical Applications Sound Modification Time-stretching using PSOLA Auto-tune: pitch-correction or T-Pain effect Music Performance Tuning musical instruments Pitch-based sound control: e.g. fiddle~ Score-following and auto-accompaniment Query-by humming Relative pitch change might be more important Singing evaluation (e.g. karaoke) and visualization 28 Original (Variable) Time-Stretched (N. Bryan, 2012)

References A. de Cheveigne and H. Kawahara, YIN, a Fundamental Frequency Estimator for Speech and Music, 2002. A. Noll, Cepstrum Pitch Determination, 1967. A. Noll, Pitch Determination of Human Speech by the Harmonic Product Spectrum, the harmonic sum spectrum and a maximum likelihood estimate, 1969 M. Puckette, T. Apel and D. Zicarelli, Real-time audio analysis tools for Pd and MSP, 1998 M. Sondhi,New Methods of Pitch Extraction, 1968. D. Talkin,A Robust Algorithm for Pitch Tracking (RAPT), 1995. M. Mauch and S. Dixon,PYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions, 2014. 29

gct731 fall 2014 topics in music technology - music information retrieval pitch detection and...

Documents

definition of pitch

pitch harmonics

t slide

pitch music melody

way of measuring pitch

interpretation of acf

musical sounds

acf bi