gct731 fall 2014 topics in music technology - music information retrieval pitch detection and...

29
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Pitch Detection and Tracking Juhan Nam 1

Upload: brendan-barrett

Post on 16-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Pitch Detection and Tracking Juhan Nam 1
  • Slide 2
  • Introduction Music is described with what? The majority of musical symbols are notes which mainly contains pitch information We (our brains) usually memorize music as a melody, that is, a sequence of pitches 2
  • Slide 3
  • Outlines Introduction Definition of Pitch Information in Pitch Pitch and Harmonicity Pitch Detection Algorithms Time-Domain Approaches Frequency-Domain Approaches Psychoacoustic Model Approaches Learning-based Approaches Pitch Tracking Applications 3
  • Slide 4
  • Definition of Pitch Pitch Defined as auditory attribute of sound according to which sounds can be ordered on a scale from low and high (ANSI, 1994) One way of measuring pitch is finding the frequency of a sine wave that is matched to the target sound in a psychophysical experiment thus, subject to individual persons: e.g. tone-deaf Fundamental Frequency Physical attribute of sounds measured from periodicity Often called F0 Thus, pitch should be discriminated from F0 However, they are very close for sounds of our interest (i.e. musical sounds). So pitch is often used mixed with F0 4
  • Slide 5
  • Information in pitch Music Melody or notes Harmony (when there are multiple notes with different pitches ) Size (or register) of musical instruments: Bass, Cello, violin Speech Person: gender, age, identity Context: question, mood, attitude, Meaning: Chinese (Mandarin) Others Vocalization of animals (e.g. birds chirp, whale): size and types, communication 5
  • Slide 6
  • Pitch and Harmonicity Not all sounds have pitch Harmonics sounds Regularly spaced harmonic partials Speech or Singing Voice: Vowel Musical Instruments: Piano*, Guitar, Strings, Woodwind, Brass, Organ Non-harmonic sounds No harmonic patterns or irregular harmonic partials Speech or Singing Voice: Consonant Musical Instruments: Drum, Mallet (has pitch but not harmonic) 6 *Inharmonicity in Piano Vibraphone [From Klapuris slides]
  • Slide 7
  • Pitch Detection Algorithms Taxonomy of Algorithms Time-Domain Approaches Frequency-Domain Approaches Psychoacoustic Model Approaches Learning-based Approaches 7
  • Slide 8
  • Time-Domain Approach Basic Ideas Periodicity: x(t) = x(t+T) Measure similarity (or distance) between two segments Find the period (T) that gives the closest distance Two main approaches Auto-correlation function (ACF): distance by inner product Average magnitude difference function(AMDF): distance by difference (e.g., L1, L2 norm) 8 T
  • Slide 9
  • Auto-Correlation Function (ACF) Measuring self-similarity by 9 Singing Voice (Sondhi 1967)
  • Slide 10
  • Biased auto-correlation Unbiased auto-correlation Auto-Correlation Function (ACF) 10
  • Slide 11
  • Comparison of spectrogram and ACF 11 Spectrogram (tracking max values) ACF (tracking max values)
  • Slide 12
  • Interpretation of ACF in Frequency Domain By convolution theorem, auto-correlation can be computed in frequency domain and also efficiently using FFT Thus, the ACF can be computed as 12
  • Slide 13
  • Interpretation of ACF in Frequency Domain This is equivalent to ACF is a simple template-based approach in frequency domain Positive weights for (harmonic) peaks and negative weights for valleys 13
  • Slide 14
  • Problems in ACF Bias to the large peak around zero lag Not robust to octave errors, particularly, lower octaves ACF is sensitive to amplitude changes Equal weights for all harmonic partials In general, low-numbered harmonic partials are more important in determining pitch 14
  • Slide 15
  • Average Magnitude Difference Function (AMDF) Measuring self-similarity by In YIN, p is set to 2 And the AMDF is normalized as 15 Minimize the negative ACF plus a lag-dependent term (de Cheveign & Kawahara, 2002)
  • Slide 16
  • Average Magnitude Difference Function (AMDF) 16 AMDF Normalized AMDF
  • Slide 17
  • Why YIN (AMDF) works better 17 Robust to changes in amplitude The difference takes care of amplitude changes. This reduces octave errors. Zero-lag bias is avoided by the normalized AMDF The normalized AMDF allows using a fixed threshold Can choose multiple candidates and refine peaks
  • Slide 18
  • Example of AMDF (YIN) 18
  • Slide 19
  • Frequency-Domain Approach Basic Ideas Periodic in time domain Harmonic in frequency domain Measure how harmonic the spectrum Find F0 that best explains the harmonic patterns (harmonic partials) Template matching Harmonic Sieve or Spectral Template Cepstrum Harmonic-Product-Sum (HPS) 19
  • Slide 20
  • Harmonic Sieves (or Comb-filtering) Using sharp harmonic sieves to take peak regions only ACF is similar to this but not sharp enough Sigmund~ (PD) and fiddle~ (MaxMSP) are based on weighted harmonics sieves 20 (Puckette et al. 1998)
  • Slide 21
  • Spectral Template Cross-correlation with an ideal template on a log-scale spectrogram 21 [From Ellis e4896 course slides]
  • Slide 22
  • Cepstrum Real Cepstrum is defined as Basic ideas Harmonic partials are periodic in frequency domain (Inverse) FFT find the the periodicity 22 Liftering (Noll, 1967)
  • Slide 23
  • Harmonic Product Sum (HPS) Harmonic Product Sum (HPS) is obtained by multiplying the original magnitude spectrum its decimated spectra by an integer number 23 (Noll, 1969)
  • Slide 24
  • Stabilize & Combine Auditory Model 24 input...... HC...... ACF Summary ACF Correlogram Correlogram is formed by concatenating the ACF of individual HC output Summary ACF is computed by summing the ACF across all channels The peaks in the ACF represent periodicity features This is known to be robust to band-limited noises
  • Slide 25
  • Example of Auditory Model 25 Summary ACF
  • Slide 26
  • Pitch Tracking Pitch is usually continuous over time Once a pitch with strong harmonicity is detected on a frame, the following frames form smooth pitch contour Pitch tracking methods Post processing: first detect pitch in a frame-by-frame manner and then find a continuous path by smoothing. Median Filtering Dynamic Programming (Talkin, 1995) Probabilistic approach: detect multiple pitch candidates every frame and and find the best path Viterbi-decoding: Probabilistic YIN (Mauch, 2014) 26
  • Slide 27
  • Issues and Challenges Voice activity detection (VAD) / singing voice detection (SVAD) Discriminate voice/unvoiced/silent frames Latency: real-time implementation The use of long windows results in slight delay Post-processing / Probabilistic approaches need larger delay Noisy environment Learning-based approaches: NMF or classifiers Active research topic Melody Transcription Pre-dominant Pitch Detection Singing Voice Separation + Pitch Tracking active research topic Polyphonic Pitch 27
  • Slide 28
  • Musical Applications Sound Modification Time-stretching using PSOLA Auto-tune: pitch-correction or T-Pain effect Music Performance Tuning musical instruments Pitch-based sound control: e.g. fiddle~ Score-following and auto-accompaniment Query-by humming Relative pitch change might be more important Singing evaluation (e.g. karaoke) and visualization 28 Original (Variable) Time-Stretched (N. Bryan, 2012)
  • Slide 29
  • References A. de Cheveigne and H. Kawahara, YIN, a Fundamental Frequency Estimator for Speech and Music, 2002. A. Noll, Cepstrum Pitch Determination, 1967. A. Noll, Pitch Determination of Human Speech by the Harmonic Product Spectrum, the harmonic sum spectrum and a maximum likelihood estimate, 1969 M. Puckette, T. Apel and D. Zicarelli, Real-time audio analysis tools for Pd and MSP, 1998 M. Sondhi,New Methods of Pitch Extraction, 1968. D. Talkin,A Robust Algorithm for Pitch Tracking (RAPT), 1995. M. Mauch and S. Dixon,PYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions, 2014. 29