similarity algorithms for music plagiarismmas03dm/...musicplagiarism.pdf · for music plagiarism....
TRANSCRIPT
Daniel Müllensiefen
Department of Psychology
GoldsmithsUniversity of London
Similarity Algorithmsfor Music Plagiarism
StructureStructure1 Introduction2 Method and Examples3 Algorithms4 Empirical Study5 Summary – Future Research
IntroductionIntroduction: : The The BackgroundBackgroundHuge Public Interest:
Importance for the pop industry, interestingemphasis on tunes (melodic plagiarism)
Ambigity surrounding UK law in defining asubstantail part
Lack of Research, Exceptions: Timothy English: Sounds Like Teen Spirit, (2008) Stan Soocher: They Fought The Law, (1999) Charles Cronin: Concepts of Melodic Similarity in
Music-Copyright, (1998)
The 2-second rule The 1-bar rule The 5-notes rule
IntroductionIntroduction::No No hard hard and fast and fast rulesrules!!
It is more complicated than that!
What has been taken needs to be at least asubstantial part of the copyright work(Lord Millet, Designers Guild v. Williams [2000] 1 WLR 2426)
The first quantitative study on music plagiarism
IntroductionIntroduction: : MMüllensiefen üllensiefen &&Pendzich Pendzich (2009)(2009)
Questions: Can computer algorithms predict plagiarism
incidents from the melodic similarity of tunes? What is the predictive power of these algorithms
when superimposed on documented case law?
What is the frame of reference(directionality of comparisons)?
How can prior musical knowledge be takeninto account?
MethodMethod: : OutlineOutline
Selection of 20 cases* from 1970 to 2005 – focuson melodic aspects of copyright infringement
Transcription to monophonic MIDI files Analysis of judges' written opinions Reduction of court decisions to two categories
“pro plaintiff” = melodic plagiarism “contra plaintiff” = no infringement
Use of context: Corpus of 14,063 pop songsrepresenting pop music history from 1950-2006
*from UCLA/Columbia copyright infringement database: http://cip.law.ucla.edu/
The Chiffons He‘s So Fine, 1963 No. 1 in US, UK highest position 11
Example 1: Bright Tunes v.Example 1: Bright Tunes v.Harrisongs Harrisongs (1976, 420 F. Supp)(1976, 420 F. Supp)
George Harrison, My Sweet LordPublished in 1971 No.-1-Hit in US, UK & (West-)Germany
Ronald Selle, “Let It End” Unreleased (1975)
Example 2: Selle v. Gibb Example 2: Selle v. Gibb (1983, 567(1983, 567F.Supp 1173)F.Supp 1173)
Bee Gees, “How Deep Is Your Love” (1977)
AlgorithmsAlgorithms
Statistically Informed Similarity Algorithms:Importance of frequency of melodicelements for similarity assessment
Inspired from computational linguistics(Baayen, 2001) and text processing (Manning &Schütze, 1999; Jurafsky & Martin, 2000)
Conceptual Components: m-types (aka n-grams) as melodic elements Frequency counts: Type frequency (TF) and
Inverted Document Frequency (IDF)
Melodic elementsMelodic elements: : m-typesm-types
Word Type t Frequency f(t), Melodic Type τ (pitchinterval, length 2)
Frequency f(τ),
Twinkle 2 0, +7 1
little 1 +7, 0 1
star 1 0, +2 1
How 1 +2, 0 1
I 1 0, -2 3
wonder 1 -2, -2 1
what 1 -2, 0 2
you 1 0, -1 1
are 1 -1, 0 1
!
IDFC"( ) = log
C
m:" #m( )C Corpus of melodies
m melody
τ Melodic type
T # different melodic types
|m:τ ∈ m| # melodies containing τ
!
TF(m," ) =fm "( )
fm " i( )i=1
#
$
Melodic Type τ(pitch interval,length 2)
Frequency f(τ)
0, +7 1
+7, 0 1
0, +2 1
+2, 0 1
0, -2 3
-2, -2 1
-2, 0 2
0, -1 1
-1, 0 1
Type- and Type- and InvertedInvertedDocument FrequenciesDocument Frequencies
!
IDFC"( ) = log
C
m:" #m( )C Corpus of melodies
m melody
τ Melodic type
τ T # different melodic types
|m:τ ∈ m| # melodies containing τ
!
TF(m," ) =fm "( )
fm " i( )i=1
#
$
Melodic Type τ(pitch interval,length 2)
Frequency f(τ) TF(m, τ)
0, +7 1 0.083
+7, 0 1 0.083
0, +2 1 0.083
+2, 0 1 0.083
0, -2 3 0.25
-2, -2 1 0.083
-2, 0 2 0.167
0, -1 1 0.083
-1, 0 1 0.083
Type- and Type- and InvertedInvertedDocument FrequenciesDocument Frequencies
!
IDFC"( ) = log
C
m:" #m( )C Corpus of melodies
m melody
τ Melodic type
τ T # different melodic types
|m:τ ∈ m| # melodies containing τ
!
TF(m," ) =fm "( )
fm " i( )i=1
#
$
Melodic Type τ(pitch interval,length 2)
Frequency f(τ) TF(m, τ) IDFC(τ)
0, +7 1 0.083 1.57
+7, 0 1 0.083 1.36
0, +2 1 0.083 0.23
+2, 0 1 0.083 0.28
0, -2 3 0.25 0.16
-2, -2 1 0.083 0.19
-2, 0 2 0.167 0.22
0, -1 1 0.083 0.51
-1, 0 1 0.083 0.74
Type- and Type- and InvertedInvertedDocument FrequenciesDocument Frequencies
!
IDFC"( ) = log
C
m:" #m( )C Corpus of melodies
m melody
τ Melodic type
τ T # different melodic types
|m:τ ∈ m| # melodies containing τ
!
TF(m," ) =fm "( )
fm " i( )i=1
#
$
Melodic Type τ(pitch interval,length 2)
Frequency f(τ) TF(m, τ) IDFC(τ) TFIDFm,C(τ)
0, +7 1 0.083 1.57 0.13031
+7, 0 1 0.083 1.36 0.11288
0, +2 1 0.083 0.23 0.01909
+2, 0 1 0.083 0.28 0.02324
0, -2 3 0.25 0.16 0.04
-2, -2 1 0.083 0.19 0.01577
-2, 0 2 0.167 0.22 0.03674
0, -1 1 0.083 0.51 0.04233
-1, 0 1 0.083 0.74 0.06142
Type- and Type- and InvertedInvertedDocument FrequenciesDocument Frequencies
!
"C(s,t) =
TFIDFs,C(# ) $TFIDF
t,C(# )
# % sn&tn
'
TFIDFs,C(# )( )
2
$ TFIDFt,C(# )( )
2
# % sn&tn
'#% s
n&tn
'
TF-IDF CorrelationTF-IDF Correlation
Ratio Model (Tversky, 1977): Similarity σ(s,t) related to # features in s and t have common salience of features f()
!
"(s,t) =f (sn# tn )
f (sn# tn )+$f (sn \ tn )+ %f ( tn \ sn ),$,% & 0
features => m-types salience => IDF and TF different values of α, β to change frame of reference
Variable m-type lengths (n=1,…,4), entropy-weighted average
Feature-based SimilarityFeature-based Similarity
Tversky.equal measure (with α = β = 1)
!
"(s,t) =IDF
C(# )
# $ sn% t
n
&IDF
C(# )
# $ sn% t
n
& + IDFC(# )
# $ sn\ tn
& + IDFC(# )
# $ tn\ sn
&
Tversky.plaintiff.only measure (with α = 1, β = 0)
!
"plaintiff.only(s,t) =IDF
C(# )
# $ sn% t
n
&IDF
C(# )
# $ sn% t
n
& + IDFC(# )
# $ sn\ tn
&
Tversky.defendant.only measure (with α = 0, β = 1)
!
"defendant .only( t,s) =IDFC (# )# $ sn% tn
&IDFC (# )# $ sn% tn
& + IDFC (# )# $ tn \ sn&
Tversky.weighted measure with and
!
" =TF
s(# )
# $ sn% t
n
&TF
s(# )
# $ sn
&
!
" =TF
t(# )
# $ sn% t
n
&TF
t(# )
# $ tn
&
Feature-based SimilarityFeature-based Similarity
Ground Truth:20 cases with pro/contra decision (7/13)
Evaluation metrics Accuracy (% correct at optimal cut-off on
similarity scale) AUC (Area Under receiver operating
characteristic Curve)
EvaluationEvaluation
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Edit Dis
t.
Sum C
om.
Ukkonen
TF-IDF c
orrel.
Tv.equal
Tv.pla
intif
f
Tv.defe
ndant
Tv.weig
hted
Accuracy
AUC
EvaluationEvaluation
Evaluation: ROC CurvesEvaluation: ROC Curves
Observations: Decision sometimes based on ‘characteristic motives’ (incl. rhythm)
High-level form can be important (e.g. call-and-response structure)
Frame of reference can be different
Ronald Selle, “Let It End”
Bee Gees, “How Deep Is Your Love”
A Qualitative LookA Qualitative Look
SummarySummary Court decisions can be related closely tomelodic similarity Plaintiff’s song is often the frame ofreference when determining the question ofsubstantial part (“It depends upon its importance to thecopyright work. It does not depend upon its importance to thedefendants” per Lord Millet)
Statistical information about commonnessof melodic elements is important
Future ResearchFuture Research
Conduct analogous studies with cases fromthe UK and Germany and utilize additionalUS cases
Include rhythm in m-types Use higher level features of melodies in
addition to m-types Conduct listening experiment to determine
cognitive validity
Daniel Müllensiefen
Department of Psychology
Goldsmiths,University of London
Similarity algorithmsfor music plagiarism