chapter 5 raga identification -...
TRANSCRIPT
105
CHAPTER 5
RAGA IDENTIFICATION
Chapter 4 dealt with how feature extraction algorithms were
designed by exploiting Carnatic music characteristics, and hence, these
features could be used for identifying the specific content of Carnatic music.
One of the important Carnatic music specific content is the Raga, which is
typically used to convey the Emotion and characteristic of a particular song.
In this Chapter, we discuss the algorithms that are proposed for Carnatic
music Raga identification.
5.1 MELODY AND RAGA
Melody is a central area in all types of music. Melody distinguishes
speech from music. Melody is fundamental in Western music. Several
attempts have been made to identify melody (Deutsch 1978, Schulkind et al
2003, Madsen and Widmer 2007). While melody is a typical characteristic of
Western music, it has an organized representation called Raga in the context
of Indian music. As discussed earlier, the concept of Raga is specific to Indian
music; it is defined as the arrangement of notes in a pre-defined order, and
each Raga is given a name based on its characteristics (Sambamurthy 1983).
Moreover, the role of Raga and its interpretation varies between Carnatic
music and Hindustani music.
A parent Raga called Melakarta Raga is one, in which all the seven
swaras of S, R, G, M, P, D, and N are present. A child Raga called Janya
Raga, is derived from a parent Raga; in this only 5 or 6 of the seven swaras
106
are present. As a child Raga is derived from a parent Raga by omitting one or
two of the seven swaras, a child Raga can be thought of as derived from
multiple parent Ragas.
Carnatic music establishes a hierarchical relationship to classify the
Raga in terms of a Parent-Child relationship. In our work, we have proposed
three different approaches for the identification of both parent and child
Ragas of Carnatic music, based on identifying the swaras in a given musical
piece. This mapping of frequency to swara is challenging, due to the narrow
range of frequency for each note, the presence of Gamakas and the use of
microtones. Hence, the observation is that the process of Raga identification
should also consider other characteristics of the Raga, in addition to the
swaras that comprise the Arohana and Avarohana of the Raga.
5.2 CHARACTERISTICS OF RAGA
A Raga is characterized by its Arohana and Avarohana. In addition
to this, a Raga is also characterized by the Raga lakshana which conveys the
semantic information about the Raga. As described in literature, a Raga
lakshana has 13 essential features (Sambamurthy 1983) and comprise of the
following components:
Graha: Note at which a Raga commences
Amsa: The note that reveals the melodic entity of the Raga - or
svarupa - or jiva swara
Nyasa: The note on which the Raga can be concluded
Mandra: The lowest note that can be played in the Raga
Tara: The highest note that can be played in the Raga
Alpatva: The note used sparingly in the Raga
107
Bahutva: The note used frequently in the Raga
Apanyasa: The same sangati is sung in Tara and Madhya sthayi
Vinayasa : Raga Sancharas are stopped at a swara - then
elaborated in Mandra and tara sthayi – a characteristic that
defines the pattern of singing a particular Raga
Sanyasa: The Raga is sung and elaborated and finally closed at
the Adhara Shadja Swara
Shadava: 6 note sancharas
Audava: 5 note sancharas
Antara Marga: Introduction of the note or chayya of another
Raga
The identification of this Raga lakshana requires the identification
of the swaras that occur in the input music signal. In this thesis, the swaras are
identified to determine the Arohana, Avarohana and the Raga lakshana, which
are later used to identify the Raga of the input music.
5.3 AROHANA AVAROHANA APPROACH
In this thesis, we propose three approaches to Raga identification:
the Arohana Avarohana approach, LDA approach and the Raga model based
approach. In the first approach, the use of the Arohana and Avarohana is
explored, as the basis for Raga identification.
5.3.1 Algorithm
The primary component of the Raga is the swara comprising of the
Arohana and the Avarohana. If the swaras comprising the Arohana and
Avarohana were identified, it would make it easy to identify the Raga. The
Arohana Avarohana algorithm was carried out using two scenarios, one with a
108
known fixed frequency corresponding to the shadja ‘S’, and the other by
estimating the tonic using our algorithm to indicate the frequency of the
shadja ‘S’.
Songs were chosen from Singers whose tonic is already assumed.
The input signal is segmented using the segmentation algorithm as discussed
in Chapter 4. As already mentioned, at the end of the segmentation phase the
assumption is that, every segment would probably correspond to a swara.
Therefore, the dominant frequency corresponding to every segment is
identified. The choice of the dominant frequency is decided using the spectral
energy and the spectral centroid feature. The spectral centroid is computed
and the frequency at which 95% of the energy is present is determined as the
dominant frequency. After identifying the frequency components from every
segment, these frequencies are converted to a swara notation, by determining
the ratio between the frequency and the known tonic using Table 1.4 of
Chapter 1 (Sambamurthy 1983).
Using these swaras that are identified, the Arohana and Avarohana
are determined by choosing one swara from all the 7 swaras. From the
Arohana and Avarohana the Raga look up table is used to determine the Raga
of the input song. A simple string matching approach is used, to compare the
identified Arohana Avarohana pattern with the Raga database, to determine
the Raga of the input song. The Raga look up table consists of the name of the
Raga, the Arohana, Avarohana of the Raga in the form of swara components,
as shown in Figure 5.1.
Raga Name Arohana Avarohana
Figure 5.1 Arohana Avarohana model
109
This algorithm lacked efficiency, since the known fixed frequency
corresponding to ‘S’ of the Singer was used, but as discussed earlier in
Chapter 1, this frequency is a variable depending on the song. Hence, the
same algorithm for Raga identification was used by accommodating the
estimated variable tonic, using our algorithm as explained in Chapter 4.
5.3.2 Analysis of the Arohana Avarohana Approach
For the purpose of Raga identification we considered songs
belonging to only parent Ragas sung by musicians, like Nithyasree,
M.S.Subalakshmi and Balamuralikrishna. For testing this algorithm, we
considered songs in the Ragas, Sankarabharanam, Kanakangi, and
Karaharapriya and in Adi Tala or Rupaka Tala. The details of the identified
Ragas, based on the Singer and the percentage of identification, are shown in
Figure 5.2.
Figure 5.2 Raga Identification - Arohana Avarohana Approach
110
A comparison is performed on the three Ragas using three Singers,
with fixed and estimated tonic to achieve an average identification rate of
61% and 66% respectively. It can also be observed from Figure 5.2, that the
Raga identification algorithm, which incorporated the tonic identification
algorithm, performed better in many situations, compared to the algorithm
that assumed only the fixed frequency for ‘S’.
This algorithm was tested only for parent Ragas where all the seven
swaras are present. However, in the case of Parent Ragas with microtones, the
so called “vivadhi” swaras were identified incorrectly leading to a wrong
identification of the Raga. In addition, due to the narrow frequency range of
the swaras and the Gamakas (pitch inflexions given to swaras), the
identification of the swaras was difficult, using only the dominant frequency
which resulted in limited accuracy. In addition, the tonic that was estimated
was based on using known frequency mutating signals corresponding to the
frequencies of ‘S’, ‘R1’, .. ‘N”. This may not be the case always since the
singer can chose a tonic in between the intervals, like a value between ‘S’ and
‘R1’ or ‘M2’ and ‘P’ and so on. Hence, in such situations the algorithm was
not able to determine the Raga due to the difficulty in tonic estimation.
Another limitation of this algorithm is that, it is not able to handle
missing swaras (Child Raga), differing swaras (Child/vakra Raga) and
jumbled swara patterns (vakra Ragas) in Arohana and Avarohana. In the case
of missing swaras, the narrow range of frequencies between notes leads to one
swara being misinterpreted as another. In the case of differing swaras the
large number of possible combinations made the interpretation of the Arohana
and Avarohana difficult, while for a jumbled swara pattern determining the
sequence of swaras was difficult.
Considering these limitations of the Arohana Avarohana approach,
we propose yet another approach for Raga identification, which is not just
111
based on the Arohana and Avarohana, but also on the Raga lakshana
characteristics. The Raga lakshana characteristics conveying semantic
information about the Raga can be mapped to the contextual information used
in text processing.
Therefore, we considered analyzing the musical piece to identify
Raga similar to a text based approach, where we have used the ‘Amsa’
characteristics of Raga lakshana in order to convey semantic information.
5.4 LDA APPROACH
To overcome the limitations of the Arohana and Avarohana
approach for Raga identification, we propose the use of a probabilistic Latent
Dirichlet Allocation (LDA) model (Hu 2009), which incorporates additional
Raga lakshana parameters to determine the Raga. The LDA is an
unsupervised statistical model, which is being used for document
classification to determine the underlying topics in a given document, under
the assumption that a document contains a random mixture of topics. In this
work, we have constructed the LDA for identifying the Raga(s) available in a
given input music signal, based on the assumption that the musical piece is a
random mixture of notes, and hence, notes map to the words in a topic, and
the topics in a document map to the Raga.
5.4.1 LDA for Music
In the work proposed by Hu (2009), the author has analysed the
performance of the LDA for text, images and music. As explained in Chapter
2, the Dirichlet parameters and indicate the distribution of the topics in a
given document. In the work for images, for each image segment, the
Dirichlet parameters are identified to determine the topics available in a given
image. In the work for music, the author has used the LDA for determining
112
the harmonic structure available in a given Western musical piece, using the
Dirichlet parameters. The authors have established the note pattern catering to
the major and minor scales using the LDA. This inspired us to try this
approach for Carnatic music analysis, which has a pre-defined arrangement of
notes, called the Raga. To determine this possibility we explored the semantic
information conveyed by the Raga, by means of the ‘Amsa’ Raga lakshana.
5.4.2 LDA based Algorithm
For the purpose of Raga identification, we explore the characteristic
phrase of notes, which is unique for every Raga. A sequence of notes that
occurs contiguously and the maximum number of times is the characteristic
phrase, which defines the ‘Amsa’ of a given Raga. This characteristic phrase
helps in distinguishing the Child Raga from its Parent, despite the swaras in
the Arohana and Avarohana are the same for the Parent and Child Raga. This
characteristic swara phrase of the Raga is used to increase the weight that the
LDA gives to the swaras, to identify the Raga. The LDA parameters and
are derived using this characteristic phrase of the Raga. The initial value of ,
need to be computed from a random mixture of songs. This value is assumed
to be uniform for all Ragas initially. It is computed with all possible
combinations of swaras, where equal weight is given to all the phrases. The
parameter , estimates the weight associated with a sequence of notes for a
given Raga and is computed by assuming the initial value of , using Bayes
equation.
The Dirichlet parameters and of the LDA are computed for all
Ragas during the training phase. These parameters require the swaras of the
song. The swaras are determined by initially segmenting using our
segmentation algorithm to identify frequency components. Then using our
tonic estimation algorithm a ratio is computed with this frequency and the
other identified frequency components. The ratio is then mapped to swaras.
113
After determining the swaras, the sequence of swaras are considered for a
length of four to compute the dirichlet parameters. In our work, we have
assumed as the generic distribution of the patterns of swaras in all Carnatic
songs, since this parameter is estimated from a random mixture. The
individual Raga’s distribution of swara patterns is given by the vector. In
our algorithm, during the training phase, we assume the initial probability
value of , by assuming all combinations of swaras of length 4. This value is
re ned by studying the frequency of the occurrence of the swara patterns in
the songs from the training set. Similarly, we initialize and recompute this
value, based on songs belonging to a particular Raga.
Using the computed value, the value of is modified, using the
Baye’s theorem. The process of computation for all Ragas, including Parent,
child and vakra, is performed to determine the corresponding probability
vectors.
In our algorithm, the estimated value of is maintained for a
pattern length. If the pattern length is 4, all songs with the identifying pattern
length of 4 will have one common , and every Raga has a vector for itself.
Using a permutation algorithm, we generate all possible combinations of the 4
length patterns that include all the 7 swaras. We initialise equal probabilities
to these patterns, with a value as 1/(7*7*7*7). To compute this value of and
we considered the 7 swaras S, R, G, M, P, D, N without representing them
as microtones.
After initialising the probability value for , we need to train this
value of with some songs belonging to all Ragas, to represent the generic
information regarding Ragas. In the training process, for every 4 length
pattern we encounter in the song, there is an increase in weight in for that
corresponding pattern. After re-computing the value of with this process by
using a training corpus, the value of is determined from for each Raga.
114
To construct , the Raga specific parameter, the system needs to be
trained with the songs of a particular Raga. Given a song, for every four
length pattern encountered, the weight of that pattern is retrieved from
vector and its weight is increased, thereby storing the pattern’s probability in
the vector. The initial value of is 1/74 which is incremented in steps of
0.004 to compute using Baye’s theorem. Then during the next computation
of , weight is incremented in steps of 0.04 to compute the value of . The
initial value of is computed and is increased by 0.05 to encounter a new
characteristic phrase pattern. Using this procedure, the top 20 patterns for a
given Raga are found with the training set, and stored as the vector. Once
again this vector is used to refine vector, which is done by adding a small
weight to the patterns of vector, that are encountered in the vector. After
gets re-computed, the same procedure is repeated to determine this vector for
all the Ragas. The pseudocode of the LDA construction algorithm is given in
the following algorithm.
LDAConstruct ()
{
Determine 4 length pattern combinations and assign equal probability
= 1 / (7*7*7*7)
For every Raga
{
Compute by choosing songs belonging to all Ragas by assigning a little
weight if the 4 length pattern is encountered in the song,
Compute by choosing the song belonging to one Raga, and if the 4-length
pattern occurs add a little weight by choosing from ,
Re-compute using the computed vector.
}
}
115
In the testing phase, the input song is given, and using vector the of the
song is determined. This is compared with the s of all the Ragas to find the
Raga whose computed is closest to an available . The closeness is
determined by the relative positions between the top 10 patterns in the input
and the available values.
5.4.3 Analysis of the LDA Approach
In our approach for determining the initial value of we used the
characteristic swara phrase of length four. The challenge was in determining
the unique phrase of notes for each Raga. In general, a Raga is characterized
by more than one characteristic note phrase which occurs frequently, and this
is represented using the dirichlet parameters and . A Raga can have
multiple characteristic phrases of length 3, 4, or 5. We started with a
characteristic phrase of 3, but however found that this did not yield correct
Raga identification. Hence we increased the length to 4 to compute the
dirichlet parameters and . We then also increased the length to 5 but found
that the LDA process did not yield better Raga identification accuracy. For
instance, if in the characteristic phrase the swara occurs for more than one
duration like the P,DP in Sankarabharanam, the identification of the phrase
was incorrect. We have not been able to handle the swara phrase as indicated
for Sankarabharanam where a note has been occurring for a longer duration,
as we have merged this together in our segmentation algorithm and
considered as one swara. Therefore, the length of the characteristic phrase was
restricted to four. In this work, we identified the characteristic phrase, from
the literature on Carnatic music (Sambamurthy P 1983), and also by
computing manually, by observing the swara representation of Carnatic songs
for each Raga. The results of the algorithm are given in
Figure 5.3.
116
Figure 5.3 Raga identification using LDA
From the figure it is observed, that the performance of the
Melakarta Ragas like, Sankarabaranam, Thodi and Kalyani is higher
compared to that of Child Ragas Madhyamavathi, Mohanam, Sindhubairavi
and Bilahari. In addition to this, if the characteristic swara phrase of the Ragas
is of length four, the performance is better than that of Ragas whose
characteristic phrase is of a different length. In addition, since we have
considered the swara representation without considering the microtones, the
computation of is also affected, leading to lesser efficiency. This is due to
the fact that the swaras R3 and G1, and similarly D3 and N1, share the same
frequency values leading to misinterpretation if we do not consider them
separately. In addition, the determination of the wrong swaras, due to an error
in the tonic, wrong frequency identification leading to wrong swara mapping
due to the presence of Gamakas, also lead to an incorrect identification of the
characteristic phrase thereby resulting in an error in Raga identification.
Therefore to tackle cases where the characteristic length of 4 for computing
the dirichlet parameters did not yield correct results, we designed the
supervised Raga model based approach to Raga identification.
117
5.5 RAGA MODEL APPROACH
The algorithms that exist for the process of Raga identification of
Hindustani or Carnatic music are based on extracting the raw signal level
features in terms, of the temporal, spectral, Cepstral features, constructing a
Classifier using these features, and determining the Raga either with or
without the help of the swara components. Chordia and Rae (2007), used
Pitch Class distribution (PCD) and Pitch Class Dyad Distribution (PCDD) for
Hindustani Raga identification, which was later, tried for Carnatic music. The
drawback of this system is the necessity of converting the input to a MIDI
representation (Chordia et al 2009), which essentially results in the loss of the
Gamakas available in the signal, because of the loss in the conversion.
Among the approaches that we have proposed, the Arohana
Avarohana approach has the drawback in determining the swara from the
frequency components, and is also not successful for Child Raga
determination. The LDA based approach, which was adopted from text
classification, has difficulty in computing the probabilistic dirichlet
parameters. In addition, since the characteristic swara phrase is a predefined
one, the use of the LDA is not fully justified.
Therefore, we combine our approaches to tackle all the drawbacks
that the individual approaches have, and conclude that the process of Raga
identification as performed by these algorithms cannot be a one step process
but rather should be a multi-step process. Therefore, as motivated by the idea
of constructing a swara model (Chordia and Rae 2008) and using our
Aroahana Avarohana approach, with the need for a multi-step algorithm as
the basis, we have constructed a Raga model. This model for the
determination of Raga is based on three major aspects - raw signal level
features, the Arohana-Avarohana pattern and other Raga lakshana
characteristics.
118
5.5.1 Components of the Raga Model
The Raga model comprises musical features and signal level
features, as shown in Figure 5.4. The musical features are represented in
terms of the Arohana, Avarohana, and the Raga lakshana characteristics.
Name Arohana Avarohana Musical Parameters - Raga Lakshana
Signal Level parameters
Rag
a
S R1
R2
R3 … N1
N2
N3
N3
N2
N1
D3 … R2
R1 S
Rag
a Ph
rase
Star
t Sw
ara
End
Sw
ara
Swar
a Fr
eque
ntly
us
ed
Swar
as th
at c
an
take
Gam
aka
CIC
CM
FCC
Flux
Cen
troi
d
Figure 5.4 Raga Model
The signal level features are represented, using the spectral and
Cepstral features consisting of the Spectral Centroid, Spectral flux, MFCC,
and CICC. The construction of the Raga model is described below.
5.5.2 Construction of the Raga Model
For implementing the Raga model, we need to specify how this
model is to be represented, so as to help in the later stages of Raga
identification.
5.5.2.1 Arohana, Avarohana and Raga Lakshana
The Arohana and Avarohana, are the primary components of the
Raga; they are the first constituents of the Raga model and are important in
distinguishing Ragas. This information is available in the literature, and in our
work we have indicated the swaras comprising the Arohana and Avarohana as
a Boolean value of either 0 or 1, indicating the presence or absence of a
swara. For this purpose, the swaras at the microtone level to indicate 3
119
variations of R, G, D, N and two variations of M, are represented as the
Arohana and Avarohana. The Raga model is created in such a manner that the
first 72 rows correspond to the Parent Ragas, and then, from row 73 onwards
the model corresponds to the Child Ragas. In some Child Ragas the Arohana
and Avarohana are different, and hence, in the Raga model there is a need to
represent them as two separate components. However, for the Vakra Ragas
the presence or absence of swaras in the Arohana, Avarohana, is indicated,
but this cannot convey the sequence of swaras that comprise the Arohana,
Avarohana.
The next component of the Raga model is the set of musical
parameters conveying other Raga lakshana characteristics. In this work, we
have considered features such as the Graha, Amsa, Nyasa, and Bahutva, since
these characteristics could be determined directly from the raw signal level
features, and are more pertinent for the identification of the Raga. Based on
the Raga lakshana, the musical parameters that we have considered are the
Starting swara, Ending swara, Sequence of swaras indicating the
characteristic phrase of a Raga, the most frequently used swara, and swaras
that can take the Gamaka. The starting and ending swaras are represented as
strings as “S”, “R1”, and “R2” etc. The Raga phrase, which is a string
consisting of three to four swaras, is represented as a sequence and there may
be more than one such characteristic pattern for a Raga, and these patterns are
unique for a particular Raga. The pattern as a string, along with its
representation as a prefix function suitable for use by string-matching
algorithms, is part of the Raga model. As explained earlier, since the
frequency span for a particular swara is very narrow, it is possible to interpret
one swara for another adjacent swara. Hence, this characteristic phrase
component helps to correctly identify the Raga even if there is an error in
identifying the swara component.
120
In addition to the above, another important component of the Raga
the Gamaka, is also used. Gamakas can be defined as pitch fluctuations. In
Carnatic music, not all swaras take discrete frequencies. The swaras can take
a range of frequency. For example, the frequency between 240 Hz and 256.4
Hz is thought of as R1, as against the exact value of 256.4 Hz. As discussed in
Chapter 1, this small range of frequency can be covered in multiple ways,
either as a valley, as a continuously changing value, a mountain or a simple
jump. The representation, of which swaras would take what type of Gamakas,
varies from Raga to Raga, and hence, this component is also represented in
the Raga model as a Raga lakshana.
The last component of the musical parameters list is again
represented as a string, where each character of the string is a probable
candidate for taking the Gamaka for a particular Raga. This component is
very useful to identify pitch fluctuations, and is used to disambiguate a swara.
If a particular swara can take the Gamaka for a Raga, and if it is identified as
an adjacent swara, then it would be corrected to the actual swara, because the
reason for the fluctuation in frequency is due to the gamaka, which had
resulted in the wrong swara. The musical parameters are populated in the
Raga model by referring to the literature of music (Sambamurthy P 1983),
and by interviewing musicologists. In addition to this, from the music
literature, patterns are extracted to determine the Raga lakshana
characteristics, and are validated against those derived from musicologists to
populate the Raga model.
5.5.2.2 Signal level parameters
The last components in the Raga model are the raw signal level
features. As discussed earlier, in cases where the swara identification is not
accurate, the signal level features can help in Raga identification. The
121
construction of a Gaussian mixture model based on signal level features, and
the use of the same for the process of Raga determination has been discussed
(Sudha 2009). We have considered both spectral and Cepstral features; and
the features used are the Spectral flux, Spectral centroid, MFCC, and CICC,
which are extracted from songs belonging to the same Raga, and the range of
these values each Raga can take is represented in the Raga model.
The spectral flux and Spectral centroid give information about the
dominant frequency in a typical segment, which is again mandatory for
conveying the swara. This is represented as a vector consisting of the spectral
centroid values for the Arohana or Avarohana pattern. This value is
determined by the method of training and extracting values for each Raga
using different Singers. The coefficients of the MFCC and the CICC are
determined for the Raga, and a feature vector consisting of the first 9
coefficients is determined by means of training and stored in the model. After
creating the Raga model with all these parameters, it is available for use in
identifying the Raga.
5.5.3 Raga Identification Algorithm
Most of the algorithms available for Hindustani and Carnatic Raga
identification, including the ones that we have proposed, use a one-step
procedure for Raga identification (Chordia et al 2009), (Pandey et al 2003). In
our work, the process of Raga identification is a three-pronged one, as shown
in Figure 5.5 which is a multi-faceted model.
122
Figure 5.5 Raga identification using Raga model
During the Raga identification phase, the frequency components are
extracted from the input music signal, followed by extracting features and the
tonic, which refers to the frequency of ‘S’ as explained in Chapter 4. Using
the extracted frequency components and the tonic, the ratio between the
frequency components and this frequency is determined to identify the swaras
constituting the input. From these swara components the Raga lakshana is
identified by observing the swaras available in the consecutive segments.
These swara sequences along with the signal level features are used for the
process of Raga identification.
In the first step of the three-pronged approach the swara pattern is
determined from the input musical piece to identify the Arohana and
Avarohana, which is compared with the Raga model to determine the Raga.
This is the first level of identification. Due to the design of the Raga Mode,l
in cases where all the seven swaras are present only the first 72 rows need to
be compared.
123
The second step in the three-pronged model of Raga identification
is the use of other Raga lakshana components of the Raga model. From the
swara pattern that has already been identified, the following components are
determined by considering the swaras available in the consecutive segments:
Starting swara – corresponding to the swara of the first segment,
which is the swara at the onset
Ending swara – corresponding to the swara of the last segment,
which is the swara at the offset
Most commonly occurring swara – count of which swara is used
the maximum
Most commonly occurring phrase of swaras – the sequential
occurrence of a phrase of swaras that has a maximum count
From the input song, the most repeating phrase of the swara is
determined using a string-matching algorithm. The string matching algorithm
that is done here is the reverse of KMP string matching. In this algorithm, the
most frequently occurring pattern is determined, rather than identifying the
presence of the pattern. This pattern is the ‘Amsa’ characteristic of the Raga
lakshana. The characteristic phrase consists of swaras, and is typically of
length six. Therefore, we observe the swara sequence using a window of
length seven (maximum seven swaras), to determine the pattern that occurs
the maximum number of times. In the case where more than one pattern
occurs for a maximum a number of times, all the patterns are identified.
After determining this pattern, all the Raga lakshana characteristics
that have been identified are compared with the Raga model. The first
comparison is performed using the characteristic phrase, to identify the Raga
following which, other components of Raga lakshana, like the starting swara,
ending swara, and frequently occurring swaras are compared.
124
During the second step the Raga lakshana component also
considers notes that can take the Gamaka. In this case, a swara could
correspond to the frequency of the next swara. Hence, during the process of
Raga identification, the Raga model checked to identify swaras that could
take the Gamaka. From the identified set of swaras the commonly occurring
phrase is determined, and in the case of a mismatch in one swara, this swara is
identified, and is replaced with the adjacent swara and again validated. The
error that this Raga model is not able to correct is the one that is encountered
due to the wrong computation of the tonic, leading to incorrect swara
mapping.
A score is assigned to determine the number of Raga lakshana
components that match. If a majority of these Raga lakshana components
match with the pre-determined Raga lakshana components of a particular
Raga, this is the identified Raga, according to the second step of the three-
pronged process of Raga identification.
After identifying the Raga in the first and second steps, signal level
parameters like the MFCC, CICC, Spectral flux and Spectral centroid, which
were extracted from the input song are compared with the features in the Raga
model, by computing the Euclidean distance between the pre-defined set and
the newly computed set to find the similarity. This step is performed to isolate
errors that are possible due to the presence of the Gamakas, while identifying
the tonic and mapping it to the swaras. This step also isolates errors that may
occur while identifying the Arohana and/or Avarohana either due to incorrect
swaras or due to incorrect extraction of the Arohana Avarohana string from
the correct swaras.
As discussed earlier, Ragas can be classified into Parent, Child and
Vakra. An ambiguous Arohana Avarohana is typical of Vakra Ragas. In the
Vakra Raga, the Arohana or Avarohana can contain more than 7 swaras or
125
can contain swaras in a non-sequential fashion in either the Arohana,
Avarohana or both. Hence there is a repetition of some swaras in terms of its
variation, which is present in the Arohana or Avarohana, or both. We have
represented the Arohana and Avarohana using binary values. From the
extracted swaras, determining a unique Arohana/Avarohana is simple for
Parent and Child Ragas, while for Vakra Ragas, the jumbled pattern
constituting the Arohana and Avarohana is difficult to identify. For example,
the Raga Anandha bairavi has its Arohana as SGRGMPDPS and Avarohana
as SNDPMGRS. Hence, in the Raga model, for the Arohana of this Raga, the
swaras S, R, G, M, P and D will be checked, and hence, extracting the
Arohana pattern from the swaras is difficult, leading to ambiguity in the Raga.
Hence, we use the characteristic phrase of this Raga, namely, SGGM, SP,
SGMP and its signal level features for Raga identification.
After the third step of the identification process, the final Raga is
chosen as the one, which is determined by more than one step in the three-
pronged process of Raga identification. In the event, where all the algorithms
gave different Ragas, the one given by the Signal Parameters is chosen, since
this value is the result of the average of all songs belonging to one Raga,
using the assumption that the error is in the swara determination.
5.5.4 Results and Analysis
After identifying the Raga, the following section discusses the Data
used, and the analysis of the Raga model’s approach to Raga identification
and its comparison with the other algorithms.
5.5.4.1 Data Used
In this work, Carnatic music songs sung by different Singers were
sampled at the rate of 44.1 KHz, and used for processing. In this work, for
126
training and testing, we have used songs sung by Singers Dr. M.S.
Subbulakshmi, Dr. M. Balamuralikrishna, Ms. Sowmya, Mr. Sikkil
Gurucharan, Ms. Sudha Raghunathan, Ms. Nithyasree Mahadevan, and Mr.
Ilayaraja. In addition to this, for the purpose of training we have also used the
data set called ‘AAlapana’ which is available from “SaReGaMa” consisting
of songs sung by different Singers pertaining to a single Raga. In both the
training and testing phases, the input song is a polyphonic music consisting of
the Instrument and the voice signal.
During the model construction phase nearly 20 songs for each Raga
are used, and the signal features are determined to be used as a parameter in
the Raga model. In every song, 5 arbitrary segments are chosen to determine
the features already indicated, and the Raga model is populated with these
feature values, which is used for comparison during the Raga identification
phase.
Nearly 27 Ragas, viz, 14 Parent Ragas and 13 Ragas belonging to
the Child or Vakra were considered, and a total of 1200 songs belonging to all
Ragas sung by both male and female Singers are used for the process of
testing.
5.5.4.2 Analysis of the Raga Model for all types of Ragas
Figure 5.6 shows the performance of the three-pronged Raga model
identified for all three types of Ragas, namely, Parent, Child, and Vakra
Ragas.
127
Figure 5.6 Performance of Raga model for Parent, Child and Vakra Ragas
In general, the identification of Parent Ragas is simpler. However,
the use of a three-pronged approach shows (Figure 5.6) that the accuracy of
the identification of Parent (81.5%), Child (77.7%) and Vakra (75.7%) Ragas
are comparable. This comparable performance was possible due to the use of
three types of evidences in determining the Raga.
5.5.4.3 Comparison and Analysis of Raga Identification Algorithms
We performed a comparative analysis of the Raga model based
identification with three of our other algorithms –the Arohana Avarohana
algorithm, the LDA model based algorithm, and the signal level based
comparison algorithm, where our specially designed CICC coefficient was an
important parameter. In addition, the PCD algorithm suggested for Carnatic
music (Chordia et al 2009) was also used for comparison. The average
performance of the five algorithms for each of the three types of Ragas is
given in Table 5.1.
128
Table 5.1 Comparison of average performance of Raga identification
algorithms
AlgorithmPerformance
Signal Parameters
Arohana Avarohana
PCD LDA RagaModel
Average Performance across Parent Ragas (14 Ragas – Average of 45 Songs per Raga)
64.2% 76.5% 74.7% 72% 81.5%
Average Performance across Child Ragas (9 Ragas – Average of 42 Songs per Raga)
50.4% 45.9% 53.5% 72% 77.7%
Average Performance across Vakra Ragas (4 Ragas – Average of 38 Songs per Raga)
39.7% 20% 28.2% 54% 75.7%
Average for all Ragas
56% 57.9% 60.7% 69.3% 79.4%
Signal level parameters conveyed mostly the timbral features, and
hence, were not able to perform well independently. The Arohana Avarohana
algorithm was not able to determine the exact swara pattern, and the
determination of the swaras was especially difficult for Child Ragas. The
PCD algorithm had the problem of converting the input to the MIDI
representation, and hence, lost information, and therefore, had an average
identification rate of 60.7%. The Raga model based three-pronged process
had an average identification of 79.4%. This was due to the multiple levels of
check in the algorithm, which uses the Raga model.
129
The table shows that the performance of all the algorithms is
comparable, and relatively good for the well structured Parent Ragas. The
Arohana-Avarohana algorithm gives a good performance (76.5%), because
there is regularity in the swara pattern. However, this algorithm is based on
the correct identification of swaras. The table shows that the three-pronged
Raga model gives an even better performance (81.5%) for Parent Ragas,
because three types of characteristics – the signal level, swara level and Raga
lakshana level are used.
For the Child Ragas, the regularity is based on characteristic
phrases, uniquely determining the Ragas rather than the complete Arohana-
Avarohana pattern. The LDA algorithm based on classifying the Ragas of
songs, based on the 4 length swara pattern gave good results (72%). Again,
the multi pronged approach of the Raga model gave even better results (77.7%).
For the Vakra Ragas, the Arohana Avarohana is jumbled, and
hence the algorithm based on this Arohana Avarohana pattern performed
poorly (20%). Moreover, although the LDA performed moderately (54%), the
jumbled nature of these swaras meant that the 4 length pattern used by the
LDA was not very effective. Again, the multi pronged approach, especially
the use of the Raga lakshanas, improved the performance (75.7%).
The Raga identification error rates of the various algorithms are
given in Table 5.2 where the error rate is defined as the ratio between the
Ragas of songs that have not been identified to the ones that have been tested
for identification. Here, we only consider songs with the specified 27 Ragas.
From the table it is noticeable that the Raga model based algorithm has the
least error rate when compared with the other four algorithms.
130
Table 5.2 Raga Identification Error Rate
Algorithm Error Rate (%) Using only Arohana and Avarohana 42.1Using only Signal level parameters 44Using PCD 39.2Using LDA 30.6Using Raga model (three-pronged) 20.5
5.5.4.4 Analysis of Parent Raga Identification
In this section we discuss in detail, the performance of the
algorithms for different Parent Ragas, which have been identified. The results
are shown in Figure 5.7, which shows that the performance of both the
Arohana Avarohana algorithm and the Raga Model algorithm was much
better than that of the other algorithms, including the PCD, for Karaharapriya
and Sarasangi, because the swaras are comparatively distinct.
Figure 5.7 shows that the performance for the Ragas Mecha Kalyani and
Dheera Sankarabaranam, where the only difference is in the swara ‘M’-Mecha
kalyani (65.8%) uses M2 and Sankarabharanam (69.6%) uses M1 respectively
- was lower than the average of all the Parent Ragas for all the algorithms
(73.8%). This is because even if one swara was identified incorrectly, an input
song was identified as Sankarabharanam when the input given is Mecha
Kalyani. The Arohana Avarohana algorithm identified the swaras correctly
when there is a clear separation between the adjacent swaras, and there is little
influence of the Gamakas. The Raga model algorithm tackled this issue to a
limited extent, by utilizing a three-pronged identification process, where the
performance for the identification of Mecha Kalyani (77%) and for
Sankarabharanam (73%) was better, but still lower than the average for Parent
Ragas (81.5%). The performance of Thodi was better than the average for all
the algorithms, and in fact PCD performed the best, because the distinctness
131
between the adjacent swaras is even more marked and amenable for histogram
tracking used by PCD.
Figure 5.7 Comparison of performance for Parent Raga identification
5.5.4.5 Analysis of Child and Vakra Raga Identification
From Figure 5.8 it can be seen that the performance of the Raga
model for even a child Raga like, Malahari is good (87%), since this Raga is
uniquely specified by the “DPMGRS” which is indicated by the Raga model
as one of the lakshanas . It outperforms the LDA (77%) which uses only a 4
length phrase for identification. In addition to the common phrase, the starting
and end swaras are the other components used by the Raga Model. The
starting swara of many Ragas is ‘S’ but there are Ragas whose starting swara
is different. The performance for the Ragas Mohanam (85% for Raga Model)
and Vasantha (76% for Raga Model) has improved because of the inclusion of
their starting swara ‘G’ by the Raga Model (Figure 5.8). The performance of
132
the PCD algorithm was good for Hamsadhwani, a child Raga which has
distinct swara components. This is because this Raga has very little or no
Gamaka for any of its swaras in the sample data considered.
As already discussed, there is ambiguity between the Ragas
Anandha Bairavi and Reethigowlai where the frequency of the usage of N is
used to resolve ambiguity. The Raga Anandha Bairavi has a limited usage of
‘N’ when compared to Reetigowlai; hence, distinguishing the two Ragas
requires the frequently used swara specified by the Raga Model. This is
evident from the fact that the performance of the Raga Model is better
(Anandha Bairavi -78%, Reetigowlai -78%) when compared to that of all the
other algorithms (Figure 5.8).
Figure 5.8 Comparison of performance for Child and Vakra Raga
identification
133
5.5.4.6 Statistical analysis of results
The analysis carried out so far and indicated by Figures 5.7 and 5.8
is an evidence of True Positive rate (TP), which is defined as the ratio of a
Raga correctly identified against the total number of songs tested for that
Raga.
In addition, we have carried out the statistical analysis of the results
of the Raga identification procedure. This is performed by determining the
False Positive and the True Negative values for all the identified Ragas, and
by plotting the results for the various algorithms. The True Negative (TN) is
defined as the ratio of the number of songs for a Raga that is unidentified,
against the total number of songs that is tested for that Raga. The False
Positive (FP) is where a Raga is identified as another Raga. This is computed
as the ratio of the number of songs that is identified as a Raga (Y) against the
number of songs that is tested for a Raga (X). The details of the False Positive
and True Negative values are specified in Table 5.3.
Table 5.3 True Negative (TN) rate and False Positive (FP) rate
comparison of the Algorithms for Raga identification
SignalParameters
(SP)
Arohana Avarohana
(ARAV)PCD LDA
RagaModel (RM)
Algorithm
Performance TN% FP% TN% FP% TN% FP% TN% FP% TN% FP%
For Parent Ragas
18.6 17.3 12.6 11.1 16 9.35 18.14 10.1 13.5 4.85
For Child and Vakra Ragas
21 32 18.7 43.3 30.5 23.8 21.8 11.8 18 5.1
Overall Ragas 19.8 24.6 15.7 27.2 23.2 16.6 19.9 10.9 15.8 4.9
When the signal level parameters alone are extracted for testing, the
signal values could correspond to the Singer, song, Instrument, Genre or any
134
other musical component; the TN-SP is high for the Parent Ragas and is much
higher for the Child and Vakra Ragas, which is above the average as shown in
Table 5.3.
A high value of the True Negative (TN) is better than a high value
of the False Positive. This is the situation of all the algorithms, and is shown
in Table 5.4. However, the FP and TN of the Raga Model is much lower than
that of the other two algorithms. For the Parent Ragas, the TN rates of the
Araohana Avarohana algorithm (12.6%) are comparable with the TN rates of
the Raga model (13.5%). On the other hand, for the other algorithms, the True
Negative is very high and much above the average. Thus, these two
algorithms had a good precision compared to the other algorithms for the
Parent Ragas.
On the other hand, for the Child and Vakra Ragas, the TN rate of
the Arohana Avarohna algorithm is the same as for the Raga Model (18.7%
and 18%). However, the False positive rate of Arohana Avarohana algorithm
is very high (43.3%) which is due to the fact that the child Ragas were
identified as its parent Ragas, due to incorrect Arohana Avarohana. The PCD
algorithm too had a higher TN comparable with the LDA based Raga
identification algorithm (16% and 18.14%) for Parent Ragas; however, the
incorporation of the characteristic phrase in the LDA algorithm improved the
TN rate for the child Ragas (21.8%) and is better than the PCD algorithm
(30.5%).
The reason for the high value of the TN and FP for the signal
parameters based algorithm, is that the signal values could be used for
multiple analyses, in terms of the Genre, Emotion, Singer, etc. by proper
training, since they conveyed the timbral feature rather than the melodic
feature. Hence, the determination of the Raga, using the signal parameter
tends to yield very poor results. For example, the Ragas Mechakalyani,
135
Karaharapriya and Keeravani had the same TN rate (17%) but showed a
difference in their FP rates (7%, 4% and 3% respectively). This difference in
the FP is due to the presence of the Gamakas in many swaras, in the input
song being considered.
The algorithms are also analyzed for their statistical significance by
performing the linear analytical comparison between the TP and FP values of
each algorithm. This linear analytical procedure uses the ordinary least
squares linear regression and so does not account for the error in the
reference/comparative method. The scatter plot is generated using a standard
tool, which is available for a spreadsheet package, and the following
observations are made with reference to estimating the components of
Standard Error and the regression value R2, by performing a scatter plot
between the True Positive and False positive rates of all the algorithms. The
values of these components are given in Table 5.4.
Table 5.4 Estimate of Regression coefficient for Raga identification
Type of Algorithm \ Parameters R2
Arohana Avarohana 0.01Signal Parameters 0.02PCD 0.12LDA 0.16Raga Model (three-pronged) 0.66
The goodness of the linear fit is estimated using the R2 value, which
is the measure of regression. The value of R2 varies from 0 to 1, with a higher
value indicating a closer fit, and a value of 1 indicating the best fit. As shown
in Table 5.5, the Raga model (three-pronged) algorithm had a much higher
value, when compared to the Arohana Avarohana algorithm, Pitch class
distribution, the LDA or the signal parameters based algorithm.
136
5.5.4.7 Complexity Analysis of Raga model algorithm
As far as the algorithm analysis is concerned, there is O(n)
comparison with respect to the Arohana and Avarohana, where ‘n’ is the
number of Ragas in the Raga model. The next level of comparison, which is
based on the signal parameter, is (n), where for each Raga; four signal
parameters have to be compared. The third level of comparison is (n) where
each of the total ‘n’ Ragas, will take another (k) character comparison, thus
accounting for (nk) comparisons, where ‘k’ is a constant which refers to the
number of character comparisons that happen during the comparison with the
Raga lakshana. Thus, the overall running time for the comparison with the
Raga model takes O(n) + (n ) + (nk ) time, resulting in (nk ) time
which is higher than O(n), with respect to the Arohana and Avarohana
algorithm, and (n) with respect to the algorithm that takes the signal
parameters only. The LDA algorithm is time consuming to compute the
probability value, and hence, has a higher time complexity when compared
with the other algorithms for Raga identification.
After identifying the Raga from the input signal, the other
non-music components, namely the Singer, Instrument, Emotion and Genre
need to be identified. For this purpose, the features that were designed for
Carnatic music, namely, the tonic, CICC and other extracted Signal level
features are used. In addition to these features, the Raga that has been
identified is also used to construct an appropriate model, leading to non-music
component identification, which is discussed in the next chapter of this thesis.