musicological and technological exploration of...
TRANSCRIPT
Musicological and Technological Exploration of Truths and Myths inCarnatic Music, the Raagam in Particular
Thesis submitted in partial fulfillmentof the requirements for the degree of
Mastersin
Computer Science
by
Koduri Gopala Krishna200502005
Cognitive Science LabInternational Institute of Information Technology
Hyderabad - 500 032, INDIADecember 2010
Copyright c© Koduri Gopala Krishna, 2011
All Rights Reserved
To
Amma, Nanna & Chinni.
Acknowledgements
An individual is moulded by the experiences in life. Few of us are lucky to have valuable experiencesin a conducive environment to grow in. I’m deeply indebted to almost every person I have encounteredin the last few years of my stay at IIIT-H and outside. I’ll go in a chronological order, for it will help mebetter in recollecting most of them.
My journey at IIIT-H commenced with meeting two similar natured guys of coastal Andhra descent,Bharat ram (Ambati) from Vijayawada and Vijay bharat (Yaram) from Guntur. Without the endlessfun episodes of Yaram and mission-critical tutorials of Ambati, I would not have found the place thatinteresting and hospitable. In the course of my stay, I discovered a wide variety of creatures in thejungle, I can fill pages with their names.
Mesmerized by the lectures of Prof. Jawahar in the first year, we three stayed back in the campusin the summer holidays of 2006, with the sole aim of securing our future at the prestigious Center forVisual Information and Technology, under his guidance. Obviously, in the beginning I did not have anyobjective of my own, I was behind what is lucrative in the popular opinion. Prof. Jawahar was, is andwill be one of the most adored professors of IIIT-H. I spent two years in that lab. There, I interactedwith several people. I never saw Rasagna frowning or complaining, however be the day. He shared hisexperiences with a very open heart. He is also the person with whom I can relate to myself in mostcases. Pramod Nair had always been there whenever I needed some guidance. I would call him andsay, ”Anna, chinna salaha kavali..” (Bro, I need an advice), and then it would go on. I partnered withRavindra who thinks very analytically, in several of my course projects. He was a very good partner. Heresembled Buddha, as he never got emotional to which ever extent I freaked.
Above all, it is Prof. Jawahar to whom I’m indebted to. Though I did not take up any seriouscomputer vision stuff while I was in the lab, I was involved in two projects - one related to the work onfont encodings, and the other, document image retrieval. The work on font encodings has later helpedme serve my first love - Open Source and Ethnocomputing. Today, I still devote a significant part of mytime towards it. There are several valuable experiences I have gained in the way - leading a small teamfor the development of an Indic Firefox-plugin (Padma), working as an intern at a company, identifyingthe core problems with Ethnocomputing in India and so on. A very very special thanks to him.
In late 2008 and early 2009, I got interested in cognitive science. I conveyed the same to Prof. Jawa-har, who said that I should pursue whatever my interest is. It is then, I met Prof. Bipin, who warmlywelcomed me into his lab. At first, I did not have a special interest in any particular topic in cognitive
v
vi
science. Without the freedom Prof. Bipin usually leaves his students with, it would have been a real dif-ficult situation for me. I kept jumping from topic to topic; I was intrigued by the cognition of language,role of images in comprehension, narrative structures etc. But finally, I have zeroed down to musiccognition and music information retrieval. By late 2009, I started working seriously on it. The topic Ihave chosen is - explore the musicological literature of India, see what has been done technologically,and address an interesting issue. The interdisciplinary nature of the topic made it difficult to move aheadin a normal phase a typical masters student would do. I’m immensely indebted to Prof. Bipin for hispatience and especially the opportunities he had provided me to learn from.
Prof. Bipin was very kind to me in leaving the freedom to take decisions that would help me. Theinternship opportunity he has provided with Prof. Christophe of IRIT - ENSEEIHT, France during late2009, has deeply impacted my thoughts. It helped me to discover Prof. Christophe’s radically differentviews of human perception, of music in particular. I’m fascinated by the non-statistical approach toaddress problems in information retrieval. Prof. Christophe is a very kind and friendly person. Withouthim, my stay in France, which is my first stay outside India, would have been a nightmare.
I’m very grateful to Prof. Preeti Rao, DAP lab, IIT-B for guiding me in building the raaga recognitionsystem. The three months in summer 2010 I have spent in her lab, have been very fruitful in gettingseveral insights into audio processing. A special thanks to Sankalp Gulati and other DAP lab membersfor making my stay at IIT-B a peaceful and interesting one. I’m also grateful to Dr. SuvarnalathaRao for replying patiently to my queries on Indian classical music. I thank Prof. Navjyoti, PranavKumar Vasishta, Kavita Vemuri, Sai Gollapudi and Violin Vasudevan for providing me valuable contactsand resources on Indian classical music. I thank Anupama, Abhilash, Ambati, Divya and Siva forreviewing the drafts written for conferences. I’m also greatly thankful to Prof. Xavier and Joan, ofMusic Technology Group at UPF in Barcelona, for reviewing parts of this thesis and providing criticaland wonderful feedback.
I feel very lucky to be in the company of my friends at IIIT-H and outside. The experiences weshared are very influential. There are commendable outcomes from the discussions over societal issues- http://team-samvedana.org and http://techsetu.com. There is also a drastic change in my nature andthe way I socialized. My gratitude is inexpressible.
With a weak economic status and a rural background, the firm determination of my parents in pro-viding me a decent schooling is the sole reason why I’m doing something that I’m doing today - thatwhich I liked and enjoyed. No words can possibly express what I owe them.
Abstract
The classical music traditions of the Indian subcontinent, Hindustani and Carnatic, offer an excellentground on which to test the limitations of the current music information research approaches. At thesame time, their study can shed light on how to solve new and complex music modeling problems. Bothtraditions have very distinct characteristics, specially compared with western ones: they have developedtheir own instruments, musical forms, performance practices, social uses and context. In this thesis, wefocus on the Carnatic music tradition of south India, especially on its melodic characteristics.
Raaga is the spine of Indian classical music. It is the single most crucial element of the melodicframework on which the music of the subcontinent thrives. Naturally, automatic raaga recognition isan important step in computational musicology as far as Indian music is considered. It has severalapplications like indexing Indian music, automatic note transcription, comparing, classifying and rec-ommending tunes, and teaching to mention a few. Simply put, it is the first logical step in the process ofcreating computational methods for Indian classical music. In this thesis, we investigate the properties ofa raaga and the natural process by which people identify the raaga. We survey the past raaga recognitiontechniques correlating them with human techniques, in both Hindustani and Carnatic music systems.We identify the main drawbacks and propose minor, but multiple improvements to the state-of-the-artraaga recognition technique.
Music is said to evoke emotions. After the advent of advanced signal processing techniques andeasily accessible computational resources, the scientists and engineers have been trying to understandthe nature of music in this very context. In this context, one of the several aspects of Indian music whichinterests us is the traditional association of emotions with raagas. Besides the ancient scriptures likeNatyasastra, the recent articles of several scholars also associate the raagas with emotions. A part ofour work is dedicated to the investigation of the origin of this association. We discuss the term rasa,often mistaken as emotion. We also report the results of a survey conducted to study the aforementionedraaga-emotion association.
We also overview the other theoretical aspects that are relevant for music information research anddiscuss the scarce computational approaches developed so far. We put emphasis on the limitations ofthe current methodologies and we present some open issues that have not yet been addressed and thatwe believe are important to be worked on.
vii
Contents
Chapter Page
1 Introduction(Why did you end up doing this work?) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Why Carnatic music? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Goals of our work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Contributions from the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Organization of the content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Computational approaches to Indian classical music . . . . . . . . . . . . . . . . . . . . . . 52.1 Computational approaches to melody . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 How do people identify raaga . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.1.1 Non-trained person or the rasika’s way . . . . . . . . . . . . . . . . 62.1.1.2 The trained musician’s way . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Swaras and Srutis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.3 Arohana and Avarohana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.4 Unexploited properties of raaga . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.4.1 Gamakas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.4.2 Various Roles Played by the Notes . . . . . . . . . . . . . . . . . . 13
2.2 Computational approaches to rhythm . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Musical forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Improvisatory Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.1.1 Raaga alapana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.1.2 Taanam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.1.3 Pallavi exposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.1.4 Swara kalpana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.1.5 Niraval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 Composed Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.3.3 Associated work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Raaga and Rasa(History, context, truths and myths) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1 A brief history of rasa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 Origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.2 Rasa nishpatti
(The process of experiencing rasa) . . . . . . . . . . . . . . . . . . . . . . . . 19
viii
CONTENTS ix
3.1.3 Categories of rasa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.4 Rasa in drama and music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.5 Raaga paintings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 A behavioural study to analyse the raaga-emotion relationship . . . . . . . . . . . . . 233.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.3 Differences between Carnatic and Hindustani traditions . . . . . . . . . . . . . 243.2.4 The hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.5 Conceptualization of emotions . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.6 Details of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.6.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.6.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.6.3 Results and observations . . . . . . . . . . . . . . . . . . . . . . . 283.2.6.4 Implications to music recommendation systems . . . . . . . . . . . 31
3.2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Raaga Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2 Problems that need to be addressed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.1 Gamakas and pitch extraction for Carnatic music . . . . . . . . . . . . . . . . 354.2.2 Skipping tonic detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2.3 Resolution of pitch-classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2.4 A comprehensive dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Our method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3.1 Pitch extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3.2 Finding the tuning offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.3.3 Note segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.3.4 Pitch-class profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.3.5 Distance measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Experiment and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.4.2 Classification experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.1 Impact of this work and the future directions . . . . . . . . . . . . . . . . . . . . . . . 455.2 Few guidelines for future students/researchers . . . . . . . . . . . . . . . . . . . . . . 46
Appendix A: Basic Acoustics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48A.1 Demonstration of various physical properties . . . . . . . . . . . . . . . . . . . . . . 48A.2 Sine waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49A.3 Harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50A.4 Timbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51A.5 Frequency measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52A.6 Tuning systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
A.6.1 Equal-temperament . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53A.6.2 Just-intonation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
x CONTENTS
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
List of Figures
Figure Page
2.1 Results of Rajeswari & Geeta’s raaga identification method . . . . . . . . . . . . . . . 10
3.1 Raaga paintings of Vasanta Ragini (left) and Hindola (right) raagas . . . . . . . . . . . 233.2 Average of the ratings collected per rasa, across all users and tunes, for each raaga.
X-axis denotes rasa index and Y-axis denotes the average value of the ratings. . . . . . 283.3 (a) Average rating obtained per rasa for a tune in Nadanamakriya. (b) Standard deviation
of the ratings for the same tune. X-axis denotes the rasa indices. Y-axis denotes thenormalized values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Ratings given by six users for a sample track in each raaga. X-axis denotes rasa indices.Y-axis denotes the ratings quantifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Histograms of ratings obtained per rasa, for the six raagas. X-axis has the four ratingquantifiers - None at all, A Little, Somewhat and Very. Y-axis denotes the number ofratings obtained for each quantifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 Screenshot from the melodic pitch extraction system of [37] showing the detected pitchsuperimposed on the signal spectrogram. The axis on the right indicates pitch value (Hz). 37
4.2 Note segmentation and labeling. Thin line: continuous pitch contour; Thick line: de-tected stable note regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.1 Wave pattern generated by a plucked string . . . . . . . . . . . . . . . . . . . . . . . 48A.2 Condensation and rarefaction represented as a sine wave . . . . . . . . . . . . . . . . 49A.3 Examples of sine waves with high and low frequencies . . . . . . . . . . . . . . . . . 50A.4 Possible harmonics in a given string . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
xi
List of Tables
Table Page
2.1 The scales used in Indian classical music . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 The values of 22 Srutis derived by [41] . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Accuracy of raaga identification reported in [31] . . . . . . . . . . . . . . . . . . . . . 112.4 Comparison of various classifiers used in Chordia and Rae’s system . . . . . . . . . . 12
3.1 Categories of rasas as given in Natyasastra . . . . . . . . . . . . . . . . . . . . . . . . 213.2 An Emotion Classification Based on Navarasa . . . . . . . . . . . . . . . . . . . . . . 263.3 Raagas and their intended rasas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1 Description of the dataset across 10 raagas. . . . . . . . . . . . . . . . . . . . . . . . 404.2 Performance of weighted-k-NN classification with various pitch-class profiles . . . . . 41
A.1 Fundamental and its harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
xii
Chapter 1
Introduction
(Why did you end up doing this work?)
1.1 Motivation
In the last decade, the music information research played a vital role in the commercial music recom-mendation services in the western music industry. Examples include last.fm1, pandora2 etc. However,such services for Indian music were not encountered till date. Few scarce web music services likeraaga.com3, work merely with textual metadata. Even the current available web and desktop-basedservices for syncing or getting the metadata from the web are very western centric. The metadata ofa typical Indian film song is much more than that is allowed in Musicbrainz [52]. For instance, theinvolvement of various artists in creating an Indian film song can not be completely accounted for, usingthe schema used by, say Musicbrainz. This is because the involvement of various kinds of artists differin, say western pop and Indian films, and it is difficult to reflect these differences using, say Musicbrainz,without compromise. And majority of the content-based music recommendation algorithms found inopen source media players [57] are not at all suitable for Indian music - classical, film or otherwise.
In an attempt to build a web based service with metadata syncing and content based music recom-mendation for Indian film music, we surveyed the related music information research and musicologicalliterature. Methods proposed in several publications were based on western concepts which were notsufficient/relevant in the context of Indian film music. For instance, the concept of genre has no mean-ing as far as most film music is concerned. Classification of film songs requires a radically differentformulation, which, to our knowledge, has not been attempted. However, we found a few publicationsinteresting [24, 8]. They attempt to classify Indian classical music based on raagam and taalam [49]. Wealso encountered an interesting theoretical mood-based music classification in Indian classical music,which we found to be a scarcely researched topic.
1http://last.fm2http://pandora.com3http://raaga.com
1
In this thesis, we focus on Carnatic music and survey a few relevant computational models researchedin the past. We propose few enhancements to the state-of-the-art in the raaga recognition. Further, wealso investigate the raaga-emotion association with a behavioral study.
1.2 Why Carnatic music?
In the myriad of world music traditions, Indian classical music has few unique properties, as we’llsee in a detailed manner in the following chapter. There are two classical music traditions in India.Owing to the popularity of Pandit Ravi Shankar4 and The Beatles5 in the west, Hindustani, which isthe north Indian classical music, is often mistaken as the classical music tradition throughout India.The four south Indian states which form a large part of India - Andhra Pradesh, Karnataka, Kerala andTamilnadu - and parts of Maharashtra, Orissa have a distinct classical music tradition called Carnaticmusic.
Ever since India had seen the invaders, the evolution of the classical music tradition in India tooktwo different paths. In the north region, where it was greatly influenced by Sufi, it is called Hindustani.In south India, where it was less influenced, it is called Carnatic music. Carnatic tradition has adaptedsomething from outside, only if it proved to uphold the innate characteristics of the tradition. Violinstands as a living testimonial to this fact. The ability of the instrument to imitate human voice is verycrucial for its use in Carnatic music, which is full of gamakas, the curvy movements between notes.Though there are several commonalities between Carnatic and Hindustani, the differences are notableand very significant. In chapter 3, we outline few such differences between the two traditions.
Therefore, to be specific with what we are working with, we have chosen Carnatic music6. Moreover,the two classical music traditions of India have extensive musicological literature, and a few existingcomputational attempts that can help us in our investigation. Film music, however, neither have theextensive literature, nor any existing computational models. Since none of us are musicians by profes-sion, we chose classical music, for we can seek the guidance from the available literature during ourinvestigation.
1.3 Goals of our work
To our knowledge, this is the first thesis on computational and theoretical aspects of Carnatic music,presenting a thorough overview of the current state-of-the-art, and discussing several open issues that arecomputationally relevant in the realm. Though there are several musicological works in the past, therehas not been much discussion with an emphasis to build computational models barring few exceptions
4http://ravishankar.org5http://en.wikipedia.org/wiki/The_Beatles6It is also a natural choice since we are natives of the state of Andhra Pradesh.
2
[51] [53]7. As is the case with any area of research which is mostly untouched, it has been tough tochoose a narrow topic to work on. We chose these two broad aspects of Carnatic music to investigate indepth.
• Carnatic raaga recognition.
• The raaga-rasa relationship.
1.4 Contributions from the thesis
This thesis is intended to open doors to a new type of music for the scientific community to workon, than to propose solutions with a significant lap over the state-of-the-art. But we do report ourinvestigations in the two broad aspects which we have just listed. The following are the outcomes of thethesis:
1. Critical analysis of raaga-rasa relationship and a survey to test the hypothetical association be-tween them. Almost every scholarly article available on Indian music treats the term rasa asthough it is identical to the term emotion. In the course of our work, we used the term with thesame sense. But our investigation has yielded an insight which is very different from the currentunderstanding of the term. In this work, we report the results of a survey conducted to analyse therelationship between raaga and emotion, and discuss the the term rasa.
2. A survey of the state-of-the-art raaga recognition techniques identifying the drawbacks and thefuture directions. We present a general overview of the plausible approaches to the raaga recog-nition, and discuss various systems with respect to their contributions and drawbacks.
3. A raaga recognition system. To know a raaga, it is often said that there is no other way exceptto listen and feel it. This well defined, yet an abstract entity drew our attention to build a modelto identify the raaga of a given musical piece. In this work, we discuss the previous work andpropose few enhancements to a Hindustani raaga recognition model to suit the requirements ofCarnatic music, and discuss the results.
Apart from these major contributions, the thesis also includes the following minor contributions.
1. Discussed various concepts of Carnatic music like rasa, 22 srutis and microtonal intervals inthe light of knowledge shared by the recent investigations of the scientific community. We alsopresent several other open problems that are computationally relevant.
2. Built a ground-truth dataset of 10 raagas with 170 tunes. This dataset is drawn from real stageconcerts and audio CDs. As far as we know, it is also by far the most diverse Carnatic raagadataset reported.
7This thesis work which I have discovered only recently, has been done almost parallelly in IIT-Madras on other aspects ofCarnatic music.
3
1.5 Organization of the content
Chapter 2: This chapter introduces melodic, rhythmic and structural aspects of Carnatic music,coupled with the critical reviews of the past computational work. We focus on the drawbacks andpropose few enhancements. Further we discuss the advantages of computational modeling of variousmusical aspects of Carnatic music.
Chapter 3: We discuss the term rasa with a historical perspective, and present our criticism towardsits usage in today’s Indian classical music context. A thorough analysis of a survey on raaga and emotionis also presented and its implications on mood based music recommendation systems is discussed.
Chapter 4: In this chapter we identify few drawbacks of the current raaga recognition systems andpresent our method which attempts to overcome them. We’ll also present a new Carnatic raaga ground-truth data which can help researchers in their future efforts.
Chapter 5: We conclude the thesis presenting few open problems to the community, and also thepossible future direction of this work.
4
Chapter 2
Computational approaches to Indian classical music
Though all music traditions share few characteristics, each one can be recognized by some veryparticular features that need to be identified and preserved. The Information Technologies used for musicprocessing have typically targeted the western music traditions and current research is emphasizing thisbias even more. However, to develop technologies that can deal with the richness of our world’s musicwe need to study and exploit the unique aspects of other musical cultures. By looking at the problemsemerging from various musical cultures we will not only help those specific cultures but we will openup our computational methodologies, making them much more versatile. In turn, we will help preservethe diversity of our world’s culture.
The classical music traditions of the Indian subcontinent, Hindustani and Carnatic, offer an excellentground on which to test the limitations of the current music information research approaches. At thesame time, their study can shed light on how to solve new and complex music modeling problems.Both traditions have very distinct characteristics, specially compared with western ones: they havedeveloped their own instruments, musical forms, performance practices, social uses and context. Likewe said, in this thesis, we focus on the Carnatic music tradition of south India, especially on its melodiccharacteristics.
The computational study of Carnatic music offers a number of problems that require new research ap-proaches. Its instruments emphasize sonic characteristics that are quite distinct and not well understood.The concepts of Raaga and Taala are completely different from the western concepts used to describemelody and rhythm. Their music scores serve a different purpose than the ones of western music. Thetight musical and sonic coupling between the singing voice, the other melodic instruments and the per-cussion accompaniment within a piece, requires going beyond the modular approaches commonly usedin music information research (MIR). The tight communication established in concerts between per-formers and audience offer great opportunities to study issues of social cognition. Its devotional aim isfundamental to understand the music. The study of the lyrics of the songs is also essential to understandthe rhythmic, melodic and timbre aspects of the Carnatic music.
This chapter focuses on the melodic (Sec 2.1) and rhythmic (Sec 2.2) aspects of Carnatic music,overviewing the theoretical aspects that are relevant for MIR and discussing the scarce computational
5
approaches that have been presented. We put emphasis on the limitations of the current methodologiesand we present some open issues that have not yet been addressed and that we believe are important tobe worked on.
2.1 Computational approaches to melody
In Carnatic music, the melody is carried mainly by the vocalist. The voice plays always the centralrole, however, sometimes instruments like violin or veena take its place, usually imitating its mannerof articulating. The most fundamental melodic concept in Indian classical music is raaga. Matanga isthe first known person to define what a raaga is [45]: “In the opinion of the wise, that particularity ofnotes and melodic movements, or that distinction of melodic sound by which one is delighted, is raaga”.Therefore, the raaga is neither a tune nor a scale[32]. It is a set of rules which can together be called amelodic framework. The notion that a raaga is not just a sequence of notes is important in understandingit, and for developing a computational representation. A raaga evolves over time, i.e. no raaga wasunderstood the way it is today. A given raaga can nonetheless be described by a set of properties: Aset of notes (swaras), their progressions (arohana/avarohana), the way they are intoned using variousmovements (gamakas), characteristic phrases and the relative position, strength and duration of notes(types of swaras). In order to identify raagas computationally, swara intonation, scale, note progressionsand characteristic phrases are used (Sec 2.1.2 and 2.1.3). Other unexploited properties of a raaga includegamakas and the various roles the swaras play (Sec 2.1.4).
2.1.1 How do people identify raaga
Though there are no rules of thumb in identifying a raaga, usually there are two procedures by whichpeople get to know the raaga from a composition. It normally depends on whether the person is a trainedmusician or a rasika, the non-trained but knowledgeable person. People who have not much knowledgeof raagas cannot identify them unless they memorize the compositions and their raagas.
2.1.1.1 Non-trained person or the rasika’s way
In a nutshell, the procedure followed by a rasika typically involves correlating two tunes based onhow similar they sound. Years of listening to tunes composed in various raagas gives a listener enoughexposure. A new tune is juxtaposed with the known ones and is classified depending on how similar itsounds to a previous tune. This similarity can arise from a number of factors - the rules in transitionbetween notes imposed by arohana and avarohana, characteristic phrases, usage-pattern of few notesand gamakas.
This method depends a lot on the cognitive abilities of a person. Without enough previous exposure,it is not feasible for a person to attempt identifying a raaga. There is a note worthy observation in thismethod. Though the people cannot express in a concrete manner what a raaga is, they are still able to
6
identify it. This very fact hints at a possible classifier, that can be trained with enough data for eachraaga.
2.1.1.2 The trained musician’s way
A musician tries to find the characteristic phrases of the raaga. These are called pakads in Hindustanimusic and swara sancharas in Carnatic music. If the musician finds these phrase(s) in the tune beingplayed, the raaga is immediately identified. But at times these phrases might not be found or, are toovague. In this case, the musicians play the tune on an instrument (imaginary or otherwise) and identifythe swaras being used. They observe the gamakas used on these swaras, locations of various noteswithin the music phrases and the transitions between swaras. They use these clues to arrive at a raaga.
This method seems to use almost all the characteristics a raaga has. It looks more programmaticin its structure and implementation. If the current music technology can afford to derive various lowlevel features which can be used to identify such clues, the same procedure can be implemented com-putationally with almost perfect results! These two methods corresponding to the trained musiciansand the non-trained listeners are both important which are to be understood for implementing a raagarecognition system, or to model the raaga in a broad sense.
2.1.2 Swaras and Srutis
In Indian music, swaras are the seven notes in the scale, denoted by Sa, Ri, Ga, Ma, Pa, Da and Ni1
[43]. Except for the tonic and the fifth, all the other swaras have two variations each, which account for12 notes in an octave, called swarasthanas. There are three kinds of scales that one generally encountersin Carnatic and Hindustani music theory: a 12-note scale, a 16-note scale and the scale which claims 22srutis2. The 16-note scale is the same as the 12-note scale except that 4 of the 12 notes have two nameseach in order to be backward compatible with an older nomenclature. See Table 2.1. The tuning itself,whether it is just-intonation or equi-tempered, is an issue of debate3 [22]. Since Indian classical musicis an orally transmitted tradition, perception plays a vital role. For instance, tuning seldom involves anexternal tool. And even tambura, which is used as a drone, has a very unstable frequency. Hence theanalysis of the empirical data coupled with perceptual studies are important.
Few musicians and scholars claim that there are more srutis in practice than those explained above.Though many of them argue the total number to be 22, that itself is debated [18]. A more importantquestion to be asked is whether they are used in current practice at all. Some musicologists say that theyare no more used [35]. It is also said that they are wrongly attributed to Bharata, who used sruti to mean“the interval between two notes such that the difference between them is perceptible”. Krishnaswamy[23] argues that the micro tonal intervals observed in Carnatic music are the perceptual phenomena
1This notation is analogous to e.g. Do, Re, Mi, Fa, So, La and Ti.2Sruti is the least perceptible interval as defined in Natyasastra[36]3http://cnx.org/content/m12459/1.11
7
Table 2.1: The scales used in Indian classical music
Swaram Notation Western Sthanam RatioSadjamam Sa C 1 1Suddha Rishabam (Komal) Ri1 C # 2 16/15Chathusruthi Rishabam (Tivra) Ri2 D 3 9/8Shatsruthi Rishabam Ri3 D #/ E b 4 6/5Suddha Gandharam Ga1 D 3 9/8Sadharana Gandharam (Komal) Ga2 D # /E b 4 6/5Anthara Gandharam (Tivra) Ga3 E 5 5/4Suddha Madhyamam (Komal) Ma1 F 6 4/3Prati Madhyamam (Tivra) Ma2 F #/G b 7 64/45Panchamam Pa G 8 3/2Suddha Dhaivatham (Komal) Da1 G #/A b 9 8/5Chathusruthi Dhaivatham (Tivra) Da2 A 10 5/3Shatsruthi Dhaivatham Da3 A #/ B b 11 16/9Suddha Nishadam Ni1 A 10 5/3Kaisiki Nishadam (Komal) Ni2 A #/B b 11 16/9Kakali Nishadam (Tivra) Ni3 B 12 15/8
caused by the gamakas, i.e. that these micro tonal intervals are what few scholars and musicians claimas 22 srutis. However, we believe that these claims need to be verified with perceptual and behavioralstudies. In general, more empirical, quantitative and large-scale evidence on the tuning of Carnaticmusic needs to be gathered. In our encounters with most musicians, we can only conclude that they areunaware of the usage of 22 srutis in practice. Few musicians who claim they are used, are not ready todemonstrate them in a raaga. Table 2.2 shows the 22 sruti values derived by Sambamurthy [41].
It is a well accepted notion that a note (swarasthana) is a region rather than a point [13, 43]. Thus, a
fixed tuning for each note is not as important as it is in, say, western classical music. In addition, Sa can
be any frequency. It depends on the comfort of the singer or the choice of the instrument player. A given
note is intoned in different ways for each raaga. Even if two raagas have the same scale, the intonation
of notes vary significantly. Belle et al [4] have used this clue to differentiate raagas that share the same
scale. They evaluated their system on 10 audio excerpts accounting for 2 distinct scale groups (two
raagas each). They showed that the use of swara intonation features improved the accuracies achieved
with pitch-class distributions [8]. This clearly indicates that intonation differences are significant to
understanding and modeling raagas computationally. Levy [29] analyses the intonation in Hindustani
raaga performances and notes that it is highly variable, and that it does not seem to agree with any
standard tuning system. Subramanian [51] reports much the same for Carnatic music. These studies call
8
Table 2.2: The values of 22 Srutis derived by [41]
Name of Sruti Notation Ratio Interval Freq (Hz) Interval (cents) Equi-temp ratioShadja sa 1 240 0Ekasruti Rishabha ra, r1 256/243 1.0534 252.8 90 1.05946Dvisruti Rishabha ri, r2 16/15 1.0125 256 112Trisruti Rishabha ru, r3 10/9 1.0416 266.6 182Chatussruti Rishabha re, r4 9/8 1.0125 270 204 1.1224Suddha Gandhara Or KomalSadharana Gandhara
ga, g1 32/27 1.0534 284.4 294 1.1891
Sadharana Gandhara gi, g2 6/5 1.0125 288 316Antara Gandhara gu, g3 5/4 1.0416 300 386 1.2599Chyuta Madhyama GandharaOr Pythagorean Major 3rd
ge, g4 81/64 1.0125 303.75 408
Suddha Madhyama ma,m1 4/3 1.0534 320 498 1.3348Tivra Suddha Madhyama mi,m2 27/20 1.0125 324 520Prati Madhyama mu,m3 45/32 1.0416 337.5 590 1.4147Chyuta Panchama Madhyama me,m4 729/512 1.0125 341.7 610
64/45 1.0113 341.3Panchama pa 3/2 1.0534 360 702 1.4982Ekasruti Dhaivata dha,d1 128/81 1.0534 379 792 1.5873Dvisruti Dhaivata dhi,d2 8/5 1.0125 384 814Trisruti Dhaivata dhu,d3 5/3 1.0416 400 884Chatussruti Dhaivata OrPythagorean Major 6th
dhe,d4 27/16 1.0125 405 906 1.6817
Suddha Nishada Or KomalaKaisiki Nishada
na,n1 16/9 1.0534 426.6 996 1.7817
Kaisiki Nishada ni,n2 9/5 1.0125 432 1018Kakali Nishada nu,n3 15/8 1.0416 450 1088 1.8876Chyuta Shadja Nishada OrTivra Kakali Nishada OrPythagorean Major 7th
ne,n4 243/128 1.0125 455.6 1110
Tara Shadja sa 2 1.0534 480 1200 2
for the need to understand the extent to which a given note can be intoned. In particular, this could be
of interest to differentiate artists and styles.
All these works indicate that a complete characterization of swarasthanas must go beyond static
frequency measurements and that their dynamics need to be considered. The problem implies much
more than trying to discriminate whether swarasthanas are tuned to just-intonation, equi-tempered or
following 22 srutis. Much empirical data like the one reported in [51] and [29] needs to be gathered to
investigate the intervals, the range of intonations and the temporal evolution of each swarasthana.
2.1.3 Arohana and Avarohana
Typically, a raaga is represented using ascending (arohana) and descending (avarohana) progressions
of notes. There are certain note transition rules that are necessary to be followed when performing a
raaga. The set of unique notes in these progressions form a scale. For raaga identification, Rajeswari
et al [49] estimate the scale from the given tune, and compare it with the template scales in a given
database. Then, the raaga corresponding to the best match is output. Their test data consists of 30
9
tunes in 3 raagas sung by 4 artists. They use the harmonic product spectrum algorithm [28] to extract
the pitch, and the tonic is manually fed. The other frequencies in the scale are marked down based on
the respective ratio with the tonic. The results obtained are shown in Figure 2.1, which shows a 67%
accuracy. The authors claim that such a low accuracy could be due to discrepancies in the manually fed
tonic. But considering that their system identifies only the swaras that are used in a raaga and no other
relevant data, the result shows that the swaras alone can be very useful. However, there are raagas which
have the same swaras. Since the scales of the raagas they considered are different, this is not an issue.
Figure 2.1: Results of Rajeswari & Geeta’s raaga identification method
Shetty et al[46] use a similar approach when they try to recognize raagas. The features extracted are
the individual swaras and their relation in arohana-avarohana (swara pairs). The sequence of features
is used for training a neural network. They report an accuracy of 95% over 90 tunes from 50 raagas,
using 60 tunes as training data and the remaining 30 tunes as test data. However, such a high accuracy
is questionable due to the few data per class used.
Sahasrabudde et al [40] model the raaga as finite automata. A finite automata has a set of states
between which the transitions take place. In the case of raaga, the swarasthanas are the states and the
note transitions are observed. This idea is used to generate a number of music compositions in the
form of symbolic notation, which they claim are technically correct and indistinguishable from human
compositions. Inspired by this, Pandey et al [31] use HMM models to recognize the raagas. The rules
to form a melodic sequence for a given raaga are well defined [41] and the number of notes is finite.
Therefore, intuitively, HMM models should be good at capturing those rules in note transitions imposed
by arohana and avarohana patterns.
10
Raaga Samples HMM HMM +Phrase matching
Yaman Kalyan 15 80% 80%Bhupali 16 75% 94%Total 31 77% 87%
Table 2.3: Accuracy of raaga identification reported in [31]
Each raaga has also few characteristic phrases. They are called swara sancharas in Carnatic and
pakads in Hindustani. These phrases are said to be very crucial for conveying the feeling of the raaga.
Typically, in a concert, the artist starts by singing these phrases. They are the main clues for the listeners
to identify which raaga it is. Pandey et al have complemented their approach with values obtained
from two modules that match characteristic phrases, taking advantage of this information. In one such
module, characteristic phrases are identified with a substring matching algorithm. In the other one, they
are identified by counting the occurrences of frequency n-grams in the phrase.
The other important contributions by Pandey et al include two heuristics to improve the transcription
of Indian classical music: the hill peak heuristic and the note duration heuristic. As mentioned, Indian
music has a lot of micro tonal variations which makes even the monophonic note transcription a chal-
lenging problem [31]. The two heuristics proposed in their approach try to get through these micro tonal
fluctuations in attaining a better transcription. The hill peak heuristic states that a significant change in
the slope of a pitch contour (or the sign reversal of such slope) is closely associated with the presence
of a note. The note duration heuristic considers only the notes that are played for at least a certain span
of time. The approach was tested on two raagas. Table 2.3 shows the results obtained by using HMM
models alone, and by complementing the models with characteristic phrase matching. Not much can be
said about the reliability of the features they have used since the number of classes considered are just
two. But the advantage of characteristic phrase matching is evident. HMM models of raaga are also
used by Sinith et al [47] to search for musical patterns in a catalog of monophonic Carnatic music. They
build HMM models for 6 typical music patterns corresponding to 6 raagas (they report a 100% accuracy
in identifying an unknown number of tunes into 6 raagas). HMMs are also used by Das and Choudary
[12] to automatically generate Hindustani classical music.
Chordia and Rae [8] use pitch class profiles and bi-grams of pitches to classify raagas. The dataset
used in their system consists of 72 minutes of monophonic instrumental (sarod) data in 17 raagas played
11
Classifier AccuracyMulti Variate Normal 94%FFNN 75%K-NN Classifier 67%Tree-based Classifier 50%
Table 2.4: Comparison of various classifiers used in Chordia and Rae’s system
by a single artist. Again, the harmonic product spectrum algorithm [28] is used to extract the pitch.
Note onsets are detected by observing the sudden changes in the phase and the amplitude of the signal.
Then, the pitch-class profiles and the bi-grams are calculated. It is shown that bi-grams are useful in
discriminating the raagas with the same scale. They use several classifiers combined with dimensionality
reduction techniques. The feature vector size is reduced from 144 (bi-grams) + 12 (pitch profile) to 50
with PCA. Using just the pitch class profiles, the system achieves an accuracy of 75%. Using only bi-
grams of pitches, the accuracy is 82%. Best accuracy of 94% is achieved using maximum a posteriori
rule with multi-variate likelihood model. Comparison to other classifiers is shown in Table 2.4.
2.1.4 Unexploited properties of raaga
2.1.4.1 Gamakas
The various forms of pitch movements are together called as gamakas. A sliding movement from
one note to another or a vibrato are examples of gamakas. There are various ways to group these
movements, but the most accepted classification speaks of 15 types of gamakas. Gamakas are not just
decorative items or embellishments, but very essential constituents of a raaga [18]. Each raaga has
gamakas characteristic to its nature. Thus the detection of gamakas is a crucial step to model and
identify raagas.
A gamaka is often represented using discrete notes, but it does not necessarily mean that one plays
them using discrete notes. The representation is only a handy expression of a more continuous sounding
pattern, which is difficult to represent on the paper. The gamaka is almost always a smooth change in
the dynamics of a pitch contour. Though they are used in both Carnatic and Hindustani [33], the pattern
of usage is very distinct. Owing to their tremendous influence on how a tune sounds, they are often
considered the soul of Indian classical music.
12
There are two major issues that make identifying a gamaka a challenging problem. First, it requires
accurate pitch transcription, without octave errors. Second, the variations found for different artists
in performing a gamaka complicate it further. Krishnaswamy [25] and Subramanian [51] report such
variations across different artists performing the same gamaka. They also propose some theoretical
guidelines to resolve the second problem to some extent. These variations should be exploited in per-
formers’ computational modeling, a field that lacks much research in the case of Indian classical music.
2.1.4.2 Various Roles Played by the Notes
In a given raaga, not all the notes play the same role. Though two given raagas have the same set of
constituent notes, their functionality can be very different, leading to a different feeling altogether [55].
For example, some swaras occur frequently, some are prolonged, some occur either at the beginning or
the end of the phrases, etc.
In addition, there are alankaras, patterns of note sequences which are supposed to beautify and instill
feelings when listened to.
Though emotion is a subjective issue, it gets into almost every discussion involving raagas. That
is because each raaga is said to evoke characteristic emotions. To test this hypothesis, Chordia and
Rae [9] have conducted a survey to check whether Hindustani raagas elicit emotions consistently across
listeners. Positive results are reported, jointly with the musical properties like relative weight of the
notes, which partially explain the phenomenon. Koduri et al4 [21] have conducted a similar survey with
Carnatic raagas. Though not as significant as the pattern reported by Chordia et al, the results indicate
that Carnatic raagas elicit emotions which are consistent across listeners. Wieczorkowska et al [56] tests
if raagas elicit emotions, and also arrives at a mapping between melodic sequences of 3 or 4 notes and
the elicited emotions. Their work suggests that different compositions in the same raaga might elicit
different emotions, what is consistent with the observations made by Koduri et al [21]. Wieczorkowska
et al note that these melodic sequences are related vaguely to the subjects’ emotional responses. Another
interesting observation is the significance in the similarity between the responses of people from various
cultures, which is consistent with the observations made in a previous study conducted by Balkwill et al
[2].
The previous work has verified whether raagas elicit emotions, and tried to map the musical features
which are responsible for such phenomenon. Besides the note sequences, another important aspect
4This work is discussed in detail in Chapter 3
13
which is responsible for emotional aspect of Indian classical music, is gamaka. So far, there are no
studies which report their effect. The kind of instruments used and the rhythmic aspects also need to be
accounted for.
2.2 Computational approaches to rhythm
In Indian classical music the rhythm is carried by mrudangam in Carnatic music, a cylindrical drum
with two faces, and tabla, a pair of kettle drums in Hindustani music. While raaga is the most funda-
mental concept related to melody, taala is the most fundamental concept related to rhythm [55]. A taala
is a rhythmic cycle, which is divided into specific uneven sections, each of them subdivided into even
measures. The first beat of each taala section is accented, with notable melodic and percussive events.
Further rhythmic variations are developed along the cycle, giving the taala a unique pattern of bols. A
bol is the main unit in learning and playing taalas. It is a mnemonic associated to the sound produced
by either a single stroke or a combination of multiple strokes.
Though the concept of taala is used in both Carnatic and Hindustani music, there are slight differ-
ences. For example, based on the composition and the order of sections, there are seven classes of taalas
in Carnatic music. Sambamoorty [41] lists all such taalas and provides the description for each. On the
other side, in Hindustani music, there are hundreds of taalas that have been used and are mentioned, but
nowadays just about ten of them are more common with their known variations [11].
The first research on the acoustics of Indian drums was conducted by Raman [34] who studied the
relationship between the vibrational modes of the drum’s membrane and the harmonic overtones of
the sound, which allow the drum to be finely tuned. In general, the current MIR research on rhythm
deals with detecting drum strokes in monophonic context. Gillet and Richard [16] segmented tabla
solo recordings in order to build a database of bols. Then they used a probabilistic approach based
on HMMs to label the specific bols. The resultant transcription system was embedded in a real time
environment called Tablascope [16]. Chordia and Rae [10] built a system to recognize bols, as part of
an automatic tabla-solo accompaniment software, Tabla Gyan. Their research extends the studies of
Gillet and Richard on categorizing bols using different classifiers: Multivariate Gaussian, probabilistic
neural network and feed forward neural network. Classification accuracies of 92%,94% and 84% over
10-15 classes were obtained, respectively [6]. Chordia and collaborators’ research focuses more on the
tabla solos and the logic of building the improvisation sequences and ornamentation. Chordia et al [7]
14
have described a system which predicts the continuation of tabla compositions, using a variable length
n-gram model, to attain an entropy rate of 0.780 in a cross-validation experiment.
Bol transcription in a polyphonic context is an open topic. The current MIR research on drum
transcription uses a small number of drum stroke classes. Each class is associated with a specific (single)
drum, usually based on the typical drum set. Conversely, with tabla and mrudangam, multiple classes
(bols) are associated to each drum. Considering that they are tuned, such information can be used,
besides timbre, to separate the classes. A more ambitious topic is the classification of taalas, which
currently can only be performed by musicians and trained audience. As the musician always tries
to embellish the taala, there is a strong variation from performance to performance. Therefore, the
main goal would be to gain insensitivity to these variations in order to classify taalas or, otherwise, to
model these variations for understanding performance and improvisation. Indeed, there is a well-defined
structure to improvisation which should be exploited [18].
2.3 Musical forms
Apart from melodic and rhythmic aspects of Carnatic music, another important concept is the musical
form. It is the way a given song is organized or perceived. During the evolution of Carnatic music, there
are several historical stages one can refer to as having a number of such musical forms. This section
briefly discusses the need for identifying them computationally, and its advantages. Our discussion on
musical forms is limited in its scope to a few of the current forms in Carnatic music. Sambamurthy [41]
and Janakiraman [18] thoroughly discuss them in a historical perspective.
There are a number of ways to classify a given musical form. A classification scheme which is com-
putationally relevant and therefore discussed here is, the classification of Manodharma5 / Improvisatory
and Kalpita6 / Composed forms of music.
2.3.1 Improvisatory Forms
There are five kinds of improvisatory forms - Raaga alapana, Taanam, Pallavi exposition, Swara
kalpana and Sahitya prastara or Niraval. We breifly discuss them and describe the need to disntinguish
them computationally.
5Manodharma indicates acting according to ones heart.6Kalpita literally means that which was already created.
15
2.3.1.1 Raaga alapana
The artist improvises in a raaga without any rhythmic constraints. The improvisation is done using
syllables which does not have any meaning, such as vowels. It does not involve any accompaniment. It
normally has three-four stages. During the first stage the artist gives direct hints about the raaga, such
as the swara sancharas7. The second stage is the most elaborate one which constitutes singing in three
octaves and various speeds in the given raaga. The third and fourth stages conclude the alapana.
2.3.1.2 Taanam
Taanam involves permuting and combining swaras. It starts with as few as 2 or 3 swaras a time, and
once the artist runs out of more combinations, a swara is added. Taanam is sung using few auspicious
words such as anandam, anantam and taanam. There is a perceptible rhythmic component unlike
alapana. Indeed, often when pallavi of kruti is preceded by alapana, it is customary to sing a brief
taanam as a conclusion to alapana.
2.3.1.3 Pallavi exposition
The word pallavi comes from three words, pada laya vinyasam, which literally translates to the grand
showcase of magic with the words and the rhythm. The artist’s virtuosity in rhythmic aspects is put to
a tough test in this section. It consists of playing the pallavi, which is the first stanza of a kruti, in three
speeds keeping the taala constant. So, the same pallavi, if it sung in t seconds the first time, it is sung
twice in the same t seconds next time, and four times the next time. This pattern is called anuloma.
There is another called pratiloma, which is the converse of it. The pallavi exposition also consists of
sangatis. They are the subtle melodic variations introduced each time the artists sings the stanza.
2.3.1.4 Swara kalpana
The phrase itself conveys that it is the creative imagination of swara patterns. This musical form is
sung with solfege syllables. This form is rhythmically constrained too. Each melodic phrase constructed
with swaras is sung to fit in a cycle of taala. The phrase presents a complete melodic picture, that is
each taala cycle accommodates a complete musical phrase. The length of such patterns grows as the
artist progresses. There are several decorative structures like alankaras that are used in constructing
7Swara sancharas are the characteristic phrases of a raaga.
16
interesting patterns. In old literature few such alankaras are mentioned as gamakas. But now, they are
well distinguished from each other.
2.3.1.5 Niraval
This is a very interesting form which brings the two most important elements together, namely the
sahitya/lyrics and sangeeta/music. Lyrics are almost indispensable part of Carnatic music. So much
so that even the training which involves just the instruments like violin, is bound to include the lyrical
aspect since it inherently carries with it the context and more importantly, the musical structure. For
instance, the pauses, the flexibility to stretch or break few phrases are important for the artist to improvise
[18].
In Niraval, the artist chooses a phrase which is deep in meaning and reflective of the poem/kruti and
presents it with varying melodic formations. The goal is to sketch the hidden bhavas/feelings in the
kruti. This is done by elaborating the central idea musically. The variations in such melodic formations
include changing gamakas, pauses, stressed points etc.
2.3.2 Composed Forms
The composed forms are primarily of two kinds - abhyasa gana and sabha gana. The forms which
are classified as abhyasa gana are helpful to a student in gaining insights into the raaga and the taala,
incrementally. Examples include swarajatis, varnams etc. Each of them is an exercise to practise an
aspect of the raaga or the taala. For instance, a form called jatiswara helps the student in learning
the intricacies of the taala. The forms which are classified as sabha ganas are sung on stage. Krutis,
Keertanas etc are the examples of sabha gana. These are more elaborate and also accommodate the
artistic improvisations. For instance, when kruti is sung as the main piece in a concert, it features the
alapana, pallavi exposition, neraval and swara kalpana.
2.3.3 Associated work
No research has been found in understanding the musical forms computationally. Improvisation is
an indispensable element of Indian classical music. A significant effort in teaching goes in training the
students to be creative, using the basic building blocks taught as part of the improvisatory forms. They
are musically very structured and provide a good scope to be explored programatically. Understanding
17
how a particular form evolves helps us to see the role played by properties of the raaga and the structure
of the taala in shaping it. Further, each artist has his/her own style of improvisation. Thus, studying the
evolution of musical forms facilitates modeling the performances and artists.
18
Chapter 3
Raaga and Rasa
(History, context, truths and myths)
3.1 A brief history of rasa
Music is certainly known to evoke feelings. An example which everyone is familiar with is a lullaby
the mother sings to put a baby to sleep. Those babies barely understand the words used in the lullaby.
What soothes them is the melody. Another familiar example is the war music. Traditionally, raagas are
often associated with the feelings they are said to evoke, called rasas. In this section we’ll learn more
about them.
3.1.1 Origin
The term Rasa in the context of arts, to our knowledge, was first used in Natyasastra, a magnum-
opus on theatrical arts, especially the drama. The text opines that there is no limit to the continuum
of emotions a human can express. However, it propagates eight distinct rasas which are experienced
by amalgamation of the emotions. Please note that emotions are well recognised, distinct and different
from rasas. Rasa is not an atomic entity. It is a process though which the audience goes through whilw
watching a drama. This process is outlined in the following subsection.
3.1.2 Rasa nishpatti
(The process of experiencing rasa)
Nishpatti literally means production. Rasa is best understood with an analogy. Think of any dish that
you like the most. There are several condiments which are included in making it. Those condiments
19
correspond to several flavours. The final taste obtained in the dish using all those spices, is analogical to
rasa. Here, the spices are bhavas/emotions. Now, we will layout the actual process of experiencing the
rasa. We’ll run a simultaneous example to help the reader comprehend the process easily.
In all, there are around 49 bhavas listed in Natyasastra. These bhavas are sub classified into three
groups.
1. Sthayi (8)
2. Satvika (8)
3. Sanchari (33)
The sanchari bhavas are the temporary feelings or emotions which a person goes through before
attaining a rasa. The satvika bhavas are the physiological responses, and the sthayi bhavas are the final
emotional states a person attains.
1. Vibhaava (The stimulus): Let us consider an example - a small incident with a father and his kid.
Let us say the father is not very happy with the kid’s mischievous nature. Now the cause of this
worry can be from two kinds of sources.
(a) Alambana vibhaava (From the actual source): The father might have seen the kid doing
some mischief. This is the direct source of information.
(b) Uddeepana vibhaava (That which reinforces the information from the actual source): A
neighbour might have complained to the father about the kid’s mischievous behaviour. This
is not a direct source of information.
2. Anubhaava (Involuntary responses): The immediate reaction of the father without much thought
is called Anubhaava. ‘Worry’ and ‘Disappointment’ are most probably his Anubhaavas.
3. Satvika Anubhaava (Physical responses): If the father is too worried, he might be sweating, or
having tears in his eyes. These are bodily responses to the incident.
4. Sanchari/Vyabhichaari bhaava (Temporary sentiments): The father gently talks to the kid and tries
to teach and persuade him/her not to repeat the mistakes. He teaches the kid to behave properly
with neighbours. These are his intermediary reactions to the incident, which will finally give way
to a final sentiment.
20
Table 3.1: Categories of rasas as given in Natyasastra
Index Rasa Sthayi Bhaava1 Srungara Love2 Hasya Humour3 Karuna Grief4 Raudra Anger5 Vira Bravery6 Adbhuta Surprise7 Beebhatsa Disgust8 Bhayanaka Fear
5. Sthayi bhaava (The perceived final sentiment): The kid has committed some mistakes. The
parental love towards the kid made the father worried. The love, worry drove him to teach the kid
about good behaviour. Now, the final state of the father, or even the audience observing the father,
is the sthayi bhaava.
After witnessing this process, the state of audience corresponds to a rasa. Another example given
in Natyasastra speaks about sadness, happiness, disappointment etc in Srungara (Love) rasa. With the
process outlined here, it should be clear that a rasa may contain an interplay of several emotions.
3.1.3 Categories of rasa
Table 3.1 lists the rasas identified in Natyasastra, and the equivalent English term of the Sthaayi
bhaava corresponding to the rasa.
Later, Saantha rasa (Peace) is added to these 8 rasas. Together, they are called Navarasas1. The
number of rasas is a subject of debate. For instance few scholars identify devotion as a seperate rasa.
Few others go further ahead and identify mother’s love towards a child as a seperate rasa. However,
Bharata’s classification together with Saantha rasa, is the most popular one today.
3.1.4 Rasa in drama and music
It is an understatement to say that the rasa theory of Natyasastra is very influential on Indian art
forms. The magnum-opus has discussed the rasa in the context of drama. It considers music to be a part
of the drama. For the audience to experience a rasa, music alone is insufficient. It must be accompanied
by the other components of drama like gestures, body movements, dialogues, context etc. A typical
1Navarasas is a Sanskrit term for nine rasas.
21
drama depicts an incident from a story which involve human actions and responses. These often consist
the regular activities in the day to day life. The stronger element which moved the audience came from
these depictions.
Whereas today, music is an art form by itself. It has evolved much beyond its usage in dramas. It
can be true that different raagas evoke different emotions in the audience. But calling such emotions as
rasas is something which we found very inconsistent with the definition of rasa. The process involved
in experiencing a rasa is completely irrelevant in the case of music. Music needs a different emotional
model and corresponding nomenclature to describe the feelings evoked by it. However, to our knowl-
edge, there are no such attempts. We discuss the only nearest and unintended attempt we knew so far,
in the following subsection.
3.1.5 Raaga paintings
The feelings triggered by music are not just confined to aural senses. When we listen to raaga music,
it is not unusual that we visualize (ourselves in) some place/scenery appropriate to the kind of music
we are listening to. For example, the serene slow-tempo Indian flute music drives our thoughts around
mountains, rivers and fresh air. The fast paced high frequency patterns on violin can lead to a different
visualization or interpretation. In the medieval period, rather an interesting development took place in
this direction. Each raaga is painted by various schools of art in India. The painting usually depicted the
mood set by the raaga, season and time of the day when it is sung etc. They are an interesting projection
of aesthetics of a raaga. They leave us a hint to sense the feasibility of expressing the emotions evoked
my music in a non-traditional way. These paintings are popular in Hindustani and not so much in
Carnatic. Two such paintings are shown in 3.1.
In our effort to validate the hypothesis that raagas evoke peculiar rasas, we have conducted a be-
havioural study. Like many scholars, earlier we were also mislead to believe that rasa is the same as
emotion. Like we said earlier, we did not have the knowledge of the previous section when we con-
ducted this study. Nevertheless, we would like to describe the set-up and analysis of the study as it is
done. Towards the end, we’ll conclude the analysis in the light of the knowledge from the previous
section.
22
Figure 3.1: Raaga paintings of Vasanta Ragini (left) and Hindola (right) raagas
3.2 A behavioural study to analyse the raaga-emotion relationship
The motivation for this behavioural study is our desire to build a culture-specific content-based music
recommendation system. A total of 750 subjective emotional responses to tunes composed in popular
raagas of Carnatic music are collected to investigate the long speculated relation between raagas and
rasas (considered as emotion clusters). We discuss the results from analysis of this survey, which show
that raagas are indeed useful as a first step in a different direction, towards building a content-based
music recommendation system. We used a classification based on a novel approach in conceptualization
of emotions based on navarasa, which is the emotion classification method given by Bharata, that suits
behavioural studies with Indian arts.
3.2.1 Introduction
For a long time, there has been an ongoing debate on how music induces emotions in listeners
[17, 27]. Past and recent developments [48], with behavioural [26] and neurological [58] evidences,
show that music indeed induces emotions and the corresponding activation patterns in brain circuits are
found [42] in same locations that are found to be active when emotions are induced by a different stimuli,
say language. However, emotion induced by music does not provoke us to act out that emotion, as is
the case when we are emotionally excited by a different stimulus like language. But there are instances
23
when music does seem to evoke physiological responses like respiration and perspiration [27] while it
lasts, as in war music. It is unclear whether we have more cognitive control on emotions induced by
music [54].
There has been a tremendous progress in carving out theories which can explain what patterns of
music elicit which emotions. Juslin and Sloboda [19] have published a thorough survey of the state-of-
the-art in this field. Most of the work has been around western music. The music cultures from eastern
hemisphere have been left fairly untouched, barring few attempts. In this paper, we study the emotional
responses of Indian listeners to Carnatic music.
As discussed in the chapter 2, raaga is a the most fundamental melodic aspect in Carnatic and Hin-
dustani music traditions. Many definitions of a raaga given in chapter 2 clearly say that it is much more
than a scale. In cognition of Indian music, the gamakas, which typically come as note transitions, are
held as important as an independent note [38]. In each raaga, few notes are emphasized, based on which
the emotion of the tune changes. Such notes are called jeeva-swaras. Essentially, the emotion carried
by the compositions in a raaga depends on the role played by various swaras in the respective raaga. If a
raaga has many possible jiva-swaras, it is capable of evoking more emotions, based on the emphasized
note. So, each one of these raagas is associated with a particular rasa or a set of them accordingly.
Sambamurthy [41] discusses the theory behind such attribution to some extent.
3.2.2 Related work
Very little empirical work on behavioural studies in Indian classical music has been reported till date.
We are aware of two similar studies [2, 9] in the context of Indian classical music. Both the studies are
conducted with raagas from Hindustani music. The same data can not be extended without empirical
backing, to be valid with Carnatic music due to the following important differences between raagas in
these two music traditions.
3.2.3 Differences between Carnatic and Hindustani traditions
Most raagas of these two traditions differ. Few raagas share the same scale but have different names,
such as Hindolam in Carnatic and Malkauns in Hindustani. Some others have the same name and
the same scale as well, but they are rendered in distinct styles, such as Hamsadwani, owing to other
properties of the raaga. Carnatic tradition does not always employ pakad which is a characteristic phrase
of a raaga in Hindustani tradition [44]. Pakad is typically used in Hindustani concerts to establish or set
24
the mood of raaga. Though Carnatic music has characteristic phrases called swara-sancharas, they are
not used the same way as pakad.
Carnatic tradition, unlike Hindustani, does not adhere any more to the time concept of raaga where
each raaga is used only in specific duration of the day. Hindustani organizes its raagas with Thaat system
[5] whereas the raagas in Carnatic are organized into Melakarta system which affects the properties
such as scale and composition of raagas [41]. Hindustani tradition is highly influenced by other music
traditions like Sufi. But Carnatic tradition has remained relatively unaffected from the influences of
other traditions. These factors clearly indicate that these two music traditions of India are quite distinct
and the results of studies on Hindustani music cannot be extended to Carnatic music, hence demanding
a separate study of association between raagas and emotions in Carnatic tradition. Later, a study on
both the traditions analysed together might help us to better understand the similarities and differences
between Hindustani and Carnatic music, in the context of raagas and emotions.
3.2.4 The hypothesis
Our goal is to test a hypothesis held in Indian classical music tradition for many centuries. It states
that each raaga is peculiar in evoking a particular rasa. Rasa is understood as a conglomerate of a few
emotional states [1]. But, in the context of modern India, it has become ambiguous. Rasa, by many
modern musicians, is taken as a simple emotional state. It would become clear in the following section
that the traditional definition of rasa is not a proper choice in the context of music.
We selected a set of popular Carnatic raagas and conducted a behavioural experiment with tunes
composed in those raagas by expert artists. Before going into details of the experiment, we will discuss
the subjective measures which we chose to collect from the listeners.
3.2.5 Conceptualization of emotions
Western researchers have outlined several psychological models of emotions which were used in
behavioural studies [14, 38, 39]. There are several other approaches in conceptualizing emotions [19].
But for our task, we need a classification that suits the culture and perception of native Indians. Indian
arts use navarasa as a way to understand and represent human emotions. Therefore, we use the emotion
clusters derived from a classification given by ancient Indian musicologist, Bharata. Our motive is to
present few choices through which one can express his/her emotions after listening to a tune. To this
25
Table 3.2: An Emotion Classification Based on Navarasa
Index rasa Cluster0 Srungaram Arousal, Longing, Desire, Naughty, Romance, Love1 Hasyam Satire, Imitation, Tickle, Wit, Comedy2 Karunam Hesitation, Poverty, Hardships, Ignorance, Repen-
tance3 Raudram Anger, Arrogance, Frenzy, Demolition4 Viram Pride, Bravery, Fury, Perseverance, Awestruck5 Bhayanakam Startle, Tremor, Stiffen, Paleness, Dreadful6 Adbhutam Surprise, Wonder, Inexplicable, Overwhelming7 Santam Steady, Rest, Peace8 Bhakti Devotion, Eternal, Ritual, Preaching9 Santosham Joy, Pleasure, Excitement, Contentment
end, we have made few changes to the emotion clusters of navarasa to incorporate three major aspects
which we think are important. We present an overview of the procedure followed.
Natyasastra lists 49 emotional and non-emotional states which are shared between eight clusters. We
modify and prune these clusters by making the following changes. As we do not want to merge two
seemingly different emotions and, at the same time, not to make it too difficult for the listeners to record
their responses, we have limited the clusters to 10.
The first change concerns the difficulty in choosing between some emotions like longing, sad and so
on. So, we have only considered keeping those emotional states in each cluster that are considerably
different from each other in their valence and/or strength. The second major change is to regroup the
emotions, i.e., add a new cluster, modify or remove the existing cluster from among the navarasas. We
will give an example as to why this is important. Navarasa does not explicitly accommodate some
emotions. For example, devotion is not a very obvious choice in it. The third change is to tune the
clusters to gather responses to music. Since Natyasastra is intended for theatrical arts, the rasas in it
are defined to fit in a greater context. It is justified to say that resolution and recognition of emotions
in music is a hard task when compared to, say, a drama. Table 3.2 gives the final classification we have
arrived at.
26
Table 3.3: Raagas and their intended rasas
Raaga Intended rasaAnanda Bhairavi Srungaram, Karunam, SantoshamAtana VeeraHamsadwani Veera, HasyaKalyani Multiple positive valence rasasKedaragowla VeeraNadanamakriya Pathos
3.2.6 Details of the study
3.2.6.1 Design
We have chosen six raagas based on their popularity with the help of a music trainer. They are
Ananda Bhairavi, Atana, Hamsadwani, Kedaragowla, Kalyani and Nadanamakriya. Each of these raa-
gas, by their properties, is believed to evoke a peculiar emotion as shown in Table 3.3. In each of
these raagas, five tunes played on violin, of approximately 1 minute duration are selected. These are
excerpts from kruti renditions. In a similar survey with Hindustani raagas, Chordia [9] used excerpts
from alapana section, which, in our view, has few drawbacks. In alapana, the artist improvises within
the constraints of the raaga. Most listeners in our unreported pilot survey conducted with six subjects
with excerpts of alapana from two tracks each of a raaga, have only reported whether they enjoyed it or
not. They said they did not feel any emotion. This observation is congruent with remarks of Samba-
murthy [41], who says the so called art music leaves listeners in an ecstasy called sangitananda, which
is bliss and does not necessarily evoke any particular emotion. Listeners appreciate this very process
which is highly dependent on the artist’s skill, but one might not essentially feel any emotion. However,
choosing the stimuli from alapana section bars effects from other variables such as accompaniment and
tempo. Tempo is another important aspect which can affect the perception so much that a raaga which is
typically used for melancholic tunes can be used with a faster tempo to bring about ferociousness. But
for this survey, for each raaga, we have selected those tunes that have more or less a common tempo.
Please note that it is tempo, and not taalam. The tempo varied only slightly in the tunes across the
raagas. The problems that might arise due to accompaniment differences are taken care of, since the
only accompaniment in the tunes selected, mrudangam2, if at all present, is mild.
2Mrudangam is a barrel shaped percussion instrument with two ends of barrel covered with skin. It is a tuned instrument.
27
Figure 3.2: Average of the ratings collected per rasa, across all users and tunes, for each raaga. X-axisdenotes rasa index and Y-axis denotes the average value of the ratings.
To reduce participants’ fatigue, we divided the selected 30 tunes into two sets of 15 tunes each. Each
set had at least 2 tunes from each raaga. We set-up a web portal where each participant gets one set and
marks the subjective responses. Each cluster of words from Table 1 is allowed to be rated as None at
all, A Little, Somewhat or Very, based on how best that cluster expresses the participant’s emotions. For
example, if the listener feels the tune is very romantic, he/she selects Very for cluster Srungara. They
can mark multiple clusters for the same tune. They were also asked to respond verbosely on how they
feel after listening to the tune.
3.2.6.2 Participants
A total of 750 responses were recorded by 48 people with a median age of 22. Majority of them
are undergraduate or graduate students. 88% of them are male and 12% are female participants. They
described their familiarity with Indian classical music tradition, either Hindustani or Carnatic, as None
(35%) and Moderate (65%).
3.2.6.3 Results and observations
Figure 3.2 shows the normalized averages of subjective responses recorded by participants for all
the tracks in each raaga. We have quantized the verbal responses — None at all, A Little, Somewhat
28
and Very as 0, 1, 2 and 3 — to arrive at this plot. One can immediately comprehend the similarity
between plots of Ananda Bhairavi and Kalyani, Atana and Hamsadwani respectively. Kedaragowla and
Nadanamakriya are unique in regard to their plot as they do not show much similarity with other raagas.
However, Ananda Bhairavi differs from Kalyani when the relative height of peaks within each of their
plots is considered. To cross check whether the obtained peaks for rasas in each raaga correspond to
consistent user ratings, standard deviation of ratings given for each rasa across all tracks for each raaga
has been calculated.
Figure 3.3: (a) Average rating obtained per rasa for a tune in Nadanamakriya. (b) Standard deviation ofthe ratings for the same tune. X-axis denotes the rasa indices. Y-axis denotes the normalized values.
Figure 3.4: Ratings given by six users for a sample track in each raaga. X-axis denotes rasa indices.Y-axis denotes the ratings quantifiers.
29
But it has been realized that the popular measures like standard deviation cannot be relied upon for
this analysis since the actual subjective responses were obtained through verbal terms which were later
quantified numerically for analytical purpose. Standard deviation of these values will not reflect the
truth. Let us look into an example. After listening a particular tune, if few users have rated Somewhat
for a rasa and a few have rated Very for the same rasa, the deviation in the values for that particular
rasa grows high when compared to the case where all the users either rated it either Very or Somewhat.
Figure 3.3 shows one such example of the average rating and standard deviation for all rasas for a tune
in raaga Nadanamakriya. Hence, non-standard numerical quantification of such terms is ruled out as a
measure to check the consistency in ratings.
With that in mind, we have resorted to a naive but straight-forward method to validate if the rasa-
peaks for each raaga are valid in attributing a rasa to the corresponding raaga. Figure 3.5 shows the
histograms of values obtained for each rasa for all raagas. For instance, let us consider the raaga
Nadanamakriya. One can observe that consistently a large number of responses have been recorded
against rasa clusters other than the third cluster, which is sympathy. So, observing those histograms and
correlating with the mean plot of Nadanamakriya shown in Figure 3.2, we arrive at a conclusion that
Nadanamakriya primarily induces sympathy. Figure 3.4 shows responses of nine users for a track in
each raaga. For instance let us consider Nadanamakriya again. A consistent trend has been observed in
such plots for this raaga, which reaffirms that this raaga primarily induces Karuna rasa.
But this is not the case with all the raagas though. For instance, ratings for raaga Kedaragowla have
not been very consistent across listeners. This can be observed from plots for other raagas in Figure
3.4. Kalyani is an interesting raaga with many jiva-swaras, which are the most stressed notes. The
results reaffirm this, showing that it arouses an array of emotions based on the frequencies stressed
in the composition. Though not as evident as Nadamanakriya, the other raagas more or less show a
convergence in ratings and the ratings show a consistency across users. However, from Figure 3.5 and
Figure 3.4, it can be observed that the emotional responses for any given raaga are not in favour of
any single rasa. Almost every participant has chosen the ratings for multiple clusters and expressed
verbosely that he/she can feel emotions in multiple rasas listening to the tune, as is evident in the plot.
What interests us is the converging patterns in those ratings for a given raaga. The observed rasas
of few raagas are not consistent with the actual rasas, but there is certainly an overlap. Though this
observation keeps us from making a final statement on the raaga-rasa association, from the converging
30
pattern between ratings for a raaga and their variance across raagas, we can say that the raaga certainly
encapsulates the melodic patterns responsible for eliciting specific emotions.
3.2.6.4 Implications to music recommendation systems
We have investigated the possibility of building a novel recommendation system based on emotion,
specific to Indian culture. The results from the analysis of this survey have several direct and indirect
implications which can be used to increase the effectiveness of content-based music recommendation
systems in general. The fact that raaga holds properties of a tune responsible for perception of melody
and evoking emotions can be a potential key to build recommendation systems that compliment and
contrast the western approaches.
3.2.7 Conclusions
We have reported a behavioural study that has empirically tested the hypothesis that each raaga in
Carnatic music evokes peculiar emotions characteristic to that raaga. But still few questions remain
to be answered. The participants in the study were Indians. So, the influence of raaga-based music
on other cultures is yet to be seen. For now, we can only speculate based on an analysis reported by
Balkwill et al [3]. In an attempt to avoid biasing the raagas and the tunes chosen, we have deferred from
deliberately picking up raagas and/or tunes. The tunes selected were not very distinct from each other.
This has resulted in a dataset with less polarity in emotional content of the tunes, as can be seen from
Figure 3.2.
With an added data of new raagas and tunes that bring in more polarity to the dataset, a study must
be conducted with participants from other ethnic groups around the world to observe if the cultural fac-
tors play a dominant role in responding emotionally to Carnatic music. As we have observed a great
similarity of plots of user responses between few raagas, it will be interesting to see if the note transition
patterns are common in the melodies constructed using those raagas. Further, these patterns can be anal-
ysed to see if they make a dominant contribution in evoking peculiar emotions. It will be interesting to
see the extent to which this analysis improves the mood-based recommendation system by incorporating
the results of this study and our future work, with an existing statistical music recommendation program
and identify the features responsible for perception of various emotions. Later it can also be verified if
the same features hold true in perceiving other genres of music.
31
And the most important aspect in taking this work any further is to take care in incorporating the
knowledge of the previous section on rasa. Without it, this study can only be considered as the one which
investigates the relationship between raaga and emotion. Even if the rasa term is properly interpreted,
it is almost impossible to take the user responses to empirically measure it. So, as we have said in the
beginning, the definition of the rasa does not make it a good emotion model in the case of music.
32
Figure 3.5: Histograms of ratings obtained per rasa, for the six raagas. X-axis has the four ratingquantifiers - None at all, A Little, Somewhat and Very. Y-axis denotes the number of ratings obtainedfor each quantifier.
33
Chapter 4
Raaga Recognition
Raaga is the spine of Indian classical music. It is the single most crucial element of the melodic
framework on which the music of the subcontinent thrives. Naturally, automatic raaga recognition is
an important step in computational musicology as far as Indian music is considered. It has several
applications like indexing Indian music, automatic note transcription, comparing, classifying and rec-
ommending tunes, and teaching to mention a few. Simply put, it is the first logical step in the process of
creating computational methods for Indian classical music. In this chapter, we identify the main draw-
backs of the previous raaga recognition techniques and propose minor, but multiple improvements to the
state-of-the-art raaga recognition technique. We discuss the results obtained with our raaga recognition
system with those improvements.
4.1 Introduction
Geekie [15] very briefly summarizes the importance of raaga recognition for Indian music and it’s
applications in music information retrieval in general. Raaga recognition is primarily approached as
determining the scale used in composing a tune. However the raaga contains more information which
is lost if it is dealt with western methods such as this. This information plays a very central role in the
perception of Indian classical music.
Though our work primarily concerns with Carnatic music, but most of the discussion applies to Hin-
dustani music as well, unless mentioned otherwise. In chapter 2, we have surveyed the computational
approaches to melody. In the following section, we identify a few problems that we address using our
raaga recognition system.
34
4.2 Problems that need to be addressed
4.2.1 Gamakas and pitch extraction for Carnatic music
An appropriate pitch extraction module is that which can accurately represent the gamakas. It has not
been a severe problem for the classification systems that were not depending on gamakas of a note for
classification. If there is such a pitch extraction system in place, gamakas can be used as an additional
feature to improve the accuracies of existing systems. Gamakas assume a major role when the number
of raaga classes is high in the dataset.
4.2.2 Skipping tonic detection
The manually implemented tonic (the base frequency of the instrument/singer) identification stage
needs to be eliminated if possible. Since the tonic identification itself involves some amount of error,
this could adversely impact the performance of a raaga recognition system. Neither the Carnatic nor
Hindustani systems adhere to any absolute tonic frequency, therefore it makes sense to build a system
that can ignore the absolute location of the tonic.
4.2.3 Resolution of pitch-classes
Though 12 bins for pitch-class profiles look ideal to the Western eye,we hypothesize that a more
continuous model can capture more relevant information related to Indian classical music. Dividing an
octave into n bins where n 12 can help us model the distribution with better resolution. Gamakas (the
micro tonal variations) play a vital role in the perception of Indian music, and this has been confirmed
by several accomplished artists. The transitions involved in a gamaka and the notes through which its
trajectory passes are two factors that need to be captured. We hypothesize that this information can be
obtained, at least partially, using a higher number of bins for the first-order pitch distribution.
4.2.4 A comprehensive dataset
The previous datasets which are used for testing have several problems. In Tansen, and the work by
Sridhar and Geeta, the datasets had as few as 2 or 3 raagas. The dataset used by Chordia has all the data
played on a single instrument by a single artist. The test datasets were constrained to some extent by
the requirement of monophonic audio (unaccompanied melodic instrument) for reliable pitch detection.
35
In the present work, we investigate raaga recognition performances on a more comprehensive dataset
with more raaga classes with significant number of tunes in each across different artists and different
compositions. This should enable us to obtain better insight into the raaga identification problem.
With these issues about the raaga recognition in mind, we have implemented a system which ad-
dresses some of the challenges described. The following sections introduces our method, and presents
a detailed analysis and discussion of the results.
4.3 Our method
As mentioned earlier, we propose to address some of the issues described in the previous section.
We have taken a diverse set of tunes to include in the dataset. The use of amply available recorded
music necessitates a pitch detection method that can robustly track the melody line in the presence of
polyphony. The obtained sequence of pitch values converted to cents scale (100 cents = 1 semitone)
constitutes the pitch contour. The pitch contour may be used as such to obtain a pitch-class distribution.
On the other hand, given the heavy presence of ornamentation in Indian music, it may help to use identi-
fied stable note segments before computing the pitch-class distribution. We investigate both approaches.
Finally, a similarity measure, that is insensitive to the location of the tonic note, is used to determine the
best matched raaga to a given tune based on available labeled data. Each of the aforementioned steps is
detailed next.
4.3.1 Pitch extraction
Pitch detection is carried out at 10 ms intervals throughout the sampled audio file using a predom-
inant pitch detection algorithm designed to be robust to pitched accompaniment [37]. The pitch de-
tector tracks the predominant melodic voice in polyphonic audio accurately enough to preserve fast
pitch modulations. This is achieved by the combination of harmonic pattern matching with dynamic
programming based smoothing. Analysis parameter settings suitable to the pitch range and type of
polyphony are available via a graphical user interface thus facilitating highly accurate pitch tracking
with minimal manual intervention across a wide variety of audio material. Figure 4.1 shows the output
pitch track superimposed on the signal spectrogram for a short segment of Carnatic vocal music where
the instrumental accompaniment comprised violin and mridangam (percussion instrument with tonal
characteristics). While the violin usually follows the melodic line, it plays held notes in this particular
36
segment. Low amounts of reverberation were audible as well. We observe that the detected pitch track
faithfully captures the vocal melody unperturbed by interference from the accompanying instruments.
Figure 4.1: Screenshot from the melodic pitch extraction system of [37] showing the detected pitchsuperimposed on the signal spectrogram. The axis on the right indicates pitch value (Hz).
4.3.2 Finding the tuning offset
The pitch values obtained at 10 ms intervals are converted to the cents scale by assuming an equi-
tempered tuning scale at 220 Hz. All the pitch values are folded into a single octave. The finely-binned
histogram maximum of the deviation of the cents value from the notes of the equi-tempered 12-note
grid provides us the underlying tuning offset of the audio with respect to 220 Hz. The tuning offset is
applied to the pitch values to normalize the continuous pitch contour to standard 220 Hz tuning by a
simple vertical shift but without any quantization to the note grid at this point.
4.3.3 Note segmentation
As we observe in Figure 4.1, the pitch contour is continuous and marked by glides and oscillations
connecting more stable pitch regions. The stable note regions too are marked by low pitch modulations.
As described in Sec. 2, melodic ornamentation in Indian classical music is very diverse and elaborate.
For our investigation of pitch class profiles confined to stable notes, we need to detect relatively stable
note regions within the continuously varying pitch contour. The local slope of the pitch contour can be
used to differentiate stable note regions from connecting glides and ornamentation.
At each time instant, the pitch value is compared with its two neighbors (i.e. 10 ms removed from
it) to find the local slope in each direction. If either local slope lies below a threshold value of 15
semitones per second, the current instant is considered to belong to a stable note region. This condition
is summarized by the Eq. 4.1.
37
(| (F (i− 1)− F (i)) |< θ) ‖ (| (F (i+ 1)− F (i)) |< θ) (4.1)
where F (i) is the pitch value at the time index i and θ being the slope threshold. To put the selected
threshold value in perspective, a large vibrato (spanning a 1 semi-tone pitch range) at 6 Hz pitch modu-
lation frequency has a maximum slope of about 15 semitones per second. All instants where the slope
does not meet this constraint are considered to belong to the ornamentation.
Finally, the pitch values in the segmented stable note regions are quantized to the nearest available
note value in the 220 Hz equi-tempered scale. This step smoothes out the minor fluctuations within
intended steady notes. Figure 4 shows a continuous pitch contour with the corresponding segmented
and labeled note sequence superimposed. We note several passing notes are detected which on closer
examination are found to last for durations of 30 ms or more.
Figure 4.2: Note segmentation and labeling. Thin line: continuous pitch contour; Thick line: detectedstable note regions.
4.3.4 Pitch-class profiles
We investigate various approaches to deriving the pitch class profile. The first of two broad ap-
proaches corresponds to considering only the stable notes, segmented and labeled in the previous step.
The pitch class profile is then a 12-bin histogram corresponding to the octave-folded note label values.
There are two choices for weighting the note values for histogram computation. We call these P1 and
P2, where P1 refers to weighting a note bin by the number of instances of the note, and P2 refers to
weighting by total duration over all instances of the note in the music piece.
A second broad approach is ignore the note segmentation step and to consider all pitches in the
pitch contour irrespective of whether they correspond to stable notes or ornamentation regions. We
call this P3. Further, the number of divisions of the octave is varied representing different levels of
38
fineness in pitch resolution. The investigation of varying quantization intervals is motivated by the
widely recognized microtonal character of Indian music.
4.3.5 Distance measure
In order to compare pitch-class profiles computed from two different tunes, it is necessary that the
distribution intervals are aligned in terms of the locations of corresponding scale degrees. This can be
ensured by the cyclic rotation of one of the distributions to achieve alignment of its tonic note interval
with that of the other distribution. Since information about the tonic note of each tune is not available
a priori, we consider all possible alignments between two pitch class profiles and choose the one that
matches best in terms of minimizing the distance measure. This is achieved by cyclic rotation of one of
the distributions in 12 steps with computation of the distance measure at each step.
As for choosing the distance measure itself, we would like it to reflect the extent of similarity between
two tunes in terms of shared raaga characteristics. We choose the Kullback-Leibler (KL) divergence
measure as a distance measure suitable for comparing distributions. Symmetry is incorporated into this
measure by summing the two values as given below [4].
DKL(P,Q) = dKL(P |Q) + dKL(Q|P ) (4.2)
dKL(P |Q) =∑i
P (i) logP (i)
Q(i)(4.3)
where i refers to the bin index in the pitch class profile, and P and Q refer to pitch class distributions
of two tunes.
4.4 Experiment and results
We describe a raaga classification experiment and present results on the comparative performances of
the various types of pitch-class profiles for different classifier settings. A suitable dataset is constructed
from commercially available CD audio recordings. To make the best use of available data, we use leave-
one-out cross validation with a k-NN (k Nearest Neighbors) classifier to evaluate the performance of our
system. The details of the experiment are provided next.
39
4.4.1 Dataset
There are a few observations worth mentioning in connection with the design of a test dataset for
our raaga recognition system. During preliminary trials of our system, we observed a performance bias
in available datasets arising from the fact that several popular compositions in Carnatic music originate
in the 17th, 18th and 19th centuries. Some of these compositions sung by several artists lead to the
occurrence of several sets of near identical tunes in the dataset resulting in very similar pitch profiles for
supposedly different pieces of music. This prompted us to exercising due care in selecting music pieces
for our test dataset. We have been careful not to include different versions of the same composition in
the dataset. For instance, a tune which renders the kruti ‘nanu brOvamani cheppavE’ is not included
if another tune based on that kruti already existed in the dataset. However, since alapanas are not pre-
composed, and are purely based on the artists virtuosity, we have included them. To get a bigger dataset
we considered the complete Raagam-Taanam-Pallavis of various artists besides shorter krutis. This
expanded the list of options from which it is possible to extract a clip to be included in the dataset. The
clips were extracted from the live performances and CD recordings of 31 artists, both vocal (male and
female) and instrumental (veena, violin, mandolin and saxophone) music. The dataset consisted of 170
tunes from across 10 raagas with at least 10 tunes in each raga (except Ananda Bhairavi with 9 tunes)
as summarized in Table 4.1. The duration of each tune averages 1 minute. The tunes are converted to
mono-channel, 22.05 kHz sampling rate, 16 bit PCM. The dataset can be considered very representative
of the Carnatic classical music, since it includes artists spanning several decades, male and female, and
all the popular instruments.
Raaga Total tunes Avg. duration in seconds Composition of TunesAbheri 11 61.3 6 vocal, 5 instrumentalAbhogi 10 62 5 vocal, 5 instrumentalAnanda Bhairavi 9 64.7 4 vocal, 5 instrumentalArabhi 10 64.9 8 vocal, 2 instrumentalAtana 21 56.75 12 vocal, 9 instrumentalBegada 17 61.17 9 vocal, 8 instrumentalBehag 14 59.71 12 vocal, 2 instrumentalBilahari 13 61.38 10 vocal, 3 instrumentalHamsadwani 41 57.07 14 vocal, 27 instrumentalHindolam 24 60 15 vocal, 9 instrumental
Table 4.1: Description of the dataset across 10 raagas.
40
Pith-class profile k=1 k=3 k=5 k=7P1 (12 bins, weighted by number of instances) 55.9 56.5 57.1 59.4P2 (12 bins, weighted by duration) 71.2 73.5 76.5 76.5P3 (12 bins) 73.5 70 74.7 75.3P3 (24 bins) 72.4 72.9 75.3 74.1P3 (36 bins) 68.2 72.4 72.9 74.1P3 (72 bins) 67.7 68.2 69.4 68.2P3 (240 bins) 65.3 68.2 66.5 65.9
Table 4.2: Performance of weighted-k-NN classification with various pitch-class profiles
4.4.2 Classification experiment
A k-NN classification framework is adopted where several values of k are tried. In a leave-one-out
cross-validation experiment, each individual tune is considered a test tune in turn while all the remaining
constitute the training data. The k nearest neighbors of the test tune in terms of the selected distance
measure are considered to estimate the raaga label of the test tune. The distance measure used is the
symmetric KL distance presented in the previous section. Since there are in all a minimum of 9 tunes
per raaga, we consider values of k=1, 3, 5 and 7. Since the number of classes is high (10 raagas), it is
more appropriate to consider a weighted-distance k-NN classification rather than simple voting to find
the majority class. Weighted k-NN classification is described by the equations below. The chosen class
is C*,
C∗ = arg maxc∑i
wiδ(c, fi(x)) (4.4)
where c is the class label (raaga identity in our case) , fi(x) is the class label for the ith neighbor of x
and δ(c, fi(x)) is the identity function that is 1 if fi(x) = 0, or 0 otherwise. The weights are given by,
wi =1
d(x, y)(4.5)
where d(x,y) is the symmetric KL distance between two pitch-class profiles x and y (e.g. its ith neigh-
bor).
The results in terms of percentage accuracy in raaga identification, obtained on the test dataset,
appear in Table 4.2. Two important points emerge from the comparison of accuracies across the different
types of pitch-class profiles. For all values of k, except k=1, in the k-NN classification, we see that P2
41
(the note segmented, duration weighted pitch-class profile) yields the highest accuracies. This implies
that note durations play an important role in determining their relative prominence for a particular raaga
realization. This is consistent with the fact that long sustained notes like dirgha swaras play a major role
in characterizing a raaga than other functional notes which occur briefly in the beginning, the end or in
the transitions. The benefit of note segmentation is seen in the slightly superior performance of P2 over
P3 (12 bin). P2 does not consider those instants that lie outside detected stable note regions. The second
important point emerging from Table 4.2 is the decreasing classification accuracy with increasing bin
resolution. Although the reverse might be expected in view of the widely held view that the specific
intonation of notes within micro-intervals are a feature peculiar to a raaga, a more carefully designed,
possibly unequal, division of the octave may be needed to observe this.
The overall best accuracy of 76.5%, which value is much higher than chance for the 10-way classi-
fication task, indicates the effectiveness of pitch-class profile as a feature vector for raaga identification.
It is encouraging to find that a simple first order pitch distribution provides considerable information
about the underlying raaga although the complete validation of this aspect can be achieved only by test-
ing with a much larger number of raaga classes on larger dataset. Including the ornamentation regions
in the pitch-class distribution did not help. As mentioned before, the gamakas play an important role
in characterizing the raaga as evidenced by performance as well as listening practices followed. How-
ever, for gamakas to be effectively exploited in automatic identification, it is necessary to represent their
temporal characteristics such as the actual pitch variation with time. A first-order distribution which
discards all time sequence information is quite inadequate for the task.
4.5 Conclusions
A brief but comprehensive introduction to the raaga and its properties is presented. Previous raaga
recognition techniques are surveyed with a focus on their approach and contributions. Key aspects that
need to be addressed are outlined and a method which deals with a few of them is discussed. Apart from
these contributions of our work, we have also highlighted details such as the composition of the testing
dataset, and provided insights into the post-processing steps involved with pitch extraction procedure
for Carnatic music. This is the first work, to the best of our knowledge, that uses polyphonic audio
recordings in the raaga recognition task.
42
The transitions in gamakas are discarded in the method explained, or are not fully utilized. A higher
number of bins in the pitch distribution proved to be not necessarily useful. Future raaga recognition
techniques can take into account the other properties of a raaga. Most important of these are the charac-
teristic phrases and gamakas which suggest that temporal properties may be usefully exploited in future
work. An automatic pitch-transcription system as accurate as the semi-automatic polyphonic pitch-
extraction system used in our work, is also necessary to scale the work to a large number of raagas.
43
Chapter 5
Conclusions
Very little research has been carried out on Indian music and even less on the specific characteristics
that makes it so special. The few existing computational approaches to melody, discussed in chapter 2,
have focused mainly on raaga recognition. Given the number of raagas which are commonly performed
and their unique properties, the data used in the literature is not representative. Indeed, the high accu-
racies reported might be due to the limited number of raagas used and the overall size of the dataset.
Moreover, important properties of the raagas, like their specific use of gamakas, have not been exploited
and issues beyond recognition have not been approached. As more representative datasets are gathered,
the features used will not be sufficient to discriminate the raaga classes. Features such as pitch-class
profiles and pitch-class dyad distributions infer partial information about the raagas. But the other roles
of notes are not evident, which need to be exploited. Symbolic scores can also be used for building
more complex models, especially to model the characteristic melodic movements of particular raagas.
It should be noted that raaga recognition is only a starting point to model a raaga and thus a lot remains
to be done.
At the level of musical instruments there is practically nothing done. Physical modeling of their
many non-linear behaviors is quite complex and the lack of instrument standardization does not help.
Some research has been done on modeling tabla and sitar [20] and there have been a few attempts in
developing sound synthesis systems [50]. In order to obtain credible synthesized sounds, as well as to
describe performance practise, the modeling of gamakas is a bottleneck.
The variability in performances of the same song is quite large, especially due to the importance
of improvisation. The same composition sung by two artists can be different in many musical and
expressive facets. These differences may challenge the version identification methods developed for
44
western commercial music. In addition to the compositional forms, there are many improvisatory forms
that are performed with well-defined structural criteria [18]. Nothing has been done in these topics.
Through this thesis, we have mentioned a number of characteristics of the Carnatic music that deserve
to be studied. Given that this music tradition is so different from the ones used to develop the current
methodologies, there is a need to also deal with some more fundamental issues. We need to study
how the musical concepts and terms in Indian music are understood, specifying proper ontologies with
which to frame our work. Also the cultural and community aspects of the music are so important that
without studying them we will not be able to develop proper musical models. In summary, to approach
the computational modeling of Carnatic music, making justice to its richness, is fundamental to take a
cultural approach and thus take into account musicological and contextual information.
To conclude, in this thesis, we have made the following contributions.
1. Strong theoretical arguments are presented to show that the term rasa cannot be used in the context
of music, and with the help of a behavioural study, raagas in Carnatic music are shown to evoke
feelings to certain extent.
2. A survey of state-of-the-art in raaga recognition is presented identifying the problems to be ad-
dressed.
3. Based on an existing raaga recognition system for Hindustani music, a system with several im-
provements is built for Carnatic music and has been tested on a comprehesive real world data.
4. Contributed the ground-truth data drawn from real stage concerts and CD recordings, making it
the most diverse and extensive dataset for Carnatic raagas till date.
5. A brief discussion on few standing debates like 22 srutis concluding with necessary future steps
to resolve such debates.
6. A discussion on several open problems in Carnatic music, to be explored computationally.
5.1 Impact of this work and the future directions
Possible applications of our work include music recommendation systems based on mood and raaga,
learning-aid for students in visualizing the feedback from their practice sessions, digitizing and archiv-
45
ing the huge amount of music data automatically with correct metadata, analysing various artistic styles
etc.
The data used to test the raaga recognition systems so far, is very less. when compared to hundreds
of raagas in Carnatic music. The first future step of this raaga recognition technique is to consider much
larger and diverse data of say, 100 raagas. This step is not as obvious as it sounds. In Carnatic music,
the same kruti is often sung by different artists, during several instances. A new kruti is not a common
phenomenon. The dataset gets biased if we include two versions of the same kruti. To handle this, we
need to grab those sections of a rendition which are not pre-composed. Alapana and Swarakalpana are
two such sections. One possible step would be to extract such sections programmatically from a huge
pool of renditions As the data grows, there will arise a need to exploit the unexploited properties of
raaga.
Cognitive and behavioural aspects of Carnatic music need to be studied in a systematic manner.
These studies will shed light on various things like the perception of gamakas, the extent to which the
variations of a swara are normally allowed, and raaga and emotion association etc.
5.2 Few guidelines for future students/researchers
The thesis has been a huge learning activity for us. It included a number of domains - musicology,
cognition, music performance, signal processing, machine learning and pattern recognition. During the
course of this thesis work, we have had several experiences which helped us to get an overview of the
scientific research in Indian classical music. We would like to draw the reader’s attention, especially
those who are working in this field, to few points.
The first point concerns the flexibility of an art form. Music being a very highly celebrated art
form, is practically an endless creative domain. It is very natural to observe deviations from the written
rules. Particularly the Indian classical music being an oral tradition, depends heavily on what people
perceive and transfer to next generation. Students learn from listening to the guru sing and perform.
The only feedback is the agreement with the guru. But this does not mean that rules can be broken.
Good examples for this are gamakas and the swaras. The artists take liberty in changing them a little
to sound good depending on the context. This is very different from western scenario where music is
played reading the notation!
46
Second point we would like to stress is the scale used. Some artists say that it is the same as the west-
ern equi-tempered scale. Others disagree saying it is just-intonated. Though we have spent some time in
trying to obtain the correct information, it appears that it is not that important, since a swarasthanam is
not a fixed point anyway, it is a region. Moreover, the tuning is often based on perceptual measures than
the objective tuning instruments. For mathematical simplicity, we have used equi-tempered scale in our
raaga recognition system. However, we do observe that there is a slight perceptual difference between
the two tuning systems.
The third point, something which bothered us throughout is - does it require to be a musician to do
research related to Indian music? There is no definitive answer for this. It is always better to have hands-
on experience with something one works with. But a musician can be as good/bad as a non-musician in
explaining the science behind it. A non-musician stature should not constrain one from approaching the
domain as a researcher. In this case, a lot of listening activity and reading the musicological literature
helps a lot as has been the case with us.
47
Appendix A
Basic Acoustics
Consider any object. Everything we observe about this object has an explanation. Studying its
physical aspects allows us to know how it behaves in response to various actions. Sound is the behaviour
of objects to natural/our actions. Of course, not all sounds qualify as music. But the physical properties
we study about an object are same be the sound it produces music or noise.
Pluck a string and observe the wave pattern generated. It looks like the complex wave pattern in
Figure A.1. Our vocal cords are no different, only that they are made of muscles.
Figure A.1: Wave pattern generated by a plucked string
A.1 Demonstration of various physical properties
Like any other object, sound also has certain physical properties. Though we are intuitively very
sensitive to them, we do not always realize them. To be able to appreciate this fact better, do the
following experiments, and see how the sound changes compared to the default case.
1. Stretch the string keeping the length of the string between the two points the same.
48
2. Vary the length keeping the tension in the string the same. i.e., hold it at different points to vary
the length without stretching or letting it go slack.
3. Pluck it with force.
4. Change the string. If it is rubber, now consider a brass/other-material string.
See if your observations concur with the following, they probably should.
1. The sound is sharper, like your younger sibling’s shrill cry.
2. The sound is flat compared to the original one.
3. It is louder.
4. It is a different sound altogether, though sharpness/flatness may be the same/different compared
to the original one.
Now, we’ll learn the physical properties which are involved in these observations. Any sound is caused
because the source vibrates and the vibrations are carried to our ear drum which sympathizes. As a
result of the vibrations of the source, alternative high and low pressure regions are created next to it.
These particles of the medium, typically air, are set in motion and disturb their adjacent particles in turn.
In this way the whole region around the source is set to vibrate. Observe the waves shown in Figure
A.2.
Figure A.2: Condensation and rarefaction represented as a sine wave
A.2 Sine waves
The wave in Figure A.2 can be represented as a series of crests and troughs, as shown. In mathemati-
cal terminology, it is called a sine wave. It has certain physical properties. It is periodic, i.e., it is nothing
49
but a pattern that is copied over and over. The number of such repetitions passing through a point in a
second is called frequency (f). Each such repetition is called a cycle. The time it takes for a cycle to pass
through a point is called time period (T). The distance between corresponding points on two adjacent
cycles is called the wave length (λ). An example to such points would be points on extreme tips in two
adjacent cycles. The height of the wave is called amplitude (A). See Figure A.2.
The frequency is the factor which we perceive when we observe a tone to be sharper or flatter.
More the frequency, sharper the sound. Time period and wave length are inversely proportional to the
frequency. The amplitude is the volume/loudness perceived. More the amplitude, louder the sound. See
Figure A.3, for example.
Figure A.3: Examples of sine waves with high and low frequencies
Normally when we speak or sing, or when an instrument is being played, these factors keep changing
and we perceive them collectively to be either noise/speech/music.
A.3 Harmonics
But we have one more question left to be answered. Why do two sounds produced by different
materials sound different when all these factors are made equal? Let us experiment again. Set a string
to vibrate and observe it. It might be difficult to see with naked eye, but the vibration does not look like
a perfect single sine wave. As said earlier, it is a complex wave pattern. When a string is plucked, there
are various modes in which it can vibrate. See Figure A.4. These are called harmonics; the frequency
of wave whose wavelength is λ1 is called 1st harmonic or the fundamental.
50
Figure A.4: Possible harmonics in a given string
Now let’s see Figure A.1 again. When we pluck a string, we can observe it vibrating not exactly
like a simple sine wave, but something more complex. This complexity arises from the mixture of other
harmonics with the fundamental. Given that the two ends of the vibrating string are fixed, the length
and other physical properties of the string will have a crucial effect on the properties of wave generated
with it. A string of length L can only have waves of wavelength 2L, 2L/2, 2L/3, 2L/4, etc. Frequencies
corresponding to the fundamental and second, third and fourth harmonics are given in the Table A.1. v
is the velocity with which the wave travels.
Frequency =velocity
wavelength(A.1)
Table A.1: Fundamental and its harmonics
Fundamental or First harmonic v/2L f1Second harmonic or first overtone 2v/2L 2f1Third harmonic or second overtone 3v/2L 3f1Fourth harmonic or third overtone 4v/2L 4f1nth harmonic or (n-1)th overtone nv/2L nf1
A.4 Timbre
Due to the fact that the force between molecules varies according to the material, the nature of wave
propagation is affected. In turn, the harmonic pattern is affected. Thus, the nature of these harmonics
51
varies from material to material. It is this property that distinguishes the sound waves of varied origins
(e.g.: rubber and steel). It is called the timbre. This is also one of the important reasons why we are able
to discriminate between two persons speaking.
Let us now look at the frequency measures in use.
A.5 Frequency measures
Frequency is measured in Hertz and Cents. Hertz is a linear scale measure whereas cents is used
to measure the interval between two frequency values in logarithmic scale. The interval in cents is
calculated using the following formula.
V alue in Cents(C) = log (a
b)× k (A.2)
Here a and b are the frequencies, k is a constant and it’s value is 120010log(2) .
For calculation purposes, if one has to deal with more than one octave, which usually is the scenario,
it is often advisable to use cents. Let us look at a quick example. Consider two octaves. One from 10 to
20, another from 1000 to 2000. If we take the absolute differences between the fifth note and the tonic,
they would be (10*3/2 10 == 5 Hz) and (1000*3/2 1000 == 500 Hz) respectively. But if we take the
interval in cents, they both will be the same.
D1 =1200
10log(2)∗ 10log(15
10) = 701.95cents
D2 =1200
10log(2)∗ 10log(1500
1000) = 701.95cents
The cents measure will also enable us to know the number of octaves between two frequency values.
Each octave has an interval of 1200 cents. So, for example the number of octaves between 12 Hz and
16400 Hz would be,
Number of Intervals =(1200/log(2) ∗ log(16400/12))
1200=
(log(16400/21))
log(2)= 10.41 octaves.
This knowledge should be sufficient to understand the chapters in this thesis. We recommend the
reader to refer to relevant books [30] for other concepts if and when one requires them. We will now
introduce the scales used in Carnatic music and then understand raaga and it’s properties.
52
A.6 Tuning systems
A.6.1 Equal-temperament
This is a western standard. It is preferred mostly due to its mathematical simplicity. In this scale, all
the 12 notes in the scale are equally spaced. That is, the ratio between 2nd and 1st note is the same as
the ratio between 3rd and 2nd note. We can derive the value corresponding to this ratio. Say xi is the ith
frequency value.
k =x2x1
=x3x2
=x4x3
= ... =x12x11
=(2 ∗ x1)x12
x2 = k ∗ x1, x3 = k ∗ x2 and so on x12 = k ∗ x11, and 2 ∗ x1 = k ∗ x12
So, x12 = k11 ∗ x1
Hence k ∗ x12 = k12 ∗ x1
2 ∗ x1 = k12 ∗ x1
k =12√2
So, in the equi-tempered scale, each subsequent note is obtained by multiplying the current note with
k, where k = 12√2.
A.6.2 Just-intonation
On the other hand, just-intonation tuning system uses ratios of small integers as intervals between
two notes of the scale. There are a number of ways to tune using small integer ratios. But the key essence
of this tuning method is to use these ratios instead of geometric progression to find the intervals.
53
Bibliography
[1] P. S. R. Apparao. Natyasastramu. Natyamala Publications, 1959.
[2] L. L. Balkwill and W. F. Thompson. A cross-cultural investigation of the perception of emotion in music:
Psychophysical and cultural cues. Music Perception, 17(1):43–64, 1999.
[3] L. L. Balkwill, W. F. Thompson, and R. Matsunga. Recognition of emotion in japanese, western, and
hindustani music by japanese listeners. Japanese Psychological Research, 46(4):337–349, 2004.
[4] S. Belle, R. Joshi, and P. Rao. Raga Identification by using Swara Intonation. Journal of ITC Sangeet
Research Academy, 23, 2009.
[5] V. N. Bhatkande. Hindusthani Sangeet Paddhati. Sangeet Karyalaya, 1934.
[6] P. Chordia. Segmentation and recognition of tabla strokes. In Proc. of ISMIR, pages 107–114, 2005.
[7] P. Chordia, A. Albin, A. Sastry, and T. Mallikarjuna. Multiple viewpoints modeling of tabla sequences. In
International Conference on Music Information Retrieval, number Ismir, pages 381–386, 2010.
[8] P. Chordia and A. Rae. Raag recognition using pitch-class and pitch-class dyad distributions. In Proc. of
ISMIR, pages 431–436, 2007.
[9] P. Chordia and A. Rae. Understanding emotion in raag: An empirical study of listener responses. Computer
Music Modeling and Retrieval, pages 110–124, 2008.
[10] P. Chordia and A. Rae. Tabla Gyan : An Artificial Tabla Improviser. In International Conference on
Computational Creativity, 2010.
[11] M. Clayton. Time in Indian Music : Rhythm , Metre and Form in North Indian Rag Performance. Oxford
University Press, 2000.
[12] D. Das and M. Choudhury. Finite State Models for Generation of Hindustani Classical Music. In Proceed-
ings of International Symposium on Frontiers of Research in Speech and Music, 2005.
[13] A. Datta, R. Sengupta, N. Dey, and D. Nag. Experimental Analysis of Shrutis from Performances in Hin-
dustani Music. Scientific Research Department, ITC Sangeet Research Academy, 2006.
[14] P. Ekman. An argument for basic emotions. Cognition & Emotion, 6(3):169–200, 1992.
[15] G. Geekie. Carnatic ragas as music information retrieval entities. In Proc. of ISMIR, pages 257–258, 2002.
[16] O. Gillet and G. Richard. Automatic labelling of tabla signals. In Proc. of ISMIR, 2003.
[17] E. Hanslick, G. Cohen, and M. Weitz. The beautiful in music. Liberal Arts Press, New York, 1957.
54
[18] S. R. Janakiraman. Essentials of Musicology in South Indian Music. The Indian Music Publishing House,
2008.
[19] P. Juslin. Music and Emotion: Theory and Research. Oxford University Press, November 2001.
[20] A. Kapur, P. Davidson, P. Cook, W. Schloss, and P. Driessen. Preservation and extension of traditional
techniques:digitizing north indian performance. Journal of New Music Research, 34(3):227–236, 2005.
[21] G. K. Koduri and B. Indurkhya. A Behavioral Study of Emotions in South Indian Classical Music and its
Implications in Music Recommendation Systems. In SAPMIA, ACM Multimedia, pages 55–60, 2010.
[22] A. Krishnaswamy. On the twelve basic intervals in south indian classical music. 10 2003.
[23] A. Krishnaswamy. Inflexions and Microtonality in South Indian Classical Music. In Frontiers of Research
on Speech and Music, 2004.
[24] A. Krishnaswamy. Melodic atoms for transcribing carnatic music. In Proc. of ISMIR, pages 345–348, 2004.
[25] A. Krishnaswamy. Multi-Dimensional Musical Atoms in South Indian Classical Music. In Proc. of the
International Conference of Music Perception & Cognition, 2004.
[26] C. L. Krumhansl. Reasoning about naming systems. Canadian Journal of Experimental Psychology,
51(4):336–353, 1997.
[27] S. K. Langer. Philosophy in a new key: a study in the symbolism of reason, rite, and art. Nueva York, EUA
: Mentor Books, 1959.
[28] K. Lee. Automatic chord recognition from audio using enhanced pitch class profile. In Proc. of the Inter-
national Computer Music Conference, 2006.
[29] M. Levy and N. A. Jairazbhoy. Intonation in North Indian Music: A Select Comparison of Theories with
Contemporary Practice. Aditya Prakashan, New Delhi, 1982.
[30] G. Loy. Musimathics: the mathematical foundations of music. Vol. II. With a foreword by John Chowning.
Cambridge, MA: MIT Press, 2007.
[31] G. Pandey, C. Mishra, and P. Ipe. Tansen: A system for automatic raga identification. In Proc. of Indian
International Conference on Artificial Intelligence, pages 1350–1363, 2003.
[32] H. S. Powers. The Background of the South Indian Raaga-System. PhD thesis, Princeton University, 1959.
[33] Pratyush. Analysis and Classification of Ornaments in North Indian (Hindustani) Classical Music. Master’s
thesis, University of Pompeu Fabra, 2010.
[34] C. V. Raman. The Indian musical drums. Proceedings Mathematical Sciences, 1(3):179–188, 1934.
[35] N. Ramanathan. Shrutis according to ancient texts. Journal of the Indian Musicological Society, 12(3):31–
37, 1981.
[36] A. Rangacharya. The Natyasastra. Munshiram Manoharlal Publishers, 2010.
[37] V. Rao and P. Rao. Vocal melody extraction in the presence of pitched accompaniment in polyphonic music.
Audio, Speech, and Language Processing, IEEE Transactions on, 18(8):2145–2154, 2010.
[38] E. Rosch. Principles of Categorization. John Wiley & Sons Inc, 1978.
55
[39] J. A. Russell. A circumplex model of affect. Journal of Personality and Social Psychology, 39:1161–1178,
1980.
[40] H. Sahasrabuddhe and R. Upadhye. On the computational model of raag music of india. In Workshop on AI
and Music: European Conference on AI, 1992.
[41] P. Sambamoorthy. South Indian Music. The Indian Music Publishing House, 1998.
[42] L. A. Schmidt and L. J. Trainor. Frontal brain electrical activity (eeg) distinguishes valence and intensity of
musical emotions. Cognition & Emotion, 15(4):487–500, 2001.
[43] V. Shankar. The art and science of Carnatic music. Music Academy Madras, Chennai, 1983.
[44] M. Sharma. Tradition of Hindustani Music. A.P.H Publishing Corporation, 2006.
[45] P. Sharma and K. Vatsayan. Brihaddeshi of Sri Matanga Muni. South Asian Books, 1992.
[46] S. Shetty and K. Achary. Raga Mining of Indian Music by Extracting Arohana-Avarohana Pattern. In
International Journal of Recent trends in Engineering, volume 1, pages 362–366. Acamey Publisher, 2009.
[47] M. Sinith and K. Rajeev. Hidden Markov Model based Recognition of Musical Pattern in South Indian
Classical Music. In IEEE International Conference on Signal and Image Processing, Hubli, India, 2006.
[48] J. A. Sloboda. Music Structure and Emotional Response: Some Empirical Findings. Psychology of Music,
19(2):110–120, 1991.
[49] R. Sridhar and T. Geetha. Raga identification of carnatic music for music information retrieval. International
Journal of Recent trends in Engineering, 1(1):571–574, 2009.
[50] M. Subramanian. Synthesizing Carnatic Music with a Computer. Sangeet Natak,(Journal of Sangeet Natak
Akademi), New Delhi, 133-134(June):16–24, 1999.
[51] M. Subramanian. Carnatic Ragam Thodi Pitch Analysis of Notes and Gamakams. Journal of the Sangeet
Natak Akademi, XLI(1):3–28, 2007.
[52] A. Swartz. Musicbrainz: A semantic web service. IEEE Intelligent Systems, 17:76–77, January 2002.
[53] D. Swathi. Analysis of Carnatic Music : A Signal Processing Perspective. Master’s thesis, IIT Madras,
2009.
[54] L. J. Trainor and L. A. Schmidt. Processing emotions induced by music. In The cognitive neuroscience of
music, pages 311–324, 2003.
[55] T. Viswanathan and M. H. Allen. Music in South India. Oxford University Press, 2004.
[56] A. Wieczorkowska, A. Datta, R. Sengupta, N. Dey, and B. Mukherjee. On Search for Emotion in Hin-
dusthani Vocal Music. Advances in Music Information Retrieval, pages 285–304, 2010.
[57] G. Wood and S. O’Keefe. On techniques for content-based visual annotation to aid intra-track music navi-
gation, 2005.
[58] R. J. Zatorre, A. C. Evans, and E. Meyer. Neural mechanisms underlying melodic perception and memory
for pitch. Journal of Neuroscience, 14(4):1908, 1994.
56