analytics of the mahabharata

7
MAHABHARATA – TEXTUAL ANALYSIS PRATAP VARDHAN [email protected] 1.8+ Million Words 18 Varnas – Books 1800+ Chapters 80+ Actors 1.3+ Lakh Sentences 14300+ Common Nouns 13200+ Proper Nouns 5000+ Verbs 1600+ Adjectives 1000+ Adverbs From The Mahabharata of Krishna-Dwaipayana Vyasa were analysed and visualized using Python, R and d3js. Word cloud is constructed of the entire Mahabharata text. Size of a particular word is proportional to number of times it occurs in the text. This gives us an overview of the textual nature of the language used. Word cloud on primary characters and their occurrences reveal the main characters involved in the story. As you see, Yudhishthira, Bhima, Arjuna, Krishna, Bhishma, Karna, Vaisampayana, Duryodhana, Pandu, Drona, Dhritarashtra, Sanjaya and Kunti are the main ‘actors’. © 2013, Pratap Vardhan

Upload: pratap-vardhan

Post on 12-Feb-2017

669 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Analytics of the Mahabharata

MAHABHARATA – TEXTUAL ANALYSIS PRATAP VARDHAN [email protected]

1.8+ Million Words 18 Varnas – Books 1800+ Chapters 80+ Actors 1.3+ Lakh Sentences 14300+ Common Nouns 13200+ Proper Nouns 5000+ Verbs 1600+ Adjectives 1000+ Adverbs From The Mahabharata of Krishna-Dwaipayana Vyasa were analysed and visualized using Python, R and d3js.

Word cloud is constructed of the entire Mahabharata text. Size of a particular word is proportional to number of times it occurs in the text. This gives us an overview of the textual nature of the language used.

Word cloud on primary characters and their occurrences reveal the main characters involved in the story. As you see, Yudhishthira, Bhima, Arjuna, Krishna, Bhishma, Karna, Vaisampayana, Duryodhana, Pandu, Drona, Dhritarashtra, Sanjaya and Kunti are the main ‘actors’.

© 2013, Pratap Vardhan

Page 2: Analytics of the Mahabharata

SENTIMENT ANALYSIS

Sentiment Analysis of The

Mahabharata Positive Sentiment | Negative Sentiment

1.3+ Lakh Sentences

Shanti Parva Longest Parva

12th Book

Sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. So, we algorithmically determine if a sentence is positive (“I like going to movies”) or negative (“I hate watching horror movies”). Neutral sentences (“I am sitting here in the hall”) are given a zero score.

These visualizations shows a moving average of the data to provide a coherent story. Normal raw sentiment data doesn’t give an accurate ‘eye’ picture.

For better viewing of ups and downs of sentiment, Shanti Parva is alone taken in this case.

© 2013, Pratap Vardhan

Page 3: Analytics of the Mahabharata

ACTORS - NETWORKS

Below is an adjacency matrix heatmap, which measures the closeness between two actors by counting the number of times two actors appear within a range of 50 words. The darker the colour, higher the closeness between the two actors. This map helps us to visually see if any local communities/groups appear. We do see top 10 actors appear on top rows of the heatmap.

We plot the adjacency matrix into a weighted network of nodes. Where, the size of node is proportional to sum weighted connections it is connected to. The links between the nodes are weighted. And, we try to identify hidden clusters in the network by measuring modularity of the network. Three different colours are given to identify the groups.

The Top 10 well connected actors of Mahabharata

Yudhishthira is at the centre of the network. Pandu is closer to Arjun than to any of his sons. Dhritarashtra is central to Kunti and Pandu. Drona and Kripa; Nakula and Sahadeva; Dhritharastra and Vidura are closer and in one group. Bhima is comfortably surrounded by atleast one member of each group.

© 2013, Pratap Vardhan

Page 4: Analytics of the Mahabharata

CORRELATIONS - GROUPS

Previous data is used to create Hierarchical tree which shows actors who are closer to each other in one branch. Each actor starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. The results of hierarchical clustering are presented in a dendrogram along with heatmap.

The dendogram is partitioned into two main clusters. The first cluster has the Top 10 actors.

From cluster one, we see Bhishma is closer to Yudhishthra. Krishna is closer to Bhima and Arjun and belong in a group.

From cluster two, we see Nakula is closer to Sahadeva. Also Dhrishtadhyumna and Satyaki are closer to the same group.

Cluster two provides interesting results. Where secondary actors interact with few specific secondary actors a lot.

© 2013, Pratap Vardhan

Page 5: Analytics of the Mahabharata

CLUSTERS - SENTIMENTS

This Cluster Dendrogram represents complete drill down of the hierarchy of 80 Actors from the text. These actors are primarily clustered into four groups. We tried to analyse positive and negative

sentiments of the actors from first and second groups across 18 parvas.

Secondary actors are hardly present in Shanti Parva. Sanjaya has strong polarity of sentiments in Drona Parva and Karna Parva. Satyaki and Dhrishtadyumna are vocal in Drona Parva. Kunti is active throughout. Whereas, Vaisampayana is active in the beginning and in the end. Sahadeva and Nakula are relatively more positive in the given context.

Most actors in this group are highly vocal in Drona Parva, Karna Parva and Shalya Parva. Interestingly, in Shanti Parva (longest parva) only Bhisma and Yudhishthira are highly vocal with polar sentiments. Bhima is highly vocal with both positive and negative sentiments throughout. Of all the actors, Bhishma seems to be neutral in Drona Parva.

© 2013, Pratap Vardhan

Page 6: Analytics of the Mahabharata

BUBBLE PACKING NORMALIZED DATAMAPS

© 2013, Pratap Vardhan

Bubbles Packed here shows Top 5 words from each book in a bubble. Bubble size of each word is proportional to its count.

The figure above shows bubble for their raw count. The one below takes normalized word count.

We here see the occurrence of overall top 10 words across 18 books. Row data is normalized with the maximum value.

In this normalized map here, we calculate the number of times an actor appear relative to others in a given book. This map helps us to identify ‘secondary’ characters which have their importance in a given book.

Word ‘battle’ occurs at a frequency in Shalya, Karna, Drona and Bhishma Parvas and doesn’t have much presence in other books.

Words ‘thee’ and ‘like’ appear regularly throughout.

High frequency word ‘son’ peaks in the middle.

We find some actors have a prominent presence in a given parva and are hardly appear anywhere else. For ex, Virata, Uttara, Kichaka in Virata Parva; Nala, Ravana, Markandeya, Damayanti in Vana Parva; and Bhrigu, Surya, Vasishtha, Kasyapa etc in Shanti and Anusansa Parva

Page 7: Analytics of the Mahabharata

SENTENCE TREE ANALYSER

Sentence Tree Analyser lets you pick a word and shows you all the different contexts in which the word or phrase appears in a particular parva or entire Mahabharata. The contexts are arranged in a tree-like branching structure to reveal recurrent themes and phrases.

Here we show three Trees from 2nd Parva. Using the search terms ‘lord’, ‘fortune’ and ‘cruel’.

© 2013, Pratap Vardhan

If you are interested to know more, drop a mail at

[email protected]