data mining in social media
DESCRIPTION
Welcome. Objectives. Data Mining. Data Mining in Social Media. Social Media. Scale. Mining Social Media. Data Collection. Apriori Algorithm. K-Means Algorithm. Uses and benefits. Summary. By: Anthony Smith & Joey Fazzani. Objectives. Welcome. Objectives. Data Mining. - PowerPoint PPT PresentationTRANSCRIPT
PowerPoint Presentation
Data Mining in Social MediaBy:Anthony Smith & Joey FazzaniWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
Welcome to our presentation on data mining in Social Media1ObjectivesThings to learn;
Sentiment analysisUse of algorithmsUses and benefits of mining social media
WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
Here are just a few things that we hope you learn within the next 5 mins.
Sentiment analysisThe use of AlgorithmsThe Uses and Benefits of mining social media. 2http://findicons.com/search/data-mining
Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both.What is Data Mining?http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htmWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
Here is a definition of data mining.
(Pause for 5 seconds to allow users to read)
To summarise: Data Mining is used to process and analyse data to then make meaningful use of it. 3
http://www.417marketing.com/wp-content/uploads/2013/08/Social-Media.jpgSocial mediais digital content and interaction that is created by and between people.http://heidicohen.com/social-media-definition/WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
What is Social Media?Social Media is digital content and interaction that is created by and between people. 4
ScaleInteraction:Users UsersUsers Contenthttp://www.reachsolutions.co.nz/services/social-media-marketingWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
Social media ranges from Youtube to facebook to virtual worlds and gaming such as World of War Craft
As well as showing how users interact with one another it also shows how users interact with the content of the web.
Social Media has billions of users on Hundreds of site, giving huge traces of human activities.
By Using data mining and machine learning to reduce the noise we can make data meaningful.
5Mining Social Media
What data to extract? Enough to be meaningfulNeeds to be scopedWithin system capabilityhttp://www.dundas.com/blog-post/the-perils-of-big-data/http://www.hardcorehockey.co.uk/article/on-the-pitch/warm-up/warm-up-arm-swings
Continuously ChangingAs data changes, ACT
WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
Within mining social media we must consider how big and dynamic the data is.The size of data must be large enough to be meaningful, scoped to meet requirements and within the systems capabilities.
Data is constantly changing and being updated so this must also be considered.
6Data CollectionTwitter
Unstructured Data
Tweets: long string of textAuthortext of tweetHashtagEtc.
Break down into columns.
WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
As a practical example we will use Twitter.
As with many items on the web the data is unstructured
We need to extract the tweets as a long string of text. This will include information such as the author, Text of the tweet and the hashtag. The string must be broken down into these sections.
7Data Collection(2)Enter Key words to search forEnter Duration
IPhone, IOSAndroidBlackberry
WelcomeObjectivesData MiningSocial MediaScaleGrowthRepresentation Summary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsYou can scrape websites looking for key words as well as how long to complete the search for.
We can set key words for example IPhone, Anroid, Blackberry,8Data Collection(3)Sentiment analysis Look in text field for
GoodLove, great, etc.
BadHate, doubt, etc.
Love my new iPhone #happy
Hate my new iPhone #brokenWelcomeObjectivesData MiningSocial MediaScaleGrowthRepresentation Summary Mining Social MediaApriori AlgorithmK-Means AlgorithmUses and benefitsWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaApriori AlgorithmUses and benefits
WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsWe can complete a sentiment analysis which looks for predefined terms that are associated with good or bad sentiments.
I love my new Iphone would be associated with good because it contains a good sentiment in loveI hate my new Iphone would be associated with bad because it contains a bad sentiment in hate9Data Collection(4)Wealth of information
In 3 min = 398 tweets
398 * 20 = 7, 960 = 1 HOUR 7, 960 * 24 = 191, 040 DAY
Could be GB, TB or PB of information WelcomeObjectivesData MiningSocial MediaScaleGrowthRepresentation Mining Social MediaApriori AlgorithmK-Means AlgorithmUses and benefitshttp://www.youtube.com/watch?v=Jqq66INlQ0U
WelcomeObjectivesData MiningSocial MediaScaleMining Social MediaApriori AlgorithmUses and benefitsWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsIn 3 min 398 tweets were submitted containing keywords IphoneAndroidBlackberry
At the same rate of user input this is equivalent to 191, 040 tweets a day
10The Apriori AlgorithmUsed to find Association Rules
Especially significant in customer transactions WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsCustomerItems Bought 1A, D. F G 2B, C D, G3D, H4A, B, HD, F, G
The apriori algorithm is often used to find assiciation pattern in customer spending. For example if customer 1 buys Product A they are also likely to buy Product B11The Apriori AlgorithmWith the data from the databaseAssociation rulesIdentifies which OS systems are mentioned with positive or negative comments most frequently. WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
Using the data collected previously we must apply a set of association rules that enable us to identify the information we require, in this case we wish to discover which Operating Systems are mentioned in Tweets and what comments or types of comments are said about them.12The Apriori Algorithm
http://en.wikipedia.org/wiki/Apriori_algorithmWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
Here is the pseudo code for the algorithm, although this may look very confusing, everything will become a lot clearer in the next few slides. 13WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsThe Apriori AlgorithmDatabaseTweetKey Words in the tweet1IOS, Iphone, Apple2Android, IceCream, JellyBean3IOS, IceCream, Android, Iphone, JellyBean4Amdroid, JellyBeanC1KeywordFrequencyIOS2IceCream3Iphone3Apple1JellyBean3L1KeywordFrequencyIOS2IceCream3Iphone3JellyBean3Scan DatabaseDrop anything Under 0.5
Here we have a database with tweets and the keywords that were extracted.
We scan the database and output how many times each key word appeared.
We set a parameter so anything with a frequency under this is disregarded as been insignificant. 0.5 in this case.
C1 shows the frequencies.
L1 is showing the frequencies above 0.5. Apple has been dropped as it has a probability of 0.25. This is worked out by the frequency divided by number of tweets.14WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsThe Apriori AlgorithmL1KeywordFrequencyIOS2IceCream3Iphone3JellyBean3C2InstancesFrequency[IOS, IceCream]1[IOS, Iphone]2[IOS JellyBean]1[IceCream, Iphone]2[IceCream, JellyBean]3[Iphone, JellyBean]2C3InstancesFrequencyIOS, Iphone2IceCream, Iphone2IceCream, JellyBean3Iphone, JellyBean2
C2 shows all the possible values of L1 and a second Scan of the Database is completed to find how many times those instances appeard.. For example number of times Iphone and IOS appeared in the same tweet
We drop the highlight sections as they are less than 0.5. worked out by Number of times both keywords appear divided by number of tweets.
The result of the algorithm is: the combination of items that appear frequently within the database are highlight.
This can be extended to show do IPhone and bad? Or IPhone and Good? Appear frequently 15The Apriori AlgorithmIPhone is good
Android is badantecedentconsequentIPhone is goodXYRelationshipXYIntroductionWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
If people who mention x(iphone or android) also say y(good or bad) then we can state a relationship exists between them. An association rule is an implication of the form x--->y meaning X is the antecedent (a thing that existed before) and y is the consequent(the thing following) ((Iphone is good/Android is bad)).
16The K-means AlgorithmK-means consumer clustering we can decipherThe most popular OS within clusters What users are more likely to continue using given OS and which are likely to change. Collect data that can infer marketing decisionsWelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
To further our findings we could incorporate clustering through K-means algorithm which would allow us to discover information on things such as brand alliance and which user groups are more likely to stay with their current systems etc.17The K-means Algorithmhttp://www.imore.com/sites/imore.com/files/styles/large/public/field/image/2013/09/pink_iphone5c.png?itok=WuLq66WY
WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefitsNergis Ylmaz and Glfem Iklar Alptekinhttp://www.iaeng.org/publication/WCE2013/WCE2013_pp1611-1616.pdfArticle
In this article they combine the K-means algorithm and apriori Algorithm. This helped them identify that female consumers generally use iPhones and specify an iPhone as their next choices. Therefore it would be a good idea to create social media campaigns for female consumers.
The introduction of new products such as a pink iPhone and other ideas are often brought to the market on the back of information gained through these pratices.
18Uses and benefits within Social MediaUseful for finding customer groups that share interests.
Marketing campaigns can be altered to target specific areas to generate revenueto meet consumer needs.WelcomeObjectivesData MiningSocial MediaScaleSummary Mining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
The above algorithms and other data mining and text analytics techniques can
And Without social media this information could not be obtained19What weve learnt todaySentiment analysisApriori algorithm Uses and benefits of mining social media
WelcomeObjectivesData MiningSocial MediaScaleSummaryMining Social MediaData CollectionApriori AlgorithmK-Means AlgorithmUses and benefits
Thank you very much for listening, we hope you have learned a few things today. Bye now.20