text mining wokrshop
TRANSCRIPT
Center for Process Innovation Colloquium SeriesWorkshop on Text Mining
Presented in Collaboration with The Institute for Insight
Zhitao YinPh.D. Candidate
Workshop Developed with The Guidance of Dr. Arun Rai
Center for Process InnovationJ. Mack Robinson College of Business
Georgia State UniversityDecember 4, 2015
Experience on Yelp
Alan, (GSU Alumni)Starts a Chinese restaurant in Las Vegas
How to improve word-of-mouth on Yelp?
How to improve word-of-mouth on Yelp?
You can’t manage if you do not measure.
How to improve word-of-mouth on Yelp?
You can’t manage if you do not measure.
● What is the average customer’s attitude toward Chinese restaurants?
● What are the most commonly used words in negative Chinese restaurant reviews?
● What are the aspects when customers talk about Chinese restaurants?
● How are the most commonly used words and the aspects associated with the restaurant’s rating?
How to improve word-of-mouth on Yelp?
You can’t manage if you do not measure.
● What is the average customer’s attitude toward Chinese restaurants?
● What are the most commonly used words in negative Chinese restaurant reviews?
● What are the aspects when customers talk about Chinese restaurants?
● How are the most commonly used words and the aspects associated with the restaurant’s rating?
Text mining will give you insight on how to MEASURE.
Do you want to learn text miningto help Alan?
Three modules:
1. Lexicon-based word counting
2. Algorithm-based word counting
3. Topic modeling
Outline
Application
Concept
Experience
Three modules:
1. Lexicon-based word counting
2. Algorithm-based word counting
3. Topic modeling
Outline
Application
Concept
Experience
Three modules:
1. Lexicon-based word counting
2. Algorithm-based word counting
3. Topic modeling
Each module include:
● 15 mins demo & key takeaways● 20 mins exercise & break
Outline
Everything I Assume
● Minimum requirements
○ You are willing to help Alan.○ Beginner knowledge of Python
Everything I Assume
● Minimum requirements
○ You are willing to help Alan.○ Beginner knowledge of Python
● Necessary to understand 90%+
○ You are willing to help Alan.○ Intermediate knowledge of Python○ Beginner knowledge of Regular Expression○ Intermediate knowledge of Statistics
Don’t panic! Buddy up! Group learning is good for you!
Open your iPython Notebookor
Go to http://bit.do/onlinecode
What are your takeaways?
Workshop Takeaways
Question Defining a million-dollar and tractable question is the priority!!!
Workshop Takeaways
Text Mining
Question
● Text mining provides a way to measure
● Very clear about the context under which each technique is appropriate
You can’t manage if you do not have measure
Defining a million-dollar and tractable question is the priority!!!
Workshop Takeaways
Text Mining
Question
Data
● Text mining provides a way to measure
● Very clear about the context under which each technique is appropriate
You can’t manage if you do not have measure
You can’t measure if you do not have right data
Defining a million-dollar and tractable question is the priority!!!
Cleaning data takes a lot of time!!!
Feedback QuestionnaireGo to http://bit.do/textfeedback