text mining wokrshop

20
Center for Process Innovation Colloquium Series Workshop on Text Mining Presented in Collaboration with The Institute for Insight Zhitao Yin Ph.D. Candidate Workshop Developed with The Guidance of Dr. Arun Rai Center for Process Innovation J. Mack Robinson College of Business Georgia State University December 4, 2015

Upload: zhitao-yin

Post on 14-Apr-2017

494 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Text Mining Wokrshop

Center for Process Innovation Colloquium SeriesWorkshop on Text Mining

Presented in Collaboration with The Institute for Insight

Zhitao YinPh.D. Candidate

Workshop Developed with The Guidance of Dr. Arun Rai

Center for Process InnovationJ. Mack Robinson College of Business

Georgia State UniversityDecember 4, 2015

Page 2: Text Mining Wokrshop

Experience on Yelp

Page 3: Text Mining Wokrshop

Alan, (GSU Alumni)Starts a Chinese restaurant in Las Vegas

Page 4: Text Mining Wokrshop

How to improve word-of-mouth on Yelp?

Page 5: Text Mining Wokrshop

How to improve word-of-mouth on Yelp?

You can’t manage if you do not measure.

Page 6: Text Mining Wokrshop

How to improve word-of-mouth on Yelp?

You can’t manage if you do not measure.

● What is the average customer’s attitude toward Chinese restaurants?

● What are the most commonly used words in negative Chinese restaurant reviews?

● What are the aspects when customers talk about Chinese restaurants?

● How are the most commonly used words and the aspects associated with the restaurant’s rating?

Page 7: Text Mining Wokrshop

How to improve word-of-mouth on Yelp?

You can’t manage if you do not measure.

● What is the average customer’s attitude toward Chinese restaurants?

● What are the most commonly used words in negative Chinese restaurant reviews?

● What are the aspects when customers talk about Chinese restaurants?

● How are the most commonly used words and the aspects associated with the restaurant’s rating?

Text mining will give you insight on how to MEASURE.

Page 8: Text Mining Wokrshop

Do you want to learn text miningto help Alan?

Page 9: Text Mining Wokrshop

Three modules:

1. Lexicon-based word counting

2. Algorithm-based word counting

3. Topic modeling

Outline

Page 10: Text Mining Wokrshop

Application

Concept

Experience

Three modules:

1. Lexicon-based word counting

2. Algorithm-based word counting

3. Topic modeling

Outline

Page 11: Text Mining Wokrshop

Application

Concept

Experience

Three modules:

1. Lexicon-based word counting

2. Algorithm-based word counting

3. Topic modeling

Each module include:

● 15 mins demo & key takeaways● 20 mins exercise & break

Outline

Page 12: Text Mining Wokrshop

Everything I Assume

● Minimum requirements

○ You are willing to help Alan.○ Beginner knowledge of Python

Page 13: Text Mining Wokrshop

Everything I Assume

● Minimum requirements

○ You are willing to help Alan.○ Beginner knowledge of Python

● Necessary to understand 90%+

○ You are willing to help Alan.○ Intermediate knowledge of Python○ Beginner knowledge of Regular Expression○ Intermediate knowledge of Statistics

Page 14: Text Mining Wokrshop

Don’t panic! Buddy up! Group learning is good for you!

Page 15: Text Mining Wokrshop

Open your iPython Notebookor

Go to http://bit.do/onlinecode

Page 16: Text Mining Wokrshop

What are your takeaways?

Page 17: Text Mining Wokrshop

Workshop Takeaways

Question Defining a million-dollar and tractable question is the priority!!!

Page 18: Text Mining Wokrshop

Workshop Takeaways

Text Mining

Question

● Text mining provides a way to measure

● Very clear about the context under which each technique is appropriate

You can’t manage if you do not have measure

Defining a million-dollar and tractable question is the priority!!!

Page 19: Text Mining Wokrshop

Workshop Takeaways

Text Mining

Question

Data

● Text mining provides a way to measure

● Very clear about the context under which each technique is appropriate

You can’t manage if you do not have measure

You can’t measure if you do not have right data

Defining a million-dollar and tractable question is the priority!!!

Cleaning data takes a lot of time!!!

Page 20: Text Mining Wokrshop

Feedback QuestionnaireGo to http://bit.do/textfeedback