data & text mining abhay ahluwalia, chris bruck, christopher stanton, stefanie felitto, mike...

31
Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November 30, 2011

Upload: colleen-campbell

Post on 27-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Data & Text MiningAbhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus

BUAD 466: Introduction to Business Intelligence

November 30, 2011

Page 2: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Data Mining Background

Definition – the process of analyzing data from different perspectives and summarizing it into useful information

Data Mining Software (ex. XL Miner) allows users to analyze data from many different dimensions, categorize it, and summarize the relationships identified

Page 3: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

The Basics of Data Mining

Analyzes relationships and patterns in stored transaction data based on open-ended user queries Classes: Stored data is used to locate data in

predetermined groups Clusters: Data items are grouped according to logical

relationships or consumer preferences Associations: Data can be mined to identify

associations Sequential patterns: Data is mined to anticipate

behavior patterns and trends

Page 4: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Text Mining Background

Definition: the discovery by computer of previously unknown knowledge in text, by automatically extracting information from different written resources

Goal: to extract new, never-before encountered information

Text mining can expand the ability of data mining to deal with textual materials

Page 5: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Data are Key to Business Value

DATA: Measures of variables in categories

Support Decision Making

Provide Basis for Forecasting

Important to Obtain data from new sources (text mining) Integrate (mash) information from multiple sources

Page 6: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Software Example #1: VAIM (Value-Added Information Mash)

MINING: finding patterns in data (pattern-oriented, record-oriented searches)

MASHING: Integrating information mined from multiple resources Useful in Hospitals and for Government Campaigns

Page 7: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Software Example #2: IBM SPSS

Assists in Statistical Analysis in predicting trends

Categorizes data, Preforms Statistical Analysis Multiple Regressions to suggest causality

Page 8: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Software Example #3: XL Miner

Add-In on Microsoft Excel Products Builds off of software that companies already

possess

Assists in predictive forecasting based on observed data trends

Demonstration

Page 9: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Business Value Example #1: Grocery Store

Data mining using Oracle

Analyzed buying patterns

Finding lead to changes in Marketing

Increased revenues

Page 10: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Value Example #2 - University of Rochester Cancer Center

Using KnowledgeSEEKER software

Studied effect of anxiety of Chemotherapy on nausea

Analysis helped improved treatment of patients and improved quality of life.

Page 11: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Value Example #3: MGM Grand Hotel

Analyzed customer satisfaction and probability of return stay

Found that the front desk and room where most important

Focused next 6 months improving

10% improvement in attrition

Increased guest returns and profitability

Page 13: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Complications & Concerns

Invasion of Privacy According to Lita van Wel and

Lamber Royakkers in “Ethical issues in web data mining”, privacy is considered lost when information about an individual is obtained, used, or spread without that individual’s permission

Page 14: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

More Complications

• Data is made anonymous before gathered into profiles, there are no personal profiles; therefore these applications de-individualize the users by judging them just by their mouse clicks

• De-individualization: tendency of judging and treating people on the basis of group characteristics instead of on their own individual characteristics

Page 15: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

More Concerns

Companies can claim to collect the data for one purpose and use it for another

The growing movement of selling personal data as a service encourages website owners to trade personal data obtained from their site

The companies that buy the data make it anonymous and these companies and assume ownership of the data that they release

http://www.youtube.com/watch?v=zdM6vzRHrG0

Page 16: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Even More Complications

Some web mining algorithms might use controversial characteristics to categorize individuals, such as sex, race, religion, or sexual orientation This process could result in the refusal of service or a

privilege to an individual based on his race, religion, or sexual orientation.

Page 17: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Application Recommendations & Conclusion

Sync data repositories (VAIM Software)

Training

Use Data Mining and Text Mining together

Page 18: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November
Page 19: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Group Jeopardy:

Data and Text Mining Background

Business Applications

Complications with Mining

From the Examples

100 100 100 100

200 200 200 200

300 300 300 300

Page 20: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Data and Text Mining Background For 100:

True or False: Clusters refer to Data Items that are grouped according to logical relationships or consumer preferences?

True.

Home

Page 21: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Data and Text Mining Background For 200:

What is the name of the Text Mining Software that allows users to analyze data from different dimensions, categorize it, and summarize the relationships it identified, all within a familiar Microsoft Office Program?

XL Miner

Home

Page 22: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Data and Text Mining Background For 300:

Name either 2 Pro's or 2 Cons to the Business Applications of Data Mining.

Pros: extracts new info, can answer the why, creates a competitive advantage

Cons: expensive, requires training, dependent on structure of warehouses and repositories

Home

Page 23: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Business Applications for 100:

What does VAIM stand for?

Value-Added Information Mashing

Home

Page 24: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Business Applications for 200:

What is the difference between Text Mining and Text Mashing?

MINING: finding patterns in data (pattern-oriented, record-oriented searches)

MASHING: Integrating information mined from multiple resources

Home

Page 25: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Business Applications for 300:

What is the greatest benefit of Text Mining for Businesses?

Extracts new information and Combines human linguistic capabilities with the speed and accuracy of a computer

Home

Page 26: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Complications for 100:

True or False: Companies who buy the data and make it anonymous are not responsible for potential legal actions against them for using the data?

False, they are responsible and can have serious legal actions taken upon them

Home

Page 27: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Complications for 200:

What is the term used when the personal data of individuals is treated on the basis of group characteristics rather than individual characteristics? De-individualization

Home

Page 28: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

Complications for 300:

Which two US Senators introduced the Commercial Privacy Bill of Rights?

John McCain (R-AZ)

John Kerry (D-MA)

Home

Page 29: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

From the Examples for 100:

When the grocery store analyzed men's buying trends they found that when men purchased diapers and what other item did they buy?

Beer

Home

Page 30: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

From the Examples for 200:

What software did the University of Rochester Cancer Center use to analyze the affects of Chemotherapy treatments on nausea?

KnowledgeSEEKER

Home

Page 31: Data & Text Mining Abhay Ahluwalia, Chris Bruck, Christopher Stanton, Stefanie Felitto, Mike Paulus BUAD 466: Introduction to Business Intelligence November

From the Examples for 300:

What did Text Mining identify as the two most important areas of the MGM Grand Hotel?

The Front Desk and the Room

Home