text analytics on google app

Post on 22-Jan-2017

112 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Text Analytics on Google App

Nan (Miya) Wang, Chang Yin, Ellin Iqra, Anshu Vyas

INTRODUCTION

2

ANALYSIS OVERVIEW

TEXT CLUSTER

ING(Python)

TEXT CLASSIFICAT

ION(Python)

SUCCESSANALYSIS(Tableau)

3

● March 1st - April 21● 2227 Google apps● 12 variables

● Two Text Variables

DATASET

Source: https://play.google.com/store/apps/category/GAME/collection/topselling_new_free4

EXPLANATORY ANALYSIS

5

EXPLANATORY ANALYSIS

Word Count for Description Text

Average: 1400

6

• Remove Numbers/Stopwords

• Lowercase

• Tokenize

• Stemming

• Feature Engineering

TEXT PREPARATION

7

TEXT CLUSTERINGFeature

Engineering-----TFIDF “min_df= 4”“ngram range” =(1,5)135 Features

Top 20 TFIDF Terms

8

TEXT CLUSTERING 'DT' 'JJ''JJS', 'JJR''NN''NNP' 'RB''VB''VBP''VBZ' 'RBR''VBD''VBN'

Feature Engineering-----POS

9

TEXT CLUSTERING

90 Principals can explain all the variance

Feature Engineering-----PCA

10

TEXT CLUSTERING

KMeans

Only 197 unique text about update for 2227 APPs.

Model Building

11

TEXT CLUSTERINGEvaluation----Cluster

0

12

TEXT CLUSTERINGEvaluation----Cluster

1

13

TEXT CLUSTERINGEvaluation----Cluster

2

14

TEXT CLUSTERINGEvaluation----Cluster

3

15

TEXT CLUSTERINGEvaluation----Cluster

4

16

TEXT CLASSIFICATIONPreprocessing

Choose Just 10 out of 23 categories

17

TEXT CLASSIFICATIONFeature Engineering ----

Parameter Setting

18

TEXT CLASSIFICATIONFeature EngineeringFeature Engineering ----- POS

Feature Counts by POS

19

TEXT CLASSIFICATIONFeature EngineeringFeature Engineering -----

TFIDFDrop Features with TFIDF

larger than 0.074 and smaller than 0.3

About 51000 features reduced to 20040

20

TEXT CLASSIFICATIONModel Building

21

SUCCESS ANALYSIS

● Average of Num of star and review num for each operating sys.

● The data is filtered on Num Install over 50,000

Varies with

Device 22

● Average of Num of star and review num for each subcategory.

● Data is filtered on Num Install over 50,000

Apps with largest installations are all in Strategy Group.

SUCCESS ANALYSIS

23

● Average of Num of star and review num for Top Developer or not.

● Data is filtered on Num Install over 50,000

SUCCESS ANALYSIS

24

CONCLUSION

● Five labels available for application updation:➢ Add New Features➢ GamePlay Modification➢ Improve Levels➢ Fix Bugs & New Versions➢ Fix Minor Bugs & Optimization

● 71% of Accuracy for classify apps into 10 subcategories.

● App Subcategory, Developer, Operating System have correlation relationship with APP success.

25

top related