generic framework for knowledge classification-1

13

Upload: venkata-vineel

Post on 15-Aug-2015

45 views

Category:

Documents


1 download

TRANSCRIPT

2

Generic Framework For Knowledge Classification

By Venkata Vineel

3

Agenda

•  Introduction •  Problem at Hand •  How is it solved ? •  Challenges •  Skills and Career alignment •  Q & A

4

Introduction

•  Masters in Computer Science University of Utah, SaltLakeCity, UT •  Systems Engineering Intern Internal tools team - Knowledge Management Interests: Scalability challenges, Machine Learning and Visualization.

5

Problem at Hand

•  Generic Framework for classifying knowledge •  Classifying questions in Answer Hub

6

How did I solve ??

•  Developed an generic algorithm.

•  Answer Hub Knowledge Base that learns.

7

Project High Points

•  72 % percent accuracy has been achieved.

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

1

3

5

7

9

11

13

15

17

19

21

23

Rank Statastics

No of Questions RANK CATEGORIES

8

Confusion matrix

Categories   V3   GBX   C3   Hadoop   BES   DAL   Raptor   Stratus   Security  Pla>orm   General   User  Tracking   ExperimentaEon   Service  Frameworks   Search  Services   Sherlock   Batch  Frameword   Trinity   Commerce  OS     Teradata   AnalyEcs  Pla>orm   Total  

V3   1552   2   1   2   6   263   217   3   23   455   2   41   290   9   3   6   0   0   0   0   2875  

GBX   1   68   0   0   0   6   37   0   1   9   1   26   4   8   0   0   0   1   0   0   162  

C3   0   0   318   1   1   25   27   54   5   32   1   6   1   4   0   1   0   1   1   0   478  

Hadoop   0   0   2   173   1   10   8   0   0   20   1   3   4   0   3   0   0   0   0   0   225  

BES   11   0   0   0   300   59   39   1   0   5   0   1   22   0   0   0   0   0   0   0   438  

DAL   67   0   1   0   3   2307   89   0   2   16   0   13   99   5   0   1   0   0   0   0   2603  

Raptor   11   10   5   2   25   396   5352   3   62   212   26   184   337   25   6   17   0   0   1   0   6674  

Stratus   1   0   82   2   1   40   188   435   4   40   0   13   6   0   2   1   0   1   0   0   816  

Security  Pla>orm   4   0   0   0   0   32   38   0   174   11   0   6   129   1   0   1   0   0   0   0   396  

General   100   2   12   15   6   129   258   16   13   1200   3   88   64   29   4   3   0   0   5   0   1947  

User  Tracking   3   0   0   1   0   16   43   0   3   8   126   41   10   1   0   0   0   0   0   0   252  

ExperimentaEon   1   1   0   0   0   27   40   0   1   8   0   868   29   1   0   0   0   0   3   0   979  

Service  Frameworks   124   3   0   0   6   90   299   2   67   83   0   56   1977   38   5   3   0   11   0   0   2764  

Search  Services   0   1   1   0   1   5   9   1   2   8   0   4   32   163   0   0   0   0   0   0   227  

Sherlock   2   0   0   4   0   67   31   2   0   17   0   29   19   0   85   0   0   0   0   0   256  

Batch  Frameword   11   0   0   2   2   100   92   2   2   10   0   2   22   0   0   67   0   0   1   0   313  

Trinity   0   0   0   0   0   0   0   0   0   0   0   4   1   1   0   0   0   0   0   0   6  

Commerce  OS     0   0   0   0   0   10   48   0   4   15   0   14   15   8   0   0   0   103   0   0   217  

Teradata   0   0   1   1   0   10   0   0   0   0   1   16   2   1   0   1   0   0   49   0   82  

AnalyEcs  Pla>orm   0   0   1   1   0   5   1   0   1   23   1   14   0   3   1   0   0   0   1   11   63  

Total   1888   87   424   204   352   3597   6816   519   364   2172   162   1429   3063   297   109   101   0   117   61   11   21773  

Percentage  correct   82.20339   78.16092   75   84.80392   85.22727   64.13678   78.52113   83.81503   47.8021978   55.24862   77.77777778   60.74177747   64.54456415   54.88215488   77.98165   66.33663366   #DIV/0!   88.03418803   80.32787   100      

9

Challenges and How Did We Overcome Those

•  Sparse data.

•  Large number of features. •  Chi- Square test came to the rescue.

10

Skills Obtained

•  Lucene

•  Literature survey of existing techniques

•  Machine Learning and NLP

•  Exposure to productizing research

11

Alignment With My Career Path

•  Interested in Text and Machine Learning. •  eBay has tonnes of data.

12

Future Scope for Improvement

•  User profile •  Support Vector Machine, TF-IDF and k-NN algorithms

13

Q&A