generic framework for knowledge classification-1
TRANSCRIPT
3
Agenda
• Introduction • Problem at Hand • How is it solved ? • Challenges • Skills and Career alignment • Q & A
4
Introduction
• Masters in Computer Science University of Utah, SaltLakeCity, UT • Systems Engineering Intern Internal tools team - Knowledge Management Interests: Scalability challenges, Machine Learning and Visualization.
5
Problem at Hand
• Generic Framework for classifying knowledge • Classifying questions in Answer Hub
7
Project High Points
• 72 % percent accuracy has been achieved.
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
1
3
5
7
9
11
13
15
17
19
21
23
Rank Statastics
No of Questions RANK CATEGORIES
8
Confusion matrix
Categories V3 GBX C3 Hadoop BES DAL Raptor Stratus Security Pla>orm General User Tracking ExperimentaEon Service Frameworks Search Services Sherlock Batch Frameword Trinity Commerce OS Teradata AnalyEcs Pla>orm Total
V3 1552 2 1 2 6 263 217 3 23 455 2 41 290 9 3 6 0 0 0 0 2875
GBX 1 68 0 0 0 6 37 0 1 9 1 26 4 8 0 0 0 1 0 0 162
C3 0 0 318 1 1 25 27 54 5 32 1 6 1 4 0 1 0 1 1 0 478
Hadoop 0 0 2 173 1 10 8 0 0 20 1 3 4 0 3 0 0 0 0 0 225
BES 11 0 0 0 300 59 39 1 0 5 0 1 22 0 0 0 0 0 0 0 438
DAL 67 0 1 0 3 2307 89 0 2 16 0 13 99 5 0 1 0 0 0 0 2603
Raptor 11 10 5 2 25 396 5352 3 62 212 26 184 337 25 6 17 0 0 1 0 6674
Stratus 1 0 82 2 1 40 188 435 4 40 0 13 6 0 2 1 0 1 0 0 816
Security Pla>orm 4 0 0 0 0 32 38 0 174 11 0 6 129 1 0 1 0 0 0 0 396
General 100 2 12 15 6 129 258 16 13 1200 3 88 64 29 4 3 0 0 5 0 1947
User Tracking 3 0 0 1 0 16 43 0 3 8 126 41 10 1 0 0 0 0 0 0 252
ExperimentaEon 1 1 0 0 0 27 40 0 1 8 0 868 29 1 0 0 0 0 3 0 979
Service Frameworks 124 3 0 0 6 90 299 2 67 83 0 56 1977 38 5 3 0 11 0 0 2764
Search Services 0 1 1 0 1 5 9 1 2 8 0 4 32 163 0 0 0 0 0 0 227
Sherlock 2 0 0 4 0 67 31 2 0 17 0 29 19 0 85 0 0 0 0 0 256
Batch Frameword 11 0 0 2 2 100 92 2 2 10 0 2 22 0 0 67 0 0 1 0 313
Trinity 0 0 0 0 0 0 0 0 0 0 0 4 1 1 0 0 0 0 0 0 6
Commerce OS 0 0 0 0 0 10 48 0 4 15 0 14 15 8 0 0 0 103 0 0 217
Teradata 0 0 1 1 0 10 0 0 0 0 1 16 2 1 0 1 0 0 49 0 82
AnalyEcs Pla>orm 0 0 1 1 0 5 1 0 1 23 1 14 0 3 1 0 0 0 1 11 63
Total 1888 87 424 204 352 3597 6816 519 364 2172 162 1429 3063 297 109 101 0 117 61 11 21773
Percentage correct 82.20339 78.16092 75 84.80392 85.22727 64.13678 78.52113 83.81503 47.8021978 55.24862 77.77777778 60.74177747 64.54456415 54.88215488 77.98165 66.33663366 #DIV/0! 88.03418803 80.32787 100
9
Challenges and How Did We Overcome Those
• Sparse data.
• Large number of features. • Chi- Square test came to the rescue.
10
Skills Obtained
• Lucene
• Literature survey of existing techniques
• Machine Learning and NLP
• Exposure to productizing research
11
Alignment With My Career Path
• Interested in Text and Machine Learning. • eBay has tonnes of data.