deep learning automated helpdesk

66
AUTOMATED HELPDESK FINAL YEAR PROJECT (7TH SEM) SUBMITTED BY NIKHIL PATHANIA PARTHA PRATIM KURMI PRANAV SHARMA RISHABH KUMAR SOURAV KUMAR PAUL

Upload: pranav-sharma

Post on 07-Jan-2017

32 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Deep Learning Automated Helpdesk

AUTOMATED HELPDESKFINAL YEAR PROJECT (7TH SEM)

SUBMITTED BYNIKHIL PATHANIA

PARTHA PRATIM KURMIPRANAV SHARMARISHABH KUMAR

SOURAV KUMAR PAUL

Page 2: Deep Learning Automated Helpdesk

PRESENTATION TIMELINE

Theoretical NLPKnowledge Base Design

By – Pranav Sharma

Practical NLP ApplicationForming of Tokens

By – Rishabh Kumar

Clustering

By – Sourav Kr Paul 

Tensorflow

By – Nikhil Pathania

Query Model

By – Partha PratimKurmi

Page 3: Deep Learning Automated Helpdesk

PROJECT TIMELINE

Problem FormulationSep-2016

Literature SurveySept-Oct 2016

Design MethodologyNov-2016

Synchronizing Modules

Nov-2016

Basic ImplementationJan- Feb 2017

Working ModelMar-2017

Accuracy ImprovementsMar- Apr 2017

Page 4: Deep Learning Automated Helpdesk

PROBLEM STATEMENT

Automate the task of customer centers.

AIM - Build a system to answer questions like"How to recharge my mobile?" - PayTM

"How to pay my bills?" - PayTM

"Why is my refund not credited?" - Book My Show

Page 5: Deep Learning Automated Helpdesk

Training Model

1.1Raw Data

1.2NLP

1.3Preprocessing

1.4Knowledge Base

1.5Clustering

O/P

Page 6: Deep Learning Automated Helpdesk

INFORMATION RETRIEVAL

• Data Sources• FAQ's• Past forum data

• Proper data extraction model• Knowledge base 

Page 7: Deep Learning Automated Helpdesk

DATA EXTRACTION MODELSWHY NLP?

• 3 steps process.• Extends with clustering.• Fast, accurate.

NLP• 4 step process.• No extension with clustering.• Smaller domain.

PATTERN MATCHING

ExampleKnowledge Base - “The CEO of IBM is Samuel Palmisano.”Query - “Who is the CEO of IBM?”Format - \Q is \A

Page 8: Deep Learning Automated Helpdesk

Training Model

1.1Raw Data

1.2NLP

1.3Preprocessing

1.4Knowledge Base

1.5Clustering

O/P

Page 9: Deep Learning Automated Helpdesk

NATURAL LANGUAGE PROCESSING

• Problem Domain – English.• Aim.• Origin - Turing Test.• Annotating the sentence.• Clouds exist on mars. => <cloud, exist, mars>• Kernel sentences, T Expressions.

Page 10: Deep Learning Automated Helpdesk

KERNEL SENTENCE, T-EXP

• Kernel Sentences.• Ternary Expressions.• <Subject, Relation, Object>

Page 11: Deep Learning Automated Helpdesk

AN EXAMPLE

Page 12: Deep Learning Automated Helpdesk

KNOWLEDGE BASE

• What is it?• What to store? Proper data structure.• Mapping to original set.• NLP Annotations, parameterized variants.

Page 13: Deep Learning Automated Helpdesk

Training Model

1.1Raw Data

1.2NLP

1.3Preprocessing

1.4Knowledge Base

O/P

1.5Clustering

Page 14: Deep Learning Automated Helpdesk

PREPROCESSING:-

• Tokenization• Stop words removal.• Stemming.• POS Tagging.

Page 15: Deep Learning Automated Helpdesk

NLTK ( NATURAL LANGUAGE TOOLKIT )

• Suite of libraries.• Python Support.• Few libraries which we will be using are :-• Lexical analysis.• Parts of speech tagger

Page 16: Deep Learning Automated Helpdesk

TOKENIZATION:-

Tokenization( Word Tokenize)• Breaking stream into meaningful elements.• Stream may or may not be a meaningful sentence.

Page 17: Deep Learning Automated Helpdesk

EXAMPLE:-

"Recharge your mobile by visiting this link"

After tokenization:-['Recharge', 'your', 'mobile', 'by', 'visiting', 'this', 'link']

Page 18: Deep Learning Automated Helpdesk

STOP WORDS :-

E.g. “is, for, the, in, etc” 

Target :- REMOVE THE STOP WORDS

Page 19: Deep Learning Automated Helpdesk

STOP WORDS REMOVED BY NLTK:-

Page 20: Deep Learning Automated Helpdesk

EXAMPLE :-

FromTokenization['Recharge', 'your', 'mobile', 'by', 'visiting', 'this', 'link']

After Stop Words removal['Recharge', 'mobile', 'visiting', 'link']

Page 21: Deep Learning Automated Helpdesk

STEMMING:-

Word = Stem + AffixesExample:- playing = play(stem) + ing(affixes)

TARGET:- Removing affixes from word (called stemming)E.g. plays, playing, playful all reduced to 'play' 

Library in NLTK  :- PorterStemmer

Page 22: Deep Learning Automated Helpdesk

EXAMPLE :-

From Stop words removal :-['Recharge', 'mobile', 'visiting', 'link']

After Stemming :-['Recharge', 'mobile', 'visit', 'link']  // input for clustering is generated

Page 23: Deep Learning Automated Helpdesk

POS TAGGING:-

POS (part of speech) = Category of Tokens in linguistics, such as verb noun etc.

Target :- Tag the tokens with the POS with a universal format.

Page 24: Deep Learning Automated Helpdesk

EXAMPLE :-From Stemming:-['Recharge', 'mobile', 'visit', 'link']

After POS Tagging:-[('Recharge', 'NN')][('mobile', 'NN')][('visit', 'VBG')][('link', 'NN')]

Page 25: Deep Learning Automated Helpdesk

Training Model

1.1Raw Data

1.2NLP

1.3Preprocessing

1.4Knowledge Base

O/P

1.5Clustering

Page 26: Deep Learning Automated Helpdesk

Training Model

1.1Raw Data

1.2NLP

1.3Preprocessing

1.4Knowledge Base

O/P

1.5Clustering

Page 27: Deep Learning Automated Helpdesk

DOCUMENT CLUSTERING – WHAT AND WHY?• Unsupervised document organization• Automatic topic organization• Topic extraction• Fast Information retrieval and filtering

Page 28: Deep Learning Automated Helpdesk

EXAMPLES

• Web document clustering for search users.

• QA document clustering to solve common problems and questions.

Page 29: Deep Learning Automated Helpdesk

WHY K-MEANS? WHY NOT ANY HIERARCHICAL ALGO?

• Time Complexity 

Page 30: Deep Learning Automated Helpdesk

CLUSTERING

• Algorithm•  Find k (most dissimilar) documents • Assign them as k centroid

• Until no change• For each document

• Find the most similar cluster • Use cosine similarity fn

• Recalculate the centroid of each cluster• Stop If no document was reassigned

Page 31: Deep Learning Automated Helpdesk

K-MEANS USING JACCARD DISTANCE MEASURE• Problems in Simple K-Means Procedure.• Greedy Algorithm• Doesn't guarantee the best solution.

• JACCARD Distance Measure• Find k most dissimilar document.

Page 32: Deep Learning Automated Helpdesk

OUTPUT OF PREPROCESSING

• Possible text documents are :• Recharge mobile visit link• Recharge landline visit link• Cancel ticket process• Add money wallet

Page 33: Deep Learning Automated Helpdesk

CALCULATING TF-IDF VECTORS

• Term Frequency – Inverse Document Frequency• (Weight) Ranks the importance

•  Terms frequent in Document and rare in Set• Ex: College name NITS. - name is frequent but not rare.

Page 34: Deep Learning Automated Helpdesk

TF-IDF VECTOR SPACE

Add Cancel

Recharge

landline

link mobile

money

process

ticket visit wallet

0.00 0.00 0.17 0.00 0.17 0.35 0.00 0.00 0.00 0.17 0.000.00 0.00 0.17 0.35 0.17 0.00 0.00 0.00 0.00 0.17 0.000.00 0.46 0.00 0.00 0.00 0.00 0.00 0.46 0.46 0.00 0.000.46 0.00 0.00 0.00 0.00 0.00 0.46 0.00 0.00 0.00 0.46

Page 35: Deep Learning Automated Helpdesk

SELECT K-CLUSTER ( K =3)

• Use Jaccard Distance Measure - {{0},{2},{3}}Document No (I) Document No (J) Similarity0 1 0.60 2 0.000 3 0.001 2 0.001 3 0.002 3 0.00

Page 36: Deep Learning Automated Helpdesk

AFTER FIRST ITERATION

• Assigning of documents to its most similar cluster. - {{0,1},{2},{3}}• Clusters After 1st iteration: (vecspace – centroid centers)

Add Cancel

Recharge

landline

link mobile

money

process

ticket visit wallet

0.00 0.00 0.17 0.17 0.17 0.17 0.00 0.00 0.00 0.17 0.000.00 0.46 0.00 0.00 0.00 0.00 0.00 0.46 0.46 0.00 0.000.46 0.0 0.0 0.0 0.0 0.0 0.46 0.0 0.0 0.0 0.46

Page 37: Deep Learning Automated Helpdesk

CLUSTERING OUTPUT

• {    { Recharge mobile visit link, Recharge landline visit link },

   { Cancel ticket process },   { Add money wallet }}

Page 38: Deep Learning Automated Helpdesk

Training Model

1.1Raw Data

1.2NLP

1.3Preprocessing

1.4Knowledge Base

O/P

1.5Clustering

Page 39: Deep Learning Automated Helpdesk

TENSOR FLOW

• What• Why• Where

Page 40: Deep Learning Automated Helpdesk

PROGRAMMING MODEL AND BASIC CONCEPTS• Computation Graph• Nodes• Tensors• Session• Extend• Run

Page 41: Deep Learning Automated Helpdesk

COMPUTATION GRAPH

Page 42: Deep Learning Automated Helpdesk

IMPLEMENTATION

• Single Device Execution• Multi Device Execution• Cross Device Communication

Page 43: Deep Learning Automated Helpdesk

SINGLE DEVICE EXECUTION

Page 44: Deep Learning Automated Helpdesk

CROSS DEVICE COMMUNICATION

Page 45: Deep Learning Automated Helpdesk

PERFORMANCE

• Data Parallel Training• Model Parallel Training• Concurrent Step for Model Computation Pipelining

Page 46: Deep Learning Automated Helpdesk

DATA PARALLEL TRAINING

Page 47: Deep Learning Automated Helpdesk

MODEL PARALLEL AND CONCURRENT STEPS

Page 48: Deep Learning Automated Helpdesk

CLUSTERING USING TENSOR FLOW

• Training Sets• Nodes• Data flow• Feed as Input• Output

Page 49: Deep Learning Automated Helpdesk

Query Model

2.1Query

2.2NLP

2.3Preprocessing

2.4Recommendation Engine

O/P

Page 50: Deep Learning Automated Helpdesk

Query Model

2.1Query

2.2NLP

2.3Preprocessing

2.4Recommendation Engine

O/P

Page 51: Deep Learning Automated Helpdesk

Query Model

2.1Query

2.2NLP

2.3Preprocessing

2.4Recommendation Engine

O/P

Page 52: Deep Learning Automated Helpdesk

Query Model

2.1Query

2.2NLP

2.3Preprocessing

2.4Recommendation Engine

O/P

Page 53: Deep Learning Automated Helpdesk

RECOMMENDATION ENGINE

• Recommendation Engine analyzes available data to answer the questions• The various steps are:1. Data collection2. Preprocessing and Transformations3. Classifier Ensemble

Page 54: Deep Learning Automated Helpdesk

PREPROCESSING AND TRANSFORMATIONS

• The training set is taken consisting of FAQs, past forums etc.• Given a question, we want to deduce its genre from the texts • Only the text of the question is extracted.  • Feature selection to evaluate the importance of a word using

TF-IDF

Page 55: Deep Learning Automated Helpdesk

PREPROCESSING AND TRANSFORMATIONS

• Training set derived from the key parts of speech in each sentenceExample How to recharge my mobile

Part of Speech

Verb Noun Object

Decision label Task Electronics

Page 56: Deep Learning Automated Helpdesk

PREPROCESSING AND TRANSFORMATIONS

• recharge mobile

• Find TF-IDF vector• Compare it with distinct clusters using cosine similarity

Page 57: Deep Learning Automated Helpdesk

CLASSIFIER ENSEMBLE

• Ensemble modelling is used for classification using three classifiers• Naïve Bayesian using FAQ training set• POS Naïve Bayesian• Threshold Biasing classifier

    

Page 58: Deep Learning Automated Helpdesk

ENSEMBLE STRUCTURE

• Learning algorithm that uses multiple classifiers• Classify using a weighted vote for their decisions• The classifier having better precision is considered

Page 59: Deep Learning Automated Helpdesk

RESULTS

• Documents are hand-tagged with the genres• In the Ensemble approach, we use a bag approach• The count of genres is taken into account• The top tallied genre is used to generate result• Answer is "recharge mobile visit link"

Page 60: Deep Learning Automated Helpdesk

Query Model

2.1Query

2.2NLP

2.3Preprocessing

2.4Recommendation Engine

O/P

Page 61: Deep Learning Automated Helpdesk

INNOVATION• Sections Removed• User friendly• Reduced Man-power• Future plans to collaborate with college website.

Page 62: Deep Learning Automated Helpdesk

CONCLUSION AND OUTCOMESThe outcomes of this project can be formulated (but not limited to) in the following points :-1. Complete Designed Architecture.2. Proper modules and uses defined.3. Model solution to the problem.

Hence we would like to conclude that the theoretical and survey aspect of the problem is complete. We have selected the best tech solutions after surveying for all existing alternatives. Thus, a working model is soon to be expected from the team.

Page 63: Deep Learning Automated Helpdesk

LITERATURE SURVEYSerial No

Paper Title Authors

1 Natural Language Annotations for Question Answering

Boris Katz, Gary Borchardt and Sue Felshin

2 Using English for Indexing and Retrieving Katz, Boris3 Recommendation engine: Matching

individual/group profiles for better shopping experience

Sanjeev Kulkarni, Ashok M. Sanpal, Ravindra R. Mudholkar, kiran Kumari

4 Recommendation engine for Reddit Hoang Nguyen, Rachel Richards, C.C. Chan, Kathy J. Liszka

5 TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo

6 Executing a programon the MIT tagged-token dataflow architecture.IEEE Trans. Comput.,  1990.

Arvind and Rishiyur S. Nikhil

Page 64: Deep Learning Automated Helpdesk

LITERATURE SURVEYSerial No

Paper Title Author

7 An efficient K-Means Algorithm integrated with Jaccard Distance Measure for Document Clustering

Mushfeq-Us-Saleheen Shameem, Raihana Ferdous

8 An Intelligent Similarity Measure for Effective TextDocument Clustering 

M.L.AISHWARYA1Department of Computer Science , K.SELVI2

9 K Means Clustering with Tf-idf Weights Jonathan Zong10 Comparison Between K-Mean and Hierarchical

AlgorithmUsing Query Redirection

Manpreet kaur , Usvir Kaur

11 Question Answering System on Education Acts Using NLP Techniques

Dr.M.M. Raghuwanshi Professor , Department Of Computer Science and Technology

Page 65: Deep Learning Automated Helpdesk

LITERATURE SURVEYSerial No

Paper Title Author

12 Affective – Hierarchical Classification of Text – An Approach Using NLP Toolkit

Dr.R.Venkatesan Asst.Prof-III/CSE

13 Building high-level features using large scale unsupervisedlearning. In ICML’2012, 2012.

Quoc Le, Marc’Aurelio Ranzato, Rajat Monga,  and AndrewNg.

14 Preprocessing Techniques for Text Mining - An Overview

Dr. S. Vijayarani1, Ms. J. Ilamathi, Ms. Nithya, Assistant Professor, M. Phil Research Scholar,Department of Computer Science

15 Annotating the World Wide Web using Natural Language

Boris Katz

Page 66: Deep Learning Automated Helpdesk

            THANK YOU !!