practical natural language processing from theory to industrial applications

110
Practical Natural Language Processing From Theory to Industrial Applications Jaganadh G http://jaganadhg.in [email protected] Karpagam University Coimbatore 19 th March 2012 Jaganadh G Practical Natural Language Processing

Upload: jaganadg-gopinadhan

Post on 26-Jan-2015

114 views

Category:

Technology


1 download

DESCRIPTION

Practical Natural Language ProcessingFrom Theory to Industrial Applications

TRANSCRIPT

Page 1: Practical Natural Language Processing From Theory to Industrial Applications

Practical Natural Language ProcessingFrom Theory to Industrial Applications

Jaganadh Ghttp://jaganadhg.in

[email protected]

Karpagam UniversityCoimbatore

19th March 2012

Jaganadh G Practical Natural Language Processing

Page 2: Practical Natural Language Processing From Theory to Industrial Applications

About me !!

Working in Natural Language Processing, MachineLearning, Data Mining etc...

Passionate about Free and Open source :-)

When gets free time teaches Python, Speaks about FOSSand blogs athttp://jaganadhg.in

I am a computational linguist / Linguist and Indologist,Book reviewer

Software Engineer by Profession

Jaganadh G Practical Natural Language Processing

Page 3: Practical Natural Language Processing From Theory to Industrial Applications

Question ??

Have you ever used any Natural Language Processing basedtools/services?

Jaganadh G Practical Natural Language Processing

Page 4: Practical Natural Language Processing From Theory to Industrial Applications

Question ??

Have you ever used any Natural Language Processing basedtools/services?

Jaganadh G Practical Natural Language Processing

Page 5: Practical Natural Language Processing From Theory to Industrial Applications

Question ??

Have you ever used any Natural Language Processing basedtools/services?

Jaganadh G Practical Natural Language Processing

Page 6: Practical Natural Language Processing From Theory to Industrial Applications

What is Natural Language Processing (NLP) ?

Aim : To build intelligent systems that can interact withhuman beings as like human beings

A sub-field of Artificial Intelligence (AI)

Inter-disciplinary subject (Language + Linguistics +Statistics + Computer Science + .. )

Natural Language

Refers to the language spoken by people, e.g.English,Japanese, Tamil, Malayalam as opposed to artificiallanguages, like C++, Java, etc.

Jaganadh G Practical Natural Language Processing

Page 7: Practical Natural Language Processing From Theory to Industrial Applications

What is Natural Language Processing (NLP) ?

Aim : To build intelligent systems that can interact withhuman beings as like human beings

A sub-field of Artificial Intelligence (AI)

Inter-disciplinary subject (Language + Linguistics +Statistics + Computer Science + .. )

Natural Language

Refers to the language spoken by people, e.g.English,Japanese, Tamil, Malayalam as opposed to artificiallanguages, like C++, Java, etc.

Jaganadh G Practical Natural Language Processing

Page 8: Practical Natural Language Processing From Theory to Industrial Applications

What is Natural Language Processing (NLP) ?

Aim : To build intelligent systems that can interact withhuman beings as like human beings

A sub-field of Artificial Intelligence (AI)

Inter-disciplinary subject (Language + Linguistics +Statistics + Computer Science + .. )

Natural Language

Refers to the language spoken by people, e.g.English,Japanese, Tamil, Malayalam as opposed to artificiallanguages, like C++, Java, etc.

Jaganadh G Practical Natural Language Processing

Page 9: Practical Natural Language Processing From Theory to Industrial Applications

What is Natural Language Processing (NLP) ?

Aim : To build intelligent systems that can interact withhuman beings as like human beings

A sub-field of Artificial Intelligence (AI)

Inter-disciplinary subject (Language + Linguistics +Statistics + Computer Science + .. )

Natural Language

Refers to the language spoken by people, e.g.English,Japanese, Tamil, Malayalam as opposed to artificiallanguages, like C++, Java, etc.

Jaganadh G Practical Natural Language Processing

Page 10: Practical Natural Language Processing From Theory to Industrial Applications

Definition

Natural Language Processing

Natural Language Processing is a theoretically motivated rangeof computational techniques for analyzing and representingnaturally occurring texts/speech at one or more levels oflinguistic analysis for the purpose of achieving human-likelanguage processing for a range of tasks or applications.

NLP was considered as an academic discipline beforesome 10 to 20 years.

Now concepts from NLP is applied in variety ofComputing Platforms and Services

Jaganadh G Practical Natural Language Processing

Page 11: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP ?

Problem

Before going to some theory can we have some funnypractical problems to solve ?

Picture Courtesy: http://twitpic.com/1y21qm/full

Jaganadh G Practical Natural Language Processing

Page 12: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP ?

Problem

Before going to some theory can we have some funnypractical problems to solve ?

Picture Courtesy: http://twitpic.com/1y21qm/full

Jaganadh G Practical Natural Language Processing

Page 13: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP ?

Problem

Before going to some theory can we have some funnypractical problems to solve ?

Picture Courtesy: http://twitpic.com/1y21qm/full

Jaganadh G Practical Natural Language Processing

Page 14: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP

Problem

Tweet-a-Toddy receives thousands of tweets per day

Tweets requesting home deliveryTweets about quality of productsTweets related to enquirers

They requires following things to be automated

Identify tweet categoryProcess home-delivery requestEvaluate quality related tweets

How?

How to find a solution for Tweet-a-Toddy

Jaganadh G Practical Natural Language Processing

Page 15: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP

Problem

Tweet-a-Toddy receives thousands of tweets per day

Tweets requesting home deliveryTweets about quality of productsTweets related to enquirers

They requires following things to be automated

Identify tweet categoryProcess home-delivery requestEvaluate quality related tweets

How?

How to find a solution for Tweet-a-Toddy

Jaganadh G Practical Natural Language Processing

Page 16: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP

Problem

Tweet-a-Toddy receives thousands of tweets per day

Tweets requesting home delivery

Tweets about quality of productsTweets related to enquirers

They requires following things to be automated

Identify tweet categoryProcess home-delivery requestEvaluate quality related tweets

How?

How to find a solution for Tweet-a-Toddy

Jaganadh G Practical Natural Language Processing

Page 17: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP

Problem

Tweet-a-Toddy receives thousands of tweets per day

Tweets requesting home deliveryTweets about quality of products

Tweets related to enquirers

They requires following things to be automated

Identify tweet categoryProcess home-delivery requestEvaluate quality related tweets

How?

How to find a solution for Tweet-a-Toddy

Jaganadh G Practical Natural Language Processing

Page 18: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP

Problem

Tweet-a-Toddy receives thousands of tweets per day

Tweets requesting home deliveryTweets about quality of productsTweets related to enquirers

They requires following things to be automated

Identify tweet categoryProcess home-delivery requestEvaluate quality related tweets

How?

How to find a solution for Tweet-a-Toddy

Jaganadh G Practical Natural Language Processing

Page 19: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP

Problem

Tweet-a-Toddy receives thousands of tweets per day

Tweets requesting home deliveryTweets about quality of productsTweets related to enquirers

They requires following things to be automated

Identify tweet categoryProcess home-delivery requestEvaluate quality related tweets

How?

How to find a solution for Tweet-a-Toddy

Jaganadh G Practical Natural Language Processing

Page 20: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP

Problem

Tweet-a-Toddy receives thousands of tweets per day

Tweets requesting home deliveryTweets about quality of productsTweets related to enquirers

They requires following things to be automated

Identify tweet category

Process home-delivery requestEvaluate quality related tweets

How?

How to find a solution for Tweet-a-Toddy

Jaganadh G Practical Natural Language Processing

Page 21: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP

Problem

Tweet-a-Toddy receives thousands of tweets per day

Tweets requesting home deliveryTweets about quality of productsTweets related to enquirers

They requires following things to be automated

Identify tweet categoryProcess home-delivery request

Evaluate quality related tweets

How?

How to find a solution for Tweet-a-Toddy

Jaganadh G Practical Natural Language Processing

Page 22: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP

Problem

Tweet-a-Toddy receives thousands of tweets per day

Tweets requesting home deliveryTweets about quality of productsTweets related to enquirers

They requires following things to be automated

Identify tweet categoryProcess home-delivery requestEvaluate quality related tweets

How?

How to find a solution for Tweet-a-Toddy

Jaganadh G Practical Natural Language Processing

Page 23: Practical Natural Language Processing From Theory to Industrial Applications

Practical NLP

Problem

Tweet-a-Toddy receives thousands of tweets per day

Tweets requesting home deliveryTweets about quality of productsTweets related to enquirers

They requires following things to be automated

Identify tweet categoryProcess home-delivery requestEvaluate quality related tweets

How?

How to find a solution for Tweet-a-Toddy

Jaganadh G Practical Natural Language Processing

Page 24: Practical Natural Language Processing From Theory to Industrial Applications

Solution

??

Any Solutions

Some thoughts

Text Classification

Entity Identification

Information Extraction

Sentiment Analysis

Parsing, gammer ...

Regex (Regular Expressions)

Jaganadh G Practical Natural Language Processing

Page 25: Practical Natural Language Processing From Theory to Industrial Applications

Solution

??

Any Solutions

Some thoughts

Text Classification

Entity Identification

Information Extraction

Sentiment Analysis

Parsing, gammer ...

Regex (Regular Expressions)

Jaganadh G Practical Natural Language Processing

Page 26: Practical Natural Language Processing From Theory to Industrial Applications

Solution

??

Any Solutions

Some thoughts

Text Classification

Entity Identification

Information Extraction

Sentiment Analysis

Parsing, gammer ...

Regex (Regular Expressions)

Jaganadh G Practical Natural Language Processing

Page 27: Practical Natural Language Processing From Theory to Industrial Applications

Solution

??

Any Solutions

Some thoughts

Text Classification

Entity Identification

Information Extraction

Sentiment Analysis

Parsing, gammer ...

Regex (Regular Expressions)

Jaganadh G Practical Natural Language Processing

Page 28: Practical Natural Language Processing From Theory to Industrial Applications

Solution

??

Any Solutions

Some thoughts

Text Classification

Entity Identification

Information Extraction

Sentiment Analysis

Parsing, gammer ...

Regex (Regular Expressions)

Jaganadh G Practical Natural Language Processing

Page 29: Practical Natural Language Processing From Theory to Industrial Applications

Solution

??

Any Solutions

Some thoughts

Text Classification

Entity Identification

Information Extraction

Sentiment Analysis

Parsing, gammer ...

Regex (Regular Expressions)

Jaganadh G Practical Natural Language Processing

Page 30: Practical Natural Language Processing From Theory to Industrial Applications

Solution

??

Any Solutions

Some thoughts

Text Classification

Entity Identification

Information Extraction

Sentiment Analysis

Parsing, gammer ...

Regex (Regular Expressions)

Jaganadh G Practical Natural Language Processing

Page 31: Practical Natural Language Processing From Theory to Industrial Applications

Solution

??

Any Solutions

Some thoughts

Text Classification

Entity Identification

Information Extraction

Sentiment Analysis

Parsing, gammer ...

Regex (Regular Expressions)

Jaganadh G Practical Natural Language Processing

Page 32: Practical Natural Language Processing From Theory to Industrial Applications

Another Practical Question

Everybody might have used spell checker available in wordprocessing systems like OpenOffice.org or Microsoft WordAny guess on how to develop a spell checker system ?

Solutions

Word List

Structure of words

Dynamic Programming (Edit Distance)

Jaganadh G Practical Natural Language Processing

Page 33: Practical Natural Language Processing From Theory to Industrial Applications

Another Practical Question

Everybody might have used spell checker available in wordprocessing systems like OpenOffice.org or Microsoft WordAny guess on how to develop a spell checker system ?

Solutions

Word List

Structure of words

Dynamic Programming (Edit Distance)

Jaganadh G Practical Natural Language Processing

Page 34: Practical Natural Language Processing From Theory to Industrial Applications

Another Practical Question

Everybody might have used spell checker available in wordprocessing systems like OpenOffice.org or Microsoft WordAny guess on how to develop a spell checker system ?

Solutions

Word List

Structure of words

Dynamic Programming (Edit Distance)

Jaganadh G Practical Natural Language Processing

Page 35: Practical Natural Language Processing From Theory to Industrial Applications

Another Practical Question

Everybody might have used spell checker available in wordprocessing systems like OpenOffice.org or Microsoft WordAny guess on how to develop a spell checker system ?

Solutions

Word List

Structure of words

Dynamic Programming (Edit Distance)

Jaganadh G Practical Natural Language Processing

Page 36: Practical Natural Language Processing From Theory to Industrial Applications

Another Practical Question ...

Context Sensitive Spell-checking

Identifying and suggesting spelling of words based on contextHow ??

Solutions

Statistical Models

Word category based suggestions

Jaganadh G Practical Natural Language Processing

Page 37: Practical Natural Language Processing From Theory to Industrial Applications

Another Practical Question ...

Context Sensitive Spell-checking

Identifying and suggesting spelling of words based on contextHow ??

Solutions

Statistical Models

Word category based suggestions

Jaganadh G Practical Natural Language Processing

Page 38: Practical Natural Language Processing From Theory to Industrial Applications

Another Practical Question ...

Context Sensitive Spell-checking

Identifying and suggesting spelling of words based on contextHow ??

Solutions

Statistical Models

Word category based suggestions

Jaganadh G Practical Natural Language Processing

Page 39: Practical Natural Language Processing From Theory to Industrial Applications

Another Practical Question ...

Context Sensitive Spell-checking

Identifying and suggesting spelling of words based on contextHow ??

Solutions

Statistical Models

Word category based suggestions

Jaganadh G Practical Natural Language Processing

Page 40: Practical Natural Language Processing From Theory to Industrial Applications

Can Machines Translate ??

Answer !!!

Jaganadh G Practical Natural Language Processing

Page 41: Practical Natural Language Processing From Theory to Industrial Applications

Why NLP ?

Because ”Information is Power !!!”

Every day wast amount of text and speech data is beingproduced

Internet == at least 40 Million pages

Picture Courtesy: http://soundsgood.in/wikipediafat print book/

Jaganadh G Practical Natural Language Processing

Page 42: Practical Natural Language Processing From Theory to Industrial Applications

Why NLP ?

Because ”Information is Power !!!”

Every day wast amount of text and speech data is beingproduced

Internet == at least 40 Million pages

Picture Courtesy: http://soundsgood.in/wikipediafat print book/

Jaganadh G Practical Natural Language Processing

Page 43: Practical Natural Language Processing From Theory to Industrial Applications

Why NLP ?

Because ”Information is Power !!!”

Every day wast amount of text and speech data is beingproduced

Internet == at least 40 Million pages

Picture Courtesy: http://soundsgood.in/wikipediafat print book/

Jaganadh G Practical Natural Language Processing

Page 44: Practical Natural Language Processing From Theory to Industrial Applications

Why NLP ?

Because ”Information is Power !!!”

Every day wast amount of text and speech data is beingproduced

Internet == at least 40 Million pages

Picture Courtesy: http://soundsgood.in/wikipediafat print book/

Jaganadh G Practical Natural Language Processing

Page 45: Practical Natural Language Processing From Theory to Industrial Applications

Why NLP ?

Because ”Information is Power !!!”

Every day wast amount of text and speech data is beingproduced

Internet == at least 40 Million pages

Picture Courtesy: http://soundsgood.in/wikipediafat print book/

Jaganadh G Practical Natural Language Processing

Page 46: Practical Natural Language Processing From Theory to Industrial Applications

History

Second World War !!!

Machine Translation

Now :

Most promising imperfect technology

Moves from Lab to Industry to Layman

Jaganadh G Practical Natural Language Processing

Page 47: Practical Natural Language Processing From Theory to Industrial Applications

History

Second World War !!!

Machine Translation

Now :

Most promising imperfect technology

Moves from Lab to Industry to Layman

Jaganadh G Practical Natural Language Processing

Page 48: Practical Natural Language Processing From Theory to Industrial Applications

History

Second World War !!!

Machine Translation

Now :

Most promising imperfect technology

Moves from Lab to Industry to Layman

Jaganadh G Practical Natural Language Processing

Page 49: Practical Natural Language Processing From Theory to Industrial Applications

History

Second World War !!!

Machine Translation

Now :

Most promising imperfect technology

Moves from Lab to Industry to Layman

Jaganadh G Practical Natural Language Processing

Page 50: Practical Natural Language Processing From Theory to Industrial Applications

History

Second World War !!!

Machine Translation

Now :

Most promising imperfect technology

Moves from Lab to Industry to Layman

Jaganadh G Practical Natural Language Processing

Page 51: Practical Natural Language Processing From Theory to Industrial Applications

History

Second World War !!!

Machine Translation

Now :

Most promising imperfect technology

Moves from Lab to Industry to Layman

Jaganadh G Practical Natural Language Processing

Page 52: Practical Natural Language Processing From Theory to Industrial Applications

NLP Really Hard to Achieve?

NLP delas with human languagesHuman Language is dynamic and mysterious !!!

Communication in Human Language

Jaganadh G Practical Natural Language Processing

Page 53: Practical Natural Language Processing From Theory to Industrial Applications

NLP Really Hard to Achieve?

NLP delas with human languagesHuman Language is dynamic and mysterious !!!

Communication in Human Language

Jaganadh G Practical Natural Language Processing

Page 54: Practical Natural Language Processing From Theory to Industrial Applications

NLP Really Hard to Achieve?

Levels of Knowledge encoding in Language Data

Jaganadh G Practical Natural Language Processing

Page 55: Practical Natural Language Processing From Theory to Industrial Applications

Tasks in NLP

Broad Areas

Text Processing

Speech Processing

Jaganadh G Practical Natural Language Processing

Page 56: Practical Natural Language Processing From Theory to Industrial Applications

Tasks in NLP

Broad Areas

Text Processing

Speech Processing

Jaganadh G Practical Natural Language Processing

Page 57: Practical Natural Language Processing From Theory to Industrial Applications

Tasks in NLP

Broad Areas

Text Processing

Speech Processing

Jaganadh G Practical Natural Language Processing

Page 58: Practical Natural Language Processing From Theory to Industrial Applications

Major tasks in Text Processing

Word Level Analysis

Morphological SynthesisPart of Speech TaggingStemmingLemmatization

Sentence Level Analysis - Syntactical Parsing

Discourse Analysis - Semantic Processing

Jaganadh G Practical Natural Language Processing

Page 59: Practical Natural Language Processing From Theory to Industrial Applications

Major tasks in Text Processing

Word Level Analysis

Morphological SynthesisPart of Speech TaggingStemmingLemmatization

Sentence Level Analysis - Syntactical Parsing

Discourse Analysis - Semantic Processing

Jaganadh G Practical Natural Language Processing

Page 60: Practical Natural Language Processing From Theory to Industrial Applications

Major tasks in Text Processing

Word Level Analysis

Morphological Synthesis

Part of Speech TaggingStemmingLemmatization

Sentence Level Analysis - Syntactical Parsing

Discourse Analysis - Semantic Processing

Jaganadh G Practical Natural Language Processing

Page 61: Practical Natural Language Processing From Theory to Industrial Applications

Major tasks in Text Processing

Word Level Analysis

Morphological SynthesisPart of Speech Tagging

StemmingLemmatization

Sentence Level Analysis - Syntactical Parsing

Discourse Analysis - Semantic Processing

Jaganadh G Practical Natural Language Processing

Page 62: Practical Natural Language Processing From Theory to Industrial Applications

Major tasks in Text Processing

Word Level Analysis

Morphological SynthesisPart of Speech TaggingStemming

Lemmatization

Sentence Level Analysis - Syntactical Parsing

Discourse Analysis - Semantic Processing

Jaganadh G Practical Natural Language Processing

Page 63: Practical Natural Language Processing From Theory to Industrial Applications

Major tasks in Text Processing

Word Level Analysis

Morphological SynthesisPart of Speech TaggingStemmingLemmatization

Sentence Level Analysis - Syntactical Parsing

Discourse Analysis - Semantic Processing

Jaganadh G Practical Natural Language Processing

Page 64: Practical Natural Language Processing From Theory to Industrial Applications

Major tasks in Text Processing

Word Level Analysis

Morphological SynthesisPart of Speech TaggingStemmingLemmatization

Sentence Level Analysis - Syntactical Parsing

Discourse Analysis - Semantic Processing

Jaganadh G Practical Natural Language Processing

Page 65: Practical Natural Language Processing From Theory to Industrial Applications

Major tasks in Text Processing

Word Level Analysis

Morphological SynthesisPart of Speech TaggingStemmingLemmatization

Sentence Level Analysis - Syntactical Parsing

Discourse Analysis - Semantic Processing

Jaganadh G Practical Natural Language Processing

Page 66: Practical Natural Language Processing From Theory to Industrial Applications

Morphology

The branch of linguistics that studies word structures.

To a computer program a word is : ???

Morphological analysis can be explained as: the process ofanalyzing words to identify its constituents

Computational Analysis of Morphology

Morphological Analysis

Morphological Generation

Stemming

Lemmatization

Jaganadh G Practical Natural Language Processing

Page 67: Practical Natural Language Processing From Theory to Industrial Applications

Morphology

The branch of linguistics that studies word structures.

To a computer program a word is : ???

Morphological analysis can be explained as: the process ofanalyzing words to identify its constituents

Computational Analysis of Morphology

Morphological Analysis

Morphological Generation

Stemming

Lemmatization

Jaganadh G Practical Natural Language Processing

Page 68: Practical Natural Language Processing From Theory to Industrial Applications

Morphology

The branch of linguistics that studies word structures.

To a computer program a word is : ???

Morphological analysis can be explained as: the process ofanalyzing words to identify its constituents

Computational Analysis of Morphology

Morphological Analysis

Morphological Generation

Stemming

Lemmatization

Jaganadh G Practical Natural Language Processing

Page 69: Practical Natural Language Processing From Theory to Industrial Applications

Morphology

The branch of linguistics that studies word structures.

To a computer program a word is : ???

Morphological analysis can be explained as: the process ofanalyzing words to identify its constituents

Computational Analysis of Morphology

Morphological Analysis

Morphological Generation

Stemming

Lemmatization

Jaganadh G Practical Natural Language Processing

Page 70: Practical Natural Language Processing From Theory to Industrial Applications

Morphology

The branch of linguistics that studies word structures.

To a computer program a word is : ???

Morphological analysis can be explained as: the process ofanalyzing words to identify its constituents

Computational Analysis of Morphology

Morphological Analysis

Morphological Generation

Stemming

Lemmatization

Jaganadh G Practical Natural Language Processing

Page 71: Practical Natural Language Processing From Theory to Industrial Applications

Morphology

The branch of linguistics that studies word structures.

To a computer program a word is : ???

Morphological analysis can be explained as: the process ofanalyzing words to identify its constituents

Computational Analysis of Morphology

Morphological Analysis

Morphological Generation

Stemming

Lemmatization

Jaganadh G Practical Natural Language Processing

Page 72: Practical Natural Language Processing From Theory to Industrial Applications

Morphology

The branch of linguistics that studies word structures.

To a computer program a word is : ???

Morphological analysis can be explained as: the process ofanalyzing words to identify its constituents

Computational Analysis of Morphology

Morphological Analysis

Morphological Generation

Stemming

Lemmatization

Jaganadh G Practical Natural Language Processing

Page 73: Practical Natural Language Processing From Theory to Industrial Applications

Morphology

The branch of linguistics that studies word structures.

To a computer program a word is : ???

Morphological analysis can be explained as: the process ofanalyzing words to identify its constituents

Computational Analysis of Morphology

Morphological Analysis

Morphological Generation

Stemming

Lemmatization

Jaganadh G Practical Natural Language Processing

Page 74: Practical Natural Language Processing From Theory to Industrial Applications

Practical Question from Morphology

Approximate number of word forms that can be derived from

the word”maram”

Jaganadh G Practical Natural Language Processing

Page 75: Practical Natural Language Processing From Theory to Industrial Applications

Parts of Speech Tagging

POS tagging is the process of marking up the words in a text(corpus) as corresponding to a particular part of speech, basedon both its definition, as well as its context.Ram goes to school.Ram/NNP goes/VBZ to/TO school/NN ./.

Words are ambiguous !!!!e.g. book, cricket, bank

Jaganadh G Practical Natural Language Processing

Page 76: Practical Natural Language Processing From Theory to Industrial Applications

Parts of Speech Tagging

POS tagging is the process of marking up the words in a text(corpus) as corresponding to a particular part of speech, basedon both its definition, as well as its context.Ram goes to school.Ram/NNP goes/VBZ to/TO school/NN ./.

Words are ambiguous !!!!e.g. book, cricket, bank

Jaganadh G Practical Natural Language Processing

Page 77: Practical Natural Language Processing From Theory to Industrial Applications

Syntactical Parsing

Parsing

In computer science and linguistics, parsing, or, more formally,syntactic analysis, is the process of analyzing a text, made of asequence of tokens (for example, words), to determine itsgrammatical structure with respect to a given (more or less)formal grammar.

Sentences are ambiguous !!!!

Jaganadh G Practical Natural Language Processing

Page 78: Practical Natural Language Processing From Theory to Industrial Applications

Syntactical Parsing

Parsing

In computer science and linguistics, parsing, or, more formally,syntactic analysis, is the process of analyzing a text, made of asequence of tokens (for example, words), to determine itsgrammatical structure with respect to a given (more or less)formal grammar.

Sentences are ambiguous !!!!

Jaganadh G Practical Natural Language Processing

Page 79: Practical Natural Language Processing From Theory to Industrial Applications

Semantics

Study of meaning ans its structure

Word meaning is ambiguous !!!!E.g. marriage

Jaganadh G Practical Natural Language Processing

Page 80: Practical Natural Language Processing From Theory to Industrial Applications

Semantics

Study of meaning ans its structure

Word meaning is ambiguous !!!!E.g. marriage

Jaganadh G Practical Natural Language Processing

Page 81: Practical Natural Language Processing From Theory to Industrial Applications

Where can I apply this techniques?

Machine Translation Systems

Search Engine

Spell-checker

Grammar Checker

..........

Jaganadh G Practical Natural Language Processing

Page 82: Practical Natural Language Processing From Theory to Industrial Applications

Where can I apply this techniques?

Machine Translation Systems

Search Engine

Spell-checker

Grammar Checker

..........

Jaganadh G Practical Natural Language Processing

Page 83: Practical Natural Language Processing From Theory to Industrial Applications

Where can I apply this techniques?

Machine Translation Systems

Search Engine

Spell-checker

Grammar Checker

..........

Jaganadh G Practical Natural Language Processing

Page 84: Practical Natural Language Processing From Theory to Industrial Applications

Where can I apply this techniques?

Machine Translation Systems

Search Engine

Spell-checker

Grammar Checker

..........

Jaganadh G Practical Natural Language Processing

Page 85: Practical Natural Language Processing From Theory to Industrial Applications

Where can I apply this techniques?

Machine Translation Systems

Search Engine

Spell-checker

Grammar Checker

..........

Jaganadh G Practical Natural Language Processing

Page 86: Practical Natural Language Processing From Theory to Industrial Applications

Other Interesting Tasks

Named Entity Identification

Information Extraction

Information Retrieval

Text Classification and Clustering

Jaganadh G Practical Natural Language Processing

Page 87: Practical Natural Language Processing From Theory to Industrial Applications

Other Interesting Tasks

Named Entity Identification

Information Extraction

Information Retrieval

Text Classification and Clustering

Jaganadh G Practical Natural Language Processing

Page 88: Practical Natural Language Processing From Theory to Industrial Applications

Other Interesting Tasks

Named Entity Identification

Information Extraction

Information Retrieval

Text Classification and Clustering

Jaganadh G Practical Natural Language Processing

Page 89: Practical Natural Language Processing From Theory to Industrial Applications

Other Interesting Tasks

Named Entity Identification

Information Extraction

Information Retrieval

Text Classification and Clustering

Jaganadh G Practical Natural Language Processing

Page 90: Practical Natural Language Processing From Theory to Industrial Applications

Speech Processing

Two Major Areas

Text to Speech

Speech Recognition

Practical Applications

IVR

Technology for Visually Challenged People

Mobile Phones

Speech Enabled Web

Vehicle Mounted GPS Navigator

Jaganadh G Practical Natural Language Processing

Page 91: Practical Natural Language Processing From Theory to Industrial Applications

Speech Processing

Two Major Areas

Text to Speech

Speech Recognition

Practical Applications

IVR

Technology for Visually Challenged People

Mobile Phones

Speech Enabled Web

Vehicle Mounted GPS Navigator

Jaganadh G Practical Natural Language Processing

Page 92: Practical Natural Language Processing From Theory to Industrial Applications

Commerical NLP Applications

What Industry Looks

Components of Word Processors

Machine Translation Systems

Custom Search Systems

Information Extraction

Entity Identification

Text Summarization

Speech Systems

Question Answering Systems

Jaganadh G Practical Natural Language Processing

Page 93: Practical Natural Language Processing From Theory to Industrial Applications

Commerical NLP Applications

What Industry Looks

Components of Word Processors

Machine Translation Systems

Custom Search Systems

Information Extraction

Entity Identification

Text Summarization

Speech Systems

Question Answering Systems

Jaganadh G Practical Natural Language Processing

Page 94: Practical Natural Language Processing From Theory to Industrial Applications

Commerical NLP Applications

What Industry Looks

Components of Word Processors

Machine Translation Systems

Custom Search Systems

Information Extraction

Entity Identification

Text Summarization

Speech Systems

Question Answering Systems

Jaganadh G Practical Natural Language Processing

Page 95: Practical Natural Language Processing From Theory to Industrial Applications

Commerical NLP Applications

What Industry Looks

Components of Word Processors

Machine Translation Systems

Custom Search Systems

Information Extraction

Entity Identification

Text Summarization

Speech Systems

Question Answering Systems

Jaganadh G Practical Natural Language Processing

Page 96: Practical Natural Language Processing From Theory to Industrial Applications

Commerical NLP Applications

What Industry Looks

Components of Word Processors

Machine Translation Systems

Custom Search Systems

Information Extraction

Entity Identification

Text Summarization

Speech Systems

Question Answering Systems

Jaganadh G Practical Natural Language Processing

Page 97: Practical Natural Language Processing From Theory to Industrial Applications

Commerical NLP Applications

What Industry Looks

Components of Word Processors

Machine Translation Systems

Custom Search Systems

Information Extraction

Entity Identification

Text Summarization

Speech Systems

Question Answering Systems

Jaganadh G Practical Natural Language Processing

Page 98: Practical Natural Language Processing From Theory to Industrial Applications

Commerical NLP Applications

What Industry Looks

Components of Word Processors

Machine Translation Systems

Custom Search Systems

Information Extraction

Entity Identification

Text Summarization

Speech Systems

Question Answering Systems

Jaganadh G Practical Natural Language Processing

Page 99: Practical Natural Language Processing From Theory to Industrial Applications

Commerical NLP Applications

What Industry Looks

Components of Word Processors

Machine Translation Systems

Custom Search Systems

Information Extraction

Entity Identification

Text Summarization

Speech Systems

Question Answering Systems

Jaganadh G Practical Natural Language Processing

Page 100: Practical Natural Language Processing From Theory to Industrial Applications

Commerical NLP Applications

What Industry Looks

Components of Word Processors

Machine Translation Systems

Custom Search Systems

Information Extraction

Entity Identification

Text Summarization

Speech Systems

Question Answering Systems

Jaganadh G Practical Natural Language Processing

Page 101: Practical Natural Language Processing From Theory to Industrial Applications

Future of NLP

Future!!!

Semantics oriented technologies

Jaganadh G Practical Natural Language Processing

Page 102: Practical Natural Language Processing From Theory to Industrial Applications

NLP in other domains

Bio-Medical

Legal

Forensic Science

Advertisement

Education

Politics

E-governance

Business Development

Marketing

and where ever we use language !!!

Jaganadh G Practical Natural Language Processing

Page 103: Practical Natural Language Processing From Theory to Industrial Applications

Natural Language Processing in India

Academic Institutions

IIT Kanpur, Kharagpur, Bombay

IIIT hydrabad

IISc Bangalore

AU-KBC Chennai

Amritha University Ettimadai, Coimbatore

IIITMK, Trivandrum

Central University, Hydrabad

JNU, Delhi

Tamil University, Thanjore

Jaganadh G Practical Natural Language Processing

Page 104: Practical Natural Language Processing From Theory to Industrial Applications

Natural Language Processing in India

Industry

Microsoft

Yahoo!

AOL

365Media Pvt. Ltd.

Inside View

Thaazza

AIAIO Labs

Jaganadh G Practical Natural Language Processing

Page 105: Practical Natural Language Processing From Theory to Industrial Applications

Questions ??

Jaganadh G Practical Natural Language Processing

Page 106: Practical Natural Language Processing From Theory to Industrial Applications

References

Daniel Jurafsky,James H. Martin, SPEECH andLANGUAGE PROCESSING, 2nd Edition.

U.S. Tiwary, Tanveer Siddiqui , Natural LanguageProcessing and Information Retrieval

Jaganadh G Practical Natural Language Processing

Page 107: Practical Natural Language Processing From Theory to Industrial Applications

Finally

Jaganadh G Practical Natural Language Processing

Page 108: Practical Natural Language Processing From Theory to Industrial Applications

Questions ??

Jaganadh G Practical Natural Language Processing

Page 109: Practical Natural Language Processing From Theory to Industrial Applications

References

Daniel Jurafsky,James H. Martin, SPEECH andLANGUAGE PROCESSING, 2nd Edition.

U.S. Tiwary, Tanveer Siddiqui , Natural LanguageProcessing and Information Retrieval

Jaganadh G Practical Natural Language Processing

Page 110: Practical Natural Language Processing From Theory to Industrial Applications

Finally

Jaganadh G Practical Natural Language Processing