an intro to text analytics on big data with a use case

42
#TOSMAC Toronto SMAC Meetup – Welcome! An Intro to Text Analytics on Big Data with a use case

Upload: raul-chong

Post on 15-Jan-2015

120 views

Category:

Technology


1 download

DESCRIPTION

Introduction on how to perform text analytics using input from twitter and the "Emmys" as use case example.

TRANSCRIPT

Page 1: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Toronto SMAC Meetup – Welcome!An Intro to Text Analytics on Big Data with a use case

Page 2: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Toronto SMAC Team

| © 2014 IBM Corporation2

Lucas Silva Felipe MosquettaMarcos de Mello

Page 3: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Twitters numbersAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 3

As you know:

-500 million Tweets are sent per day.

-Twitter supports 35+ languages.

-255 million monthly active users.

Huge amount of data!

Page 4: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 4

Overview

Section1 Section2 Section3 Section4 Section5

Page 5: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 5

Overview

Page 6: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 6

Overview

Page 7: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Let’s get started!

| © 2014 IBM Corporation 7

Page 8: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Input dataAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 8

Page 9: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 9

Section2

Page 10: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Demo

| © 2014 IBM Corporation 10

Page 11: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 11

Next section

Page 12: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 12

Next sectionExtractor: used to extract

structured information from unstructured and

semi-structured data.

AQL: Annotation Query Language. Rule language

with familiar SQL-like syntax.

Page 13: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Section1 Section2 Section3 Section4 Section5

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 13

Next section

Profiler:troubleshooting performance

problems.

Page 14: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 14

Types of extraction specifications:

- Dictionaries

- Regular expressions

- Part of speech

Page 15: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 15

Page 16: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 16

Page 17: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 17

Page 18: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 18

Page 19: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 19

Page 20: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 20

Types of extraction specifications:

- Dictionaries

-Regular expressions

- Part of speech numbers:7.54

13

Page 21: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Demo

| © 2014 IBM Corporation 21

Page 22: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 22

Types of extraction specifications:

- Dictionaries

- Regular expressions

- Part of speech

Page 23: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Main conceptsAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 23

Page 24: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 24

Page 25: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

| © 2014 IBM Corporation 25

An Intro to Text Analytics on Big Data with a use case

AQL Guidelines

Basic feature AQL statements- Develop the core building blocks of the extractor.

Page 26: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

| © 2014 IBM Corporation 26

An Intro to Text Analytics on Big Data with a use case

AQL Guidelines

Candidate generation AQL statements- Combine basic features AQL statements.

Page 27: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

| © 2014 IBM Corporation 27

An Intro to Text Analytics on Big Data with a use case

Candidate generation AQL statements

$7.5 million$4 thousand

$ 7.5 million

Page 28: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

| © 2014 IBM Corporation 28

An Intro to Text Analytics on Big Data with a use case

Candidate generation AQL statements

$7.5 million$4 thousand

$ 7.5 million

$7.5 million

Page 29: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

| © 2014 IBM Corporation 29

An Intro to Text Analytics on Big Data with a use case

AQL Guidelines

Filter and consolidate AQL statements- Refine results- Remove invalid annotations- Resolve overlap between annotations.

Page 30: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Demo

| © 2014 IBM Corporation 30

Page 31: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

| © 2014 IBM Corporation 31

An Intro to Text Analytics on Big Data with a use case

Conclusion

Page 32: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Check pointAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 32

Page 33: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

What we have doneAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 33

Section1 Section2 Section3

Page 34: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

What are we going to do?An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 34

Section4 Section5

Page 35: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Demo

| © 2014 IBM Corporation 35

Page 36: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Also using RAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 36

1.75 0.32

Page 37: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

What are we going to do?An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 37

Page 38: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Demo

| © 2014 IBM Corporation 38

Page 39: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

So what?An Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 39

Page 40: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

CompaniesAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 40

Page 41: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Exporting to youAn Intro to Text Analytics on Big Data with a use case

| © 2014 IBM Corporation 41

Page 42: An Intro to Text Analytics on Big Data with a use case

#TOSMAC

Thank you!Let's network!

| © 2014 IBM Corporation 42