machine learning, predictive text, and topic modelswallach/talks/ml_intro.pdfhanna wallach machine...

Post on 01-Aug-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Machine Learning, Predictive Text, and Topic Models

Hanna WallachUniversity of Massachusetts Amherst

wallach@cs.umass.edu

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 2

Outline

● What is machine learning?

● Examples of machine learning in practice:

$30: Dinner, Cambridge MA $50: Bus ticket, Cambridge MA$5000: Hotel suite, Hong Kong $20: Beer, Amherst MA $10: Lunch, Amherst MA

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELDPLAYER

BASKETBALLCOACHPLAYEDPLAYING

HITTENNIS

JOBWORKJOBS

CAREEREXPERIENCE

EMPLOYMENTOPPORTUNITIES

WORKINGTRAINING

SKILLSCAREERS

POSITIONSFIND

POSITIONFIELD

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 3

Machine Learning (ML)

● There are increasingly large amounts of digital data available:

● Machine learning uses computers to find the most salient features in data to further knowledge and make life easier

... with as little human input as possible

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 4

Uncertainty

● There is uncertainty in almost all real world situations:

● ML explicitly represents uncertainty using probability:

– Pr (lemon) = how certain I am that this is a lemon

● Probability provides a framework for reasoning under uncertainty

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 5

USPS Digit Recognition

● Problem:

– USPS needs to sort letters by zip code

● Solution:

– Teach a computer to recognize hand-written digits

– Only ask human when computer is uncertain:

1 or 7?

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 6

Credit Card Fraud

● Problem:

– Want to detect credit card fraud

● Solution:

– Train a computer to recognise normal and abnormal usages

– Alert card-holder if abnormal pattern is detected

$30: Dinner, Cambridge MA$50: Bus ticket, Cambridge MA$10: Lunch, Amherst MA$20: Beer, Amherst MA$10: Lunch, Amherst MA

$30: Dinner, Cambridge MA $50: Bus ticket, Cambridge MA$5000: Hotel suite, Hong Kong $20: Beer, Amherst MA $10: Lunch, Amherst MA

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 7

Sorting News Stories by Genre

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 8

Predictive Text Entry

● e.g., T9 or iTAP

● Used on cell phones

● Enables use of reduced keyboard

● Enter as much text as possible with as few gestures as possible

TextGestures

(as few as possible)

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 9

Predictive Text Entry

● This is like the reverse of text compression

● Text compression: want to go from as much text as possible to as small a representation as possible

TextBit string

(preferably short)

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 10

Writing and Text Compression

● Optimal text compression

TextBit string

(preferably short)

probabilisticmodel

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 11

Writing and Text Compression

● Optimal text compression and writing with predictive text entry

TextBit string

(preferably short)

probabilisticmodel

TextGestures

(as few as possible)

probabilisticmodel

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 12

Dasher [http://www.dasher.org.uk]

● Driven by 2D continuous gestures

● Uses a model of language

● Available for

– Windows

– Linux

– Mac OS X

– Pocket PC

– etc.

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 13

Dasher: Screen Layout

● Box sizes are proportional to probabilities

● Probabilities come from a letter-based language model

● P(X) = b P(X, Y) = a

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 14

Dasher: Dynamics

Point where you want to go

● Like driving a car

● Motion sickness?

● Not if you're driving!

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 15

Dasher: Benefits

● Keyboards: one gesture per character

● Dasher: some gestures select many characters

● Works with any language

● Inaccurate gestures can be compensated for by later gestures

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 16

Topic Models

● Humans can read a document and identify the small number of topics that best characterize that document

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 17

Topic Models

● Topics are mixtures of words and documents are mixtures of topics

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 18

Topic Models

● Infer topic information from word-document co-occurrences

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 19

Example Topics [Tenenbaum et al.]

STORYSTORIES

TELLCHARACTER

CHARACTERSAUTHOR

READTOLD

SETTINGTALESPLOT

TELLINGSHORTFICTIONACTION

FIELDMAGNETICMAGNET

WIRENEEDLE

CURRENTCOIL

POLESIRON

COMPASSLINESCORE

ELECTRICDIRECTION

FORCE

SCIENCESTUDY

SCIENTISTSSCIENTIFIC

KNOWLEDGEWORK

RESEARCHCHEMISTRY

TECHNOLOGYMANY

MATHEMATICSBIOLOGYFIELD

PHYSICSLABORATORY

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELDPLAYER

BASKETBALLCOACHPLAYEDPLAYING

HITTENNIS

JOBWORKJOBS

CAREEREXPERIENCE

EMPLOYMENTOPPORTUNITIES

WORKINGTRAINING

SKILLSCAREERS

POSITIONSFIND

POSITIONFIELD

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 20

Transfer between Topics [Mimno]

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 21

Entities and Topics [Newman et al.]

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 22

Topics and Email

● Enron email corpus:

– 250k email messages, 23k people

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 23

Selecting Email Keywords [Dredze et al.]

● Without topics: producst pliers stmt hammer wrench

● With topics: team meeting services lisa ase

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 24

Senders, Recipients, Topics [McCallum et al.]

“Chief Operations Officer” “Government Relations Executive”

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 25

Summary

● Machines can learn a lot from unstructured digital data

● We can use machine learning to build useful applications, some of which you are already using!

Hanna Wallach Machine Learning, Predictive Text, and Topic Models 26

Questions?

top related