kathy mccoy

52
Kathy McCoy Kathy McCoy Artificial Intelligence Artificial Intelligence Natural Language Processing Natural Language Processing Applications for People with Applications for People with Disabilities Disabilities

Upload: nirav

Post on 07-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Kathy McCoy. Artificial Intelligence Natural Language Processing Applications for People with Disabilities. Primary Research Areas. Natural Language Generation – problem of choice. Deep Generation --- structure and content of coherent text - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Kathy McCoy

Kathy McCoyKathy McCoy

Artificial IntelligenceArtificial Intelligence

Natural Language ProcessingNatural Language Processing

Applications for People with Applications for People with DisabilitiesDisabilities

Page 2: Kathy McCoy

Primary Research AreasPrimary Research Areas

Natural Language GenerationNatural Language Generation – problem – problem of choice. of choice. Deep Generation --- structure and content of Deep Generation --- structure and content of

coherent textcoherent text Surface Generation – particularly using TAG Surface Generation – particularly using TAG

(multi-lingual generation and machine translation)(multi-lingual generation and machine translation)

Discourse ProcessingDiscourse Processing Second Language AcquisitionSecond Language Acquisition Applications for people with disabilitiesApplications for people with disabilities

affecting their ability to communicateaffecting their ability to communicate

Page 3: Kathy McCoy

ProjectsProjects Augmentative CommunicationAugmentative Communication

Word Prediction and Contextual Word Prediction and Contextual Information (Keith Trnka)Information (Keith Trnka)

Using prestored text (Jan Bedrosian+, Using prestored text (Jan Bedrosian+, Linda Hoag+, Tim Walsh)Linda Hoag+, Tim Walsh)

ICICLEICICLE – CALL system for teaching English – CALL system for teaching English as a second language to ASL natives as a second language to ASL natives (Rashida Davis+)(Rashida Davis+)

Text SkimmingText Skimming – for someone who is blind – for someone who is blind to find an answer to a question (Debbie to find an answer to a question (Debbie Yarrington)Yarrington)

Generating Textual Summaries of Generating Textual Summaries of GraphsGraphs – (Sandee Carberry, Seniz Demir, – (Sandee Carberry, Seniz Demir, Charlie Greenbacker, Peng Wu)Charlie Greenbacker, Peng Wu)

Summarizing Multi-Modal Documents – Summarizing Multi-Modal Documents – (Charlie Greenbacker, Sandee Carberry)(Charlie Greenbacker, Sandee Carberry)

Page 4: Kathy McCoy

Developing Developing Intelligent Intelligent

Communication Communication Aids for People Aids for People with Disabilitieswith DisabilitiesKathleen F. McCoyKathleen F. McCoy

Computer and Information Sciences & Center for Applied Science and Engineering in Rehabilitation

University of Delaware

Page 5: Kathy McCoy

Augmentative Augmentative CommunicationCommunication

Intervention that gives non-speaking Intervention that gives non-speaking person an alternative means to person an alternative means to communicatecommunicate

User PopulationUser Population May have severe motor impairmentsMay have severe motor impairments

Unable to speakUnable to speak Unable to writeUnable to write Cannot use sign languageCannot use sign language

Our focus here: adults with no cognitive Our focus here: adults with no cognitive impairments and very good literacy skillsimpairments and very good literacy skills

Page 6: Kathy McCoy

Row-Column ScanningRow-Column Scanning

Page 7: Kathy McCoy

Row-Column Scanning IIRow-Column Scanning II

Page 8: Kathy McCoy

Can we be faster?Can we be faster?

Page 9: Kathy McCoy

Language Language Representation: WordsRepresentation: Words

Page 10: Kathy McCoy

Still Need to Spell!Still Need to Spell!

Page 11: Kathy McCoy

Predicting Fringe Predicting Fringe VocabularyVocabulary

Word Prediction of Spelled Words Word Prediction of Spelled Words (infrequent context-specific words)(infrequent context-specific words)

MethodsMethods Statistical NLP MethodsStatistical NLP Methods Learning from the context of the Learning from the context of the

individualindividual Other Contextual CluesOther Contextual Clues

Geographic Location, Time of Day, Geographic Location, Time of Day, Conversational Partner, Topic of Conversational Partner, Topic of Conversation, Style of the DocumentConversation, Style of the Document

Page 12: Kathy McCoy

Prediction ExamplePrediction Example

Page 13: Kathy McCoy
Page 14: Kathy McCoy
Page 15: Kathy McCoy

Trigram Model: P(w|h)=P(w|wTrigram Model: P(w|h)=P(w|w--

22 w w-1-1))

Page 16: Kathy McCoy

Can we do better??Can we do better??

Intuitively all possible words do not Intuitively all possible words do not occur with equal likelyhood during a occur with equal likelyhood during a conversation.conversation.

The topic of the conversation affects The topic of the conversation affects the words that will occur. the words that will occur. E.g., when talking about baseball: ball, E.g., when talking about baseball: ball,

bases, pitcher, bat, triple….bases, pitcher, bat, triple…. How often do these same words occur How often do these same words occur

in your algorithms class?in your algorithms class?

Page 17: Kathy McCoy

Topic ModelingTopic Modeling

Goal: Automatically identify the topic Goal: Automatically identify the topic of the conversation and increase the of the conversation and increase the probability of related words and probability of related words and decrease probability of unrelated decrease probability of unrelated words.words.

QuestionsQuestions Topic RepresentationTopic Representation Topic IdentificationTopic Identification Topic ApplicationTopic Application Topic Language Model UseTopic Language Model Use

Page 18: Kathy McCoy

Topic Modeling Topic Modeling ApproachApproach

Page 19: Kathy McCoy

Topic IdentificationTopic Identification

Page 20: Kathy McCoy

Topic IdentificationTopic Identification

Page 21: Kathy McCoy

Topic ApplicationTopic Application

How do we use those similarity How do we use those similarity scores?scores?

Essentially weight the contribution Essentially weight the contribution of each topic by the amount of of each topic by the amount of similarity that topic has with the similarity that topic has with the current conversation.current conversation.

Page 22: Kathy McCoy
Page 23: Kathy McCoy

Results Using TopicsResults Using Topics

Page 24: Kathy McCoy

Current Work: Current Work: Graduation for Keith!Graduation for Keith!

Other kinds of tuning to the user we Other kinds of tuning to the user we can do:can do: RecencyRecency Style (part of speech models)Style (part of speech models)

Older work: Does keystroke savings Older work: Does keystroke savings translate into communication rate translate into communication rate enhancement?enhancement?

Page 25: Kathy McCoy

Using the Web to Create Using the Web to Create Semantic Relations for a Semantic Relations for a Skimming System for Non-Skimming System for Non-Visual ReadersVisual Readers

Debra YarringtonDept. of Computer and Information ScienceUniversity of Delaware{yarringt}@eecis.udel.edu

Page 26: Kathy McCoy

IntroductionIntroduction This talk will discuss factors in designing a

system for automatically skimming text documents in response to a question:

System:◦ Input:

Question Potentially complex in nature

Text document◦ Output:

Web Page with links Links to text related to the question Links to text visual skimmers are likely to focus on

Page 27: Kathy McCoy

GoalGoalThe goal of this system is to give

nonvisual readers information similar to what visual readers get when skimming through a document in response to a question.

Motivation Working with college students who were blind

and visually impairedStudents took significantly longer to find

homework question answers within documents than their visual-reading counterparts

Page 28: Kathy McCoy

SubGoals:SubGoals:Production of our skimming system will require

the attainment of three major goals:1. Achieving an understanding of what information in a

document visual skimmers pay attention to when skimming in response to a question

2. Developing Natural Language Processing (NLP) techniques to automatically identify areas of text visual readers focus on as determined in 1.

3. Developing a user interface to be used in conjunction with screen reading software to deliver the visual skimming experience.

This talk focuses on work done in 1. and 2.

Page 29: Kathy McCoy

Part 1: Visual Skimming DataPart 1: Visual Skimming Data

Goal: To achieving an understanding of what information visual skimmers pay attention to when skimming through documents to answer questions

Procedure: ◦ Have visual readers skim through a

document for a question answer while being tracked by an eye tracking system

Page 30: Kathy McCoy

Gathering DataGathering Data14 complex questions and accompanying

documents◦ 10 were 2-pages, 2 were 5-pages, and 2 were 8 pages or

longer.◦ Documents were text documents

No images, few subtitles and lists◦ Examples of questions used:

“What effect does China’s rising oil prices have on other sectors of its economy?”

“According to Piaget, what techniques do children use to adjust to their environment?”

Individuals skimmed for question answer in a document while being tracked by an eye tracking system.◦ 43 subjects skimmed for answers to between 6-13

question, Total of 513 question-answer skimming results Subjects then answered multiple choice question

Page 31: Kathy McCoy

Results:Results:423/510 questions answered

correctly◦Shows that even for complex

questions, subjects were able to successfully answer the question

◦We wanted to show that the areas subjects paid most attention to when skimming had a connection to the question

Page 32: Kathy McCoy

Eye Tracker Data:Eye Tracker Data:Tobii Eye Tracker:AOIs:

◦ We could define areas of interest (AOI) in the text document ahead of time

◦ We chose paragraphs, titles, subtitles, and the question as separate AOIs.

◦ We then counted the number of gaze points (gazes of over 100 ms duration) in each AOI

HotSpot and Duration File: ◦ The tracker gave us an image that showed “hot

spots”, or locations and durations of where the eyes gazed

◦ A file with locations and durations of gaze points

Page 33: Kathy McCoy

Skimming Data Results:Skimming Data Results:Individuals do focus on titles and

subtitlesSubjects frequently focused on the

first paragraph or paragraphs of a document

Most subjects did not focus on the first line of each paragraph ◦This is a technique available via

screenreaders Clearly this does not give screenreader users

an experience similar to that of visual skimmers

Page 34: Kathy McCoy

Example of Technique 3:Example of Technique 3:

Page 35: Kathy McCoy

Results Analysis:Results Analysis:We examined AOIs most frequently

focused on that did not have physical attributes that would explain the attraction of people’s gazes◦Assumption is that these areas were

focused on because of their connection to the question.

Page 36: Kathy McCoy

Results Analysis:Results Analysis:Subjects did focus on areas of

text containing the answer to the question◦Even when answer is not

straightforward. Subjects are not matching words

◦Shows that subjects are making semantic connections between the question and the information they are skimming for

Page 37: Kathy McCoy

Subjects found question answerExample:

“How do people catch the West Nile Virus?”

The paragraph with the most gaze points for the most subjects was:“In the United States, wild birds, especially crows and jays, are the main reservoir of West Nile virus, but the virus is actually spread by certain species of mosquitoes. Transmission happens when a mosquito bites a bird infected with the West Nile virus and the virus enters the mosquito's bloodstream. It circulates for a few days before settling in the salivary glands. Then the infected mosquito bites an animal or a human and the virus enters the host's bloodstream, where it may cause serious illness. The virus then probably multiplies and moves on to the brain, crossing the blood-brain barrier. Once the virus crosses that barrier and infects the brain or its linings, the brain tissue becomes inflamed and symptoms arise.”

Page 38: Kathy McCoy

Subjects found question answerExample:

“How do people catch the West Nile Virus?”

The paragraph with the most gaze points for the most subjects was:“In the United States, wild birds, especially crows and jays, are the main reservoir of West Nile virus, but the virus is actually spread by certain species of mosquitoes. Transmission happens when a mosquito bites a bird infected with the West Nile virus and the virus enters the mosquito's bloodstream. It circulates for a few days before settling in the salivary glands. Then the infected mosquito bites an animal or a human and the virus enters the host's bloodstream, where it may cause serious illness. The virus then probably multiplies and moves on to the brain, crossing the blood-brain barrier. Once the virus crosses that barrier and infects the brain or its linings, the brain tissue becomes inflamed and symptoms arise.”

Page 39: Kathy McCoy

Subjects focused on areas that have a semantic relationship with the question

E.g., with the question,“Why was Monet’s work criticized by the public?”

the second most frequently focused on paragraph was:In 1874, Manet, Degas, Cezanne, Renoir, Pissarro, Sisley and Monet put together an exhibition, which resulted in a large financial loss for Monet and his friends and marked a return to financial insecurity for Monet. It was only through the help of Manet that Monet was able to remain in Argenteuil. In an attempt to recoup some of his losses, Monet tried to sell some of his paintings at the Hotel Drouot. This, too, was a failure. Despite the financial uncertainty, Monet’s paintings never became morose or even all that sombre. Instead, Monet immersed himself in the task of perfecting a style which still had not been accepted by the world at large. Monet’s compositions from this time were extremely loosely structured, with color applied in strong, distinct strokes as if no reworking of the pigment had been attempted. This technique was calculated to suggest that the artist had indeed captured a spontaneous impression of nature.

This Paragraph does not contain the answer

Page 40: Kathy McCoy

Subjects focused on areas that have a semantic relationship with the question

E.g., with the question,“Why was Monet’s work criticized by the public?”

the second most frequently focused on paragraph was:In 1874, Manet, Degas, Cezanne, Renoir, Pissarro, Sisley and Monet put together an exhibition, which resulted in a large financial loss for Monet and his friends and marked a return to financial insecurity for Monet. It was only through the help of Manet that Monet was able to remain in Argenteuil. In an attempt to recoup some of his losses, Monet tried to sell some of his paintings at the Hotel Drouot. This, too, was a failure. Despite the financial uncertainty, Monet’s paintings never became morose or even all that somber. Instead, Monet immersed himself in the task of perfecting a style which still had not been accepted by the world at large. Monet’s compositions from this time were extremely loosely structured, with color applied in strong, distinct strokes as if no reworking of the pigment had been attempted. This technique was calculated to suggest that the artist had indeed captured a spontaneous impression of nature.

This Paragraph does not contain the answer

Page 41: Kathy McCoy

Part 2:Part 2:►Next Step: Developing Natural

Language Processing (NLP) techniques to automatically identify areas of text visual readers focus on as determined in 1.

Page 42: Kathy McCoy

Process:Process:1. Generate keywords from question2. Weight keywords based on

inverse of # of paragraphs in which they occur in the document

3. Generate matching score for each paragraph • # of occurrences of each

keyword x keyword’s weight4. Rank paragraph’s likelihood of

being related to the question based on matching score

Page 43: Kathy McCoy

BaselineBaselineKeyword Sets:

Set 1: All nonfunction words in question . E.g.,How does marijuana affect the brain• Poor results – poor correlation between

question and areas of interest

Set 2: Synonyms, hypernyms and hyponyms of the nonfunction words (generated using WordNet)• Poor results – poor correlation between

question and area of interest.

Page 44: Kathy McCoy

Set 3: Topically-Related Set 3: Topically-Related KeywordsKeywords

We must explore other ways of identifying text relevant to complex questions

Our solution: ◦use the World Wide Web to form clusters of

topically-related words Large, covers virtually all topics, constantly

updated, constantly available◦The topic is the question◦The resulting word cluster words will be

matched to paragraphs as described above for ranking relevant text.

Page 45: Kathy McCoy

Procedure: Cluster Procedure: Cluster formationformation

1. Use content words from question as search engine (Google) query terms

2. Search returns ordered list of relevant URLs with accompanying snippets (we use top 60 “hits”)

3. Retrieve web page from URL4. Locate snippet within web page (stripped

of html)5. Include 50 content words before snippet

and 50 content words after snippet in cluster ◦ Keep track of total count of each word in cluster

Page 46: Kathy McCoy

Results:Results:• Semantic relationships are being

identified• These semantic relationships

more accurately identify relevant paragraphs of text

Page 47: Kathy McCoy

Results:Results:Question: How do people catch the West Nile

VirusQuery Terms: how people catch west nile virusResulting Cluster:

west: 337virus: 329nile: 326infected: 99people: 89symptoms: 89wnv: 84mosquito: 82can: 68mosquitoes: 66disease: 62get: 55home: 55encephalitis: 44health: 42birds: 41control: 33may: 32information: 32

become: 31humans: 28cases: 26horses: 26infection: 25blood: 24spread: 23search: 23illness: 22brain: 21cause: 201999: 20dead: 20fever: 20develop: 20wildlife: 19person: 18prevention: 18sick: 18

human: 18bird: 18site: 18borne: 18links: 17contact: 17bite: 17first: 17will: 16found: 16new: 16mild: 15headache: 15news: 15inflammation: 15risk: 15ca: 15like: 15states: 15

transmission: 15activity: 14united: 14horse: 13summer: 13aches: 13map: 13animals: 13skip: 13transmitted: 12vector: 12diseases: 12bitten: 12county: 12care: 12treatment: 12common: 12one: 12york: 11

last: 11flu: 11medical: 11africa: 11usgs: 11severe: 11asia: 10detected: 10glands: 10avian: 10east: 10owners: 10history: 10days: 10us: 10surveillance: 10muscle: 10bites: 9vaccine: 9

Page 48: Kathy McCoy

Results:Results:Ranking of paragraph with answer to

question using Web-based semantic relations in 2-page documents:Question Ranking of most

relevant paragraph

Total # paragraphs in document

Q1 1 30

Q2 1 14

Q3 2 25

Q4 3 21

Q5 3 13

Q6 4 25

Q7 4 22

Q8 7 14

Q9 7 13

Q10 10 15

Page 49: Kathy McCoy

Future WorkFuture WorkPart 1: Analyzing Skimming Data

◦Look at smaller areas of interest Users may have focused on one specific

part of the paragraph

◦Look at area users focused on first before choosing to focus on a particular area

Page 50: Kathy McCoy

Future WorkFuture WorkPart 2: Developing clusters

◦Goal: finding best clustering to identify text identified as relevant by visual skimmers Include IDF weighting for Web

“Monoamine” vs “trying” Reordering query terms

How Marijuana Affect Brain vs Marijuana Brain Affect How Explore varying number of URLs used to form

clusters Explore different window sizes (currently 100

content words) Explore using phrases as query terms Explore using synonyms, hypernyms, and

hyponyms as query terms

Page 51: Kathy McCoy

Preliminary Results:Preliminary Results:Question: How does Marijuana affect the brain?Query Terms: how Marijuana affect brainResulting Cluster:marijuana: 259brain: 210drug: 100health: 65effects: 64drugs: 57abuse: 52thc: 44use: 41alcohol: 40can: 39science: 37affected: 37news: 36addiction: 35smoking: 28affect: 27like: 26home: 26

high: 26study: 25smoke: 25receptors: 24users: 23long: 23cocaine: 23research: 22pot: 22tolerance: 22cannabis: 21new: 21term: 20areas: 19get: 19body: 19memory: 19mental: 18may: 18

researchers: 18topics: 17nida: 17heavy: 16lead: 16search: 16many: 15system: 15called: 15treatment: 15receptor: 15also: 14related: 14driving: 14rehab: 14regions: 14skip: 14one: 14scientists: 14

cannabinoid: 13brains: 13causes: 13now: 13detox: 13medicine: 12teens: 12development: 12cancer: 12heroin: 12prescription: 12medical: 12see: 12time: 12medications: 12known: 11addictive: 11blog: 11different: 11

articles: 11damage: 11crime: 11report: 11tobacco: 10biology: 10using: 10functioning: 10sites: 10functions: 10history: 10us: 10nervous: 9cortex: 9cells: 9mind: 9neurons:8proteins: 8changes: 8

Page 52: Kathy McCoy

Preliminary Results:Preliminary Results: Question: What effect does China’s rising oil prices have on other

sectors of its economy? Query Terms: What effect China’s rising oil prices sectors economy Resulting Cluster:oil: 262prices: 177china: 164us: 101news: 94home: 93world: 87energy: 81high: 73price: 65economy: 65economic: 61will: 56dollar: 52countries: 48site: 48member: 46page: 45demand: 43global: 39

chinese: 39search: 38business: 372009: 36research: 35percent: 34contact: 33new: 33military: 32production: 322008: 31rising: 30market: 29sector: 29policy: 28increase: 28cato: 28publications: 28spending: 28can: 27

markets: 27international: 27growth: 27gold: 26video: 25industry: 25fuel: 25may: 25asia: 24costs: 24now: 24industrial: 23gas: 23inflation: 23last: 23country: 22higher: 22rss: 22resources: 22economies: 22

affect: 21states: 21article: 21united: 21use: 21services: 21content: 20investment: 20cost: 20features: 20data: 202005: 20financial: 20day: 20since: 20current: 20products: 19archive: 19print: 19many: 19

crude: 19impact: 19opinion: 19issues: 19foreign: 18main: 18india: 18economics: 18money: 182010: 18supply: 17support: 17increases: 17become: 17recession: 17analysis: 17august: 17rise: 17sectors: 17email: 16barrel: 16