s ay a nything selma p. and parth p.. s ay w hat ? data-driven system that allows for interactive...

Download S AY A NYTHING Selma P. and Parth P.. S AY W HAT ? Data-driven system that allows for interactive storytelling Human author takes turns with computer

If you can't read please download the document

Upload: barnaby-newman

Post on 16-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • S AY A NYTHING Selma P. and Parth P.
  • Slide 2
  • S AY W HAT ? Data-driven system that allows for interactive storytelling Human author takes turns with computer author
  • Slide 3
  • E XAMPLE H: Youll never believe what happened last night. C: Leigh laughed at my joke but I couldnt help but think liz would have laughed harder. H: The joke wasnt very funny in a ha ha kind of way C: It wasnt anything like that, I thought he was going to give me a good night kiss but he ended up licking my cheek, she declared. H: It made me sneeze and snort out loud. C: And now my nose hurts form the snorting.
  • Slide 4
  • E XAMPLE H: Youll never believe what happened last night. C: Leigh laughed at my joke but I couldnt help but think liz would have laughed harder. H: The joke wasnt very funny in a ha ha kind of way C: It wasnt anything like that, I thought he was going to give me a good night kiss but he ended up licking my cheek, she declared. H: It made me sneeze and snort out loud. C: And now my nose hurts form the snorting.
  • Slide 5
  • E XAMPLE H: Youll never believe what happened last night. C: Leigh laughed at my joke but I couldnt help but think liz would have laughed harder. H: The joke wasnt very funny in a ha ha kind of way C: It wasnt anything like that, I thought he was going to give me a good night kiss but he ended up licking my cheek, she declared. H: It made me sneeze and snort out loud. C: And now my nose hurts form the snorting.
  • Slide 6
  • W HY DO THIS ? Intriguing Blends structure with creativity Void Game of sorts Fits in the space between language games and more graphically oriented video games Foundational work in progress Research
  • Slide 7
  • N ARRATIVE M ODELS (M AJEWSKI 2003) Five ways: Linear String of pearls Branching Amusement park Building blocks
  • Slide 8
  • W HAT S BEEN DONE BEFORE ? Faade http://interactivestoriesonline.com/ http://www.interactivenarratives.org/ Top Down Approach Strict domains Technical/ non-inclusive
  • Slide 9
  • H OW IT WORKS 1. User enters sentence 2. Users sentence is used to search corpus using term frequency - inverse document frequency (tf- idf) algorithm 3. Highest scored match is retrieved. Sentence after best match is outputted by computer 4. Repeat
  • Slide 10
  • H OW DID THEY DO IT ?
  • Slide 11
  • G ETTING D ATA ( DB ) Considered Manual (StoryCorps/ Fed. Writers) Well curated biased Favored blog posts (3.4 million) 1.06 billion words Extraction Only 17% textual material on weblogs is narrational 3.7 million story segments 66.5 million sentences Favored Recall over Precision FN preferred over FP
  • Slide 12
  • G ETTING D ATA Spinn3r.com 44 million weblog. Sampled / hand annotated 5,270 blogs. Annotated validations set
  • Slide 13
  • G ETTING D ATA Randomized training/ testing Supervised machine learning Binary classification problem Confidence Weighted Linear Classifier [Dredze 2008]
  • Slide 14
  • G ETTING D ATA Rando. Data Set annotation training/ testing F(x)F(x) Sampling
  • Slide 15
  • C ORPUS ( DB ) C REATION Crawled blogs and applied algorithm: (2012): Post- Processing Parse trees Verb tenses First personal pronouns
  • Slide 16
  • C ORPUS ( DB ) C REATION Rando. Data Set crawl blogs annotation training/ testing F(x)F(x) Sampling Blog data F(x)F(x) Story data
  • Slide 17
  • A PPLICATION Querying Corpus ( ) Optimized: Return a list of stories that contain any matching words with user input Use TF-IDF ! Story data
  • Slide 18
  • A PPLICATION Rando. Data Set crawl blogs annotation training/ testing F(x)F(x) Sampling Blog data F(x)F(x) Story data User Input Query Story data Match Algorithm (tf id) Computer Output
  • Slide 19
  • T ERM FREQUENCY INVERSE DOCUMENT FREQUENCY ( TF - IDF ) TF-IDF is a numerical statistic that reflects how important a word is to a document in a corpus The term frequency measures how often a word appears in a document The inverse document frequency is a measure of how common a word is within the corpus as a whole. It tells us how much information a word provides.
  • Slide 20
  • T ERM FREQUENCY INVERSE DOCUMENT F REQUENCY ( TF - IDF ) Image credit: Li(2011)
  • Slide 21
  • R ESULTS
  • Slide 22
  • A REAS TO I MPROVE Metrics? Entertainment Coherence
  • Slide 23
  • A REAS TO I MPROVE Metrics? Entertainment Coherence Believability / Usability Compare how well with next sentence user would have written
  • Slide 24
  • A REAS TO I MPROVE Fail to use all preceding sentences. Only returns highest ranked search.
  • Slide 25
  • A REAS TO I MPROVE Fail to use all preceding sentences. Only returns highest ranked search.
  • Slide 26
  • A REAS TO I MPROVE Fail to use all preceding sentences. Only returns highest ranked search.
  • Slide 27
  • F UTURE WORK No narrative plot
  • Slide 28
  • W ORK IN P ROGRESS Foundational Last article was 2012
  • Slide 29
  • T HANKS
  • Slide 30
  • D ISCUSSION Q S
  • Slide 31
  • I MPROVEMENTS ?
  • Slide 32
  • Q UALITY A SSESSMENT
  • Slide 33
  • M ODIFY C ORPUS
  • Slide 34
  • P APER C ONTENT ?