eacl2012: in search of a gold standard in studies of deception
DESCRIPTION
Presentation by myself and Jeff Hancock on April 23, 2012, in Avignon, France, at the 2012 conference for the European Association of Computational Linguistics (EACL) Deception Detection Workshop.TRANSCRIPT
![Page 1: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/1.jpg)
Stephanie Gohkman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie
![Page 2: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/2.jpg)
In Search of a Gold Standard in Studies of Deception
Stephanie Gokhman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie
![Page 3: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/3.jpg)
In Search of a Gold Standard in Studies of Deception
Stephanie Gohkman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie
Newman-Pennebaker Model (2003)
![Page 4: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/4.jpg)
![Page 5: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/5.jpg)
![Page 6: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/6.jpg)
![Page 7: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/7.jpg)
The NP model not consistent across contexts
On reflection, why would we expect it to be?
Psychological and persuasion dynamics of deception are highly constrained by context
![Page 8: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/8.jpg)
Context: Deception in Online Reviews
![Page 9: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/9.jpg)
1.Sanctioned Lies
Creating Deception for Research
• Researcher asks participant to lie• Topics include beliefs, attitudes, feelings, actions
Ex: mock crime
![Page 10: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/10.jpg)
1.Sanctioned Lies
Creating Deception for Research
• Researcher asks participant to lie• Topics include beliefs, attitudes, feelings, actions
Ex: mock crime
Adv: researcher can control when and where lie occursLimitations: permission to lie, requires high stakes
![Page 11: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/11.jpg)
1. Sanctioned Lies
2. Unsanctioned Lies
Creating Deception for Research
i. Diary Studies
i. Retrospective Identification
i. Cheating paradigms
![Page 12: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/12.jpg)
1. Sanctioned Lies
2. Unsanctioned Lies
Creating Deception for Research
Psychology & Communication
![Page 13: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/13.jpg)
1. Sanctioned Lies
2. Unsanctioned Lies
3. Non-gold Standard Approaches
Creating Deception for Research
i. Manual Annotation
i. Heuristically labeled
i. Unlabeled (distributional analysis)
Psychology & Communication
ComputerScience
![Page 14: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/14.jpg)
1.Sanctioned Lies
1.Unsanctioned Lies
1.Non-gold Standard Approaches
A Novel Method: The Crowd-sourcing Approach…
Creating Deception for Research
![Page 15: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/15.jpg)
The Crowdsourcing Approach
Crowdsourcing divides large projects into small manageable tasks and matches these tasks with humans that will perform them
- harness distributed resources
- maximize speed
- minimize cost
- more powerful than local tech & small research groups
- data collection, access, annotation, and analysis
![Page 16: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/16.jpg)
Amazon's Mechanical Turk
Requesters create a Human Intelligence Task (HIT) to be completed by Workers
HITs are similar to HTML forms an may include:
- the solicitation
- information needed for the Workers to complete the task
- collection of survey information
![Page 17: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/17.jpg)
4 Assumptions of our Crowdsourcing Approach
1. Balanced data set Equal # of truthful and deceptive reviews Uniform valence: whole positive or negative data set
2. Both truthful and deceptive reviews cover same set of entities
Minimize distinguishing features that may be context-based rather than language of deception
3. Data set of reasonable size 800 total reviews (400 crowdsourced)
![Page 18: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/18.jpg)
4 Assumptions of our Crowdsourcing Approach
4. Deceptive reviews should be generated under the same basic guidelines as governs the generation of truthful reviews
Length Quality Time
![Page 19: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/19.jpg)
STEP 1: Identify entities to be covered in the reviews
Truthful corpus– Find all entities (specific hotels) from the real world
database (TripAdvisor)
– Extract all statements (reviews) from those entities
– Identify the subcategories to which these entities belong (Chicago hotels)
![Page 20: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/20.jpg)
STEP 1: Identify entities to be covered in the reviews
![Page 21: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/21.jpg)
STEP 1: Identify entities to be covered in the reviews
Truthful corpus– Find all entities (specific hotels) from the real world
database (TripAdvisor)
– Extract all statements (reviews) from those entities
– Identify the subcategories to which these entities belong (Chicago hotels)
Deceptive Corpus– Use entities from truthful corpus to create the prompt
for the Turkers
![Page 22: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/22.jpg)
STEP 2: Develop the Mechanical Turk prompt
Survey real solicitations for deception (hotel reviews, doctor reviews, etc)
![Page 23: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/23.jpg)
A Real Solicitation
![Page 24: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/24.jpg)
STEP 2: Develop the Mechanical Turk prompt
Survey real solicitations for deception (hotel reviews, doctor reviews, etc)
Mimic the workflow, vocabulary and tone of the Turkers
![Page 25: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/25.jpg)
Step 3: Attach appropriate warnings to the solicitation
May not complete this task more than once Their work will not be awarded if it is not
coherent or off topic This review is for academic purposes
Be aware of priming effects and placement of this warning
![Page 26: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/26.jpg)
Step 4: Gather demographic data and comments
Survey mechanism for demographics– Age, Education, etc
Qualitative, open-ended commentProvides technical information
Incentivize comments
![Page 27: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/27.jpg)
Step 5: Pilot
Pilot the resulting HIT in small batches (10)
Remove all plagiarized results through automated processes (Yahoo! Boss API)
– Workers do not receive payment for any plagiarized material
Manually evaluate remaining set
Coherence, Topical, Length of Review
Iterate until: No technical complaints
Experiment quality
Full run of solicitation (400 reviews) by unique workers
![Page 28: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/28.jpg)
Let's see it!
![Page 29: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/29.jpg)
Finding the Gold Standard
Resulting set of 400 reviews are then used to train the algorithm for deceptive positive reviews
The algorithm trains separately on the set of 400 truthful* reviews for comparison
![Page 30: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/30.jpg)
![Page 31: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/31.jpg)
![Page 32: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/32.jpg)
![Page 33: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/33.jpg)
Discussion & Conclusion
Advantages
• model the deception as closely to real-world as possible• known deceptive
Limitations
• sanctioned?• limited knowledge of Turkers• constrained to certain contexts• construction of the ‘truthful’ set non-trivial
![Page 34: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/34.jpg)
Discussion & Conclusion
Key Potential:
to create datasets more easily and efficientlyin an effort to model deception customized tospecific contexts for a Context Constrained Approach to Deception
![Page 35: EACL2012: In Search of a Gold Standard in Studies of Deception](https://reader034.vdocuments.us/reader034/viewer/2022042700/556451a0d8b42a9f128b5760/html5/thumbnails/35.jpg)
In Search of a Gold Standard in Studies of Deception
Stephanie Gokhman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie