eacl2012: in search of a gold standard in studies of deception

Stephanie Gohkman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie

In Search of a Gold Standard in Studies of Deception

Stephanie Gokhman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie


Stephanie Gohkman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie

Newman-Pennebaker Model (2003)

The NP model not consistent across contexts

On reflection, why would we expect it to be?

Psychological and persuasion dynamics of deception are highly constrained by context

Context: Deception in Online Reviews

1.Sanctioned Lies

Creating Deception for Research

• Researcher asks participant to lie• Topics include beliefs, attitudes, feelings, actions

Ex: mock crime

1.Sanctioned Lies


• Researcher asks participant to lie• Topics include beliefs, attitudes, feelings, actions

Ex: mock crime

Adv: researcher can control when and where lie occursLimitations: permission to lie, requires high stakes

1. Sanctioned Lies

2. Unsanctioned Lies


i. Diary Studies

i. Retrospective Identification

i. Cheating paradigms

1. Sanctioned Lies



Psychology & Communication

1. Sanctioned Lies


3. Non-gold Standard Approaches


i. Manual Annotation

i. Heuristically labeled

i. Unlabeled (distributional analysis)

Psychology & Communication

ComputerScience

1.Sanctioned Lies

1.Unsanctioned Lies

1.Non-gold Standard Approaches

A Novel Method: The Crowd-sourcing Approach…


The Crowdsourcing Approach

Crowdsourcing divides large projects into small manageable tasks and matches these tasks with humans that will perform them

- harness distributed resources

- maximize speed

- minimize cost

- more powerful than local tech & small research groups

- data collection, access, annotation, and analysis

Amazon's Mechanical Turk

Requesters create a Human Intelligence Task (HIT) to be completed by Workers

HITs are similar to HTML forms an may include:

- the solicitation

- information needed for the Workers to complete the task

- collection of survey information

4 Assumptions of our Crowdsourcing Approach

1. Balanced data set Equal # of truthful and deceptive reviews Uniform valence: whole positive or negative data set

2. Both truthful and deceptive reviews cover same set of entities

Minimize distinguishing features that may be context-based rather than language of deception

3. Data set of reasonable size 800 total reviews (400 crowdsourced)

4 Assumptions of our Crowdsourcing Approach

4. Deceptive reviews should be generated under the same basic guidelines as governs the generation of truthful reviews

Length Quality Time

STEP 1: Identify entities to be covered in the reviews

Truthful corpus– Find all entities (specific hotels) from the real world

database (TripAdvisor)

– Extract all statements (reviews) from those entities

– Identify the subcategories to which these entities belong (Chicago hotels)


Truthful corpus– Find all entities (specific hotels) from the real world

database (TripAdvisor)

– Extract all statements (reviews) from those entities

– Identify the subcategories to which these entities belong (Chicago hotels)

Deceptive Corpus– Use entities from truthful corpus to create the prompt

for the Turkers

STEP 2: Develop the Mechanical Turk prompt

Survey real solicitations for deception (hotel reviews, doctor reviews, etc)

A Real Solicitation

STEP 2: Develop the Mechanical Turk prompt

Survey real solicitations for deception (hotel reviews, doctor reviews, etc)

Mimic the workflow, vocabulary and tone of the Turkers

Step 3: Attach appropriate warnings to the solicitation

May not complete this task more than once Their work will not be awarded if it is not

coherent or off topic This review is for academic purposes

Be aware of priming effects and placement of this warning

Step 4: Gather demographic data and comments

Survey mechanism for demographics– Age, Education, etc

Qualitative, open-ended commentProvides technical information

Incentivize comments

Step 5: Pilot

Pilot the resulting HIT in small batches (10)

Remove all plagiarized results through automated processes (Yahoo! Boss API)

– Workers do not receive payment for any plagiarized material

Manually evaluate remaining set

Coherence, Topical, Length of Review

Iterate until: No technical complaints

Experiment quality

Full run of solicitation (400 reviews) by unique workers

Let's see it!

Finding the Gold Standard

Resulting set of 400 reviews are then used to train the algorithm for deceptive positive reviews

The algorithm trains separately on the set of 400 truthful* reviews for comparison

Discussion & Conclusion

Advantages

• model the deception as closely to real-world as possible• known deceptive

Limitations

• sanctioned?• limited knowledge of Turkers• constrained to certain contexts• construction of the ‘truthful’ set non-trivial

Discussion & Conclusion

Key Potential:

to create datasets more easily and efficientlyin an effort to model deception customized tospecific contexts for a Context Constrained Approach to Deception


Stephanie Gokhman, Jeff Hancock, Poornima Prabhu, Myle Ott, & Claire Cardie

eacl2012: in search of a gold standard in studies of deception

Technology

deception hotel reviews

doctor reviews

statements reviews

total reviews

online reviews

deception hotelreviews

deceptive reviews coversame

entities specific hotels