overcoming temptation: theory and practice michael mozer computer science dept. and institute of...

47
Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian F. Ward McCombs School of Business, University of Texas Austin John Lynch Leeds School of Business, University of Colorado Boulder Brett Israelsen, Ian Smith Computer Science, University of Colorado Boulder Shruthi Sukumar Electrical & Computer Engineering, University of Colorado Boulder Shabnam Hakimi Institute of Cognitive Science, University of Colorado Boulder

Upload: rosamund-ford

Post on 12-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Overcoming Temptation:Theory and Practice

Michael MozerComputer Science Dept. and Institute of Cognitive Science

University of Colorado Boulder

Adrian F. WardMcCombs School of Business, University of Texas Austin

John LynchLeeds School of Business, University of Colorado Boulder

Brett Israelsen, Ian SmithComputer Science, University of Colorado Boulder

Shruthi SukumarElectrical & Computer Engineering, University of Colorado Boulder

Shabnam HakimiInstitute of Cognitive Science, University of Colorado Boulder

Page 2: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Retirement Planning Fail

Among US 55-64 year old

62% have retirement assets

median savings for those who have assets: $42k

Pre-retirement defection in the US

For every $1 contributed to the accounts of savers under age 55, $0.40 simultaneously flows out of the 401(k)/IRA system, not counting loans(Argento, Bryant, & Sabelhaus, 2014)

National Institute onRetirement Security

Page 3: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Can Financial Education Change Behavior?

US Government and nonprofits spent $670M on financial education in 2013.

Financial education explain 0.1% of variance in financial outcomes (Fernandes, Lynch, & Netemeyer, 2015)

Social Sciences Finance0.0

0.2

0.4

0.6

0.8

1.0

Effectiveness of Educational In-terventions (r2)

LargeMediumSmall

Domain

Effec

t Siz

e (r

2)

r2 = .0011

Page 4: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Behavioral Control Problem

Agent acts in the world

Some actions can lead to immediate pay offs

e.g., buy a new car

Other actions can lead to delayed pay offs

e.g., increase contributions to retirement account

How do you incentivize people to stay focused on the long-term?

Page 5: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Other Domains

Dieting

Exercise

Cleaning house

Waiting for bus / elevator

Listening to a research talk

Page 6: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Delay Discounting Paradigm

A way to quantify preference for now vs. later rewards

Find point of subjective indifference

Yields hyperbolic discounting

Would you rather have$100 now

or$X in Y days?

Page 7: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Delayed Gratification Paradigm

Marshmallow Test (Mischel and Ebbeson, 1970)

Delay Discounting Delayed Gratificationone shot decision continuous decisionreveals intrinsic future value

future value confounded with grit

Page 8: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Grit, Willpower, and Self Control

All refer to tendency to sustain interest and effort toward a goal

Grit

enduring personality trait

Willpower (= self control)

depends on grit but also varies as a function of mood, time of day, food and beverage intake, ego depletion

Page 9: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Formalization Of Delayed Gratification Task

Choice at every instant to

grab small reward ⟸ end

wait for later large reward ⟸ continue

Finite-state machine (FSM) representation

1 2 3 4 5 τ

κμe

small large

μc . . .

μe

μe

μe μe μe

μe

μc μc μc μc μc

Page 10: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

What Is The Optimal Policy?Optimal policy chooses action at time t that maximizes cumulative summed reward

Or cumulative discounted reward

more discounting -> agent more likely to succumb to temptation

Form of discounting

Exponential vs. hyperbolic

Page 11: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Value Function = Policy

In state S1

Choose action A if V(S2) > V(S3)

Choose action B otherwise

S1

S3

S2A

B

Page 12: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Dynamic Programming

Efficient way of computing value function of optimal policy

The Delayed Gratification FSM has a particularly restricted structured leading to only a few possible state sequences ECECCECCCECCCCECCCCCECCCCCCECCCCCCCECCCCCCCCECCCCCCCCCE

1 2 3 4 5 τ

κμe

small large

μc . . .

μe

μe

μe μe μe

μe

μc μc μc μc μc

Page 13: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Dynamic Programming

Dynamic programming finds the value function that satisfies

Depending on discount rate γ, this yields policy that either

ends at time 1

continues untiltime τ

1 2 3 4 5 τ

κμe

small large

μc . . .

μe

μe

μe μe μe

μe

μc μc μc μc μc

Page 14: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Modeling Human Behavior

DP is not a good model of human behavior

People may wait a while and then succumb to temptation

If you test the same person in the same situation, they may not behave identically each time

What do we need to better model people?

Willpower!

W(t) ~ Gaussian(0,σ2)

σ2: grit

1 2 3 4 5 τ

κμe

small large

μc . . .

μe

μe

μe μe μe

μe

μc μc μc μc μc1 2 3 4 5 τ

κμe

small large

μc . . .

μe-w(1)μe-w(2) μe-w(3)

μe-w(t)

μc μc μc μc μc

Page 15: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Willpower Model

State consists of

t: current time

w: agent’s current willpower level

Agent plans optimally given state {t,w}

Takes deterministic action

However, variability in behavior each time task is performed due to fluctuations in w

Page 16: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Dynamic Programming With State Uncertainty

Agent can only partially predict future states

Value function is based on expectation over this uncertainty

Page 17: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Expectation Has A (Mostly) Intuitive Form

measure of temptation

Page 18: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Theory Predicts Agent’s Temptation Resistance

Page 19: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Two Limitations on Human Behavior

Stochastic fluctuations in willpower

parameter σ

Exponential discounting

parameter γ

Agent is optimal subject to these constraints

Canonical notion of grit: Small σ + large γ

Page 20: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Simulation

Finish line effect

Low grit moderates effect of discount rate

10 time steps (τ)delayed reward is 2 x immediate (κ)

1 2 3 4 5 . . .

Page 21: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Temptation Resistance as a Function of γ and σ

high low

Page 22: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

What Magnitude Delayed Reward Leads To Temptation Resistance?

Given a wait time for the delayed reward, what relative magnitude does the reward have to be in order for there to be a 50% chance the agent will wait for it?

Effective discount rate is exponential, as reflected by the log-linear scaling of the delayed reward

Although γ determines the discount rate, σ determines a time-invariant multiplicative factor

γ = 0.89

γ = 0.95low grithigh grit

Page 23: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Prize-Linked Savings Accounts:Incentivizing A Long-Term Focus

“For every $100 you put in your retirement account, we’ll give you one ticket for a lottery for a $10000 prize.”

Potential of an immediate reward for focusing on long-term goal

One size fits all solution

Maybe different individuals would benefit from different reward structures frequent small rewards vs. infrequent large rewards

Page 24: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Simulating Prize-Linked Savings Account

Borrow η reward units from delayed reward as incentive

At each time t, hold a lottery for reward ω(t) obtained with probability ρ(t)

Page 25: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

RiskOur reward-maximizing framework is risk neutral.

lottery(ρ,ω) is equivalent to lottery(ρ’,ω’) if ρω = ρ’ω’

Risk seeking vs. risk averse behavior

Prospect theory (Tversky & Kahneman, 1979)

When gains are being considered,people underestimate high probabilitiesand overestimate low probabilities

Risk-sensitive RL (Shen, Tobia, Sommer, Obermeyer, 2013)

replace ρ with a subjective probability,

Page 26: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Incorporating Lottery Into Model

Assumes lottery at every time step (TBD)

1 2 3 4 5 τ

κμe-η

small large

μc(1) . . .

μe-w(1)μe-w(2) μe-w(3)

μe-w(t)

μc(2) μc(3) μc(4) μc(τ-1) μc(τ)

Page 27: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Optimization Problem

Given an agent with discount rate γ and grit σ, what is the lottery

L = {ρ(t), ω(t): t = 1 …τ}

that maximizes agent’s temptation resistance?

Page 28: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

varyingdiscountrate (γ)

otherparametersfixed(σ = .10,η = .40,ρ = .01,κ = 2)

γ = 0.950

γ = 0.942

γ = 0.932

γ = 0.921

γ = 0.907

γ = 0.892

γ = 0.874

γ = 0.853

γ = 0.829

γ = 0.800

Page 29: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Interesting Ideas

We can analyze delayed-gratification tasks as an MDP

Grit is helpful if agent does not heavily discount the future; but it can be harmful if the agent does.

behavioral noise can improve performance

Optimal incentive structures depend on an agent’s discount parameter and grit

Page 30: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Experimental Explorations of the Model

1. Develop a laboratory task for adults that

involves choice between smaller-sooner and larger-later rewards

requires continual decision making

induces impulsive behavior

2. Demonstrate that model accounts for human behavior

3. Use the model to optimize human behavior

i.e., resist temptation

Page 31: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Experiment

Demo

Reward per unit time

short: 1.0 points

long: 1.5 points

Page 32: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Experiment

Four minute duration

Mechanical turk participants

Up to 25% bonus payment depending on score

25 participants in control condition

Page 33: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Accumulated Points Over Time

Page 34: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Defection To Short Line

γ = 0.84σ = 0.25Model parameters

Page 35: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Two Versions Of Model

•Willpower at successive moments is independent

•Willpower follows a random walk

Page 36: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Current Directions

Now that we have a model that fits our population, can we determine incentive structure that boosts likelihood of waiting in long line?

Page 37: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Current Directions

Fit parameters of model to an individual’s data

Correlate model parameters with standard assessments like the delay discounting paradigm.

Extend theory to handle

uncertainty in the arrival time of the delayed reward (e.g., marshmallow task)

non-terminal temptations (e.g., Starbucks)

compounded interest (e.g., retirement savings)

human learning from experience (e.g., recency effects)

Page 38: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Thank You!

Page 39: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Game seems to be more interesting than we were intending.

Original intention was to simulate a series of independent episodes, but episodes are interdependent due to variations in line length from one episode to the next.

Page 40: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Retirement Planning Fail

Among US 55-64 year old

62% have retirement assets

median savings for those who have assets: $42k

Among Canadian 55-64 year olds

81% have retirement assets (RRSP or EEP)

median savings for those who have assets: $245k

24% contributed to RRSP in 2011

Pre-retirement defection in the US

For every $1 contributed to the accounts of savers under age 55, $0.40 simultaneously flows out of the 401(k)/IRA system, not counting loans (Argento, Bryant, & Sabelhaus, 2014)

statcan.gc.ca

cbc.ca

National Institute onRetirement Security

Page 41: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Optimization Problem

Formalization of delayed gratification task

μe: reward for ending early

κ: relative magnitude of delayed reward

τ: wait time for delayed reward

η: expected lottery payout, Σρ(t)ω(t) = η

Given an agent with discount rate γ and grit σ, what is the lottery L = {ρ(t), ω(t): t = 1 …τ} that maximizes agent’s temptation resistance?

Page 42: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Experiment

short vs. long: 1.0 vs. 1.5 points per time step

controlcondition

bonuscondition

Page 43: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian
Page 44: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Hazard Function For Each Line Length

Defection increases with line length

Finishing line effect

Seems like fewer defections in bonus condition

Page 45: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Issue With The Game

Intention was to simulate a series of independent episodes but they are interdependent because

time limitation

information provided about next episode’s line length

Page 46: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Optimization Problem

Formalization of delayed gratification task

μc: reward for continuing

μe: reward for ending early

κ: relative magnitude of delayed reward

τ: wait time for delayed reward

η: expected lottery payout, Σρ(t)ω(t) = η

Given an agent with discount rate γ and grit σ, what is the lottery L = {ρ(t), ω(t): t = 1 …τ} that maximizes agent’s temptation resistance?

Simplified

ρ(i) = ρ(j) = ρω(i)= ω(j) = η/ρ

Page 47: Overcoming Temptation: Theory and Practice Michael Mozer Computer Science Dept. and Institute of Cognitive Science University of Colorado Boulder Adrian

Varying η and ρ

γ = .92σ = .10