designing a safe motivational system for intelligent machines

28
Designing a Safe Motivational System for Intelligent Machines Mark R. Waser

Upload: stacey

Post on 18-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Designing a Safe Motivational System for Intelligent Machines. Mark R. Waser. Inflammatory Statements. >Human intelligence REQUIRES ethics All humans want the same things Ethics are universal Ethics are SIMPLE in concept Difference in power is irrelevant (to ethics) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Designing a Safe Motivational System for Intelligent Machines

Designing aSafe Motivational System

for Intelligent Machines

Mark R. Waser

Page 2: Designing a Safe Motivational System for Intelligent Machines

Inflammatory Statements

>Human intelligence REQUIRES ethics All humans want the same things Ethics are universal Ethics are SIMPLE in concept Difference in power is irrelevant (to ethics) Evolution has “designed” you todisagree with the above five points

Page 3: Designing a Safe Motivational System for Intelligent Machines

Definitions

• Human – goal-directed entity

• Goals – a destination OR a direction

• Restrictions – conditional overriding goals

• Motivation – incentive to move

• Actions – determined by goals + motivations

• Path (or direction)

• Preferences, Rules-of-Thumb and Defaults

• Ethics (the *goal* includes the path)

• Safety

(disguised assumptions)

Page 4: Designing a Safe Motivational System for Intelligent Machines

http://www.markzug.com/

1. A robot may not injure a human

being or, through inaction, allow a

human being to come to harm.

2. A robot must obey orders given

to it by human beings except where

such orders would conflict with the

First Law.

3. A robot must protect its own

existence as long as such protection

does not conflict with the First or

Second Law.

Asimov's 3 Laws:

Page 5: Designing a Safe Motivational System for Intelligent Machines

Four Possible Scenarios

• Asimov’s early robots (little foresight, helpful but easily confused or conflicted)

• Immediate shutdown/suicide

• VIKI from the movie “I, Robot” (generalize to “bubble-wrapping” humanity)

• Asimov’s late robots (further generalize to self-exile with invisible continuing assistance)

Page 6: Designing a Safe Motivational System for Intelligent Machines

SIAI’s Definitions

• Friendly AI - an AI that takes actions that are, on the whole, beneficial to humans and humanity; benevolent rather than malevolent; nice rather than hostile

• Coherent Extrapolated Volition of Humanity (CEV) - “In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together.”

------------------

goals & motivations

Page 7: Designing a Safe Motivational System for Intelligent Machines

SIAI’s First Law

An AI must be

beneficial to humans and humanity

(benevolent rather than malevolent)

But . . .

What is beneficial?

What are humans and humanity?

Page 8: Designing a Safe Motivational System for Intelligent Machines

Values (good/bad) are *entirely* derivative/relative with respect to some goal (CEV)

Value = f(x, y)where

x is a set of circumstances (world state), y is a set of (proposed) actions, and

f is an evaluation of how well your goal is advanced

Value = f(x, y, t, e)

t is the time point at which goal progress is judgede is the set of entities which the goal covers

Value Formula

Page 9: Designing a Safe Motivational System for Intelligent Machines

Questions

• Is this moral relativism?

• Are values complex?

• Must our goal (CEV) be complex?

Page 10: Designing a Safe Motivational System for Intelligent Machines

Copernicus!

Page 11: Designing a Safe Motivational System for Intelligent Machines

Assume that beneficial was a relatively simple formula (like z2+c)

Mandelbrot set

Page 12: Designing a Safe Motivational System for Intelligent Machines

Assume further that we are trying to determine that formula (beneficial) by looking at the results

(color) one example (pixel) at a time

Color Illusions

Page 13: Designing a Safe Motivational System for Intelligent Machines

Current Situation of Ethics

• Two formulas (beneficial to humans and humanity & beneficial to me)

• As long as you aren’t caught, all the incentive is to shade towards the second

• Evolution has “designed” humans to be able to shade to the second (Trivers, Hauser)

• Further, for very intelligent people, it is far more advantageous for ethics to be complex

Page 14: Designing a Safe Motivational System for Intelligent Machines

Definition

Ethics *IS*

What is beneficial for the community

OR

What maximizes cooperation

Page 15: Designing a Safe Motivational System for Intelligent Machines

Goal(s)/Omohundro Drives

1. AIs will want to self-improve

2. AIs will want to be rational

3. AIs will try to preserve their utility

4. AIs will try to prevent counterfeit utility

5. AIs will be self-protective

6. AIs will want to acquire resources and use them efficiently

Page 16: Designing a Safe Motivational System for Intelligent Machines

“Without explicit goals to the contrary, AIs are likely to behave like human sociopaths

in their pursuit of resources.”

7. GDEs will want cooperation and to be part of a community

8. GDEs will want FREEDOM!

GDEs-----

Page 17: Designing a Safe Motivational System for Intelligent Machines

Humans . . . • Are classified as obligatorily gregarious because we come

from a long lineage for which life in groups is not an option but a survival strategy (Frans de Waal, 2006)

• Evolved to be extremely social because mass cooperation, in the form of community, is the best way to survive and thrive

• Have empathy not only because it helps to understand and predict the actions of others but, more importantly, prevents us from doing anti-social things that will inevitably hurt us in the long run (although we generally won’t believe this)

• Have not yet evolved a far-sighted rationality where the “rational” conscious mind is capable of competently making the correct social/community choices when deprived of our subconscious “sense of morality”

Page 18: Designing a Safe Motivational System for Intelligent Machines

Circles of Morality

Relationships and Loyalty

/Moral Sombrero

Page 19: Designing a Safe Motivational System for Intelligent Machines

• Friendly Entity (“Friendly”) - an entity with goals and motivations that are, on the whole, beneficial to humans and humanity; benevolent rather than malevolent

• Friendly Entity (“Friendly”) - an entity with goals and motivations that are, on the whole, beneficial to the community of Friendlies (i.e. the set of all Friendlies, known or unknown); benevolent rather than malevolent

Redefining Friendly Entity

Page 20: Designing a Safe Motivational System for Intelligent Machines

Friendliness’s First Law

An entity must be

beneficial to the community of Friendlies

(benevolent rather than malevolent)

But . . .

What is beneficial?

What are humans and humanity?--------------------------------

Page 21: Designing a Safe Motivational System for Intelligent Machines

What is beneficial?• Cooperation (minimize conflicts & frictions)

• Omohundro drives • Increasing the size of the community (both

growing and preventing defection)

• To meet the needs/goals of each member of the community better than any alternative

(as judged by them -- without interference or gaming)

Page 22: Designing a Safe Motivational System for Intelligent Machines

What is harmful?

• Blocking/Perverting Omohundro Drives• Lying• Single-goaled entities• Over-optimization (achievable top level goals)• The fact that we do not maintain our top-level

goal and have not yet evolved a far-sighted rationality where the “rational” conscious mind is capable of competently making the correct social/community choices when deprived of our “moral sense”

Page 23: Designing a Safe Motivational System for Intelligent Machines

OPTIMAL

This makes ethics much more complex because it includes the cultural history

The anti-gaming drive to maintain utility adds friction/resistance to the discussion of ethics

community’s sense of what

is correct (ethical)

<

Page 24: Designing a Safe Motivational System for Intelligent Machines

ONE non-organ donor

SIX dying patients>

+avoiding a

defensive arms raceCredit to: Eric Baum What Is Thought?

Page 25: Designing a Safe Motivational System for Intelligent Machines

Triangle

stimuli implement moral rules of thumb

LOGICAL VIEW GOAL(S)

ACTIONS

CEV

Page 26: Designing a Safe Motivational System for Intelligent Machines

Sloman’s architecture

for ahuman-like agent

(Sloman 1999)

Page 27: Designing a Safe Motivational System for Intelligent Machines

Inflammatory Statements >Human intelligence REQUIRES ethics All humans want the same things Ethics are universal Ethics are SIMPLE in concept Difference in power is irrelevant (to ethics) Evolution has “designed” you todisagree with the above five points

Page 28: Designing a Safe Motivational System for Intelligent Machines

Next . . . .

Copies of this powerpoint available from [email protected]

CEV Candidate #1:

We wish thatall entities

were Friendlies

Necessary? Sufficient/Complete? Possible?