can winograd schemas replace turing test for defining human-level ai?

2
Can Winograd Schemas Replace Turing Test for Defining Human-Level AI? By Evan Ackerman Posted 29 Jul 2014 | 16:50 GMT Illustration: Getty Images Earlier this year, as part of a contest organized by a U.K. university. Almost immediately, it became obvious that rather than proving that a piece of software had achieved human-level intelligence, all that this particular competition had shown was that a piece of software had gotten fairly adept at fooling humans into thinking that they were talking to another human, which is very different from a measure of the ability to "think." (In fact, some observers .) a chatbot called Eugene Goostman "beat" a Turing Test for artificial intelligence ( http://spectrum.ieee.org/tech- talk/robotics/artificial-intelligence/virtual-tween-passes-turing-test ) didn't think the bot was very clever at all ( http://www.scottaaronson.com/blog/?p=1858 ) Clearly, a better test is needed, and we may have one, in the form of a type of question called a Winograd schema that's easy for a human to answer, but a serious challenge for a computer. The problem with the Turing Test is that it's not really a test of whether an artificial intelligence program is capable of thinking: it's a test of whether an AI program can fool a human. And humans are really, really dumb. We fall for all kinds of tricks that a well-programmed AI can use to convince us that we're talking to a real person who think. can For example, the Eugene Goostman chatbot pretends to be a 13-year-old boy, because 13-year-old boys are often erratic idiots (I've been one), and that will excuse many circumstances in which the . So really, the chat bot is not intelligent at all—it's just really good at making you overlook the times when it's stupid, while emphasizing the periodic interactions when its algorithm knows how to answer the questions that you ask it. AI simply fails ( http://www.scottaaronson.com/blog/?p=1858 ) Conceptually, the Turing Test is still valid, but we need a better practical process for testing artificial intelligence. A new AI contest, sponsored by Nuance Communications and CommonsenseReasoning.org, is offering a US $25,000 prize to an AI that can successfully answer what are called Winograd schemas, after , a professor of computer science at Stanford University. named ( http://cs.nyu.edu/davise/papers/WSKR2012.pdf ) Terry Winograd ( http://hci.stanford.edu/winograd/ ) Here's an example of one: The trophy doesn't fit in the brown suitcase because it is too big. What is too big? The trophy, obviously. But it's obvious. It's obvious to us, because we know all about trophies and suitcases. We don't even have to "think" about it; it's almost intuitive. But for a computer program, it's unclear what the "it" refers to. To be successful at answering a question like this, an artificial intelligence must have some background knowledge and the ability to reason. not Here's another one: Jim comforted Kevin because he was so upset. Who was upset? These are the rules the Winograd schemas have to follow: 1. Two parties are mentioned in a sentence by noun phrases. They can be two males, two females, two inanimate objects or two groups of people or objects. 2. A pronoun or possessive adjective is used in the sentence in reference to one of the parties, but is also of the right sort for the second party. In the case of males, it is “he/him/his”; for females, it is “she/her/her”; for inanimate object it is “it/it/its”; and for groups it is “they/them/their.” 3. The question involves determining the referent of the pronoun or possessive adjective. Answer 0 is always the first party mentioned in the sentence (but repeated from the sentence for clarity), and Answer 1 is the second party. 4. There is a word (called the special word) that appears in the sentence and possibly the question. When it is replaced by another word (called the alternate word), everything still makes perfect sense, but the answer changes. For more details (including some examples of ways in which certain Winograd schemas can include clues that an AI could exploit), is easy to understand and well worth reading. In fact, it's so well worth reading that I'm going to steal their conclusion and post it here: this paper ( http://www.aaai.org/ocs/index.php/KR/KR12/paper/view/4492/4924 ) Can Winograd Schemas Replace Turing Test for D... http://spectrum.ieee.org/automaton/robotics/artifi... 1 of 2 08/05/2014 09:00 PM

Upload: theo

Post on 03-Oct-2015

215 views

Category:

Documents


1 download

DESCRIPTION

Can Winograd Schemas Replace Turing Test for Defining Human-Level AI?, spectrum.ieee.org, Evan Ackerman

TRANSCRIPT

  • Can Winograd Schemas Replace Turing Test for Defining Human-Level AI?By Evan AckermanPosted 29 Jul 2014 | 16:50 GMT

    Illustration: Getty Images

    Earlier this year,

    aspart of a contest organized by a U.K. university. Almost immediately, itbecame obvious that rather than proving that a piece of software hadachieved human-level intelligence, all that this particular competitionhad shown was that a piece of software had gotten fairly adept atfooling humans into thinking that they were talking to another human,which is very dierent from a measure of the ability to "think." (In fact,some observers

    .)

    a chatbot called Eugene Goostman "beat" a TuringTest for articial intelligence (http://spectrum.ieee.org/tech-talk/robotics/articial-intelligence/virtual-tween-passes-turing-test)

    didn't think the bot was very clever at all(http://www.scottaaronson.com/blog/?p=1858)Clearly, a better test is needed, and we may have one, in the form of atype of question called a Winograd schema that's easy for a human toanswer, but a serious challenge for a computer.The problem with the Turing Test is that it's not really a test of whetheran articial intelligence program is capable of thinking: it's a test ofwhether an AI program can fool a human. And humans are really, reallydumb. We fall for all kinds of tricks that a well-programmed AI can useto convince us that we're talking to a real person who think.canFor example, the Eugene Goostman chatbot pretends to be a13-year-old boy, because 13-year-old boys are often erratic idiots (I'vebeen one), and that will excuse many circumstances in which the . So really, the chat botis not intelligent at allit's just really good at making you overlook the times when it's stupid, while emphasizing the periodic interactions when itsalgorithm knows how to answer the questions that you ask it.

    AI simply fails (http://www.scottaaronson.com/blog/?p=1858)

    Conceptually, the Turing Test is still valid, but we need a better practical process for testing articial intelligence. A new AI contest, sponsored byNuance Communications and CommonsenseReasoning.org, is oering a US $25,000 prize to an AI that can successfully answer what are calledWinograd schemas, after , a professor ofcomputer science at Stanford University.

    named (http://cs.nyu.edu/davise/papers/WSKR2012.pdf) Terry Winograd (http://hci.stanford.edu/winograd/)

    Here's an example of one:The trophy doesn't t in the brown suitcase because it is too big. What is too big?The trophy, obviously. But it's obvious. It's obvious to us, because we know all about trophies and suitcases. We don't even have to "think" about it;it's almost intuitive. But for a computer program, it's unclear what the "it" refers to. To be successful at answering a question like this, an articialintelligence must have some background knowledge and the ability to reason.

    not

    Here's another one:Jim comforted Kevin because he was so upset. Who was upset?These are the rules the Winograd schemas have to follow:

    1. Two parties are mentioned in a sentence by noun phrases. They can be two males, two females, two inanimate objects or two groups ofpeople or objects.2. A pronoun or possessive adjective is used in the sentence in reference to one of the parties, but is also of the right sort for the secondparty. In the case of males, it is he/him/his; for females, it is she/her/her; for inanimate object it is it/it/its; and for groups it isthey/them/their.3. The question involves determining the referent of the pronoun or possessive adjective. Answer 0 is always the rst party mentioned inthe sentence (but repeated from the sentence for clarity), and Answer 1 is the second party.4. There is a word (called the special word) that appears in the sentence and possibly the question. When it is replaced by another word(called the alternate word), everything still makes perfect sense, but the answer changes.

    For more details (including some examples of ways in which certain Winograd schemas can include clues that an AI could exploit), is easy to understand and well worth reading. In fact, it's so well worth reading

    that I'm going to steal their conclusion and post it here:

    this paper(http://www.aaai.org/ocs/index.php/KR/KR12/paper/view/4492/4924)

    Can Winograd Schemas Replace Turing Test for D... http://spectrum.ieee.org/automaton/robotics/arti...

    1 of 2 08/05/2014 09:00 PM

  • Like Turing, we believe that getting the behaviour right is the primary concern in developing an articially intelligent system. We furtheragree that English comprehension in the broadest sense is an excellent indicator of intelligent behaviour. Where we have a slightdisagreement with Turing is whether a free-form conversation in English is the right vehicle. Our WS [Winograd schemas] challenge doesnot allow a subject to hide behind a smokescreen of verbal tricks, playfulness, or canned responses. Assuming a subject is willing to takea WS test at all, much will be learned quite unambiguously about the subject in a few minutes. What we have proposed here is certainlyless demanding than an intelligent conversation about sonnets (say), as imagined by Turing; it does, however, oer a test challenge that isless subject to abuse.

    It's worth pointing out that we're a bit skeptical that you can really "test" for human-level AI in this manner. With a highly structured test with specicquestions and answers that are unambiguously right or wrong, there's a lot of potential for a clever (but not thinking) AI to nd ways to exploit it.The question, then, becomes whether "intelligence" is simply a technological system that is suciently complex to correctly answer a series ofquestions that a slightly more complex biological system (us) has arbitrarily decided constitute a measurement of what thinking requires.It seems inevitable that at some point, we'll have to say that true intelligence is as well as thinking, and "Blade Runner" is way ahead of us:feeling

    [ ] via [ ]

    Winograd Schema Challenge (http://commonsensereasoning.org/winograd.html) BusinessWire (http://www.businesswire.com/news/home/20140728005207/en/Nuance-Announces-Winograd-Schema-Challenge-Advance-Articial)

    Can Winograd Schemas Replace Turing Test for D... http://spectrum.ieee.org/automaton/robotics/arti...

    2 of 2 08/05/2014 09:00 PM