psychology

PsychologyBHP315111

MODULE 2 –

OperantConditioning

Operant Conditioning• While Classical Conditioning is important and useful, it

can’t explain every learned behaviour. It can’t explain voluntary behaviour; behaviour that we can control.

• Behaviours can occur without a stimulus – reading a book, playing tennis, robbing a bank.

• Much of our learning occurs by trial and error – if we do something and people like it, usually we do it again (wearing clothes – compliment. Wear again!)

Operant Conditioning• We all make adjustments to our behaviour according to the

outcomes or consequences it produces. Operant Conditioning is the learning that takes place as a result of these consequences.

• Trial and error learning is a part of operant conditioning, and is as its name suggests. It describes our attempts to learn, or to solve a problem, by trying alternative possibilities until a correct one is achieved. Once learned, the behaviour will usually be performed quickly and with fewer errors.

Trial and Error learning• Trial and error learning is also known as

instrumental learning, as the individual is instrumental in learning the correct response. More recently, both instrumental learning and trial and error learning have been referred to as operant conditioning.

• The individual operates on the environment to solve a problem. - iphone

Operant conditioning• Operant conditioning involves:– Motivation (a desire to attain a goal)

– Exploration (an increase in activity)

– Incorrect and correct responses

– Reward (the correct response is made and rewarded)

• Receiving a reward of some kind leads to the repeated performance of the correct response, strengthening the association between the behaviour and the outcome.

Thorndike• Edward Thorndike (1874 – 1949) carried out the first studies

of operant conditioning. He studied animal intelligence at the same time Pavlov studied his dogs.

• Thorndike put a hungry cat in a ‘puzzle box’ and placed a piece of fish outside the box where it could be seen/smelt, but it was just outside of reach. The cat had to learn to escape from the box by operating a latch to release a door on the side of the box. It had to push down on a paddle inside the box. Thorndike measured the time it took the cat to escape.

Thorndike’s cats• Firstly, the cat tried numerous ineffective strategies (trial and

error). It tried to squeeze through the bars or stretch its paws to reach the food. It clawed and bit the bars for ten minutes.

• Eventually it accidentally pushed the lever and the door opened. The cat was rewarded with both its release and the fish treat.

• When the cat was put back in the box it went through another series of incorrect responses before eventually pushing the lever again, and again being rewarded with the food.

Thorndike’s cats• The cat became progressively quicker at escaping from the box

and after about seven trials, it would go directly to the lever, push it, and get out immediately.

• Pushing the lever was no longer a random pattern of behaviour – it was a deliberate response that the cat had learnt due to the consequences of making the response. When the correct response was followed by a reward (escape and food), the cat demonstrated this behaviour with increasing frequency.

• This study was the first experiment that led to Thorndike calling this ‘trial and error learning’

Law of effectThis experiment led Thorndike to develop the law of effect. This states that a behaviour that is followed by ‘satisfying’ consequences is more likely to occur and a behaviour that is followed by ‘annoying’ consequences is less likely to occur.

http://www.youtube.com/watch?v=Vk6H7Ukp6To

Learning Activity 1 – pg 465 (Grivas)



Operant Conditioning• Thorndike definitely started this line of thinking, but

it didn’t become ‘operant conditioning’ until Burrhus Skinner came along. He referred to the responses observed in trial and error learning as operants. An operant is a response (or set of responses) that occurs and acts on the environment to produce some kind of effect. An operant is therefore a response of behaviour that generates consequences.

Operant conditioning• This type of conditioning is therefore based on the

principle that an organism will tend to repeat behaviours (operants) that have desirable consequences (treat), or that will enable it to avoid undesirable consequences (detention).

• Organisms will tend not to repeat behaviours that have undesirable consequences.

Skinner• Burrhys Frederic Skinner (B.F. Skinner) was inspired by

Thorndike. In the 1930s he began his own experiments and coined the term operant conditioning. He did this to show that organisms learn to operate on the environment to produce desired consequences.

• The cat’s behaviour had an effect on its environment – it opened the door. The fact that the consequences of the cat’s action were positive, increased the likelihood of the response happening again.

Respondent conditioning• A student might behave cooperatively if this behaviour

operates on the environment to produce a desired consequences (early dismissal). – Being cooperative = operant response. Conditioned by the

early dismissal.

• Skinner coined the term respondent conditioning for what we know as classical conditioning, as their behaviour doesn’t have any environmental consequences; the food simply comes!

Skinner’s beliefs…• Skinner believed that all behaviour could be explained

by the relationships between the behaviour, its antecedents (events that come before), and its consequences. Any behaviour that is followed by a consequences will change in strength and frequency, depending on the nature of that consequence.– Strength: become more, or less, established

– Frequency: Occur more, or less, often

– Consequence: Reward or punishment

A Skinner Box• Skinner created an apparatus called a Skinner box. In

this box, animals learn to make a particular response for which the consequences can be controlled by the researcher. There is lever that delivers food, lights and buzzers, and some with floors that give an electric shock.

• This was connected to a recorder to indicate how often each response is made (frequency) and the rate of the response (speed).

A Skinner Box• He mostly used rats, although later on he moved to pigeons.

Rats were conditioned to press the lever, pigeons were conditioned to peck at a disk.

• Rat would scurry around randomly and accidentally press the lever. After many repetitions, the rat became less random and eventually pressed the lever consistently. The rat was rewarded for the correct response.

• Skinner referred to different types of rewards as reinforcers.

• http://www.youtube.com/watch?v=jDLNHwquiAc

http://www.youtube.com/watch?v=jDLNHwquiAc

http://www.youtube.com/watch?v=jDLNHwquiAc

Reinforcement• When you are training your dog to shake hands and you

give them a biscuit, pat on the head or say ‘good dog’ when it behaves the way you want, you are using reinforcement.

• If you are using an umbrella to stop yourself from getting wet, that is another kind of reinforcement.

• Reinforcement may mean receiving a pleasant stimulus (biscuit) or escaping an unpleasant stimulus (avoiding getting wet).

Reinforcers• A reinforcer is an object or event that changes the

probability that an operant behaviour will occur again. ‘Reinforcer’ is often used interchangeably with ‘reward’.

• Reinforcers are only called ‘reinforcers’, if they actually reinforce behaviour. Eating chocolate is a pleasurable experience, but its only a reinforcer if it promotes or strengthens a particular response.

Schedules of reinforcement• Reinforcement can happen after every correct

response (continuous schedule), or only happen on some occasions where there is a correct response (partial reinforcement schedule).

• In the early stages of conditioning, learning is usually most rapid if the correct response is reinforced every time it occurs – continuous reinforcement.

Schedules of reinforcement• Once a correct response consistently occurs, a different

reinforcement schedule can be used to maintain the response – reinforce only some correct responses – partial reinforcement.

• Skinner ran out of pellets accidentally in his Skinner’s box

experiment, so he was forced to give pellets less often. It was found that this partial reinforcement schedule produces stronger responses and is less likely to weaken, than those maintained by continuous reinforcement.

Schedules of reinforcement• Schedule of reinforcement refers to the frequency

(how many times) and the manner in which a desired response is reinforced. – Reinforcement after a certain number of correct responses

(ratio), reinforcement after a certain amount of time has elapsed (interval)

– Reinforcement on a regular basis – every 6th time, every 30 seconds (fixed), reinforcement on an unpredictable rate (variable). Read pg 471-472 Grivas

Schedules of reinforcement• Fixed-ratio schedule – 1:10 – every 10 correct responses in

succession will equal a reinforcer

• Variable-ratio schedule – 1:10, then 1:5, then 1:12 – the reinforcement is given after a different number of responses, but always equates to a mean number (ratio)

• Fixed-interval schedule – Fixed period of time. First correct response after a period of 20 seconds.

• Variable-interval schedule – Irregular periods of time, but always equates to a mean period of time (30 seconds).

Positive reinforcement• Some examples:– The food pellet in Skinner’s box (a hungry rat)

– A in an exam (someone who studies conscientiously)

– A favourite book to read (a girl on a potty)

– Prize (competing at something)

• Provide a satisfying consequence (reward), increases the likelihood of a desired response.

• Doug Seus and Bart the Bear (ForTheGrizzly) http://www.youtube.com/watch?v=Af3G8aGk62U

http://www.youtube.com/watch?v=Af3G8aGk62U

http://www.youtube.com/watch?v=Af3G8aGk62U

Negative reinforcement• Some examples:– Umbrella (avoid wet clothes) panadol (avoid headache)

– Lever (avoid mild shock) turn off tv (avoid scary movie)

• A negative reinforcer is any unpleasant or aversive stimulus that, when removed, strengthens the likelihood of a desired response. The removal of something is the negative reinforcer. If there’s a chance of removing the unwanted behaviour, they are more likely to make the correct response.

Positive or negative reinforcement• The important distinction is:– Positive reinforcers are given

– Negative reinforcers are removed or avoided

• Both procedures lead to desirable outcomes and each procedure strengthens or reinforcers the behaviour that is desired.

Primary and Secondary • Primary reinforcers are things such as food, water or sex that

is satisfying and requires no learning on the part of the subject to become pleasurable.– Make yourself study for 2 hours before rewarding yourself

with chocolate (brain scans)

• Secondary reinforcers is any stimulus that has acquired its reinforcing power through experience – these are learned, by being paired with primary reinforcers or other secondary reinforcers.– Coupons, money, grades, praise.

Punishment - Positive• If you go faster than the speed limit = fine. This is

intended for you to reduce the speeding behaviour in the future. If you continue to speed = disqualification.

• This is an example of punishment of the unwanted behaviour with the intention of reducing or eliminating the behaviour.

• Punishment is the delivery of an unpleasant stimulus following a response (smack, fine, growl, slap, shock).

Response Cost – negative punish• Response cost is the removal of a pleasant stimulus

following a response (no iphone). This weakens or decreases the likelihood of an undesirable response recurring, by removing something pleasant.

• The difference?– Positive punishment: introduction of an unpleasant stimulus

following an undesirable response.

– Response cost: withdrawal of a pleasant stimulus following an undesirable response.

Reinforcement• Reinforcement is intended to increase the likelihood of a

behaviour being repeated and punishment is intended to decrease the likelihood of behaviour being repeated.

• In O.C. what happens after the desired response is performed is very important in terms of the strength of learning, and the rate of which it occurs. The time between the response and the consequence, as well as the appropriateness of the consequences used are important in determining the effectiveness of learning.

Order of presentation• To use reinforcement and punishment effectively, you

must always reinforce after the desired response occurs, NEVER before.

• If you reinforce someone’s use of the word ‘I’ in conversation with a smile, they are more likely to use it. If you smile before they say ‘I’, they are less likely to use it. Once you’ve reinforced this, if you remove your reinforcement (smile) they will make statements less often.

Timing• Reinforcement and punishment are most effective

when given immediately after the response has occurred. This helps to ensure that they associate the response with the reinforcer.

• It also influences the strength of the response. If there is a considerable delay, learning will generally be very slow to progress and may not even occur at all!

• Delay of some reinforcers – promise of it to come.

Appropriateness• For any stimulus to be a reinforcer, it must actually

provide a pleasing or satisfying consequence (reward). A free spot in a University course is not going to be a good reward for a student who wants to be a mechanic!

• Sometimes you can’t tell if it will be an appropriate reward until you have given it. It also can’t be assumed that what works in one situation will work in another.

Appropriateness• You also need to make sure the punishment is

appropriate. It must provide a consequence that is unpleasant and likely to decrease the undesirable behaviour.

• An inappropriate punishment can have the opposite effect – attention-starved Grade 8 student may talk in class and get spoken to verbally. This may make him act up more, as he has got the attention he wanted.

Key processes• The same key processes are in both classical and

operant conditioning– Acquisition

– Extinction

– Stimulus generalisation

– Stimulus discrimination

– Spontaneous recovery

• The do differ slightly in how they occur, however.

Acquisition• This is where the overall learning process is

established, where the specific response is established. The difference to classical conditioning is that behaviours are often more complex in operant conditioning, as classical conditioning often only deals in reflex, involuntary responses.

• The speed of which the response is established is dependant on what schedule of reinforcement is used.

Shaping• Shaping is a procedure where reinforcement is given for

any response that successively approximates and ultimately leads to the final desired response, or target behaviour.

• It’s also known as the method of successive approximations.

• Read p215 Plotnik

• http://www.youtube.com/watch?feature=player_embedded&v=teLoNYvOf90

http://www.youtube.com/watch?feature=player_embedded&v=teLoNYvOf90



Shaping• Skinner also did this with the pigeon to get it to do a 360 degree

turn. First he reinforced with a food pellet every time the pigeon moved slightly to the left. Once this response has been conditioned, Skinner would only reinforce when the pigeon turned a little further left, and so on until you get to a full turn.

• By limiting reinforcement only to those responses that gradually edged towards the target behaviour, Skinner could condition the pigeon to complete circles regularly.

• By using this method you never reinforce for previous behaviours.

Extinction• Extinction is the gradual decrease in the strength

or rate of a conditioned response following consistent non-reinforcement of the response.

• Extinction is said to have occurred when a conditioned response is no longer present.

• Basically, you stop reinforcing the behaviour. Stop giving pellets when they press the lever.

Extinction• This could be similar to partial reinforcement though?

• Extinction is less likely to occur when partial reinforcement is used. The uncertainty of the reinforcement leads to a greater tendency for the response to continue.

• This is why gambling is a hard addiction to break; the gambler is highly motivated to win, knows that there’s a chance of a big reward, and has an expectation that the reward will occur sooner or later.

Spontaneous Recovery• Extinction is not often permanent in O.C, as in C.C.

• After you think some behaviour has become extinct, spontaneous recovery can occur and the organism will once again show the response without any reinforcement. The response is likely to be weaker though, and wont last long.

• The longer after the extinction, the stronger the response, mostly.

Stimulus Generalisation• This occurs when the correct response is made to

another stimulus that is similar, but it usually occurs at a reduced level.

• Skinner’s pigeon pecked at different coloured lights, even though the original light it pecked at was green. This response, however, was less frequent, than the original behaviour.

• We do this in everyday life…starter’s gun (car backfire)

Stimulus Discrimination• Skinner also taught his pigeon to only peck at the green

light, not any other light. The light would change colours, and every time the green light came on, it would peck it and be reinforced, but ONLY for the green light.

• Sniffer dogs have to have stimulus discrimination. They use highly specialised O.C. with animals that already have a highly developed sense of smell (olfaction).

• When we change our behaviour for others, we are using stimulus discrimination – underwear, shyness, outgoing.

Learned helplessness• Martin Seligman completed a study to do with

learned helplessness. He harnessed dogs so that they couldn’t escape electric shocks.

• At first they whimpered, howled and tried to escape the shocks, but eventually they gave up and laid on the floor without struggling, showing human signs of depression (psychological stress responses).

Learned helplessness• The next day he placed the dogs in a shuttlebox where

they could easily escape, but they made no effort to escape and failed to learn even when they occasionally did escape.

• The dogs had come to expect that they could not get away; they had learned to be helpless.

• Learned helplessness consists of the expectancy that one cannot escape aversive events and the motivational and learning deficits that result from this belief.

Two-factor Learning• An ice-cream truck approaches with its bell ringing. A

boy named Justin hears the bell and thinks about ice-cream. As he does, his mouth waters. Justin runs to the truck, buys an ice cream and eats it.

• What kind of learning is this? – Both!

• In the real world, classical and operant are often inter-twined. This is called two-factor learning.

Two-factor Learning• Justin’s behaviour reflects both kinds of learning. – Classical conditioning – Justin’s salivates each time he

hears the bell

– Operant conditioning – Justin runs to the truck to buy and eat ice-cream (positive reinforcement)

• The differences is that Justin’s involuntary responses are altered by Classical conditioning, and Justin’s voluntary responses is shaped by Operant conditioning.

YouTube• http://

www.youtube.com/watch?feature=player_embedded&v=WQQheOYRQlM Classical & Operant

• http://www.youtube.com/watch?feature=player_embedded&v=B_9ZZaPDtPk Operant

• http://www.youtube.com/watch?feature=player_embedded&v=99sWFCNoJTE Classical & Operant

http://www.youtube.com/watch?feature=player_embedded&v=WQQheOYRQlM



http://www.youtube.com/watch?feature=player_embedded&v=B_9ZZaPDtPk



http://www.youtube.com/watch?feature=player_embedded&v=99sWFCNoJTE



REVIEW!

• Trial and error learning• Instrumental learning• Operant Conditioning• Thorndike & Law of effect• Skinner’s box• Schedules of reinforcement• Positive reinforcement• Negative reinforcement• Punishment

• Order of presentation• Timing• Appropriateness

• Shaping• Acquisition & Extinction• Spontaneous recovery• Stimulus generalisation• Stimulus discrimination• Learned helplessness• Two-factor learning

References• Westen, D., Burton, L., Kowalski, R. (2006)

Psychology. Queensland, Australia: John Wiley & Sons Australia, Ltd.

• Cribb, B., Gridley, H., McKersie, C., Kennedy, G., Anin, N., Rice, J. (2004) Essential VCE Psychology. Cambridge, UK: Cambridge University Press.

• Plotnik, R. (2002) Introduction to Psychology. (6th ed.) CA, USA: Wadsworth Group.

psychology

Documents

voluntary behaviour

instrumental learning

hungry cat

learned behaviour

thorndikes catsthe cat

puzzle box

correct responsesreward

error learningtrial