session 2 data collection
DESCRIPTION
data collectionTRANSCRIPT
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 1/50
Dr. Rohit Joshi, IIM Shillong
Data, Models and DecisionsPGP 13-15
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 2/50
Why Study Statistics?
Decision Makers Use Statistics To:
Present and describe business data and information
roerl! Dra" conclusions about large oulations, using
information collected from samles
Make reliable forecasts about a business acti#it!
Imro#e business rocesses
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 3/50
Why Collect Data? $ marketing research anal!st needs to assess the
effecti#eness of a ne" tele#ision ad#ertisement.
$ harmaceutical manufacturer needs to determine "hethera ne" drug is more effecti#e than those currentl! in use.
$n oerations manager "ants to monitor a manufacturing rocess to find out "hether the %ualit! of roduct beingmanufactured is conforming to coman! standards.
$n auditor "ants to re#ie" the financial transactions of acoman! in order to determine "hether the coman! is incomliance "ith generall! acceted accounting rinciles.
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 4/50
Types of Statistics
StatisticsThe branch of mathematics that transforms data
into useful information for decision makers.
Descriptive Statistics
&ollecting, summari'ing, and
describing data
Inferential Statistics
Dra"ing conclusions and(or makingdecisions concerning a oulation
based onl! on samle data
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 5/50
Descriptie Statistics
&ollect data
e). Sur#e!
Present data e). Tables and grahs
&haracteri'e data
e). Samle mean * i X
n
∑
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 6/50
!nferential Statistics
+stimation
e). +stimate the
oulation mean "eight
using the samle a#erage"eight
!othesis testing
e). Test the claim that
the oulation a#erage"eight is - /g
Dra"ing conclusions and(or making decisions
concerning a population based on sample results.
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 7/50
"asic #oca$ulary of StatisticsVARIABLE$ variable is a characteristic of an item or indi#idual.
DATA
Data are the different #alues associated "ith a #ariable.
POPULATION$ population consists of all the items or indi#iduals about "hich !ou "ant to dra" a
conclusion.
SAMPLE
$ sample is the ortion of a oulation selected for anal!sis.
PARAMETER$ parameter is a numerical measure that describes a characteristic of a oulation.
STATISTIC
$ statistic is a numerical measure that describes a characteristic of a samle
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 8/50
Population s% Sa&ple
Population Sample
Measures used to describe the
oulation are called parameters
Measures comuted from
samle data are called statistics
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 9/50
Sources of Data
Primar! Sources: The data collector is the one
using the data for anal!sis Data from a olitical sur#e!
Data collected from an e)eriment
0bser#ed data
Secondar! Sources: The erson erforming data
anal!sis is not the data collector
$nal!'ing census data +)amining data from rint 1ournals or data ublished on
the internet.
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 10/50
Types of #aria$les
Categorical 2%ualitati#e3 #ariables ha#e #alues
that can onl! be laced into categories, such as
4!es5 and 4no.5
Numerical 2%uantitati#e3 #ariables ha#e #alues
that reresent %uantities.
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 11/50
Types of Data
Data
Categorical Numerical
Discrete Continuous
Examples:
Marital Status Political Party Eye Color
(Defined categoriesExamples:
Number of C!ildren Defects per !our
(Counted items
Examples:
"eig!t #oltage
(Measured c!aracteristics
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 12/50
Pro$a$ility
+mirical classic robabilit!6ased on historical data&omuted after erforming the e)eriment 7umber of times an e#ent occurred di#ided b! the number of
trials
0b1ecti#e 88 e#er!one correctl! using the method assigns anidentical robabilit!
Sub1ecti#e robabilit!different indi#iduals ma! 2correctl!3 assign different numeric
robabilities to the same e#ent
Mutuall! +)clusi#e e#ent
&ollecti#el! +)hausti#e e#ent
+%uall! 9ikel! e#ent
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 13/50
'ando& #aria$le $ random #ariable x takes on a defined set
of #alues "ith different robabilities. or e)amle, if !ou roll a die, the outcome is random
2not fi)ed3 and there are - ossible outcomes, each of"hich occur "ith robabilit! one8si)th.
or e)amle, if !ou oll eole about their #oting references, the ercentage of the samle that resonds4;es on Proosition <==5 is a also a random #ariable2the ercentage "ill be slightl! differentl! e#er! time!ou oll3.
Roughl!, robabilit! is ho" fre%uentl! "ee)ect different outcomes to occur if "ereeat the e)eriment o#er and o#er
24fre%uentist5 #ie"3
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 14/50
'ando& aria$les can $e discrete or
continuous
Discrete random #ariables ha#e a countable number ofoutcomes+)amles: Dead(ali#e, dice, counts, etc.
Continuous random #ariables ha#e an infinite continuumof ossible #alues. +)amles: blood ressure, "eight, the seed of a car, the real
numbers from < to -.
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 15/50
Probabilit !unctions$ robabilit! function mas the ossible #alues of
x against their resecti#e robabilities of
occurrence, p(x)
p(x) is a number from = to <.=.
The area under a robabilit! function is al"a!s <.
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 16/50
Discrete e(a&ple) roll of a die
x
p(x)
1/6
1 4 5 62 3
∑ = )all
<P2)3
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 17/50
Pro$a$ility &ass function *p&f+
x p(x)
1 p(x=1)=1/6
2 p(x=2)=1/6
3 p(x=3)=1/6
4 p(x=4)=1
/65 p(x=5)=1
/66 p(x=6)=1
/6<.=
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 18/50
Cu&ulatie distri$ution function
*CD+
x
P(x)
<(-
< > -? @
<(@
<(?
?(@
(-
<.=
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 19/50
Cu&ulatie distri$ution function
x P(x≤A)
1 P(x≤1)=1/6
2 P(x≤2)=2/6
3 P(x≤3)=3/6
4 P(x≤4)=4/6
5 P(x≤5)=5/6
6 P(x≤6)=6/6
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 20/50
Practice Problem:The number of atients seen in the +R in an! gi#en hour is
a random #ariable reresented b! x. The robabilit!
distribution for x is:
x 10 11 12 13 14P(x) .4 .2 .2 .1 .1
ind the robabilit! that in a gi#en hour:
a. e)actl! <> atients arri#e
b. $t least <? atients arri#e
c. $t most << atients arri#e
p(x=14)* .<
p(x≥ 12)* 2.? A .< A.<3 * .>
p(x≤11)* 2.> A.?3 * .-
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 21/50
'eie .uestion 1
If !ou toss a die, "hatBs the robabilit! that !ou
roll a @ or lessC
a. <(-
b. <(@
c. <(?
d. (-e. <.=
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 22/50
'eie .uestion 1
If !ou toss a die, "hatBs the robabilit! that !ou
roll a @ or lessC
a. <(-
b. <(@
c$ %&'
d. (-e. <.=
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 23/50
'eie .uestion /
T"o dice are rolled and the sum of the face
#alues is si)C hat is the robabilit! that at
least one of the dice came u a @C
a. <(
b. ?(@
c. <(?d. (-
e. <.=
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 24/50
'eie .uestion /
T"o dice are rolled and the sum of the face
#alues is si). hat is the robabilit! that at least
one of the dice came u a @C
a$ %&
b. ?(@
c. <(?d. (-
e. <.=
o" can !ou get a - on t"o diceC
<8, 8<, ?8>, >8?, @8@
0ne of these fi#e has a @.
∴<(
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 25/50
Example: Suose "e fli t"o identical coins
simultaneousl!. hat is the robabilit! of obtaining a head
on the first coin 2call e#ent A3 and a head on the secondcoin 2call e#ent B3C
Example: $ card is dra"n from a "ell shuffled ack of
la!ing cards. hat is the robabilit! that it "ill either a
sade or a %ueenCExample: In a DMD class there are <?@ students of "hich
E@ students are males and @= are females. 0f these, @-
males and <F females lan to ma1or in Marketing. $ student
is selected at random from this class and it is found that thisstudent lans to be a Marketing ma1or. hat is the
robabilit! that the student is a maleC
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 26/50
Continuous case
The probabilit !unction that accompanies acontinuous ran"om #ariable is a continuousmathematical !unction that inte$rates to 1. %or e&le' recall the ne$ati#e e&ponential
!unction (in probabilit' this is calle" an)e&ponential "istribution*+:
This function integrates to <:
xe x f −=32
<<==
=
=+=−=+∞
−+∞
−∫ x x ee
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 27/50
or e)amle, the robabilit! of x falling "ithin < to ?:
x
p(x)=e-x
<
< ?
&linical e)amle: Sur#i#al
times after lung translant ma!
roughl! follo" an e)onential
function.
Then, the robabilit! that a atient "ill die in the second
!ear after surger! 2bet"een
!ears < and ?3 is ?@G.
?@.@-F.<@.?3)P2< <??
<
?
<
=+−=−−−=−==≤≤ −−−−∫ eeee x x
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 28/50
E"#ecte$ Value an$ Variance
$ll robabilit! distributions are
characteri'ed b! an e)ected #alue
2mean3 and a #ariance 2standardde#iation s%uared3.
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 29/50
,&pecte" #alue' !ormall
Discrete case:
&ontinuous case:
∑=
)all
32 ) p(x x X E ii
dx ) p(x x X E ii∫ = )all
32
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 30/50
0 Situation
$cme ruit and Hegetable holesalers bu!s tomatoes,then sells them to retailers. $cme currentl! a!s -?=== er container. Tomatoes sold on the same da!
bring - === er container. +)tremel! erishable in
nature, if an! tomato container not sold on the same
da! are "orthless and re%uired to be disosed off
2consider at no cost3. The distribution managerBs
roblem is to determine the otimum number he
should order each da!. 0n da!s "hen he stocks more
than he sells, his rofit is reduced b! the cost of the
unsold containers. 0n the other hand, "hen retailers
re%uest more containers than he has in stock, he loses
sales and makes smaller rofit than he could ha#e.
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 31/50
De%elo#ing Pa&o!! table
$cme currentl! a!s - ?=== er container. Tomatoes soldon the same da! bring - === er container. Profit * @===
er container.
Pay off table in ` )**
$&TI07S 2 uantit! ordered 3
E#EN+S(Demand
,%- %* ,'- %% ,. -%' ,/- %.
D%- %* @== ?F= ?-= ?>=
D'- %% @== @@= @<= ?E=D.- %' @== @@= @-= @>=
D/- %. @== @@= @-= @E=
hen D , P * @= and "hen D , P * @= D ?= 28D3≥
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 32/50
Probabilit o! Occurrence #rinci#le
9et us suose the Manager ket a record of his sales for the ast<== da!s.
The e)ected #alue 2+H3 of decision alternati#e d i is defined as:
"here: N * the number of states of nature
P 2 s j 3 * the robabilit! of state of nature s j
ij * the a!off corresonding to decision alternati#e d i and state of nature s j
Daily Sales Number of dayssold
Probability of eac!number being sold
D%- %* < =.<
D'- %% ?= =.?=
D.- %' >= =.>=
D/- %. ? =.?
+H2 3 2 3d P s V i j ij j
N
==
∑<
+H2 3 2 3d P s V i j ij j
N
==
∑<
Expected profit from stoc0ing %* containers
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 33/50
Expected profit from stoc0ing %* containers
Expected profit from stoc0ing %% containers
$&TI07 2 uantit! ordered is <=3
E#EN+S(Demand
Conditionalprofit (%
Probability ofselling ('
Expected profit-(% x ('
D%- %* @== =.< >
D'- %% @== =.?= -=
D.- %' @== =.>= <?=
D/- %. @== =.? K
+otal E# .**
$&TI07 2 uantit! ordered is <<3
E#EN+S
(Demand
Conditional
profit (%
Probability of
selling ('
Expected profit
-(% x ('D%- %* ?F= =.< >?
D'- %% @@= =.?= --
D.- %' @@= =.>= <@?
D/- %. @@= =.? F?
+otal E# .''$*
Expected profit from stoc0ing %' containers
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 34/50
Expected profit from stoc0ing %' containers
Expected profit from stoc0ing %. containers
$&TI07 2 uantit! ordered is <?3
E#EN+S(Demand
Conditionalprofit (%
Probability ofselling ('
Expected profit-(% x ('
D%- %* ?-= =.< @E
D'- %% @<= =.?= -?
D.- %' @-= =.>= <>>
D/- %. @-= =.? E=
+otal E# ..
$&TI07 2 uantit! ordered is <@3
E#EN+S
(Demand
Conditional
profit (%
Probability of
selling ('
Expected profit
-(% x ('D%- %* ?>= =.< @-
D'- %% ?E= =.?= F
D.- %' @>= =.>= <@-
D/- %. @E= =.? EK
+otal E# .'1$*
Strateg! adoted
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 35/50
!&portant discrete pro$a$ility
distri$ution) The $ino&ial
The "ino&ial Distri$ution) Properties
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 36/50
The "ino&ial Distri$ution) Properties $ fi)ed number of obser#ations, n
e). < tosses of a coinL ten light bulbs taken from a
"arehouse T"o mutuall! e)clusi#e and collecti#el!
e)hausti#e categories e). head or tail in each toss of a coinL defecti#e or not
defecti#e light bulbL ha#ing a bo! or girl enerall! called 4success5 and 4failure5
Probabilit! of success is , robabilit! of failure is <
&onstant robabilit! for each obser#ation The outcome of one obser#ation does not affect the outcome of
the other
T"o samling methods Infinite oulation "ithout relacement
inite oulation "ith relacement
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 37/50
"ino&ial distri$ution
Take the e)amle of coin tosses. hatBs the robabilit! that !ou fli e)actl! @ heads in coin
tossesC
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 38/50
"ino&ial distri$ution, enerally
1-p = robabilit!of failure
p *
robabilit! of
success
X = N
successes
out of n
trials
n * number of trials
Note the general pattern emerging
if you have only two possibleoutcomes (call them 1/0 or yes/no or success/failure) in n independent
trials, then the probability of exactly X “successes”=
X n X n
X
p p −−
3<2
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 39/50
"ino&ial distri$ution) e(a&ple
If I toss a coin ?= times, "hatBs the robabilit! of
getting e)actl! <= headsC
<K-.32.32. <=<=?=
<=
=
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 40/50
"ino&ial distri$ution) e(a&ple
If I toss a coin ?= times, "hatBs the robabilit! of
getting of getting ? or fe"er headsC
>
>K?=<F??=
?
K?=<E<?=
<
K?=?==?=
=
<=F.<
<=F.<<=.E<E=32.O?O<F
O?=32.32.
<=E.<<=.E?=32.O<O<E
O?=32.32.
<=.E32.O=O?=
O?=32.32.
−
−−
−−
−
=
===
+===
+==
x
x x x
x x x
x
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 41/50
''All #robabilit $istributions are c(aracteri)e$
b an e"#ecte$ %alue an$ a %ariance*
If X follo"s a binomial distribution "ith arameters n and p: X ~ Bin (n, p)
Mean
Hariance and Standard De#iation
here n * samle si'e
* robabilit! of success
2< 3 * robabilit! of failure
pn+2)3 ==
382<nQ
?
p p= 382<nQ p p=
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 42/50
0pplications
$ manufacturing lant labels items as eitherdefecti#e or accetable
$ firm bidding for contracts "ill either get a
contract or not
$ marketing research firm recei#es sur#e! resonses
of 4!es I "ill bu!5 or 4no I "ill not5
7e" 1ob alicants either accet the offer or re1ect it
;our team either "ins or loses the football game at
the coman! icnic
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 43/50
The 2ypereo&etric Distri$ution
The binomial distribution is alicable"hen selecting from a finite oulation "ith
relacement or from an infinite oulation
"ithout relacement.
The !ypergeometric distribution is
alicable "hen selecting from a finite oulation "ithout relacement.
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 44/50
The 2ypereo&etric Distri$ution
here
7 * oulation si'e
$ * number of successes in the oulation 7 $ * number of failures in the oulation
n * samle si'e
* number of successes in the samle
n * number of failures in the samle
−−
=
n
N
X n
A N
X
A
X P 32
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 45/50
The 2ypereo&etric Distri$ution
(a&ple
Different comuters are checked from <= in thedeartment. > of the <= comuters ha#e illegal
soft"are loaded. hat is the robabilit! that ? of the
@ selected comuters ha#e illegal soft"are loadedC
So, 7 * <=, n * @, $ * >, * ?
The robabilit! that ? of the @ selected comuters
ha#e illegal soft"are loaded is .@=, or @=G.
4%31/4
*5+*5+
3
14
1
5
/
6
n
7
8n
07
8
0
/+P*8 ==
=
−
−
==
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 46/50
The 2ypereo&etric Distri$ution
Characteristics
The mean of the h!ergeometric distribution is:
The standard de#iation is:
here is called the 4inite Poulation &orrection actor5
from samling "ithout relacement from a finite oulation
7
n0*(+9 ==
1-7
n-7
7
0+-n0*7:
/ ⋅=
1-7
n-7
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 47/50
The Poisson Distri$ution Definitions
$n area of opportunity is a continuous unit orinter#al of time, #olume, or such area in "hich
more than one occurrence of an e#ent can
occur.
e). The number of scratches in a carBs aint
e). The number of mos%uito bites on a erson
e). The number of comuter crashes in a da!
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 48/50
The Poisson Distri$ution Properties
2pply t!e Poisson Distribution 3!en:;ou "ish to count the number of times an e#ent occurs in a
gi#en area of oortunit!
The robabilit! that an e#ent occurs in one area of oortunit!
is the same for all areas of oortunit!
The number of e#ents that occur in one area of oortunit! is
indeendent of the number of e#ents that occur in the other
areas of oortunit!
The robabilit! that t"o or more e#ents occur in an area of
oortunit! aroaches 'ero as the area of oortunit! becomes smaller
The a#erage number of e#ents er unit is 2lambda3
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 49/50
The Poisson Distri$ution or&ula
"here:
* the robabilit! of e#ents in an area of oortunit!
λ * e)ected number of e#ents
e * mathematical constant aro)imated b! ?.K<F?F
O
eP23
)T −
=
7/17/2019 Session 2 Data Collection
http://slidepdf.com/reader/full/session-2-data-collection 50/50
0n e(a&ple
Suose that, on a#erage, cars enter a arking lot er minute. hat is the robabilit! that in a gi#en
minute, K cars "ill enterC
So, there is a <=.>G chance K cars "ill enter the
arking in a gi#en minute.
Mean * Hariance *
=.<=>KOe
ROT eP2K3
K)T
===−−
T