ca200 dublin city university (based on the book by prof. jane m. horgan) 1

21
1. Basics of R 2. Basic probability with R CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

Upload: aden-isbell

Post on 14-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

1

1. Basics of R2. Basic probability with R

CA200Dublin City University

(based on the book by Prof. Jane M. Horgan)

Page 2: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

CA200 2

Installing R

• Go to the Cran website at http://cran.r-project.org/

• Click ‘Download and Install R ’• Choose an operating system e.g. Windows;• Choose the ‘base’ package• Select the setup program (e.g. R *.exe)• Press the option ‘Run’• R is now installed :)

Page 3: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

3

R Documentation - Manuals

http://cran.r-project.org/

• An Introduction to R• The R Language Definition• Writing R Extensions• R Data Import/Export• R Installation and Administration.• R Internals• The R Reference Index

CA200

Page 4: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

4

Basics– 6+7*3/2 #general expression

[1] 16.5– x <- 1:4 #integers are assigned to the vector x

x #print x[1] 1 2 3 4

– x2 <- x**2 #square the element, or x2<-x^2x2[1] 1 4 9 16

– X <- 10 #case sensitive!prod1 <- X*xprod1[1] 10 20 30 40

CA200

Page 5: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

5

Basics

• <- assignment operator• R is case sensitive - x and X are different values• Variables can consist of any combination of cases

or numbers but cannot begin with _ or numeral• Objects: The entities that R creates and

manipulates, e.g. variables, arrays, strings, functions

• Workspace: All objects created in R are stored in a workspace.

CA200

Page 6: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

6

Getting Help

• click the Help button on the toolbar• help()• help.start()• demo()• ?read.table• help.search ("data.entry")• apropos (“boxplot”) - "boxplot",

"boxplot.default", "boxplot.stat”

CA200

Page 7: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

7

Data Entry

• Entering data from the screen to a vector• Example:

downtime <-c(0, 1, 2, 12, 12, 14, 18, 21, 21, 23, 24, 25, 28, 29, 30,30,30,33,36,44,45,47,51)

mean(downtime) [1] 25.04348median(downtime)[1] 25range(downtime)[1] 0 51sd(downtime)[1] 14.27164

CA200

Page 8: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

8

1. Basics of R2. Basic probability with R

CA200

Page 9: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

9

Basic definitions• Experiment is a process of observation that leads to a single outcome that cannot

be predicted with certaintyExamples:1. Pull a card from a deck2. Toss a coin3. Response time

• Sample Space: All outcomes of an experiment. Usually denoted by S• Event denoted by E is any subset of S

1. E = Spades2. E = Head3. E = Component is functioning

• P(E) denotes the probability of the event E1. P(E) = P(Spades)2. P(E) = P(Head)3. P(E) = P(Component is functioning)

CA200

Page 10: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

10

Calculating Probabilities• CLASSICAL APPROACH:

Assumes all outcomes of the experiment are equally likely:

Example: Roll a fair die.E = even number

• RELATIVE FREQUENCY APPROACH:Interprets the probability as the relative frequency of the event over a long series of experiment.

Example: Roll a die a large number of times and observe number of times an even number occurs

Page 11: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

11

PermutationsThe number of ordered sequences where repetition is not allowed, i.e. no element canappear more than once in the sequence.

Ordered samples (sequences) of size k from n:

In R use function:

prod

Example 1. Three elements {1,2,3}. How many sequences of two elements from these three?Example 1 - Solution:

(1,2); (1,3); (2,1); (2, 3); (3,1); (3,2) Six ordered sequences altogether.3P2 = 3 * 2 = 6

Solution is R:prod (3:2)[1] 6

CA200

Page 12: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

12

Permutations - examplesQ2. Four elements {1,2,3,4}. How many sequences of two elements from these four?

Q3. Four elements {1,2,3,4}. How many sequences of three elements from these four?

Q2-solution:(1,2); (1,3); (1,4) (2,1); (2, 3); (2,4); (3,1); (3,2); (3,4); (4,1); (4,2); (4,3).Twelve ordered sequences altogether.

4P2 = 4 * 3 = 12In R:

prod (4:3)[1] 12

Q3-solution:(1,2,3); (1,3,2); (1,2,4); (1,4,2); (1,3,4); (1,4,3)(2,1,3); (2,3,1); (2,1,4); (2,4,1); (2,3,4); (2,4,3)(3,1,2); (3,2,1); (3,2,4); (3,4,2); (3,1,4); (3,4,1)(4,1,3); (4,3,1); (4,1,2); (4,2,1); (4,3,2); (4,2,3)Twenty-four ordered sequences.

4P3 = 4 *3 * 2= 24In R:

prod (4:2)[1] 24

Page 13: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

13

CombinationsThe number of unordered sets of distinct elements, i.e. repetition is not allowed.

Number of ways of selecting k distinct elements from n or equivalently number ofunordered samples of size k, without replacement from n

In R use function:

choose

Example 4. Three elements {1,2,3}. How many sets (combinations) of two elements from these three?

Example 4 - Solution:

{1,2}; {1,3}; {2,3} Three ordered sequences altogether.3C2 = (3*2)/2*1 = 3

In R:choose (3,2)[1] 3

Page 14: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

14

Combinations - examplesQ5. Four elements {1,2,3,4}. How many combinations of two elements from these four?Q6. Four elements {1,2,3,4}. How many unordered of three elements from these four?Q7. If a box contains 75 good IC chips and 25 defective chips, and 12 chips are selected at random,find the probability that all chips are good.

Q5-solution:{1,2}; {1,3}; {1,4} {2,3}; {2, 4}; {3,4} Six unordered sequences altogether.

4C2 = (4*3)/(2*1) = 6

In R: choose (4, 2)[1] 6

Q6-solution:{1,2,3}; {1,2,4}; {2,3,4}; {3,1,4} Four unordered sequences.

4C3 = (4*3*2)/(3*2*1)= 4

In R: choose(4, 3)[1] 4

Q7-Solution: E – all chips are good

In R:

Page 15: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

15

Bayes’ rule - exampleQ8. In a computer installation, 200 programs are written each week, 120 inC++ and 80 in Java. 60% of the programs written in C++ compile on the firstrun and 80% of the Java programs compile on the first run.a) If a randomly selected program compiles on the 1st run what is the probability that it

was written in C++? Q8 – Solution:Lets denote: C++ - event that program is written in C++ J - event that program is written in JavaE - event that selected program compiles on the first run

P(E) = P(C++)P(E|C++) + P(J)P(E|J) = 120/200 * 72/120 + 80/200*64/80 = 0.68

In R:total_prob <- ((80/200)*(64/80)) + ((120/200)*(72/120)) #total probabilitycondit_c <- (120/200)*(72/120)/total_prob #posterior probability of C++condit_c

[1] 0.5294118

Page 16: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

Solve following exercises using R:(E2.1) A card is drawn from a shuffled pack of 52 cards. What is the probability of drawing a ten or a spade?

(E2.2) Records of service requests at a garage and their probabilities are as follows:

Daily demand is independent (e.g. tomorrow’s demand is independent of today’s demand).What is the probability that over a two-day period the number of requests will bea) 10 requests, b) 11 requests and c) 12 requests?

(E2.3) Analysis of a questionnaire completed by holiday makers showed that 0.75 classified their holiday as good at resort Costa Lotta. The probability of hot weather in this resort is 0.6. If the probability of regarding the holiday as good given hot weather is 0.9, what is the probability that there was hot weather if a holiday maker considers his holiday good?

Daily Demand Probability5 0.36 0.7

CA200 16

Page 17: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

18

Binomial distributionDef:Conditions:1. An experiment consists of n repeated trials2. Each trial has two possible outcomes

a) A success with probability p is constant from trial to trialb) A failure with probability q = 1 − p

3. Repeated trials are independentX = number of successes in n trials, X is a BINOMIAL RANDOM VARIABLE.

The probability of getting exactly x successes in n trials is:

In R use function: use prefix “binom” with prefix “d” for pdf. Function is dbinom(x, n, p)

Page 18: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

19

Binomial - examplesQ9. Five terminals on an on-line computer system are attached to a communication line to the

central computer system. The probability that any terminal is ready to transmit is 0.95. Q9-solution:Lets denote: X = number of ready terminalsp – probability of success in one trialn – number of trials

p = 0.95n = 1 – q = 5P(X=0) = P(0 ready terminal in 5): FFFFFP(X=1) = P(1 ready terminal in 5): SFFFF, FSFFF, FFSFF, FFFSF, FFFFS

In R:For all cumulative probabilities:x <- 0:5dbinom(x, size = 5, prob = 0.95)

[1] 0.0000003125 0.0000296875 0.0011281250 0.0214343750 0.2036265625 0.7737809375The probability of getting exactly 3 ready terminals from five:dbinom(x = 3, size =5, prob = .95)

[1] 0.02143438

Page 19: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

20

Binomial - examplesQ9. Five terminals on an on-line computer system are attached to a communication line to the

central computer system. The probability that any terminal is ready to transmit is 0.95. Q9-solution – using plot function

plot(x, dbinom(x, size = 5, prob = 0.95), xlab = "Number of Ready Terminals", ylab = "P(X = x)", type = "h", main = "Ready Terminals, n = 5, p = 0.95")

Page 20: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

21

Binomial - examplesQ10. A fair coin is tossed 10 times. Success and failure are “heads” and “tails”, respectively,each with probability, 0.5.Q10-solution:X = number of heads (successes) obtained

p = 0.50n = 10P(X=0) = P(exactly 0 head in 10 tosses)P(X=1) = P(exactly 1 head in 10 tosses)P(X=2) = P(exactly 2 heads in 10 tosses)…

In R: x <- 0:10

round(dbinom(x, 10, 0.5), 4)[1] 0.0010 0.0098 0.0439 0.1172 0.2051 0.2461 0.2051 0.1172 0.0439 0.0098 0.0010

Page 21: CA200 Dublin City University (based on the book by Prof. Jane M. Horgan) 1

22

Binomial - examplesQ10. A fair coin is tossed 10 times. Success and failure are “heads” and “tails”, respectively,each with probability, 0.5.Q10 solution – using plot function

plot(x, dbinom(x, size = 10, prob = 0.5), xlab = "Number of Heads", ylab = "P(X = x)", type = "h", main = "Heads, n = 10, p = 0.5")