what is probability and statistics and why should you care?jeffp/teaching/cs3130/lectures/l00... ·...

75
What is Probability and Statistics and Why Should You Care? CS 3130: Probability and Statistics for Engineers August 26, 2014

Upload: others

Post on 30-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What is Probability and Statistics andWhy Should You Care?

CS 3130: Probability and Statistics for Engineers

August 26, 2014

Page 2: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What is Probability?

DefinitionProbability theory is the study of the mathematicalrules that govern random events.

But what is randomness?

Informally, a random event is an event in which we donot know the outcome without observing it.

Probability tells us what we can say about such events,given our assumptions about the possible outcomes.

Page 3: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What is Probability?

DefinitionProbability theory is the study of the mathematicalrules that govern random events.

But what is randomness?

Informally, a random event is an event in which we donot know the outcome without observing it.

Probability tells us what we can say about such events,given our assumptions about the possible outcomes.

Page 4: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What is Probability?

DefinitionProbability theory is the study of the mathematicalrules that govern random events.

But what is randomness?

Informally, a random event is an event in which we donot know the outcome without observing it.

Probability tells us what we can say about such events,given our assumptions about the possible outcomes.

Page 5: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What is Probability?

DefinitionProbability theory is the study of the mathematicalrules that govern random events.

But what is randomness?

Informally, a random event is an event in which we donot know the outcome without observing it.

Probability tells us what we can say about such events,given our assumptions about the possible outcomes.

Page 6: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What is Probability?

DefinitionProbability theory is the study of the mathematicalrules that govern random events.

But what is randomness?

Informally, a random event is an event in which we donot know the outcome without observing it.

Probability tells us what we can say about such events,given our assumptions about the possible outcomes.

Page 7: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What is Statistics?

DefinitionStatistics is the application of probability to thecollection, analysis, and description of random data.

Statistics is used to:I Design experimentsI Summarize dataI Make conclusions about the worldI Explore complex data

Page 8: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What is Statistics?

DefinitionStatistics is the application of probability to thecollection, analysis, and description of random data.

Statistics is used to:I Design experimentsI Summarize dataI Make conclusions about the worldI Explore complex data

Page 9: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What is Statistics?

DefinitionStatistics is the application of probability to thecollection, analysis, and description of random data.

Statistics is used to:I Design experiments

I Summarize dataI Make conclusions about the worldI Explore complex data

Page 10: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What is Statistics?

DefinitionStatistics is the application of probability to thecollection, analysis, and description of random data.

Statistics is used to:I Design experimentsI Summarize data

I Make conclusions about the worldI Explore complex data

Page 11: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What is Statistics?

DefinitionStatistics is the application of probability to thecollection, analysis, and description of random data.

Statistics is used to:I Design experimentsI Summarize dataI Make conclusions about the world

I Explore complex data

Page 12: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What is Statistics?

DefinitionStatistics is the application of probability to thecollection, analysis, and description of random data.

Statistics is used to:I Design experimentsI Summarize dataI Make conclusions about the worldI Explore complex data

Page 13: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:

I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 14: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine Learning

I Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 15: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data Mining

I Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 16: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial Intelligence

I SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 17: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI Simulation

I Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 18: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image Processing

I Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 19: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer Graphics

I VisualizationI Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 20: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI Visualization

I Software TestingI Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 21: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software Testing

I Algorithms

Electrical Engineering:

I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 22: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:I Signal Processing

I TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 23: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:I Signal ProcessingI Telecommunications

I Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 24: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:I Signal ProcessingI TelecommunicationsI Information Theory

I Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 25: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:I Signal ProcessingI TelecommunicationsI Information TheoryI Control Theory

I Instrumentation, SensorsI Hardware/Electronics

Testing

Page 26: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, Sensors

I Hardware/ElectronicsTesting

Page 27: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

Computer Science:I Machine LearningI Data MiningI Artificial IntelligenceI SimulationI Image ProcessingI Computer GraphicsI VisualizationI Software TestingI Algorithms

Electrical Engineering:I Signal ProcessingI TelecommunicationsI Information TheoryI Control TheoryI Instrumentation, SensorsI Hardware/Electronics

Testing

Page 28: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

General:I Gambling

(not recommended)I Stock Market AnalysisI PoliticsI SportsI DemographicsI MedicineI EconomicsI All Sciences!!

Page 29: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

General:I Gambling (not recommended)

I Stock Market AnalysisI PoliticsI SportsI DemographicsI MedicineI EconomicsI All Sciences!!

Page 30: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market Analysis

I PoliticsI SportsI DemographicsI MedicineI EconomicsI All Sciences!!

Page 31: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market AnalysisI Politics

I SportsI DemographicsI MedicineI EconomicsI All Sciences!!

Page 32: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market AnalysisI PoliticsI Sports

I DemographicsI MedicineI EconomicsI All Sciences!!

Page 33: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market AnalysisI PoliticsI SportsI Demographics

I MedicineI EconomicsI All Sciences!!

Page 34: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market AnalysisI PoliticsI SportsI DemographicsI Medicine

I EconomicsI All Sciences!!

Page 35: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market AnalysisI PoliticsI SportsI DemographicsI MedicineI Economics

I All Sciences!!

Page 36: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Applications of Probability and Statistics

General:I Gambling (not recommended)I Stock Market AnalysisI PoliticsI SportsI DemographicsI MedicineI EconomicsI All Sciences!!

Page 37: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Alan Turing: Connecting CS and Probability

I “Father of Computer Science”

I Most famous for:I Computability, Turing machineI Stored-program computerI Turing testI WWII cryptanalysis

I Wrote a dissertation onprobability theory!

I Turing used probability andstatistics to crack Enigma

Page 38: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Alan Turing: Connecting CS and Probability

I “Father of Computer Science”I Most famous for:

I Computability, Turing machineI Stored-program computerI Turing testI WWII cryptanalysis

I Wrote a dissertation onprobability theory!

I Turing used probability andstatistics to crack Enigma

Page 39: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Alan Turing: Connecting CS and Probability

I “Father of Computer Science”I Most famous for:

I Computability, Turing machineI Stored-program computerI Turing testI WWII cryptanalysis

I Wrote a dissertation onprobability theory!

I Turing used probability andstatistics to crack Enigma

Page 40: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Alan Turing: Connecting CS and Probability

I “Father of Computer Science”I Most famous for:

I Computability, Turing machineI Stored-program computerI Turing testI WWII cryptanalysis

I Wrote a dissertation onprobability theory!

I Turing used probability andstatistics to crack Enigma

Page 41: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Machine Learning

Machine Learning builds statistical models of data inorder to recognize complex patterns and to makedecisions based on these observations.

Examples:I Classification (recognition of faces or handwriting)I Prediction (stock market, elections)

Page 42: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Machine Learning

Machine Learning builds statistical models of data inorder to recognize complex patterns and to makedecisions based on these observations.

Examples:I Classification (recognition of faces or handwriting)

I Prediction (stock market, elections)

Page 43: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Machine Learning

Machine Learning builds statistical models of data inorder to recognize complex patterns and to makedecisions based on these observations.

Examples:I Classification (recognition of faces or handwriting)I Prediction (stock market, elections)

Page 44: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Randomized Algorithms

I Some algorithms benefit from using random stepsrather than deterministic ones

I Example: primality testingI Testing for all possible divisors is slow for large numbersI Instead test a random selection of divisorsI Can be confident of primality up to a certain degree

I Example: stochastic optimization methodsI Optimizations can get “stuck” in the wrong answer,

depending on how they are initializedI Re-run the algorithm with several random initializations

Page 45: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Randomized Algorithms

I Some algorithms benefit from using random stepsrather than deterministic ones

I Example: primality testingI Testing for all possible divisors is slow for large numbersI Instead test a random selection of divisorsI Can be confident of primality up to a certain degree

I Example: stochastic optimization methodsI Optimizations can get “stuck” in the wrong answer,

depending on how they are initializedI Re-run the algorithm with several random initializations

Page 46: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Randomized Algorithms

I Some algorithms benefit from using random stepsrather than deterministic ones

I Example: primality testingI Testing for all possible divisors is slow for large numbersI Instead test a random selection of divisorsI Can be confident of primality up to a certain degree

I Example: stochastic optimization methodsI Optimizations can get “stuck” in the wrong answer,

depending on how they are initializedI Re-run the algorithm with several random initializations

Page 47: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Randomized Algorithms

I Some algorithms benefit from using random stepsrather than deterministic ones

I Example: primality testingI Testing for all possible divisors is slow for large numbersI Instead test a random selection of divisorsI Can be confident of primality up to a certain degree

I Example: stochastic optimization methodsI Optimizations can get “stuck” in the wrong answer,

depending on how they are initializedI Re-run the algorithm with several random initializations

Page 48: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Computer Graphics

I Ray tracing models lightphotons bouncing around ascene

I Impossible to model everyphoton

I Monte Carlo ray tracingsimulates a randomselection of photons Image by Steve Parker (U of U)

Page 49: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Visualization

I Scientific data containsuncertainty

I Visualizations can bemisleading as to “truth”

I Current researchfocuses on how tovisualize uncertainty

Johnson and Sanderson, IEEE Comp. Graph. and App., 2003

Page 50: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Visualization

I Scientific data containsuncertainty

I Visualizations can bemisleading as to “truth”

I Current researchfocuses on how tovisualize uncertainty

Johnson and Sanderson, IEEE Comp. Graph. and App., 2003

Page 51: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Visualization

I Scientific data containsuncertainty

I Visualizations can bemisleading as to “truth”

I Current researchfocuses on how tovisualize uncertainty

Johnson and Sanderson, IEEE Comp. Graph. and App., 2003

Page 52: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Medical Image Analysis

I Must deal with noisyimage data

I Example: finding ananatomical structure ina 3D image

I Often includesstatistical analysis ofresulting data

Fletcher et al, NeuroImage, 2010

Page 53: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Medical Image Analysis

I Must deal with noisyimage data

I Example: finding ananatomical structure ina 3D image

I Often includesstatistical analysis ofresulting data

Fletcher et al, NeuroImage, 2010

Page 54: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Application: Medical Image Analysis

I Must deal with noisyimage data

I Example: finding ananatomical structure ina 3D image

I Often includesstatistical analysis ofresulting data

Fletcher et al, NeuroImage, 2010

Page 55: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

“Big Data” and “Analytics”

I The amount of digitaldata is exploding!

I Big data analysis isstatistics + scalable CS.

I Examples: social media,internet purchases, newsarticles, scientific data,medical data

Source: IDC/EMC Digital Universe Study

Page 56: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

“Big Data” and “Analytics”

I The amount of digitaldata is exploding!

I Big data analysis isstatistics + scalable CS.

I Examples: social media,internet purchases, newsarticles, scientific data,medical data

Source: IDC/EMC Digital Universe Study

Page 57: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

“Big Data” and “Analytics”

I The amount of digitaldata is exploding!

I Big data analysis isstatistics + scalable CS.

I Examples: social media,internet purchases, newsarticles, scientific data,medical data

Source: IDC/EMC Digital Universe Study

Page 58: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Sources: Lesk, Berkeley SIMS, Landauer, EMC, TechCrunch, Smart Planet(slide by Chris Johnson)

all digital info

new digital info/yr

all human documents in 40k Yrs

all spoken words in all lives

amount human minds can store in 1yr

Feb. 2011

Every two days we create as much data as we did from the beginning of mankind until 2003!

Exa

byte

s (1

018

byt

es)

Page 59: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

How Much is an Exabyte?

1 Exabyte = 1000 Petabytes = could hold approximately500,000,000,000,000 pages of standard printed text

It takes one tree to produce 94,200 pages of a book

Thus it will take 530,785,562,327 trees to store an Exabyte of data

In 2005, there were 400,246,300,201 trees on Earth

We can store .75 Exabytes of data using all the trees on the entire planet.

Sources: http://www.whatsabyte.com/ and http://wiki.answers.com (slide by Chris Johnson)

How many trees does it take to print out an Exabyte?

Page 60: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description
Page 61: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation3. Formulate a hypothesis4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

Page 62: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question

2. Background research, observation3. Formulate a hypothesis4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

Page 63: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation

3. Formulate a hypothesis4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

Page 64: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation3. Formulate a hypothesis

4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

Page 65: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation3. Formulate a hypothesis4. Design and run an experiment

5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

Page 66: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation3. Formulate a hypothesis4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

Page 67: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation3. Formulate a hypothesis4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

Page 68: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

The Scientific Method

drawconclusion

makehypothesis

gather datadata

computestatistics

1. Define the question2. Background research, observation3. Formulate a hypothesis4. Design and run an experiment5. Analyze the results

Experimental measurements are noisy (randomness).

Statistics is critical in the last two steps!

Page 69: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Data Science

enormousnoisy data

workingdata

drawconclusion

datasquashing

model uncertainty and bound error

data mining

1. Process/Squash enormous available data

2. Mine working data (calculate many statistics)

3. Analyze the results / Draw conclusions

Every step is subject to noise and involves statistics.

What statistics can and cannot do!

Page 70: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Data Science

enormousnoisy data

workingdata

drawconclusion

datasquashing

model uncertainty and bound error

data mining

1. Process/Squash enormous available data

2. Mine working data (calculate many statistics)

3. Analyze the results / Draw conclusions

Every step is subject to noise and involves statistics.

What statistics can and cannot do!

Page 71: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Data Science

enormousnoisy data

workingdata

drawconclusion

datasquashing

model uncertainty and bound error

data mining

1. Process/Squash enormous available data

2. Mine working data (calculate many statistics)

3. Analyze the results / Draw conclusions

Every step is subject to noise and involves statistics.

What statistics can and cannot do!

Page 72: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Data Science

enormousnoisy data

workingdata

drawconclusion

datasquashing

model uncertainty and bound error

data mining

1. Process/Squash enormous available data

2. Mine working data (calculate many statistics)

3. Analyze the results / Draw conclusions

Every step is subject to noise and involves statistics.

What statistics can and cannot do!

Page 73: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Data Science

enormousnoisy data

workingdata

drawconclusion

datasquashing

model uncertainty and bound error

data mining

1. Process/Squash enormous available data

2. Mine working data (calculate many statistics)

3. Analyze the results / Draw conclusions

Every step is subject to noise and involves statistics.

What statistics can and cannot do!

Page 74: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

Data Science

enormousnoisy data

workingdata

drawconclusion

datasquashing

model uncertainty and bound error

data mining

1. Process/Squash enormous available data

2. Mine working data (calculate many statistics)

3. Analyze the results / Draw conclusions

Every step is subject to noise and involves statistics.

What statistics can and cannot do!

Page 75: What is Probability and Statistics and Why Should You Care?jeffp/teaching/cs3130/lectures/L00... · Statistics is the application of probability to the collection, analysis, and description

What You Should Do Now

1. Check out the class web page: www.cs.utah.edu/˜jeffp/teaching/cs3130.html

2. Download the book(start reading Ch 1 & 2)

3. Download and install R on your machine(take a look at R tutorial)