stat 31, section 1, last time course organization & website...

Post on 17-Jan-2018

226 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Stat 31, Student Poll Results “Have you taken an AP Exam?” Only ~10% had & grades generally low So don’t worry if you haven’t…

TRANSCRIPT

Stat 31, Section 1, Last Time

• Course Organization & Websitehttps://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31sec1Home.html

• What is Statistics?• Data types and structure• Get going in EXCEL• Exploratory Data Analysis• Bar Graphs

Stat 31, Student Poll Results

Stat 31, Sec 1: Majors

0

5

10

15

20

25

30

35

Busine

ssBus

. +

Biolog

y

Public

Poli

cy

Envrio

nment

Health

Pol.

Poli S

ci

OR - Actu

ary

Undec

idedOthe

r

As indicated on “Student Info” form:

Big changes from the past:

More biology

More diversity

Stat 31, Student Poll Results

“Have you taken an AP Exam?”

Only ~10% had & grades generally low

So don’t worry if you haven’t…

Major Concept: Distributions

“Distribution” = “Patterns of data” = “way data is spread out”

e.g. Bar Graph is visual display of categorical “distribution”

Exploratory Data Analysis 3

Visual Display of Quantitative Distributions:

1. Stem and Leaf PlotsNot Recommended

(Main motivation was pencil and paper statistical analysis, but now have better graphical methods readily accessible)

A limited special case of….

Visual Disp: Quantitative Dist’ns

2. Histograms

Idea: Apply bar graph idea,By creating categories,Called “class intervals” or “classes” or “bins”

Histograms

Idea: put numbers into “bins”, bar heights are counts, or “frequencies”

1.33.61.93.11.5 0 1 2 3 4

Class Histogram Example

Buffalo, N. Y. (Annual) Snowfall Data

Raw Data:https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31Eg2Raw.xls

63 years, ranging from ~30 - ~120 (inches)

Buffalo Snowfall Data

Buffalo, N. Y. (Annual) Snowfall Data

Raw Data:https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31Eg2Raw.xls

63 years, ranging from ~30 - ~120 (inches)

Histogram Analysis (pre-done):https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31Eg2Done.xls

Buffalo Snowfall Data, I

A. EXCEL default (of bin edges)• Unround numbers for bin edges• Data “centered around 90”• Most data between 50 and 130• Assymetric Distribution

Buffalo Snowfall Data, II

B. Smaller bins• Chosen by me• Binwidth = 5, << ~13 from EXCEL default• Nicer edge numbers• Data centered around 84 (now more precise)• Bar graph rougher (fewer points in each bin)• Suggests 3 main groups (called “modes”)

(can’t see this above: bin width counts)

Buffalo Snowfall Data, III

C. Larger bins• Chosen by me• Binwidth = 30, >> ~13 from EXCEL default• Bar graph is “smooth”

(since many points in each bin)• Only one mode???• Quite symmetric?

(different from above: bin width counts)

Buffalo Snowfall Data, IV

C. What’s under the hood (how to do this):

i. Tools Data Analysis Histogram (& Chart Out)

(may need Data Analysis “Add-in”)

i. Massage pic (especially bar width)

ii. Sigma min, max

iii. Bin range: create first two & drag

iv. Histogram, using input bin edges

Buffalo Snowfall Data, IV

C. What’s under the hood (how to do this):

i. Tools Data Analysis Histogram (& Chart Out)

(may need Data Analysis “Add-in”)

i. Massage pic (especially bar width)

ii. Sigma min, max

iii. Bin range: create first two & drag

iv. Histogram, using input bin edges

Histogram HWHW: 1.21• Use Excel and histograms• Get data from CDrom• Do both:

– Excel Default bins– Bins set to: 0,10,20,…,240

• Which gives answers closer to answers in back of book?

• Turn in only one page

Histogram Binwidths

Nice Example from the Webster West, U.S.C.:

http://www.stat.sc.edu/~west/applets/histogram.html

Control Binwidth with slider:• Undersmoothing?• About right?• Oversmoothing?

(critical to visual impression)

Histogram Binwidth Example

Hidalgo Stamp DataFrom Mexico in 1800sHow many sources of

paper?

How many modes:1, 2, 5, 7, 10?

Histogram Binwidth Example

How many modes?

Caution: Answer depends on binwidth

(a serious and current statistical research problem)

Stamps Data Histogram

How many modes?

2nd Caution: Answer also depends on bin location

(i.e. “shift” of bins)

Histogram Bins

For this course:Try several binwidths, to “get the idea”

Weakness of EXCEL (we will see several):This is inconvenient

Comparison of Histograms

Class Example: Study Habits DataIdea: Compare Study Habits of Males vs.

Females (measured by some “survey score”, perhaps of questionable value?)

https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31Eg4Done.xls

Study Habits Data

EXCEL default histograms:

• Populations look similar???

• Careful: Binwidth very big…

• Careful: Different bin ranges…

• Need smaller binwidths, and common scales

Study Habits Data

Better Choice: Binwidths = 10, same bins for both

• Clear difference, easy to see

• Females higher “on average”

• Males are “more spread”

• 1 “exceptional value”, really true???

Things to look for (in histo’s)

1. Population Center Point (Study Habits Data)

2. Population Spread (Study Habits Data)

3. Shape - Symmetric vs. Skewed

Right Skewed:

Left Skewed:

1. Modes - Unexpected clusters

2. Outliers - “unusual data points”

Comparison of Histograms HW

HW: 1.25b, 1.27, 1.29, 1.22• Work in this order• Get data from CDrom• Use EXCEL and histograms• Odd answers in back• You choose the bins

(if you miss something in answers, change this)• Turn in at most one page for each

Plotting Bivariate Data

Toy Example:

(1,2)

(3,1)

(-1,0)

(2,-1)

Toy Scatterplot, Separate Points

-1.5-1

-0.5

00.5

11.5

22.5

-2 -1 0 1 2 3 4

x

y

Plotting Bivariate Data

Sometimes:

Can see more

insightful patterns

by connecting

points

Toy Scatterplot, Connected points

-1.5-1

-0.50

0.51

1.52

2.5

-2 -1 0 1 2 3 4

x

y

Plotting Bivariate Data

Sometimes:

Useful to switch off

points, and only

look at lines/curves

Toy Scatterplot, Lines Only

-1.5-1

-0.50

0.51

1.52

2.5

-2 -1 0 1 2 3 4

x

y

Plotting Bivariate DataCommon Name: “Scatterplot”

A look under the hood:

EXCEL: Chart Wizard (colored bar icon)

• Chart Type: XY (scatter)

• Subtype conrols points only, or lines

• Later steps similar to above

(can massage the pic!)

top related