what does researcher want of statistics?

Post on 03-Jan-2016

23 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

What does researcher want of statistics?. What does researcher want of statistics?. “ I had a fun and get it in addition to my cool microscope images!” “I have done a statistical analysis of my results and now give me my PhD, pleeeease!.. ”. How variable it is? Does “my pet thing” work? - PowerPoint PPT Presentation

TRANSCRIPT

What does researcher want of statistics?

What does researcher want of statistics?

1. How variable it is?2. Does “my pet thing” work?3. Why do the things differ?4. Why does it fail from time to

time?5. Why patients have different

fate and where is the hope for them?

6. What would the outcome of a perturbation?

“I had a fun and get it in addition to my cool microscope images!”“I have done a statistical analysis of my results and now give me my PhD, pleeeease!..”

Generally speaking, all the statistics is about finding relations between variables

Basic concepts to understand• Variability• Variable• Relation• Signal vs. noise• Factor vs. response (outcome), independent vs.

dependent variables• Statistical test• Null hypothesis• Power• Experimental design• Distribution

Deterministic vs.

stochastic data

Two graph concepts:Histograms: show quantities of objects of particular qualities as variable-height columns

0 2000 4000 6000 8000 10000 12000 14000

D istance in chromosome, b.p .

0

200

400

600

800

1000

1200

1400

1600

1800

2000

2200

2400

No of obs

Two graph concepts:Scatterplots: show objects arranged by 2 particular qualities as coordinates

S c atterplot (Iris dat 5v*150c )

SEPALW ID = 3.4189-0.0619*x

4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5

S E P A LLE N

1.8

2.0

2.2

2.4

2.6

2.8

3.0

3.2

3.4

3.6

3.8

4.0

4.2

4.4

4.6

SE

PA

LWID

Two graph concepts:Histograms vs. scatterplots

M atrix P lot (Iris dat 5v*150c )

SEPALLEN

SEPALWID

PET ALLEN

PET ALWID

Normal distribution

1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4

S E P A LW ID

0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

No o

f obs

––––––

+++++++++––– +-+–+– ……………

---+++

Not a normal distribution

0 2000 4000 6000 8000 10000 12000 14000

D istance in chromosome, b.p .

0

200

400

600

800

1000

1200

1400

1600

1800

2000

2200

2400

No of obs

• Variance: Var = Sum(deviation from mean)2

• Standard deviation: SD = Square root from Var

• Skewness: deviation of the distribution from symmetry

• Kurtosis: “peakedness” of the distribution

• Standard error: e.g. SE = SD / square root from N

Kurtosis: positive

1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8

S E P A LW ID

0

5

10

15

20

25

30

35

40

No of obs

Kurtosis: negative

1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4

S E P A LW ID

0

2

4

6

8

10

12

14

16

18

20

22

24

No of obs

Skewness

1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4

S E P A LW ID

0

2

4

6

8

10

12

14

16

18

20

22

24

26

No of obs

Analysis of correlations

• Simple linear correlation (Pearson r):

r = Mean(CoVar) / (StDev(X) x StDev(Y))

CoVar = (Deviation Xi from mean X) x (Deviation Yi from mean Y)

• How to interpret the values of correlations– Positive: the higher X, the higher Y– Negative: the higher X, the lower Y– ~0: no relationConfidence:– |r| > 0.7: strong– 0.25 < |r| < 0.7: medium– |r| < 0.25: weak

• Outliers

• Correlations in non-homogeneous groups

• Nonlinear relations between variables

• Measuring nonlinear relations

• Spurious correlations

• Multiple comparisons and Bonferroni correction

• Coefficient of determination: r2

• How to determine whether two correlation coefficients are significant

• Other correlation coefficients

When it should not work?

0

20

40

-1 0 1 2 3 4 5 6 7 8

INCO ME

1.0

1 .5

2 .0

2 .5

3 .0

3 .5

4 .0

4 .5

5 .0

AS

SE

TS

0 20 40

•Graphs•2D graphs

•Scatterplots w/Histograms

M atrix P lot (Iris dat 5v*150c )

SEPALLEN

SEPALWID

PET ALLEN

PET ALWID

Exploratory examination of correlation matrices

When it should not work?

0

20000

40000

-10000 -5000 0 5000 10000 15000 20000 25000 30000

NewVar

0

10000

20000

30000

40000

50000

60000

70000

80000

Va

r2

0 20000 40000

Normalize it!

0

10000

20000

-2 0 2 4 6 8 10 12

NewVar1

0

2

4

6

8

10

12

14

Ne

wV

ar2

0 10000 20000

E.g. NewX = log(X)

Causality

There is no way to establish from a correlation which variable affects which.

It is just about a relation.

• Casewise vs. pairwise deletion of missing data

• How to identify biases caused by the bias due to pairwise deletion of missing data

• Pairwise deletion of missing data vs. mean substitution

Statsoft’s Statistica

• A perfect, almost universal tool for the researchers in the range for “very beginner” to ”advanced professional”.

• An old software with intrinsic development history

• Most of the methods can be found in >1 module• Most of the modules contain >1 method• No method is perfect• No module is complete • Most of the special modules are unavailable in

the basic “budget” license

top related