lies, damned lies, and health physics some random comments about statistics in health physics
Post on 03-Jan-2016
25 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
Lies, Damned Lies, and Health PhysicsSome Random Comments About
Statistics in Health Physics
Savannah River Chapter of the Health Physics Society
Aiken, SCApril 15, 2011
Tom LaBone
2
“It is easy to lie with statistics.” “It is hard to tell the truth without statistics."
Andrejs Dunkels
“There are three kinds of lies: lies, damned lies, and statistics.”
Mark Twain
3
Today Informal, mostly apocryphal discussion of
what statistics really is, who practices statistics and how they do it, and why all of this is important to you as a health physicist
Main message of talk A good working knowledge of statistics is essential in any
endeavor where data are collected and analyzed (e.g., health physics)
Everyone in the room should become a statistician (of sorts) No math is used in this presentation and no health
physicists were harmed during its preparation
4
Health Physics and Statistics Some HP “stat” books I used in school
G. F. Knoll Radiation Detection and Measurement 1st Edition 1979
J. Shapiro Radiation Protection 1nd Edition 1972 H. Cember Introduction to Health Physics 1st Edition
1969 R. D. Evans The Atomic Nucleus 1955 P. R. Bevington Data Reduction and Error Analysis
for the Physical Sciences 1st Edition 1969 Statistics was a tool, a “wrench to turn a nut”
Is that all it is?
5
“Humans are good, she knew, at discerning subtle patterns that are really there, but equally so at imagining them when they are altogether absent.”
Carl Sagan in Contact
What is Statistics?
6
Signals and Noise Useful information comes to us in the form
of signals that form distinct patterns The signals are contaminated with varying
degrees of noise, which can make it difficult to see the signal
7
Seeing Patterns
In our evolutionary history, seeing patterns where none existed may have been less harmful than missing patterns that did existThat noise in the grass – is it
just the wind or is it a lion? So, we as a species got very
good at seeing patterns, even in the absence of a signal
8
Apophenia
Apophenia is the experience of seeing meaningful patterns or connections in random or meaningless data
What do you see below?
9
Viking 1 Orbiter Mars Global Surveyor
Face on Mars
10
Face in Food, et cetera
11
Face in Data
12
Statistics is … … a science that helps us to differentiate signal
from noise and make decisions with a known probability of being wrong
… a very practical, decision oriented methodology developed to tame our natural tendency to be Apopheniacs
… based on the idea that variability and noise are natural and unavoidable
… a relatively modern science that is actively evolving especially since cheap, powerful computers became
available
13
Really, What is Statistics?
Chris ChatfieldProblem Solving: A Statistician’s Guide
“Statistics is concerned with collecting, analyzing, and interpreting data in the best possible way, where the meaning of “best” depends on the particular circumstances of the practical situation”
14
Exploratory Data Analysis
Look at data (usually with graphics) and use our ability to see patterns in the data to Suggest hypotheses to test Assess validity of assumptions on which statistical
inference will be based Support the selection of appropriate inferential tests Suggest ideas for further data collection
15
Fecal SamplesAir Filters
Pu239
0 1 2 3
12
34
5
6
78
910
11
12
13141516
17181920
2122
23
24
2526
2728
29
30
31323334
35363738 3940414243 44
45
46
47
48
4950
515253
54
555657
58
59
606162
6364
6566 67
68697071
7273747576 777879
80 81828384
85
8687 8889
9091
92939495969798 99100101 102103104105106107108109110
111112113
114 115116117118
119120 121122123124125 126127
128
129130
131132133
134
135
136
137138
139
140141
142
143
144
145
146
147
148
149 150151
152153
154
155
156
157
158
159
160161
162
163
164
165166
167
168169170171172
173
174
175
176
177
178
179180
181
182
183
184185
186
187 188189190191192
193194
195196 197198
199200
201
202
203
204
205206207208209
210
211
212
213214
215
216217
218219220
221222223224225
226227
228
229
230
231
232233234
235236
237238239240
241242
243
244245246 247248 249250251252
253254255256257
258259260261
262
263264
265266267268
269270271272273274275276277278279280281 282
283284285286
287
288289
290291292
293
294295296297
298
299300
301
302303304305306307308
309
310311312313314315316317
318
319
320
321
322
323324325
326327328329
330331
332
333334335336
337
338
339
340
341342343344345346347348349350351352353354
355356
357358359360361362363364365366367368
369370
371372373374375376377378379380381382383384385
386387388
389390391392393394395396397 398399400401402
403
404405 406407408409
410411412 413414415 416
417
418419420421422423424425426
427428429430431
432433434435436
437
438439
440441442443444445
446447448
449
450
451
452
453
454
455456457
458459460461
462
463464465
466
467
468469
470
471
472473474
475
476
477
478
479
480 481
482483484
485
486
487
488
489490
491492
493494495
496
497498
499
500
501
502
503
504
505506507508
509
510
511
512
513
514515
516
517
518
519
520
02
46
810
12
12
34
5
6
789
10
11
12
13141516
17181920
2122
23
24
252627
28
29
30
31323334
35363738 3940
414243 44
45
46
47
48
4950
51525354
555657
58
59
606162
6364
656667
68697071
7273747576777879
8081828384
85
86878889
9091
9293949596979899100101102103104105106107108109110
111112113
114 115116117118
119120121122123124125126127
128
129130
131132133
134
135
136
137138
139
140141
142
143
144
145
146
147
148
149150151
152153
154
155
156
157
158
159
160161
162
163
164
165166
167
168169170
171172
173
174
175
176
177
178
179180
181
182
183
184185
186
187 188189190
191192
193194
195196197198
199200
201
202
203
204
205206207208209
210
211
212
213214
215
216217
218219220
221222223224225
226227
228
229
230
231
232233234
235236
237238239
240
241242
243
244245246247248249250251252253254
255256257258259260
261
262
263264
265266267268
269270271272273274275276277278279280281 282283284285286
287
288289
290291292
293
294295296297
298
299300
301
302303304305306307308
309
310311312313314315316317318
319
320
321
322
323324325
326327328329
330331
332
333334335336
337
338
339
340
341342343344345346347348349350351352353354
355356
357358359360361362363364365366367368369
370371
372373374375376377378379380381382383384385
386387388
389390391392393394395396397398399400401402
403
404405406407408409
410411412413414415416
417
418419420421422423424425426427428429430431
432433434435436
437
438439
440441442443444445446
447448
449
450
451
452
453
454
455456457
458459460461
462
463464465
466
467
468469470
471
472473474
475
476
477
478
479
480481
482483484
485
486
487
488
489490
491492
493494495
496
497498
499
500
501
502
503
504
505506507508
509
510
511
512
513
514515
516
517
518
519
520
01
23 Slope = 0.236
Cm244
12
3
45
6
7
8
9
10
1112131415
16
17181920
21
2223
24
25
26
27
28
2930313233
34
35363738
39
404142 43
44
45
46
47 48
49
505152535455
565758
59
606162
636465
66
67686970
71
7273747576
7778
7980
81828384
85
86
87
8889
90
91
9293949596
9798
99100101
102
103104105106107108
109
110111112113114
115
116117
118119120
121
122123124
125
126
127
128129130
131
132133
134
135
136
137
138
139140141
142
143
144
145
146147
148149
150
151
152153
154
155
156
157
158
159
160
161
162
163
164
165166
167
168
169
170
171172
173
174
175
176
177
178
179
180
181
182
183
184185
186
187
188
189
190
191192
193
194
195196
197198199200
201
202
203
204
205206207
208209
210
211
212
213
214
215
216 217
218219
220221222223224225
226
227
228
229
230
231
232233
234
235236
237
238239240
241242
243
244245246
247
248
249
250251252253254255256257258259260261
262263
264
265266
267268269
270271272273
274275276277278279280281
282
283284285286
287
288289290
291292
293
294
295296297
298
299300
301
302303304305306
307308309310311312313
314315316317318
319320321 322
323324325326
327328
329330
331332
333334335336
337
338
339340341342343
344345346347
348
349350351352353354
355
356357358359360361362363364365366
367368
369370
371
372373374375376377378379380381382383384385
386
387388
389390391392393
394
395396397
398
399400401402
403
404405
406407408409
410411
412
413414415
416417
418419420421422423424425426427428
429430431
432433434435436
437
438439440441442443
444445
446
447448
449
450
451
452
453
454
455
456
457
458459460461
462
463464465
466
467
468469
470
471
472473474
475
476
477
478
479
480
481
482483
484
485
486
487
488489490 491
492
493494495
496
497498
499
500
501
502
503
504
505506507
508
509
510
511
512
513514515
516
517
518
519520
0 2 4 6 8 10 12
Slope = 1.38 Slope = 4.56
0 5 10 15
05
1015
Am241
Fecals as of 3/5/2011
Pu239
2 4 6 8 10
1
23
4
5
6
510
1520
2530
1
23
4
5
6
24
68
10
Slope = 0.316
Cm244
1
2
3
4
5
6
5 10 15 20 25 30
Slope = 2.02 Slope = 6.09
10 20 30 40 50 60
1020
3040
5060
Am241
Kinectrics Filters All
16
Confirmatory Data Analysis
Use statistical tests to answer questions about the data along with the risks of reaching the wrong conclusion Is the material on the filters the same material
that is in the fecal samples?Are the Pu-239 to Am-241 ratios in the fecal
samples and air samples the same once we account for random noise?
17
0 2 4 6 8 10 12
05
10
15
Pu-239 (mBq)
Am
-24
1 (
mB
q)
95% CI = (1.33, 1.46)
2
Fecal Samples
18
Data Dredging
Are the two Pu-239 to Am-241 ratios the same? If this question was asked before we saw the
data we can proceed with the test to answer it If this question was inspired by the data then we
should not test the same data to get the answer Referred to as data snooping, data dredging, etc. Cancer clusters
19
Statistical Method
Define the problem Formulate your questions in such a way that
unambiguous answers are possible
Collect data Collect data capable of answering your question
Analyze the data Present the results
in terms your audience can understand
20
"It is better to solve the right problem the wrong way than to solve the wrong problem the right way".
Richard Hamming
“An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.”
John Tukey
Define the Problem
21
Data Collection
Collect data that are capable of answering the question asked (Data Quality Objectives)Designed experimentsObservational studies
SamplingYou select samples from a population in order
to make inferences about the population
22
GIGO The collection of data is often the most time-
consuming and expensive part of a study Reverend Bayes and all of his horses can’t fix a
bum dataset
23
Analyze the Data All statistical procedures have assumptions In practice, the assumptions of any given
statistical procedure are violated to some degree Can the validity of the assumptions be verified? Can the validity of the answer be verified?
How robust is your statistical procedure to violations of its assumptions?
Simple approximate solutions you can understand may be better than complex exact solutions that you can’t
Augment standard statistical analyses with simulations
24
Present Results Technical answer versus the functional
answer“the null hypothesis is not rejected” technically “not rejected” “accepted” functionally “not rejected” =“accepted”
Statistical significance and practical significanceApply “so what” test to your answers
25
What is a Statistician?
“Powerful spirits should only be called by the master himself”
GoetheThe Sorcerer's Apprentice
26
What is a Statistician? Based on Chatfield’s definition of statistics, anyone who
makes decisions based on the analysis of data might be called a statistician
However, the title statistician is usually reserved for a professional who has specialized training in the concepts, theoretical bases, and methodologies of statistics
Key difference between the sorcerer and his apprentice Contrary to what you might think, there is a lot of subjectivity and
professional judgment in the practice of statistics Statistics is vast in scope and detail, and the apprentice does not
know what he does not know
“It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so.”
Mark Twain
27
The Sorcerer’s Apprentice
We may not be statisticians, but we are clearly doing statistics, often without adult supervision
Doing our own statistics is a good thing, but we need to become better students of the black arts and consult the master before the brooms get out of control
“Should I refuse a good dinner simply because I do not understand the processes of digestion?”
Oliver Heaviside
[On being criticized for using formal mathematical manipulations without understanding how they worked]
28
How We Can be Better Statisticians
Master the basics Learn the language Play with your data Use better software Perform reproducible work Consult with a real statistician
29
Master the Basics
Kahn Academyhttp://www.khanacademy.org/
30
Statistics MS/Certificate Distance Programs
University of South Carolina Colorado State University Texas A&M University Penn State University
31
Concepts and Terminology Specialized Concepts
Population versus sample for example Statistics has a very precise language all its own
“the null hypothesis is not rejected” “not rejected” “accepted”
Questions and answers are not right unless you use the proper language to convey the proper concept some statisticians can be intolerant of laymen who misuse the
language of statistics Learn to phrase questions and interpret answers
properly
32
Exploratory Statistics
Learn to play with your data and see if it is trying to tell you something new
Study graphs of your data
“There is no data that can be displayed in a pie chart, that cannot be displayed BETTER in some other type of chart.”
John Tukey
33
Software used for Statistics
I use the following software for statistical calculations (in order of usage)RMinitabSASSpreadsheet (e.g., MS Excel, Gnumeric)
There are many others
34
Spreadsheets (Excel)
What some people can do in Excel is nothing short of amazing (but should they be doing it?) Amarillo Slim beat tennis champ Bobby Riggs at Ping-
Pong, using a frying pan instead of a paddle Spreadsheet Addiction by Patrick Burns
http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_addiction.html
Problems with spreadsheet implementation Excel has a long history of doing bad stats
Problems with spreadsheet paradigm Reproducible science
35
9/28/2007
http://www.msnbc.msn.com/id/21033161/from/RS.1/
M. G. Almiron et al. On the Numerical Accuracy of Spreadsheets, Journal of Statistical Software (34) 4, 2010
36
Reproducible Research Reproducible research refers to the idea that the ultimate
product of research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. necessary for reproduction of the results
Raw DataData
MassagingCalculations
Plots andTables
FinalPaper
37
The R Project forStatistical Computing
R is a language and environment for statistical computing and graphics
R is available as Free Software under the terms of the GNU General Public License in source code form
It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS
Download from http://www.r-project.org/
38
Advantages of R
Command line interface rather than a GUI Promotes reproducible statistics
Open source Flexible licensing Availability of source code for peer review Bugs are public knowledge and are fixed quickly New tests and methods tend to appear first in R
Many dozens of recently published books devoted to R
Free (and very good) community support available
39
Consult with a Statistician
If you are going to involve a statistician, do it at the study design and data collection phases If not, at least estimate how much it will cost
to collect the data all over again Anybody can analyze compelling data
“To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.”
Sir Ronald Fisher
40
Twisted Answers to Crooked Questions
As health physicists there are times when a decision will be made, with or without good data and a proper statistical analysis
In such situations we base our decisions on professional judgment, often augmented with “statistics” We must not fool ourselves about what we are doing
… of all the wrong answers we have to choose from, this one is the best
We have no right to expect a statistician to endorse such mischief
41
The Apprentice Should Beware of …
The Management Prior Being bamboozled by other people’s
statistics “The only right way to do this is X [insert
statistical method here]” Being seduced by complexity
42
Statistics in the Workplace:Musings of a Sorcerer's Apprentice
Presentation to USC Stat ClubMarch 26, 2009
Main message A degree in statistics is a “Swiss Army Knife” that is
very useful in any endeavor where data are collected and analyzed
Everyone in the room should become a health physicist (I had no takers)
top related