Download - Week 10 fraud copy
![Page 1: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/1.jpg)
Think:Bing It On!
Compares Bing to Google
![Page 2: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/2.jpg)
![Page 3: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/3.jpg)
How would you design this?Tell me:
![Page 4: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/4.jpg)
Me?And I’m guessing:
Hypothesis: Students in Toronto do not prefer one SE to another.
![Page 5: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/5.jpg)
How?100 Senecans will be surveyed by 10 paid
surveyors.Asked to compare two frames with fonts, colours and text sizes randomized.Search terms Senecans choose.Choose frame they like best: Google or BingResults not revealed to participants
![Page 6: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/6.jpg)
Why?Identify sample and population I’m
trying to sample.Removing my bias by asking surveyorsSurveyors will not know how survey is
designed.“Double blind”
![Page 7: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/7.jpg)
Why 100?10 is too few1000 is too many.For sufficiently large n, the distribution of will be closely approximated by a normal distribution with the same mean and
variance.[1] Using this approximation, it can be shown that around 95% of this distribution's probability lies within 2 standard deviations of the mean. Because of this, an interval of the form
will form a 95% confidence interval for the true proportion. If this interval needs to be no more than W units wide, the equation
can be solved for n, yielding[2][3] n = 4/W2 = 1/B2 where B is the error bound on the estimate, i.e., the estimate is usually given as within ± B. So, for B = 10% one requires n = 100, for B = 5% one needs n = 400, for B = 3% the requirement approximates to n = 1000, while for B = 1% a sample size of n = 10000 is required. These numbers are quoted often in news reports of opinion polls and other sample surveys.
“Sample Size Determination”
![Page 8: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/8.jpg)
Say that works60 prefer Bing40 prefer Google
What does that mean?
![Page 9: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/9.jpg)
I have no idea!Well, sort of.
60% (±5%, p=.05) prefer Bing to Google
You tell me, what does that mean?
![Page 10: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/10.jpg)
Maybe nothing?Maybe something?
![Page 11: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/11.jpg)
Look: that was as easy as it gets!
Population identification, sample size calculation, double blinding, within two standard deviations, after stripping CSS—all that before I do the statistics
Which I can’t understand!
![Page 12: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/12.jpg)
Good methodology● Design your experiment before hand● Run the experiment according to design● Without peeking
– Or changing● Collect all data● Interpret all data● Make all data available● Analyze data according to good analysis principles.
![Page 13: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/13.jpg)
DucklingsYou have no idea how to do this.
No idea.Neither do I.
![Page 14: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/14.jpg)
QuestionsHow many people do you need to
survey?How do you test them?Double blind?Blind?What do you ask them?
![Page 15: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/15.jpg)
You have to do this● It’s too easy to fool yourself
![Page 16: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/16.jpg)
Let’s reviewPublish or perish?
Who perishes? And where do they publish?
![Page 17: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/17.jpg)
JournalsWhat are the most prestigious
journals in the world?How do you know?
![Page 18: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/18.jpg)
Impact factorNature
Proceedings of the National Academy of Science
Science
Physical Review Letters
Journal of the American Chemistry Society
Physics Review B
Journal of Biological Chemistry
Applied Physics Letters
New England Journal of Medicine
Cell
(Eigenfactor.org data for 2011, most recent available)
![Page 19: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/19.jpg)
RoughlyNumber of in-citationsNumber of out-citations
![Page 20: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/20.jpg)
But?Top-ranked are mostly medicine w.
some physicsNo computers in top 100
Bioinformatics: 68
![Page 21: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/21.jpg)
Get publishedOr get fired.
Science, Nature, Cell, NEJM, JAMA
You get ‘tenure’—never fired, made for life.
![Page 22: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/22.jpg)
● Japanese researcher in anaesthesiology– Worked in Canada too
● Published 212 papers in 20 years(about one a month)
(Hmmmmm).
Yoshitaka Fujii
![Page 23: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/23.jpg)
You’ll never guessHe made them up.
● 172 are demonstrably false.
![Page 24: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/24.jpg)
As an aside:● Retractions still need work:
– Of Fujii’s first ten articles on GS ● 4 was clearly retracted● 1 was less clearly retracted● 5 were not labelled as retracted
![Page 25: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/25.jpg)
Jan Hendrik Schön● Nano-physics genius!
– Won $100,000 as best young scientist
– Published, at his best, one paper every eight days● Including in Science and Nature
–The very best journals in the world.
![Page 26: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/26.jpg)
Now● He has 10 friends on Facebook.
– I’m one!Gave back his PhD.Disappeared
![Page 27: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/27.jpg)
You’ll never guess● He made all of his data up.
– [Movie time! 35:00]
![Page 28: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/28.jpg)
So?● What’s the problem?
● So they lied. Nobody died.● (Well, probably. Fujii was a doctor.)
![Page 29: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/29.jpg)
As I see it● Money
– Millions of dollars● Reputation
– Bell Labs, universities, colleagues, students
● Work: Reid Chesterfield spent 5 years trying to replicate Schön’s work.
![Page 30: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/30.jpg)
MohammadHis supervisor spent months trying to
replicate Schön’s work(That’s hundreds of thousands of dollars)
![Page 31: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/31.jpg)
Another kind● Damages to the scientific enterprise:
– Science has to be open to catch cheaters
– But openess makes researchers look bad
![Page 32: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/32.jpg)
Kinds of fraud● Fabrication● Falsification● Other
![Page 33: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/33.jpg)
Fraud“Fabrication of data involves totally inventing a
data set, falsification refers to manipulation of equipment or changing data such that the research is not accurately represented in the research report.” (Stroebe, Pestmes and Spears)
![Page 34: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/34.jpg)
Fabrication● Pretty clear—you make up the data.
![Page 35: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/35.jpg)
Falsification● Changing or interpreting the data:
“There is no rigid mathematical definition of what constitutes an outlier; determining whether or not an observation is an outlier is ultimately a subjective exercise.”
![Page 36: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/36.jpg)
Outliers● How do you deal with them?
– Bill Gates walks in the room● Median and mean income?
![Page 37: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/37.jpg)
(How) Do you eliminate that variation?
![Page 38: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/38.jpg)
Data picking
● Say you want to show that monkeys flip a coin to heads more often than humans. How do you do it?
● Not investigate. Show
![Page 39: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/39.jpg)
● 1) Each flip 1 coin 100 times● 2) Each flip 10 coins 10 times● 3) Each flip 100 coins 1 time
![Page 40: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/40.jpg)
Then...
● Re-design your experiment!
![Page 41: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/41.jpg)
Then...
● Monkeys and humans each flipped 10 coins....
![Page 42: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/42.jpg)
A ha.● This is (abuse) of methodology
– And why I keep saying it matters!
![Page 43: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/43.jpg)
Google Scholar vs MAS
What does that tell you?
![Page 44: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/44.jpg)
Google Scholar vs MAS● That GS has better searching than
MAS● Or that GS has worse searching than
MAS!
![Page 45: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/45.jpg)
Check this out
0 10 20 30 40 50 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Column A
Linear (Column A)
![Page 46: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/46.jpg)
ClearlyA strong trend:
Decreasing over xDespite what appear to be sinusoidal variations
![Page 47: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/47.jpg)
One problem:Made the data with random numbers
– And a few tricks● No R value● Lighten points● Darken line● Compress y for sharpness● Regenerate data if necessary
![Page 48: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/48.jpg)
AlsoChoose line of best fit:
Linear? Moving average? Exponential? Log?
![Page 49: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/49.jpg)
Of course● That’s not nearly the only way!
– Repeat the whole experiment– Blinding– Survey design– Outlier elimination
● And so on.
![Page 50: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/50.jpg)
So: It’s easyIt’s so, so easy to cheat!
Let’s do it:
![Page 51: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/51.jpg)
Google vs BingSay you wanted to show that Bing >
Google.How would you?
![Page 52: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/52.jpg)
Population is, er, everyone!
Sample 1000 in Seattle
Sample young white men in Seattle
Redo sample!
Remove double blind
Remove single blind
10 in a row for Google? Outlier!
Choose best 100 of 1000 in Seattle
Repeat that ‘experiment’ to find the 20 th out of 20.
![Page 53: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/53.jpg)
Why?● Career pressure
– Publish or perish– Past glories
● Over confidence● Tempation because of irreproducibility
![Page 54: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/54.jpg)
How do they get caught?● Data that is too good● Draw suspicion in publication● Ratted out by underlings
![Page 55: Week 10 fraud copy](https://reader034.vdocuments.us/reader034/viewer/2022042816/5598755a1a28ab42478b475a/html5/thumbnails/55.jpg)
Lessons:● Don’t cheat well● Don’t cheat much