data mining disasters a report mary mcglohon sigbovik commission for workplace safety
Post on 19-Dec-2015
217 views
TRANSCRIPT
Data Mining Disasters
A Report
Mary McGlohonSIGBOVIK Commission for Workplace Safety
Data Mining Safety
•Data mining disasters are a hazard to the progress of scientific research.
•We will review some common mining disasters and make recommendations for prevention
Numeric Overflow
In 2007, numeric floods were responsible for over $600 million in property
damages.-Department of Made-Up Statistics
““’’’’
Numeric Overflow
ERROR::NUMERICOVERFLOW Nobody expected the breach of the levees
Numeric Overflow
•Also caused loss of several hundred nerd-hours.
•1 nerd-hour = 1 grad-student-hour = 0.25 faculty-hours = 6 undergrad-hours
Numeric Overflow
•Recommendation: A drowning researcher’s best bet is to grab onto a floating log.
Power Law Failures
•Occurs when confusing heavy-tailed distributions such as:
• Power Law (incl. Pareto, Zipf)
• Lognormal
• Weibull
• Burr
• Log-gamma
• Log-Log-Log-Log-Mushroom-Mushroom
Power Law Failures
•Many natural phenomena have heavy tails.
• Magnitude of earthquakes
• Size of human settlements
• Degree distribution of “real” graphs
• Time-to-response in CS professors email
• Your mom
•However, confusing heavy-tailed distributions confused results in...
Power Law Failures
•Related danger: Statisticians, computer scientists, and physicists wasting valuable nerd-hours in religious arguments over which heavy-tailed distribution is being followed.
Power Law Failures
•Statisticians get mean when they get religious. (SIGBOVIK07)
•Recommendation: Calm the hell down.
Decision Tree Forest Fires
•Pruning is used to prevent overfitting.
•When overpruning occurs, trees are burned to stumps.
•This spreads, torching entire forests.
(Aww...)
Decision Tree Forest Fires•Recommendation:
Researchers should obtain burning permit before pruning with fire.
•Smoking while researching is not recommended-- if you choose to do so, make sure your “butts are out”.
Voting Fraud by One-Armed Bandits
•Cascading failures from other fields may cause disasters in data mining.
•Fatal mistake: combining related subfields voting mechanisms and one-armed bandit problems.
Voting Fraud by One-Armed Bandits
•One-armed bandits commit voting fraud by:
• Impersonating real voting machines.
• Cramming cake into voting machines.
• (The cake is a lie.)
Other safety measures
•Cool mining helmets
Conclusion
•The Commission for Workplace Safety hopes this has raised awareness of potential data mining disasters.
•When faced with data-mining disasters,
• Remain Calm.
• Blame it on one-off errors, lack of rigor in proofs of correctness, or whatever government agency is funding the project.