automating complex data analysis - basasbasas.com/wp-content/uploads/2018/01/doc-john... ·...
TRANSCRIPT
Automating Complex DataAnalysis
By John F. McGowan, Ph.D.Mathematical Software
toBay Area SAS User's Group (BASAS)
August 31, 2017At Genentech, Building 42 in South San Francisco
E-Mail: [email protected]: www.mathematical-software.com
http://www.basas.com
IS NOTSaaS
State of the Art
SAS Programming Language
SAS JMP (“Jump”)
SAS JMP (“Jump”)
And Many More...
State of the Art
SLOW
EXPENSIVE
ERROR PRONE
UNCONVINCING
What do I mean by AUTOMATION?
Drag and Drop Analysis
Automation
DATAFINAL
REPORTAnalystin a Box
Source Codefor Analysis(e.g. SAS)
Step by Step Logof Analysis
All Relevant Datawith Sources
Analyst in a Box
● Extract key numbers and information about thenumbers (meta-data) from the input data file or files:report, research article, table of data, Excel spreadsheet,etc..
● Key information about the numbers (meta-data) includes the units of measurement (subjects, milligrams,weeks), how values are measured, how data is collected,etc.
● Perform appropriate analysis implied by keyinformation (meta-data). Example: statistical poweranalysis for safety from clinical trials data.
Analyst in a Box
● Find a mathematical model for the data
– Recognize data resembles a mathematical function,equation, or other mathematical object
– Fit possible model to the data
– Evaluate goodness of fit
– Is something wrong with the data?
– Find another model if agreement with data is bad
– REPEAT UNTIL FIND A GOOD MODEL OR GIVE UP
Analyst in a Box: Outputs
● Log of the analysis steps
● Source code for tools such as SAS, SPSS,Python, R so anyone can reproduce the results.
● Generate an accessible final report. PLAINENGLISH
Technical Obstacle I: ModelSelection
● Many (infinitenumber)mathematical modelspossible!!!!
Obstacle: Model Selection
● Technically most challenging part of automatingcomplex data analysis is selection of themathematical model or models used.
● Infinite number of possible mathematicalmodels.
● Relies heavily on human judgment and patternrecognition – this data looks like thismathematical function to the analyst.
Technical Obstacle II: Keyinformation on numbers
What is UNRATE?
Obstacle: Key information onnumbers
Obstacle: Key information onnumbers
Obstacle: Key information onnumbers
● Key information on numbers (meta-data) includesthe units of measurement, how the data wasmeasured and collected, the definition of values.
● Key information on numbers (meta-data) is generallyin semi-structured text: scientific articles, technicalreports, data tables, FDA approval documents, etc.
● Data table column headers, if present, usually provideinsufficient information on the numbers for a fullanalysis. Example: What is UNRATE?
It Can Be Done!
DEMO
Conclusion
● Automation of Complex Data Analysis canSPEED UP the analysis.
● Automation can SAVE MONEY
● Automation can SAVE LIVES
● Automation can INCREASE THEPERSUASIVENESS OF THE RESULTS
● Automation can ENABLE THIRD PARTYAUDITING OF RESULTS
What You Can Do
● Please let me know your specific big problemsin complex data analysis!
● What would you like to see automated andwhy?
● Web: http://www.mathematical-software.com
● E-Mail: [email protected]