automating complex data analysis - basasbasas.com/wp-content/uploads/2018/01/doc-john... ·...

24
Automating Complex Data Analysis By John F. McGowan, Ph.D. Mathematical Software to Bay Area SAS User's Group (BASAS) August 31, 2017 At Genentech, Building 42 in South San Francisco E-Mail: [email protected] Web: www.mathematical-software.com

Upload: others

Post on 04-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

Automating Complex DataAnalysis

By John F. McGowan, Ph.D.Mathematical Software

toBay Area SAS User's Group (BASAS)

August 31, 2017At Genentech, Building 42 in South San Francisco

E-Mail: [email protected]: www.mathematical-software.com

Page 2: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

http://www.basas.com

Page 3: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

IS NOTSaaS

Page 4: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

State of the Art

Page 5: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

SAS Programming Language

Page 6: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

SAS JMP (“Jump”)

Page 7: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

SAS JMP (“Jump”)

Page 8: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

And Many More...

Page 9: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

State of the Art

SLOW

EXPENSIVE

ERROR PRONE

UNCONVINCING

Page 10: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

What do I mean by AUTOMATION?

Drag and Drop Analysis

Page 11: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

Automation

DATAFINAL

REPORTAnalystin a Box

Source Codefor Analysis(e.g. SAS)

Step by Step Logof Analysis

All Relevant Datawith Sources

Page 12: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

Analyst in a Box

● Extract key numbers and information about thenumbers (meta-data) from the input data file or files:report, research article, table of data, Excel spreadsheet,etc..

● Key information about the numbers (meta-data) includes the units of measurement (subjects, milligrams,weeks), how values are measured, how data is collected,etc.

● Perform appropriate analysis implied by keyinformation (meta-data). Example: statistical poweranalysis for safety from clinical trials data.

Page 13: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

Analyst in a Box

● Find a mathematical model for the data

– Recognize data resembles a mathematical function,equation, or other mathematical object

– Fit possible model to the data

– Evaluate goodness of fit

– Is something wrong with the data?

– Find another model if agreement with data is bad

– REPEAT UNTIL FIND A GOOD MODEL OR GIVE UP

Page 14: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

Analyst in a Box: Outputs

● Log of the analysis steps

● Source code for tools such as SAS, SPSS,Python, R so anyone can reproduce the results.

● Generate an accessible final report. PLAINENGLISH

Page 15: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

Technical Obstacle I: ModelSelection

● Many (infinitenumber)mathematical modelspossible!!!!

Page 16: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

Obstacle: Model Selection

● Technically most challenging part of automatingcomplex data analysis is selection of themathematical model or models used.

● Infinite number of possible mathematicalmodels.

● Relies heavily on human judgment and patternrecognition – this data looks like thismathematical function to the analyst.

Page 17: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

Technical Obstacle II: Keyinformation on numbers

What is UNRATE?

Page 18: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

Obstacle: Key information onnumbers

Page 19: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

Obstacle: Key information onnumbers

Page 20: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

Obstacle: Key information onnumbers

● Key information on numbers (meta-data) includesthe units of measurement, how the data wasmeasured and collected, the definition of values.

● Key information on numbers (meta-data) is generallyin semi-structured text: scientific articles, technicalreports, data tables, FDA approval documents, etc.

● Data table column headers, if present, usually provideinsufficient information on the numbers for a fullanalysis. Example: What is UNRATE?

Page 21: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

It Can Be Done!

Page 22: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

DEMO

Page 23: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

Conclusion

● Automation of Complex Data Analysis canSPEED UP the analysis.

● Automation can SAVE MONEY

● Automation can SAVE LIVES

● Automation can INCREASE THEPERSUASIVENESS OF THE RESULTS

● Automation can ENABLE THIRD PARTYAUDITING OF RESULTS

Page 24: Automating Complex Data Analysis - BASASbasas.com/wp-content/uploads/2018/01/Doc-John... · Obstacle: Key information on numbers Key information on numbers (meta-data) includes the

What You Can Do

● Please let me know your specific big problemsin complex data analysis!

● What would you like to see automated andwhy?

● Web: http://www.mathematical-software.com

● E-Mail: [email protected]