data science 101 - ibmpublic.dhe.ibm.com/systems/power/community/aix/... · transforming data 8...

11
Data Science 101 Chris Parsons 14 February 2018

Upload: others

Post on 14-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope

Data Science 101Chris Parsons

14 February 2018

Page 2: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope

© 2016 IBM Corporation

Agenda

• Disclaimer..• What format does my data need to be for the Machine Learning

frameworks?• Transforming Data• Options/Alternatives• Moving Forward..

2

Page 3: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope

© 2016 IBM Corporation

How do I get data into PowerAI?

3

Page 4: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope

© 2016 IBM Corporation

What format does my data need to be in for Machine Learning?

4

• Labeled data– Metadata (this is a dog, cat, etc.)– Sub folders named appropriately (/dog /cat)– CSV. (prevalent)

• TensorFlow– TFRecords

• Caffe– Blobs

Page 5: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope

© 2016 IBM Corporation

Transforming Data

5Credit - Udacity

Page 6: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope

© 2016 IBM Corporation

Transforming Data (null)

6

Page 7: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope

© 2016 IBM Corporation

Transforming Data

7

Page 8: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope

© 2016 IBM Corporation

Transforming Data

8

Tumor Proliferation Assessment – mitosis detectionImages from electron-microscope Size of image - 70K * 60K

Framework

Format Input Size (Faster R-CNN)

Caffe LMDB 1K*1KTensorFlow

TensorRecord

1K*1K

Data Transformation

Data Distribution among training, validation and testing

Data Shuffle

Page 9: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope

© 2016 IBM Corporation

Options/Alternatives

9

Page 10: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope

© 2016 IBM Corporation

DSX

• Access data from local files– Local CSV data. – Loaded and transformed to DSX “Asset” – Drag and drop/file explorer

• Access HDFS data– Cloudera/Hortonworks/BigInsights

• Access RDB Data– Scala, Python, R APIs– Db2, Netezza, Informix, Oracle, Mongo

• Remote data?10

Page 11: Data Science 101 - IBMpublic.dhe.ibm.com/systems/power/community/aix/... · Transforming Data 8 Tumor Proliferation Assessment –mitosis detection Images from electron-microscope

| 1114February2018