cv m4 l01 - cs.ou.edu

73
CV_M4_L01 Andrew H. Fagg: Machine Learning Practice 1

Upload: others

Post on 30-Jul-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CV M4 L01 - cs.ou.edu

• CV_M4_L01

Andrew H. Fagg: Machine Learning Practice 1

Page 2: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Representing Data

Andrew H. Fagg: Machine Learning Practice 2

Page 3: CV M4 L01 - cs.ou.edu

Connecting Real World Data to our ML Tools

Often have a huge disconnect between the two. Our ML

tools often rely on:

• Well-defined formatting of the data

• Cut into distinct examples. Each example:

– List of property values. Most often assume each example

consists of the same properties.

– Label / expected output value (for supervised problems)

Andrew H. Fagg: Machine Learning Practice 3

Page 4: CV M4 L01 - cs.ou.edu

Connecting Real World Data to our ML Tools

(cont) Our ML tools often assume:

• Properties are numerical

• Statistical independence between the different examples

• All examples are drawn from the same statistical

distribution

Andrew H. Fagg: Machine Learning Practice 4

Page 5: CV M4 L01 - cs.ou.edu

Connecting Real World Data to our ML Tools

Real world data can:

• Be weakly formatted

• Properties can be enumerated types (e.g., strings such as

“circle”, “square”)

• Values can be incorrect

• Values can be missing

• Different examples can have different properties

• Distribution that we draw examples from can be changing

in time

Andrew H. Fagg: Machine Learning Practice 5

Page 6: CV M4 L01 - cs.ou.edu

Connecting Real World Data to our ML Tools

Transforming the raw data to a well-formatted form is a key

first step:

• This step can take much of our project time, depending on

the form of the data

• How careful we are in taking this step can dramatically

affect everything else we do

• As a byproduct of this step, it is important to really

understand the nature of your data

Andrew H. Fagg: Machine Learning Practice 6

Page 7: CV M4 L01 - cs.ou.edu

Roadmap

• Pandas package

– Importing data from standard formats

– Data massaging

• Numpy package

– Efficient representation of numerical data

• Matplotlib package

– Matlab-like visualization package

Andrew H. Fagg: Machine Learning Practice 7

Page 8: CV M4 L01 - cs.ou.edu

Pandas

Toolkit for data handling and analysis

• File I/O, including csv files

• Hooks for visualization

• Basic statistics

• Data selection and massaging

• SQL-type operations

Andrew H. Fagg: Machine Learning Practice 8

Page 9: CV M4 L01 - cs.ou.edu

Classes Provided by Pandas

Two primary Python classes:

• Series: 1D data

– Indexed by integer location in the array or by some index

variable (index values can be numerical or strings)

• DataFrame: 2D data

– Each dimension indexed by integer index or other index

variable

– Most common for us: examples (rows) x features (columns)

Andrew H. Fagg: Machine Learning Practice 9

Page 10: CV M4 L01 - cs.ou.edu

Some Useful DataFrame Operations

• Data exploration:

– Show row / column index names

– Compute statistics for individual columns

• Create a new DataFrame that contains a subset of the

rows and/or columns

• Remove or repair rows and/or columns that contain

invalid data

• Export data to a numpy array for use with ML methods

Andrew H. Fagg: Machine Learning Practice 10

Page 11: CV M4 L01 - cs.ou.edu

Numpy

Numerical methods package

• Representation of vectors, matrices, tensors

– Vector: yet another way of representing a list of numbers

• Implementation of many linear algebra type operations

– Computing matrix inverses, Singular Value Decomposition …

• Basis for many ML packages, including Scikit-Learn

Andrew H. Fagg: Machine Learning Practice 11

Page 12: CV M4 L01 - cs.ou.edu

Andrew H. Fagg: Machine Learning Practice 12

Page 13: CV M4 L01 - cs.ou.edu

• CV_M4_L02

Andrew H. Fagg: Machine Learning Practice 13

Page 14: CV M4 L01 - cs.ou.edu

Real-Time Activity Recognition for

Assistive Robotics

OU Crawling Assistant (Kolobe, Fagg, Miller, Ding) Scientific American (Oct 2016)

Andrew H. Fagg: Machine Learning Practice

Page 15: CV M4 L01 - cs.ou.edu

Infants Learning to Crawl

• Learning to crawl is in part a reinforcement learning

process:

– Initially: making novel things happen (such as the body rolling

or shifting a bit) is rewarding

– Eventually: it becomes rewarding to grasp toys (or car keys)

• These rewards are important:

– Practice many types of motor skills

– Drives the development of spatial skills

Andrew H. Fagg: Machine Learning Practice 15

Page 16: CV M4 L01 - cs.ou.edu

Infants at Risk for Cerebral Palsy

• Initial exploratory movements do not result in interesting

things happening

• These infants show a dramatic delay in the onset of

crawling

• This impacts the learning of other motor skills & the

development of spatial skills

Andrew H. Fagg: Machine Learning Practice 16

Page 17: CV M4 L01 - cs.ou.edu

SIPPC Crawling Assistant

Wide-Angle

Cameras

Vertical

Lifts

6-Axis

Load Cell

Infant

Support

Covered Omni-

Wheels

EEG

Head

Net

Andrew H. Fagg: Machine Learning Practice

Page 18: CV M4 L01 - cs.ou.edu

Kinematic Capture Suit

IMU-based kinematic suit

• 12 sensors mounted in suit

• Real-time reconstruction of

body posture

• Recognition of crawling-like

actions

Lower leg Thigh

Back sensor

and central

processor

Shoulder

Upper

arm

ForearmFoot

Southerland (2012)

Andrew H. Fagg: Machine Learning Practice

Page 19: CV M4 L01 - cs.ou.edu

Infant-Robot Interaction

Three modes of interaction:

• Force control: robot velocity is linearly related to ground

reaction forces

• Power steering: small ground reaction forces produce a

substantial robot movement

• Gesture-based control: recognized crawling-like

movements produce robot movement

Andrew H. Fagg: Machine Learning Practice

Page 20: CV M4 L01 - cs.ou.edu

Machine Learning Questions

• Predict robot motion from kinematic data

• Predict visual attention from kinematic and robot data

• Predict limb motion from EEG data

• Predict visual attention from EEG data

• …

Andrew H. Fagg: Machine Learning Practice

Page 21: CV M4 L01 - cs.ou.edu

Andrew H. Fagg: Machine Learning Practice 21

Page 22: CV M4 L01 - cs.ou.edu

• CV_M4_L03

Andrew H. Fagg: Machine Learning Practice 22

Page 23: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Introduction to Pandas

Andrew H. Fagg: Machine Learning Practice 23

Page 24: CV M4 L01 - cs.ou.edu

Pandas Roadmap

• Importing data from Comma Separated Values (CVS) file

• Exploring data

• Indexing rows and columns

Andrew H. Fagg: Machine Learning Practice 24

Page 25: CV M4 L01 - cs.ou.edu

Pandas

• Live example

Andrew H. Fagg: Machine Learning Practice 25

Page 26: CV M4 L01 - cs.ou.edu

• CV_M4_L04

Andrew H. Fagg: Machine Learning Practice 26

Page 27: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Pandas: Basic Plotting

Andrew H. Fagg: Machine Learning Practice 27

Page 28: CV M4 L01 - cs.ou.edu

• Live example

Andrew H. Fagg: Machine Learning Practice 28

Page 29: CV M4 L01 - cs.ou.edu

• CV_M4_L05

Andrew H. Fagg: Machine Learning Practice 29

Page 30: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Introduction to Numpy

Andrew H. Fagg: Machine Learning Practice 30

Page 31: CV M4 L01 - cs.ou.edu

Numpy Rodmap

• Transforming Pandas data to a Numpy matrix

• Indexing Numpy matrices

• Combining vectors to create a matrix

Andrew H. Fagg: Machine Learning Practice 31

Page 32: CV M4 L01 - cs.ou.edu

• Live example

Andrew H. Fagg: Machine Learning Practice 32

Page 33: CV M4 L01 - cs.ou.edu

• CV_M4_L06

Andrew H. Fagg: Machine Learning Practice 33

Page 34: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Visualization with Matplotlib

Andrew H. Fagg: Machine Learning Practice 34

Page 35: CV M4 L01 - cs.ou.edu

Matplotlib Roadmap

• Creating temporal figures

• Creating scatter plots

• Tuning the display of figure elements

• Subplots

• Repairing a Pandas dataset & visualizing the results

Andrew H. Fagg: Machine Learning Practice 35

Page 36: CV M4 L01 - cs.ou.edu

• Live example

Andrew H. Fagg: Machine Learning Practice 36

Page 37: CV M4 L01 - cs.ou.edu

• CV_M4_L07

Andrew H. Fagg: Machine Learning Practice 37

Page 38: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Pipelines

Andrew H. Fagg: Machine Learning Practice 38

Page 39: CV M4 L01 - cs.ou.edu

Pipelines

• Data processing often involves multiple computational

steps, only some of which involve ML

• The Scikit-Learn Pipeline class provides a clean interface

for expressing these steps

– Each step (or pipeline element) is implemented by a class that

adheres to a standard interface

– This allows us to mix-and-match elements for different

purposes

Andrew H. Fagg: Machine Learning Practice 39

Page 40: CV M4 L01 - cs.ou.edu

Flavors of Pipeline Element Classes

A pipeline element class is some combination of:

• Estimator

• Transformer

• Predictor

Andrew H. Fagg: Machine Learning Practice 40

Page 41: CV M4 L01 - cs.ou.edu

Flavors of Pipeline Element Classes

Estimator: given a dataset, compute some measure or

some model parameters

• Implements the fit() method

– Takes as input one or two datasets (input data & desired

output)

• Our ML methods are estimators

Andrew H. Fagg: Machine Learning Practice 41

Page 42: CV M4 L01 - cs.ou.edu

Flavors of Pipeline Element Classes

Transformer: modifies a dataset in some way

• Implements the transform() method

– Takes as input one dataset

– Returns a dataset

• Transformers can be used to clean a dataset before it is

used by a ML method

Andrew H. Fagg: Machine Learning Practice 42

Page 43: CV M4 L01 - cs.ou.edu

Flavors of Pipeline Element Classes

Predictor: predicts some quantity given a dataset

• Implements the predict() method

– Takes as input one dataset and returns a different dataset

• Implements a score() method that evaluates a prediction

– Takes as input an input dataset and an expected output dataset

– Returns a score

Andrew H. Fagg: Machine Learning Practice 43

Page 44: CV M4 L01 - cs.ou.edu

Pipeline Notes

• Pipeline elements are classes in and of themselves

• The Pipeline class is also a pipeline elements

– So, we can nest pipelines!

• Python classes can inherit from multiple classes

– An element can be both an Estimator and a Predictor

• Datasets are generally Pandas objects or Numpy tensors

– A particular pipeline element will use only one type as an input

and one type as an output

Andrew H. Fagg: Machine Learning Practice 44

Page 45: CV M4 L01 - cs.ou.edu

• Live example

Andrew H. Fagg: Machine Learning Practice 45

Page 46: CV M4 L01 - cs.ou.edu

• CV_M4_L07

Andrew H. Fagg: Machine Learning Practice 46

Page 47: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Creating Pipeline Elements

Andrew H. Fagg: Machine Learning Practice 47

Page 48: CV M4 L01 - cs.ou.edu

• Live example

Andrew H. Fagg: Machine Learning Practice 48

Page 49: CV M4 L01 - cs.ou.edu

• CV_M4_L08b

Andrew H. Fagg: Machine Learning Practice 49

Page 50: CV M4 L01 - cs.ou.edu

• CV_M4_L09

Andrew H. Fagg: Machine Learning Practice 50

Page 51: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Creating Pipeline Element Classes

Andrew H. Fagg: Machine Learning Practice 51

Page 52: CV M4 L01 - cs.ou.edu

• Live example

Andrew H. Fagg: Machine Learning Practice 52

Page 53: CV M4 L01 - cs.ou.edu

• CV_M4_L10

Andrew H. Fagg: Machine Learning Practice 53

Page 54: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Pipeline Example: Computing Derivatives

Andrew H. Fagg: Machine Learning Practice 54

Page 55: CV M4 L01 - cs.ou.edu

Computing Derivatives

Numerical differentiation of a timeseries x:

• For each time t:

𝑥 𝑡 ≈𝑥 𝑡 + 1 − 𝑥[𝑡]

∆𝑡

• Often will want to include some filtering to address the

discrete nature of the data (though we won’t do this here)

Andrew H. Fagg: Machine Learning Practice 55

Page 56: CV M4 L01 - cs.ou.edu

• Live example

Andrew H. Fagg: Machine Learning Practice 56

Page 57: CV M4 L01 - cs.ou.edu

• CV_M4_L11

Andrew H. Fagg: Machine Learning Practice 57

Page 58: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Pipeline Example: Linear Imputer

Andrew H. Fagg: Machine Learning Practice 58

Page 59: CV M4 L01 - cs.ou.edu

Linear Imputer

For our implementation: we will take advantage of the

DataFrame.interpolate() method

Andrew H. Fagg: Machine Learning Practice 59

Page 60: CV M4 L01 - cs.ou.edu

• Live example

Andrew H. Fagg: Machine Learning Practice 60

Page 61: CV M4 L01 - cs.ou.edu

• CV_M4_L12

Andrew H. Fagg: Machine Learning Practice 61

Page 62: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Pipeline Example: Building a New Pipeline

Andrew H. Fagg: Machine Learning Practice 62

Page 63: CV M4 L01 - cs.ou.edu

• Live example

Andrew H. Fagg: Machine Learning Practice 63

Page 64: CV M4 L01 - cs.ou.edu

• CV_M4_L13

Andrew H. Fagg: Machine Learning Practice 64

Page 65: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Representing Categorical Data

Andrew H. Fagg: Machine Learning Practice 65

Page 66: CV M4 L01 - cs.ou.edu

Handling Categorical Data

• Discrete, finite set of values

– Most often the different values are strings or symbols

– Also known as an enumerated type

• Most ML algorithms only address numerical data, so need

some way of transforming from categorical values to

some numerical representation

Andrew H. Fagg: Machine Learning Practice 66

Page 67: CV M4 L01 - cs.ou.edu

Handling Categorical Data

Often done in stages:

• Identify the set of possible categorical values

• Transform these values into an integer index

– Order is arbitrary

• Transform the integer index into a 1-hot encoding

– Array of bits: one bit per possible index value

– For a given categorical value, only one bit is one and all others

are zeros

• Different from book: use OneHotEncoder to do all of this!

Andrew H. Fagg: Machine Learning Practice 67

Page 68: CV M4 L01 - cs.ou.edu

• Live example

Andrew H. Fagg: Machine Learning Practice 68

Page 69: CV M4 L01 - cs.ou.edu

• CV_M4_L14

Andrew H. Fagg: Machine Learning Practice 69

Page 70: CV M4 L01 - cs.ou.edu

CS/DSA 5970: Machine Learning Practice

Example: Adding Data to a DataFrame

Andrew H. Fagg: Machine Learning Practice 70

Page 71: CV M4 L01 - cs.ou.edu

Example: Adding Data to a DataFrame

Our example:

• Create a discrete label as a function of Z

• Convert discrete label to a 1-Hot Encoding

• Add these columns to the original DataFrame

Andrew H. Fagg: Machine Learning Practice 71

Page 72: CV M4 L01 - cs.ou.edu

• Live example

Andrew H. Fagg: Machine Learning Practice 72

Page 73: CV M4 L01 - cs.ou.edu

Andrew H. Fagg: Machine Learning Practice 73