gentle introduction to machine learning

21
1 Roman Orac, 1Tap Machine Learning & Data Analysis A Gentle introduction to Machine Learning

Upload: roman-orac

Post on 07-Apr-2017

19 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Gentle introduction to Machine Learning

1

Roman Orac, 1Tap Machine Learning & Data Analysis

A Gentle introduction to Machine Learning

Page 2: Gentle introduction to Machine Learning

1Tap is a Automated Accounting Platform

For the Self Employed*

* Sole Trader, Sole Proprietor, Freelancer, Contractor, Independent, Non Incorporated Businesses

Fully

Page 3: Gentle introduction to Machine Learning

The Self Employed can’t buy the stuff they want

Profit…Welfare…

Taxes…

No idea

That is a problem for the new year...

Denied...

Hopefully I get better real soon...

Credit…

6

Page 4: Gentle introduction to Machine Learning

Making Self Employment

> Employment

Our Mission

Page 5: Gentle introduction to Machine Learning

1Tap Receipts

Take a photo

Data Extracted

Tax Return updated

Customers Love it

1 2 3 4

Page 6: Gentle introduction to Machine Learning

The foundation of our apps

Ruby on Rails

Restful JSON API

4.0 Code Climate GPA

Page 7: Gentle introduction to Machine Learning

Enough about us …What is Machine Learning

Anyway?

Page 8: Gentle introduction to Machine Learning

What is Machine Learning?

Training data

Machine Learningalgorithm

ClassifierNew samples Prediction

Pre-processing

● Machine Learning is the science of getting computers to act without being explicitly programmed

Page 9: Gentle introduction to Machine Learning

Predict survival on the TitanicIn 1912 the Titanic sank, killing

1,502 out of 2,224 passengers and crew.

Some groups of people were more likely to survive than others.

Page 10: Gentle introduction to Machine Learning

Let’s look at the dataAbbreviations

● Embarked: Port of embarkation○ C = Cherbourg○ Q = Queenstown○ S = Southampton

● Parch: Number of parents/children aboard

● Pclass: Passenger's class● SibSp: Number of

siblings/spouses aboard● Survived: Survived (1) or died (0)● Ticket: Ticket number

Page 11: Gentle introduction to Machine Learning

Understanding the data● Distributions of the fare of passengers who survived or did

not survive● Many passengers with cheaper fares died● Is fare a good predictive variable?

Page 12: Gentle introduction to Machine Learning

Most Important Step: Data preprocessing

Original data Preprocessed data

preprocessing

● Clean the data● Encode attributes● Fill in missing values● Add new attributes

Page 13: Gentle introduction to Machine Learning

Decision Tree● Use training set and build a decision tree model● Use the model to predict new samples

Page 14: Gentle introduction to Machine Learning

What types of problems do we solve with ML at 1Tap?

Page 15: Gentle introduction to Machine Learning

Receipt categorization

Initial receipt categorization

based on company’s industry

deterministic categorization

many mis-categorization

The Numbers600K categorized receipts40K users80K new receipts every month

Page 16: Gentle introduction to Machine Learning

Receipt categorization with MLCategorizing receipts in a smarter and more contextual

way

Page 17: Gentle introduction to Machine Learning

● Features: ○ user’s profession○ vendor name, date, expense total and text

● Preprocessing:○ Filter receipts○ Recategorize most obvious receipts

● Train a classifier that categorizes receipts

● This approach improves categorization as receipt text adds more context

Receipt categorization with ML

Page 18: Gentle introduction to Machine Learning

Questions?

Page 19: Gentle introduction to Machine Learning
Page 20: Gentle introduction to Machine Learning
Page 21: Gentle introduction to Machine Learning

Come talk to us over pizza!

Nejc, Human Resources

Roman, Machine Learning

Vesna, Head of Product