gentle introduction to machine learning

Post on 07-Apr-2017

19 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Roman Orac, 1Tap Machine Learning & Data Analysis

A Gentle introduction to Machine Learning

1Tap is a Automated Accounting Platform

For the Self Employed*

* Sole Trader, Sole Proprietor, Freelancer, Contractor, Independent, Non Incorporated Businesses

Fully

The Self Employed can’t buy the stuff they want

Profit…Welfare…

Taxes…

No idea

That is a problem for the new year...

Denied...

Hopefully I get better real soon...

Credit…

6

Making Self Employment

> Employment

Our Mission

1Tap Receipts

Take a photo

Data Extracted

Tax Return updated

Customers Love it

1 2 3 4

The foundation of our apps

Ruby on Rails

Restful JSON API

4.0 Code Climate GPA

Enough about us …What is Machine Learning

Anyway?

What is Machine Learning?

Training data

Machine Learningalgorithm

ClassifierNew samples Prediction

Pre-processing

● Machine Learning is the science of getting computers to act without being explicitly programmed

Predict survival on the TitanicIn 1912 the Titanic sank, killing

1,502 out of 2,224 passengers and crew.

Some groups of people were more likely to survive than others.

Let’s look at the dataAbbreviations

● Embarked: Port of embarkation○ C = Cherbourg○ Q = Queenstown○ S = Southampton

● Parch: Number of parents/children aboard

● Pclass: Passenger's class● SibSp: Number of

siblings/spouses aboard● Survived: Survived (1) or died (0)● Ticket: Ticket number

Understanding the data● Distributions of the fare of passengers who survived or did

not survive● Many passengers with cheaper fares died● Is fare a good predictive variable?

Most Important Step: Data preprocessing

Original data Preprocessed data

preprocessing

● Clean the data● Encode attributes● Fill in missing values● Add new attributes

Decision Tree● Use training set and build a decision tree model● Use the model to predict new samples

What types of problems do we solve with ML at 1Tap?

Receipt categorization

Initial receipt categorization

based on company’s industry

deterministic categorization

many mis-categorization

The Numbers600K categorized receipts40K users80K new receipts every month

Receipt categorization with MLCategorizing receipts in a smarter and more contextual

way

● Features: ○ user’s profession○ vendor name, date, expense total and text

● Preprocessing:○ Filter receipts○ Recategorize most obvious receipts

● Train a classifier that categorizes receipts

● This approach improves categorization as receipt text adds more context

Receipt categorization with ML

Questions?

Come talk to us over pizza!

Nejc, Human Resources

Roman, Machine Learning

Vesna, Head of Product

top related