gentle introduction to machine learning
Post on 07-Apr-2017
19 Views
Preview:
TRANSCRIPT
1
Roman Orac, 1Tap Machine Learning & Data Analysis
A Gentle introduction to Machine Learning
1Tap is a Automated Accounting Platform
For the Self Employed*
* Sole Trader, Sole Proprietor, Freelancer, Contractor, Independent, Non Incorporated Businesses
Fully
The Self Employed can’t buy the stuff they want
Profit…Welfare…
Taxes…
No idea
That is a problem for the new year...
Denied...
Hopefully I get better real soon...
Credit…
6
Making Self Employment
> Employment
Our Mission
1Tap Receipts
Take a photo
Data Extracted
Tax Return updated
Customers Love it
1 2 3 4
The foundation of our apps
Ruby on Rails
Restful JSON API
4.0 Code Climate GPA
Enough about us …What is Machine Learning
Anyway?
What is Machine Learning?
Training data
Machine Learningalgorithm
ClassifierNew samples Prediction
Pre-processing
● Machine Learning is the science of getting computers to act without being explicitly programmed
Predict survival on the TitanicIn 1912 the Titanic sank, killing
1,502 out of 2,224 passengers and crew.
Some groups of people were more likely to survive than others.
Let’s look at the dataAbbreviations
● Embarked: Port of embarkation○ C = Cherbourg○ Q = Queenstown○ S = Southampton
● Parch: Number of parents/children aboard
● Pclass: Passenger's class● SibSp: Number of
siblings/spouses aboard● Survived: Survived (1) or died (0)● Ticket: Ticket number
Understanding the data● Distributions of the fare of passengers who survived or did
not survive● Many passengers with cheaper fares died● Is fare a good predictive variable?
Most Important Step: Data preprocessing
Original data Preprocessed data
preprocessing
● Clean the data● Encode attributes● Fill in missing values● Add new attributes
Decision Tree● Use training set and build a decision tree model● Use the model to predict new samples
What types of problems do we solve with ML at 1Tap?
Receipt categorization
Initial receipt categorization
based on company’s industry
deterministic categorization
many mis-categorization
The Numbers600K categorized receipts40K users80K new receipts every month
Receipt categorization with MLCategorizing receipts in a smarter and more contextual
way
● Features: ○ user’s profession○ vendor name, date, expense total and text
● Preprocessing:○ Filter receipts○ Recategorize most obvious receipts
● Train a classifier that categorizes receipts
● This approach improves categorization as receipt text adds more context
Receipt categorization with ML
Questions?
Come talk to us over pizza!
Nejc, Human Resources
Roman, Machine Learning
Vesna, Head of Product
top related