[@indeedeng] imhotep workshop
DESCRIPTION
Link to video: http://youtu.be/LBDZFtqL-ck?list=UURVEh0SlyrZNTeIbEDwj3wQ We are excited to announce the open source availability of Imhotep, the interactive data analytics platform that powers data-driven decision making at Indeed. In a previous talk, we explained how we developed Imhotep, a distributed system for building decision trees for machine learning. We went on to describe how we build large scale interactive analytics tools using the same platform. Next we showed how our engineering and product organizations use Imhotep to focus on key metrics at scale. During this session, Product Manager Tom Bergman provided examples of valuable insights that can be gained by using Imhotep. After the presentation, attendees explored their own data in Imhotep. Product engineers were on hand to answer questions.TRANSCRIPT
Imhotep Workshop
http://go.indeed.com/iws
engineering.indeed.com/talks
@IndeedEng Workshop:Interactive Analytics
with Imhotep
Tom BergmanProduct Manager
Is anybody there???
Does this thing work???
Are we winning???
Harder questions???
Imhotep is Indeed’s highly scalable
open-source analytics platform.
What is Imhotep?
● Imhotep Daemons
● Imhotep Query Language (IQL)
● IQL Web Client
● TSV/CSV Uploader
Imhotep open source included:
● Easy upload & compression
● Fast, Interactive queries
What does Imhotep do?
Imhotep Philosophy
● Quickly refine your questions
Interactive
SOME TIME LATER…
Time to the right question
Oh, bummer. Wrong question. Let’s try again.
Nope. Nope. YES!
Next question?
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation.
Ask all the questions!
Cool! Really?
Wow... Awesome
Oh… Ah! INSIGHT! …
● Data should not be down-sampled
Ground Truth
● Web-based to facilitate sharing
Show me the data
● Instantaneous sharing
Cache Rules Everything Around Me
● Easily queryable
Easy Access
● Dataset >
● Document >
● Field >
● Term >
Imhotep Data Structures
● DB Table
● Denormalized Row
● Column
● Value
Imhotep Query Language (IQL)
IQL - Imhotep Query Language
Expressive SQL-like language for aggregate analytics.
IQL queries - requirements
Dataset Date range
IQL queries - optional
DatasetDate range
FiltersGroup byMetrics
from searchresults
‘2013-12-05’
‘2013-12-10’
where country=ie
and jobagedays<1
group by time(1d)
select clicked, count()
IQL - Dataset
Dataset
from searchresults
‘2013-12-05’
‘2013-12-10’
where country=ie
and jobagedays<1
group by time(1d)
select clicked, count()
IQL - Date Range
Date Range
from searchresults
‘2013-12-05’
‘2013-12-10’
where country=ie
and jobagedays<1
group by time(1d)
select clicked, count()
IQL - Filters
Filters
from searchresults
‘2013-12-05’
‘2013-12-10’
where country=ie
and jobagedays<1
group by time(1d)
select clicked, count()
IQL - Group by
Groups
IQL - Metrics
from searchresults
‘2013-12-05’
‘2013-12-10’
where country=ie
and jobagedays<1
group by time(1d)
select clicked, count()
Metrics
IQL - Example Results
Extract Transform Load (ETL)
● Up to your data schema
Extract
● De-Normalize Data
● Formating
Transform
● Jobsearch
● Ad Clicks
● Resume Contacts
Example Datasets
● TSV/CSV Uploader
Load
● TSV/CSV Uploader
● Java API
Load
● Massive compression
● Fast Boolean Search
Inverted Index
● Run on any* computer/s
● Can deploy to AWS using handy
AWS CloudFormation script
Opensource Package
● Create S3 buckets
● Create EC2 Key Pair
● Run CloudFormation script
CloudFormation Setup
DEMO
Q&A
Helpful Workshop Linkshttp://go.indeed.com/iws