Download - [@IndeedEng] Imhotep Workshop
Imhotep Workshop
http://go.indeed.com/iws
engineering.indeed.com/talks
@IndeedEng Workshop:Interactive Analytics
with Imhotep
Tom BergmanProduct Manager
Is anybody there???
Does this thing work???
Are we winning???
Harder questions???
Imhotep is Indeed’s highly scalable
open-source analytics platform.
What is Imhotep?
● Imhotep Daemons
● Imhotep Query Language (IQL)
● IQL Web Client
● TSV/CSV Uploader
Imhotep open source included:
● Easy upload & compression
● Fast, Interactive queries
What does Imhotep do?
Imhotep Philosophy
● Quickly refine your questions
Interactive
SOME TIME LATER…
Time to the right question
Oh, bummer. Wrong question. Let’s try again.
Nope. Nope. YES!
Next question?
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation.
Ask all the questions!
Cool! Really?
Wow... Awesome
Oh… Ah! INSIGHT! …
● Data should not be down-sampled
Ground Truth
● Web-based to facilitate sharing
Show me the data
● Instantaneous sharing
Cache Rules Everything Around Me
● Easily queryable
Easy Access
● Dataset >
● Document >
● Field >
● Term >
Imhotep Data Structures
● DB Table
● Denormalized Row
● Column
● Value
Imhotep Query Language (IQL)
IQL - Imhotep Query Language
Expressive SQL-like language for aggregate analytics.
IQL queries - requirements
Dataset Date range
IQL queries - optional
DatasetDate range
FiltersGroup byMetrics
from searchresults
‘2013-12-05’
‘2013-12-10’
where country=ie
and jobagedays<1
group by time(1d)
select clicked, count()
IQL - Dataset
Dataset
from searchresults
‘2013-12-05’
‘2013-12-10’
where country=ie
and jobagedays<1
group by time(1d)
select clicked, count()
IQL - Date Range
Date Range
from searchresults
‘2013-12-05’
‘2013-12-10’
where country=ie
and jobagedays<1
group by time(1d)
select clicked, count()
IQL - Filters
Filters
from searchresults
‘2013-12-05’
‘2013-12-10’
where country=ie
and jobagedays<1
group by time(1d)
select clicked, count()
IQL - Group by
Groups
IQL - Metrics
from searchresults
‘2013-12-05’
‘2013-12-10’
where country=ie
and jobagedays<1
group by time(1d)
select clicked, count()
Metrics
IQL - Example Results
Extract Transform Load (ETL)
● Up to your data schema
Extract
● De-Normalize Data
● Formating
Transform
● Jobsearch
● Ad Clicks
● Resume Contacts
Example Datasets
● TSV/CSV Uploader
Load
● TSV/CSV Uploader
● Java API
Load
● Massive compression
● Fast Boolean Search
Inverted Index
● Run on any* computer/s
● Can deploy to AWS using handy
AWS CloudFormation script
Opensource Package
● Create S3 buckets
● Create EC2 Key Pair
● Run CloudFormation script
CloudFormation Setup
DEMO
Q&A
Helpful Workshop Linkshttp://go.indeed.com/iws