[@indeedeng] imhotep workshop

46
Imhotep Workshop http://go.indeed.com/iws

Upload: indeedeng

Post on 05-Jul-2015

705 views

Category:

Technology


1 download

DESCRIPTION

Link to video: http://youtu.be/LBDZFtqL-ck?list=UURVEh0SlyrZNTeIbEDwj3wQ We are excited to announce the open source availability of Imhotep, the interactive data analytics platform that powers data-driven decision making at Indeed. In a previous talk, we explained how we developed Imhotep, a distributed system for building decision trees for machine learning. We went on to describe how we build large scale interactive analytics tools using the same platform. Next we showed how our engineering and product organizations use Imhotep to focus on key metrics at scale. During this session, Product Manager Tom Bergman provided examples of valuable insights that can be gained by using Imhotep. After the presentation, attendees explored their own data in Imhotep. Product engineers were on hand to answer questions.

TRANSCRIPT

Page 1: [@IndeedEng] Imhotep Workshop

Imhotep Workshop

http://go.indeed.com/iws

Page 2: [@IndeedEng] Imhotep Workshop

engineering.indeed.com/talks

Page 3: [@IndeedEng] Imhotep Workshop

@IndeedEng Workshop:Interactive Analytics

with Imhotep

Page 4: [@IndeedEng] Imhotep Workshop

Tom BergmanProduct Manager

Page 5: [@IndeedEng] Imhotep Workshop

Is anybody there???

Page 6: [@IndeedEng] Imhotep Workshop

Does this thing work???

Page 7: [@IndeedEng] Imhotep Workshop

Are we winning???

Page 8: [@IndeedEng] Imhotep Workshop

Harder questions???

Page 9: [@IndeedEng] Imhotep Workshop

Imhotep is Indeed’s highly scalable

open-source analytics platform.

What is Imhotep?

Page 10: [@IndeedEng] Imhotep Workshop

● Imhotep Daemons

● Imhotep Query Language (IQL)

● IQL Web Client

● TSV/CSV Uploader

Imhotep open source included:

Page 11: [@IndeedEng] Imhotep Workshop

● Easy upload & compression

● Fast, Interactive queries

What does Imhotep do?

Page 12: [@IndeedEng] Imhotep Workshop

Imhotep Philosophy

Page 13: [@IndeedEng] Imhotep Workshop

● Quickly refine your questions

Interactive

Page 14: [@IndeedEng] Imhotep Workshop

SOME TIME LATER…

Time to the right question

Oh, bummer. Wrong question. Let’s try again.

Nope. Nope. YES!

Next question?

Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

Page 15: [@IndeedEng] Imhotep Workshop

Ask all the questions!

Cool! Really?

Wow... Awesome

Oh… Ah! INSIGHT! …

Page 16: [@IndeedEng] Imhotep Workshop

● Data should not be down-sampled

Ground Truth

Page 17: [@IndeedEng] Imhotep Workshop
Page 18: [@IndeedEng] Imhotep Workshop

● Web-based to facilitate sharing

Show me the data

Page 19: [@IndeedEng] Imhotep Workshop
Page 20: [@IndeedEng] Imhotep Workshop
Page 21: [@IndeedEng] Imhotep Workshop

● Instantaneous sharing

Cache Rules Everything Around Me

Page 22: [@IndeedEng] Imhotep Workshop

● Easily queryable

Easy Access

Page 23: [@IndeedEng] Imhotep Workshop

● Dataset >

● Document >

● Field >

● Term >

Imhotep Data Structures

● DB Table

● Denormalized Row

● Column

● Value

Page 24: [@IndeedEng] Imhotep Workshop

Imhotep Query Language (IQL)

Page 25: [@IndeedEng] Imhotep Workshop

IQL - Imhotep Query Language

Expressive SQL-like language for aggregate analytics.

Page 26: [@IndeedEng] Imhotep Workshop

IQL queries - requirements

Dataset Date range

Page 27: [@IndeedEng] Imhotep Workshop

IQL queries - optional

DatasetDate range

FiltersGroup byMetrics

Page 28: [@IndeedEng] Imhotep Workshop

from searchresults

‘2013-12-05’

‘2013-12-10’

where country=ie

and jobagedays<1

group by time(1d)

select clicked, count()

IQL - Dataset

Dataset

Page 29: [@IndeedEng] Imhotep Workshop

from searchresults

‘2013-12-05’

‘2013-12-10’

where country=ie

and jobagedays<1

group by time(1d)

select clicked, count()

IQL - Date Range

Date Range

Page 30: [@IndeedEng] Imhotep Workshop

from searchresults

‘2013-12-05’

‘2013-12-10’

where country=ie

and jobagedays<1

group by time(1d)

select clicked, count()

IQL - Filters

Filters

Page 31: [@IndeedEng] Imhotep Workshop

from searchresults

‘2013-12-05’

‘2013-12-10’

where country=ie

and jobagedays<1

group by time(1d)

select clicked, count()

IQL - Group by

Groups

Page 32: [@IndeedEng] Imhotep Workshop

IQL - Metrics

from searchresults

‘2013-12-05’

‘2013-12-10’

where country=ie

and jobagedays<1

group by time(1d)

select clicked, count()

Metrics

Page 33: [@IndeedEng] Imhotep Workshop

IQL - Example Results

Page 34: [@IndeedEng] Imhotep Workshop
Page 35: [@IndeedEng] Imhotep Workshop

Extract Transform Load (ETL)

Page 36: [@IndeedEng] Imhotep Workshop

● Up to your data schema

Extract

Page 37: [@IndeedEng] Imhotep Workshop

● De-Normalize Data

● Formating

Transform

Page 38: [@IndeedEng] Imhotep Workshop

● Jobsearch

● Ad Clicks

● Resume Contacts

Example Datasets

Page 39: [@IndeedEng] Imhotep Workshop

● TSV/CSV Uploader

Load

Page 40: [@IndeedEng] Imhotep Workshop

● TSV/CSV Uploader

● Java API

Load

Page 41: [@IndeedEng] Imhotep Workshop

● Massive compression

● Fast Boolean Search

Inverted Index

Page 42: [@IndeedEng] Imhotep Workshop

● Run on any* computer/s

● Can deploy to AWS using handy

AWS CloudFormation script

Opensource Package

Page 43: [@IndeedEng] Imhotep Workshop

● Create S3 buckets

● Create EC2 Key Pair

● Run CloudFormation script

CloudFormation Setup

Page 44: [@IndeedEng] Imhotep Workshop

DEMO

Page 45: [@IndeedEng] Imhotep Workshop

Q&A

Page 46: [@IndeedEng] Imhotep Workshop

Helpful Workshop Linkshttp://go.indeed.com/iws