[@indeedeng] imhotep workshop

Post on 05-Jul-2015

705 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Link to video: http://youtu.be/LBDZFtqL-ck?list=UURVEh0SlyrZNTeIbEDwj3wQ We are excited to announce the open source availability of Imhotep, the interactive data analytics platform that powers data-driven decision making at Indeed. In a previous talk, we explained how we developed Imhotep, a distributed system for building decision trees for machine learning. We went on to describe how we build large scale interactive analytics tools using the same platform. Next we showed how our engineering and product organizations use Imhotep to focus on key metrics at scale. During this session, Product Manager Tom Bergman provided examples of valuable insights that can be gained by using Imhotep. After the presentation, attendees explored their own data in Imhotep. Product engineers were on hand to answer questions.

TRANSCRIPT

Imhotep Workshop

http://go.indeed.com/iws

engineering.indeed.com/talks

@IndeedEng Workshop:Interactive Analytics

with Imhotep

Tom BergmanProduct Manager

Is anybody there???

Does this thing work???

Are we winning???

Harder questions???

Imhotep is Indeed’s highly scalable

open-source analytics platform.

What is Imhotep?

● Imhotep Daemons

● Imhotep Query Language (IQL)

● IQL Web Client

● TSV/CSV Uploader

Imhotep open source included:

● Easy upload & compression

● Fast, Interactive queries

What does Imhotep do?

Imhotep Philosophy

● Quickly refine your questions

Interactive

SOME TIME LATER…

Time to the right question

Oh, bummer. Wrong question. Let’s try again.

Nope. Nope. YES!

Next question?

Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

Ask all the questions!

Cool! Really?

Wow... Awesome

Oh… Ah! INSIGHT! …

● Data should not be down-sampled

Ground Truth

● Web-based to facilitate sharing

Show me the data

● Instantaneous sharing

Cache Rules Everything Around Me

● Easily queryable

Easy Access

● Dataset >

● Document >

● Field >

● Term >

Imhotep Data Structures

● DB Table

● Denormalized Row

● Column

● Value

Imhotep Query Language (IQL)

IQL - Imhotep Query Language

Expressive SQL-like language for aggregate analytics.

IQL queries - requirements

Dataset Date range

IQL queries - optional

DatasetDate range

FiltersGroup byMetrics

from searchresults

‘2013-12-05’

‘2013-12-10’

where country=ie

and jobagedays<1

group by time(1d)

select clicked, count()

IQL - Dataset

Dataset

from searchresults

‘2013-12-05’

‘2013-12-10’

where country=ie

and jobagedays<1

group by time(1d)

select clicked, count()

IQL - Date Range

Date Range

from searchresults

‘2013-12-05’

‘2013-12-10’

where country=ie

and jobagedays<1

group by time(1d)

select clicked, count()

IQL - Filters

Filters

from searchresults

‘2013-12-05’

‘2013-12-10’

where country=ie

and jobagedays<1

group by time(1d)

select clicked, count()

IQL - Group by

Groups

IQL - Metrics

from searchresults

‘2013-12-05’

‘2013-12-10’

where country=ie

and jobagedays<1

group by time(1d)

select clicked, count()

Metrics

IQL - Example Results

Extract Transform Load (ETL)

● Up to your data schema

Extract

● De-Normalize Data

● Formating

Transform

● Jobsearch

● Ad Clicks

● Resume Contacts

Example Datasets

● TSV/CSV Uploader

Load

● TSV/CSV Uploader

● Java API

Load

● Massive compression

● Fast Boolean Search

Inverted Index

● Run on any* computer/s

● Can deploy to AWS using handy

AWS CloudFormation script

Opensource Package

● Create S3 buckets

● Create EC2 Key Pair

● Run CloudFormation script

CloudFormation Setup

DEMO

Q&A

Helpful Workshop Linkshttp://go.indeed.com/iws

top related