introduction to infobright april 16, 2014

Introduction to Infobright

1Confidential – Do Not Distribute

Agenda

Today’s Analytic ChallengesThe Infobright Analytic PlatformGetting Started

The Rise of Machine Data

50 billion connected devices

$7.4B mobile advertising billings

190 Exabytes of data from: –Web logs–Sensor data–Call data records–Transaction records


More than just “Big” Data

•Transactional•Analytics

•Dynamic•Static

•Pre-planned•Ad-hoc•Rough•Approximate

•Structured•Unstructured•Semi-structured

Data Query

FunctionData Refresh

“Real time” Analytics: The New Imperative

Identify security threats & fraud Troubleshoot networks Optimize online/mobile ads Plan capacity scale-out Competitive positioning

Infobright Powers Big Data Analytics


Who is Infobright

Global provider of database analytics platforms to over 450 direct and OEM customers in the telecom, digital media and marketing, financial services, solution provider, energy and healthcare markets

Key Benefits of the Knowledge Grid Architecture


Column vs. Row: What is the best use case?

Row Oriented

All the columns are needed

Transactional processing is required

Column Oriented

Only relevant columns are needed

Reports are aggregates (sum, count, average,

etc.)

Column vs. Row: How it Works

50 days worth of data, 1 million rows / day

Disk I/O is the primary limiting factor

A row-oriented design forces the database to retrieve all column data

As table size increases so do the indexes

Load speed degrades since indexes need to be recreated as data is added; this causes huge sorts (another very slow operation)

30 Columns

50

M R

ow

s

Column vs. Row: How it works

Query:– Select Column 11 ,

Where Column 17 for the 3rd week (day 15 – day 21)

30 Columns

50

M R

ow

s


Row-based results– Eliminate 43 days– 7 million rows

retrieved– 210 million data

elements retrieved

30 Columns

50

M R

ow

s


Column-based results– Eliminate 43 days– Eliminate 28 of the 30

columns– 14 million data

elements

30 Columns

50

M R

ow

s

Data Loading Process: Data Packs

Bulk load input data

… … …64K

64K

64K

64K

A1

A2

A3

A-n

B1

B2

B3

B-n

C1

C2

C3

C-n

Data Packs

Data Loading Process: Compression &Knowledge Grid

… … …

64K

64K

64K

64K

Data packs compressed

On-Disk storage

In MemoryKnowledge Grid

What Your Data Looks Like Now

Original Data

10 TBCompressed Data

500 GB

The Knowledge Grid: How it works

Knowledge Nodes answer the query directly, or

Identify only required Data Packs, minimizing decompression, and

Predict required data in advance based on workload

All driven by a granular computing engine

Queries with the Knowledge Grid: How it Works

Query: How are my sales doing this year?

Granular engine iterates on Knowledge Grid

Each pass eliminates Data Packs

If any Data Packs are needed to resolve query, only those are decompressed

Knowledge Grid

Compressed Data


SELECT count(*)FROM employees WHERE salary > 100000

AND age < 35AND job = ‘DBA’AND state = ‘TX’

salary age job state

No Match Suspect All Match


SELECT count(*)FROM employees WHERE salary > 100000

AND age < 35AND job = ‘DBA’AND state = ‘TX’

salary age job state

No match Suspect All Match

All packs ignored

All packs ignored

All packs ignored

Only this pack will be decompressed

Working with Infobright & Hadoop

General purpose database solutions require:– Significant administration, ongoing tuning and indexing– More hardware– Less flexibility for macroscopic investigative analytics– Higher total cost of ownership

Hadoop ConnecterInfobright

Enterprise EditionBI Tools

Customer Example: JDSU

Low Admin: Do not want to force users to require DBA’s to keep solution running

Load Speeds: Ingestion rates continue to increase, placing heavy burden on solutions

High Compression: Want to keep longer histories in less space

Requirements

Lower TCO: Resulting in better value for customers, better margins for providers

Stripped Away “DBA” tax requirement required by previous versions

Ingesting over 1TB/Hour, with significant headroom beyond that

Over 3X the retention period and a 5X simultaneous reduction in storage requirement

Lower TCO for users, higher margins for JDSU

Results

Little to No Admin

Fast Load Speeds

20:1+ Compression

Exceptional Ad Hoc Query Performance

Very Low TCO

22

Customer Example: LiveRail

Low Admin: Reduce the requirements for labor intensive reporting

Ad Hoc Query Capabilities: Ability to mine data based for investigative analytics

High Compression: Want to keep longer histories in less space

Requirements

Lower TCO: Robust analytics platform without excessive outlay of capital or people

Eliminated the need for staff to run customized reports using Hive

Developed a portal where customers can run their own ad hoc reporting

Minimal resources required to house the Infobright repository for reporting

Better results for customers, lower costs and higher margins for LiveRail

Results

Little to No Admin

Fast Load Speeds

20:1+ Compression


Very Low TCO

23

Customer Example: JC Decaux

Low Admin: Reduce the requirements for labor intensive reporting

Ad Hoc Query Capabilities: Consolidate and issue timely reports from disparate data sources

High Compression: Existing Oracle-based system couldn’t handle the volume of data

Requirements

Lower TCO: Minimize admin required for managing Oracle and work with Hadoop

Ability to create essential reports in less than three minutes

Fast queries: queries originally taking 15+ minutes using MySQL reduced to seconds

Fast uploads: Data loads that used to take two hours are now happening in 20 minutes.

implemented in three months. Fast deployment: System implemented in three months.

Results

Little to No Admin

Fast Load Speeds

20:1+ Compression


Very Low TCO

24

Download our trial

Follow us on Twitter

Follow us on LinkedIn

Join our community

Getting Started with Infobright

https://www.infobright.com/index.php/products/download-iee-trial/

introduction to infobright april 16, 2014

Data & Analytics

column data

required data

data elements

exabytes of data

original data

data loading process

tb compressed data

days worth of data