introduction to infobright april 16, 2014
DESCRIPTION
The explosive growth of machine generated data is creating challenges in not only storage and management but how to turn that data into actionable information. On April 16, 2014 our webinar "Introduction to Infobright" explores how Infobright accelerates ad-hoc query performance and reduces costs.TRANSCRIPT
Introduction to Infobright
1Confidential – Do Not Distribute
Agenda
Today’s Analytic ChallengesThe Infobright Analytic PlatformGetting Started
The Rise of Machine Data
50 billion connected devices
$7.4B mobile advertising billings
190 Exabytes of data from: –Web logs–Sensor data–Call data records–Transaction records
3Confidential – Do Not Distribute
More than just “Big” Data
•Transactional•Analytics
•Dynamic•Static
•Pre-planned•Ad-hoc•Rough•Approximate
•Structured•Unstructured•Semi-structured
Data Query
FunctionData Refresh
“Real time” Analytics: The New Imperative
Identify security threats & fraud Troubleshoot networks Optimize online/mobile ads Plan capacity scale-out Competitive positioning
Infobright Powers Big Data Analytics
6Confidential – Do Not Distribute
Who is Infobright
Global provider of database analytics platforms to over 450 direct and OEM customers in the telecom, digital media and marketing, financial services, solution provider, energy and healthcare markets
Key Benefits of the Knowledge Grid Architecture
8Confidential – Do Not Distribute
Column vs. Row: What is the best use case?
Row Oriented
All the columns are needed
Transactional processing is required
Column Oriented
Only relevant columns are needed
Reports are aggregates (sum, count, average,
etc.)
Column vs. Row: How it Works
50 days worth of data, 1 million rows / day
Disk I/O is the primary limiting factor
A row-oriented design forces the database to retrieve all column data
As table size increases so do the indexes
Load speed degrades since indexes need to be recreated as data is added; this causes huge sorts (another very slow operation)
30 Columns
50
M R
ow
s
Column vs. Row: How it works
Query:– Select Column 11 ,
Where Column 17 for the 3rd week (day 15 – day 21)
30 Columns
50
M R
ow
s
Column vs. Row: How it Works
Row-based results– Eliminate 43 days– 7 million rows
retrieved– 210 million data
elements retrieved
30 Columns
50
M R
ow
s
Column vs. Row: How it Works
Column-based results– Eliminate 43 days– Eliminate 28 of the 30
columns– 14 million data
elements
30 Columns
50
M R
ow
s
Data Loading Process: Data Packs
Bulk load input data
… … …64K
64K
64K
64K
A1
A2
A3
A-n
B1
B2
B3
B-n
C1
C2
C3
C-n
Data Packs
Data Loading Process: Compression &Knowledge Grid
… … …
64K
64K
64K
64K
Data packs compressed
On-Disk storage
In MemoryKnowledge Grid
What Your Data Looks Like Now
Original Data
10 TBCompressed Data
500 GB
The Knowledge Grid: How it works
Knowledge Nodes answer the query directly, or
Identify only required Data Packs, minimizing decompression, and
Predict required data in advance based on workload
All driven by a granular computing engine
Queries with the Knowledge Grid: How it Works
Query: How are my sales doing this year?
Granular engine iterates on Knowledge Grid
Each pass eliminates Data Packs
If any Data Packs are needed to resolve query, only those are decompressed
Knowledge Grid
Compressed Data
Queries with the Knowledge Grid: How it Works
SELECT count(*)FROM employees WHERE salary > 100000
AND age < 35AND job = ‘DBA’AND state = ‘TX’
salary age job state
No Match Suspect All Match
Queries with the Knowledge Grid: How it Works
SELECT count(*)FROM employees WHERE salary > 100000
AND age < 35AND job = ‘DBA’AND state = ‘TX’
salary age job state
No match Suspect All Match
All packs ignored
All packs ignored
All packs ignored
Only this pack will be decompressed
Working with Infobright & Hadoop
General purpose database solutions require:– Significant administration, ongoing tuning and indexing– More hardware– Less flexibility for macroscopic investigative analytics– Higher total cost of ownership
Hadoop ConnecterInfobright
Enterprise EditionBI Tools
Customer Example: JDSU
Low Admin: Do not want to force users to require DBA’s to keep solution running
Load Speeds: Ingestion rates continue to increase, placing heavy burden on solutions
High Compression: Want to keep longer histories in less space
Requirements
Lower TCO: Resulting in better value for customers, better margins for providers
Stripped Away “DBA” tax requirement required by previous versions
Ingesting over 1TB/Hour, with significant headroom beyond that
Over 3X the retention period and a 5X simultaneous reduction in storage requirement
Lower TCO for users, higher margins for JDSU
Results
Little to No Admin
Fast Load Speeds
20:1+ Compression
Exceptional Ad Hoc Query Performance
Very Low TCO
22
Customer Example: LiveRail
Low Admin: Reduce the requirements for labor intensive reporting
Ad Hoc Query Capabilities: Ability to mine data based for investigative analytics
High Compression: Want to keep longer histories in less space
Requirements
Lower TCO: Robust analytics platform without excessive outlay of capital or people
Eliminated the need for staff to run customized reports using Hive
Developed a portal where customers can run their own ad hoc reporting
Minimal resources required to house the Infobright repository for reporting
Better results for customers, lower costs and higher margins for LiveRail
Results
Little to No Admin
Fast Load Speeds
20:1+ Compression
Exceptional Ad Hoc Query Performance
Very Low TCO
23
Customer Example: JC Decaux
Low Admin: Reduce the requirements for labor intensive reporting
Ad Hoc Query Capabilities: Consolidate and issue timely reports from disparate data sources
High Compression: Existing Oracle-based system couldn’t handle the volume of data
Requirements
Lower TCO: Minimize admin required for managing Oracle and work with Hadoop
Ability to create essential reports in less than three minutes
Fast queries: queries originally taking 15+ minutes using MySQL reduced to seconds
Fast uploads: Data loads that used to take two hours are now happening in 20 minutes.
implemented in three months. Fast deployment: System implemented in three months.
Results
Little to No Admin
Fast Load Speeds
20:1+ Compression
Exceptional Ad Hoc Query Performance
Very Low TCO
24
Download our trial
Follow us on Twitter
Follow us on LinkedIn
Join our community
Getting Started with Infobright