the new frontier: optimizing big data exploration
TRANSCRIPT
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
Topics
This Month: BIG DATA
March: CLOUD
April: BIG DATA
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
Twitter Tag: #briefr
The Briefing Room
Analyst: Robin Bloor
Robin Bloor is Chief Analyst at The Bloor Group
[email protected] @robinbloor
Twitter Tag: #briefr
The Briefing Room
Cirro
! Cirro provides a single method to access any type of data, on any platform, in any environment
! Its product suite consists of Cirro Data Hub, Analyst for Excel and Multi Store – all designed to remove complexity from Big Data analytics
! Cirro’s products are cloud based and can run in public, private and on-premise environments
Twitter Tag: #briefr
The Briefing Room
Guest: Mark Theissen
Mark is CEO at Cirro. He is a respected analytics and data warehousing expert with more than 22 years in the industry. Most recently Mark was the worldwide data warehousing technical lead at Microsoft following the acquisition of DATAllegro. At DATAllegro Mark was the COO and a member of the board of directors. Prior to joining DATAllegro, Mark was Vice President and Research Lead at META Group
(Gartner Group) for Enterprise Analytics Strategies, covering data warehousing, business intelligence and data integration markets. Before META, Mark was VP of Professional Services at Accruent where he was responsible for domestic and overseas services and operations. Mark has a BS in Computer Information Systems from Chapman University and a MBA from the University of California, Irvine.
©2014 Cirro Inc. All rights reserved.
• Access any data • On any platform • Without ETL or the cost and complexity of a
semantic layer
Cirro is the ONLY Solution that can:
“ What used to take 2-‐4 weeks is now done in a ma;er of minutes. Cirro is a ‘game-‐changing’ approach to visualizing mul*-‐structured big data and integra*ng it with other data sources.”
Director of Business Intelligence
On Demand Distributed Analysis
©2014 Cirro Inc. All rights reserved.
Cirro Enterprise Data Hub
Visualization Tools
Real-time Federation
Data Language Translation
Data Movement & Management
Cirro Data Hub
RDBMS
HDFS
NoSql
Legacy
BI Tools
CLI
Real-time Federation
Data Language Translation
Excel
SaaS
©2014 Cirro Inc. All rights reserved.
How Federation Works
I have a table on SQL Server that needs to join to tables on Oracle and Hadoop
©2014 Cirro Inc. All rights reserved.
How Federation Works
Oracle Hadoop SQL Server
SQL predicates, local joins
SQL predicates
Standard SQL
I have a table on SQL Server that needs to join to tables on Oracle and Hadoop
Row processing pushed into data systems
MapReduce
©2014 Cirro Inc. All rights reserved.
How Federation Works
Oracle Hadoop SQL Server
SQL predicates, local joins
SQL predicates
Standard SQL
I have a table on SQL Server that needs to join to tables on Oracle and Hadoop
Row processing pushed into data systems
MapReduce
50k Rows 5k Rows 50m Rows
©2014 Cirro Inc. All rights reserved.
How Federation Works
Oracle Hadoop SQL Server
SQL predicates, local joins
SQL predicates
Standard SQL
I have a table on SQL Server that needs to join to tables on Oracle and Hadoop
Row processing pushed into data systems
MapReduce
50k Rows 5k Rows 50m Rows
Limited movement
Limited movement
©2014 Cirro Inc. All rights reserved.
How Federation Works
Oracle Hadoop SQL Server
SQL join, aggregaEon
Standard SQL
I have a table on SQL Server that needs to join to tables on Oracle and Hadoop
Row processing pushed into data systems
©2014 Cirro Inc. All rights reserved.
How Federation Works
Oracle Hadoop SQL Server
Results
Standard SQL
I have a table on SQL Server that needs to join to tables on Oracle and Hadoop
Row processing pushed into data systems
UI Tools
Data Marts; in the Cloud or Data Center
BI Server
Results Des)na)on Op)ons
©2014 Cirro Inc. All rights reserved.
Completing The Solution…
• Cirro Data Hub – Federated query processing • Use any tool • The fastest distributed
processing possible • Cirro Analyst
• Data discovery • Mash up data like never before • Go beyond SQL • Publish
• Cirro Multi Store • Stage, Store, Process • Highly scalable
©2014 Cirro Inc. All rights reserved.
Next Generation Data Federation
• Designed & Built for Big Data • Compatible with structured, semi-structured & unstructured data • Works in the cloud, in the data center, or both
• Real-Time Federation • Queries are dynamically optimized and executed, taking the
processing to the data
• Enables ad-hoc query and exploration of all data • No Semantic Layer Required
Ask Questions You Couldn’t Ask Before
©2014 Cirro Inc. All rights reserved.
Cirro Federation vs. Data Virtualization
• Excellent for data exploraEon and discovery
• Excellent for ad-‐hoc queries • True federated processing – minimal
data movement, no server boPlenecks – ‘pushes processing to the data’
• Easy setup, maintenance & administraEon
• Hadoop – can execute, Hive & Impala queries along with MapReduce programs
• Pathway to NoSQL
• Not appropriate for data exploraEon or discovery – requires you to know the quesEons you want to ask in advance of accessing the data
• Not true federated processing – final joins and aggregaEons done on VirtualizaEon Server
• Good for structured data processing workloads
• Labor intensive setup, maintenance & administraEon for modeling and semanEc layer
• Hadoop – limited to Hive access
Cirro Data Virtualiza)on
©2014 Cirro Inc. All rights reserved.
Data Federation Use Cases
• On Demand Distributed Data Analysis • Data warehouse offloading • Business intelligence federation • Self-service data exploration and discovery • Entry point for private cloud analytics • SaaS, Hadoop and/or NoSQL integration with
enterprise data sources • Simplify application development
©2014 Cirro Inc. All rights reserved.
On Demand Intra-Day Analytics
Solution • Cirro Data Hub; Cirro Analyst
Results • On demand analytics supports faster
trend analysis and the ability to identify data anomalies
• Cross-platform data access reduced from weeks to minutes
• Flexible/iterative using in-house BI tools • Enables self-service data mash-ups by
analysts across all data sources
Business Challenge • Data that drives trading analytics &
decisions in data silos • Inability to analyze data ‘fast enough’ to
make informed trading decisions • ETL tools and manual data consolidation is
too slow and inflexible for hourly or daily iterative analysis
• Inability to join traditional data with cloud sources
Financial & Energy / UElity Markets
©2014 Cirro Inc. All rights reserved.
Ask Questions You Couldn’t Ask Before
Last Market Price
Oracle - Pricing DW Transaction Data
Tableau Actionable Visualizations
Subscrip)on Market Data
©2014 Cirro Inc. All rights reserved.
Ask Questions You Couldn’t Ask Before
A
Anonymous Behavior
Transactional Data
Ads viewed/clicked
Actionable Visualizations
©2014 Cirro Inc. All rights reserved.
The Business Impact
• Agility; conducEng analysis previously unavailable • CompeEEve advantage • Supports ad-‐hoc analysis , Fastest Eme to value • Leverage in-‐house BI tools – no new tools to learn
Improved Business OperaEons
• TradiEonal architectures not designed for Big Data • Easily add new data sources – RDBMS, Hadoop, NoSQL • Easy to install, use & manage • Future proof analyEcs developed
Streamline IT Processes
• ReducEon in license costs on EDW and RDBMS • Time & cost savings associated with data staging, modeling, ETL work, etc.
• No new BI applicaEons to buy -‐ use exisEng BI tools • No new skills to develop
Cost Savings
The Visible “Big Data” Trend
u Corporate data volumes grow at about 55% per annum - exponentially
u Data has been growing at this rate for, maybe, 40 years
u There is nothing new about big data; it clings to an established exponential trend
The Invisible Trend: Moore’s Law Cubed
u The biggest databases are new databases
u They grow at the cube of Moore’s Law
u Moore’s Law = 10x every 6 years u VLDB: 1000x every 6 years
• 1991/2 megabytes • 1997/8 gigabytes • 2003/4 terabytes • 2009/10 petabytes • 2015/16 exabytes
Whys and Wherefores?
u Why do we assemble such gargantuan heaps of data?
u While the data volume has grown like bamboo in spring, the size of executables has not?
u Why not just move the processing to the data?
u This is surely an option worth exploring – maybe it is even one of the foundations for Big Data Architecture…
Questions are Easy, Answers Difficult
The WORKLOAD Conundrum
The DISTRIBUTION
Conundrum
The DATA FLOW
Conundrum
The REAL-TIME
Conundrum
u What are the primary applications where Cirro makes a big impact?
u What is (roughly) the largest number of data sources Cirro federates in any implementation?
u What’s the most resource deployed for the largest Cirro implementation? How much memory?
u How does “fault tolerance” work?
u How difficult is it to develop applications employing Cirro? Is it significantly different to a DBMS?
u Are any companies adopting this technology strategically?
u Which technologies/companies do you regard as competition?
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
www.insideanalysis.com
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
This Month: BIG DATA
March: CLOUD
April: BIG DATA