The Briefing Room
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
December: Innovators
January: Big Data
February: Analytics
March: Data in Motion
Twitter Tag: #briefr
The Briefing Room
Innovators
! Charles Babbage conceived the Analytical Engine in 1834.
! Automation and ease of use have driven innovation in computing ever since.
! The Cloud and Big Data are raising the bar.
Twitter Tag: #briefr
The Briefing Room
Robin Bloor is Chief Analyst at The Bloor Group
Analyst: Robin Bloor
Twitter Tag: #briefr
The Briefing Room
! Cirro provides a single method to access any type of data, on any platform, in any environment.
! Its product suite consists of Cirro Data Hub, Analyst for Excel and Multi Store – all designed to remove complexity from Big Data analytics.
! Cirro’s products are cloud based and can run in public, private and on-premise environments.
Cirro
Twitter Tag: #briefr
The Briefing Room
Mark Theissen
Mark is CEO at Cirro. He is a respected analytics and data warehousing expert with more than 22 years in the industry. Most recently Mark was the worldwide data warehousing technical lead at Microsoft following the acquisition of DATAllegro. At DATAllegro Mark was the COO and a member of the board of directors. Prior to joining DATAllegro, Mark was Vice President and Research Lead at META Group
(Gartner Group) for Enterprise Analytics Strategies, covering data warehousing, business intelligence and data integration markets. Before META, Mark was VP of Professional Services at Accruent where he was responsible for domestic and overseas services and operations. Mark has a BS in Computer Information Systems from Chapman University and a MBA from the University of California, Irvine.
©2012 Cirro Inc. All rights reserved.
Corporate Overview
Bringing Big Data to the Desktop
©2012 Cirro Inc. All rights reserved.
The Big Data Dilemma
©2012 Cirro Inc. All rights reserved.
The Big Data Dilemma
©2012 Cirro Inc. All rights reserved.
The Big Data Dilemma
©2012 Cirro Inc. All rights reserved.
Accessing Big Data
©2012 Cirro Inc. All rights reserved.
Accessing Big Data
Incumbent Approach Hadoop Approach
©2012 Cirro Inc. All rights reserved.
Accessing Big Data
Incumbent Approach Hadoop Approach
©2012 Cirro Inc. All rights reserved.
Accessing Big Data
Incumbent Approach Hadoop Approach
©2012 Cirro Inc. All rights reserved.
What the Market Needs
An enterprise data hub to access any type of data, on
any platform, in any environment
©2012 Cirro Inc. All rights reserved.
The Enterprise Data Hub
©2012 Cirro Inc. All rights reserved.
Simplifying the Access to Your Data
Structured -‐ Unstructured Mashups
SQL (mul;ple versions)
Java
Sqoop
Map Reduce
HIVE Hadoop Install & Config
Hive – Scoop Install & Config
Source Control
DataBase Management
Cirro Data Hub
Access tool
Conven/onal Approach People manage the access to data
Cirro Approach Cirro Data Hub manages
access to data
©2012 Cirro Inc. All rights reserved.
Architecture Overview
Cirro Data Hub • Cost based federa;on op;mizer • Smart caching • Dynamic op;miza;on • Normalized cost es;mates • Metadata for unstructured sources
Cirro Func;on Library
• Library of Func;ons • Logic to build complex specific formulas
Cirro Analyst
• Excel plug-‐in that allows analysts to explore & process Big Data and tradi;onal data
Cirro Mul; Store (op;onal)
• Pre-‐built structured/unstructured data store • Used for holding data or addi;onal workspace
©2012 Cirro Inc. All rights reserved.
Typical Deployment
IT Staff • Programmers • Developers • DBA’s
Extend, Add Proprietary
Functions to CFL
Excel Analyst Users • Design Views
• Minimal IT Support
• Publish Views • Data Exploration • Analysis Tableau
Business Objects
Other BI Tools
Data Consumers Access CDH Views via ODBC & JDBC across all data types
RDBMS Oracle Teradata MySQL SQL Ver;ca
HQL
No SQL Splunk Cassandra MongoDB
MapReduce
Cirro Data Hub • Cirro Function Library • Proprietary MapReduce
• Custom Views
MapReduce
Hadoop Distributed File System
Hive
©2012 Cirro Inc. All rights reserved.
Sample Use Case
Summarize the number of tweets per hour with certain keywords from a raw twitter feed.
Requirements: • Use raw twitter data files in Hadoop • Keywords stored in SQL table for easy
manipulation • Results into Tableau Excel for visualization
©2012 Cirro Inc. All rights reserved.
Too Many Skills, Coding, Processing
Write mapper/reducer in java using development tool : • parse twi[er text -‐ convert to lower case -‐ parse words -‐ exclude common words -‐ group words by hour
Import java classes into Hadoop
Execute command line hadoop using CLI • bin/hadoop jar Twi[erParse /home/cloudera/WordCount.jar /usr/tweet/input /usr/local/output –libjars
Move result into HIVE using JDBC SQL tool • create table output1 (text STRING,created_at STRING,count BIGINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE
• LOAD DATA INPATH '/usr/data/1-‐88f1-‐864e22e77801/part*'OVERWRITE INTO TABLE output1
Move SQL table with keywords to HIVE through Scoop using CLI • export -‐-‐connect jdbc:mySQL://10.17.185.44/mytable -‐-‐password mypasswd -‐-‐username root -‐-‐table words -‐-‐export-‐dir '/home/cloudera/inpumile
• create table mytable (word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE • LOAD DATA INPATH '/home/cloudera/inpumile/part*'OVERWRITE INTO TABLE mytable
Run HIVE query using JDBC SQL tool • select a.text ,a.created_at ,a.count from output1 a join mytable b on (a.text = b.word )
Import results into Excel using Excel
©2012 Cirro Inc. All rights reserved.
Too Many Skills, Coding, Processing
Write mapper/reducer in java using development tool : • parse twi[er text -‐ convert to lower case -‐ parse words -‐ exclude common words -‐ group words by hour
Import java classes into Hadoop
Execute command line hadoop using CLI • bin/hadoop jar Twi[erParse /home/cloudera/WordCount.jar /usr/tweet/input /usr/local/output –libjars
Move result into HIVE using JDBC SQL tool • create table output1 (text STRING,created_at STRING,count BIGINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE
• LOAD DATA INPATH '/usr/data/1-‐88f1-‐864e22e77801/part*'OVERWRITE INTO TABLE output1
Move SQL table with keywords to HIVE through Scoop using CLI • export -‐-‐connect jdbc:mySQL://10.17.185.44/mytable -‐-‐password mypasswd -‐-‐username root -‐-‐table words -‐-‐export-‐dir '/home/cloudera/inpumile
• create table mytable (word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE • LOAD DATA INPATH '/home/cloudera/inpumile/part*'OVERWRITE INTO TABLE mytable
Run HIVE query using JDBC SQL tool • select a.text ,a.created_at ,a.count from output1 a join mytable b on (a.text = b.word )
Import results into Excel using Excel
B1=Twi[erParse("/user/twi[er/sample","text,created_at")
B2=ToLower(B1,"text")
B3=WordSeparate(B2,"text")
B4=Exclude(B3,"text")
B5=GroupBy(B4,"text,created_at")
B6=Cirro_Match(B5,"text","MYSQL.KeyWords","word",C9)
Results displayed at cell C9
©2012 Cirro Inc. All rights reserved.
Corporate Overview
Bringing Big Data to the Desktop
Twitter Tag: #briefr
The Briefing Room
Analyst: Robin Bloor
Perceptions & Questions
The Bloor Group
Big Data, Hot Data?
The Bloor Group
Hadoop & The Big Data Dynamic
Hadoop has become the de facto reservoir for data
The Bloor Group
Hadoop & The Big Data Dynamic
– We witnessed something like this a long time ago, with ISAM files - before the advent of RDBMS
– The difference this time is that Hadoop has an ecosystem and it is growing
– Big Data (usually caught first by Hadoop) is mostly new data and mostly event data
– Hadoop is not (yet) a performance engine. It is an all-purpose capability
– It is delivering business benefits in a big way: it is hot….
The Bloor Group
BI Categories
Regular reporting/operational BI, Excel
Dashboards, OLAP, BPM, Excel
Data mining, statistical analysis (trends and relationships)
Predictive analytics
HINDSIGHT
OVERSIGHT
INSIGHT
FORESIGHT
The Bloor Group
The New BI Universe (?)
The Bloor Group
Data Sources
Hadoop and
Hadoop ++
Standard SQL NoSQL
Graph DBMS, XML
DBMS, Flat files
Metadata Hub?
The Bloor Group
Problems Of The Data Layer
Hadoop is capable of ETL and often used for ETL, but that usually
involves coding of a kind
A connectivity architecture is needed
IT REQUIRES SIMPLE CONNECTORS
Point to point connectivity usually was, is and may always be a bad
idea
BI tools, which had good-enough interfaces to RDBMS, don’t link to
Hadoop directly, and probably shouldn’t
The data layer is more complicated than it was and its
complexity is increasing
Hadoop is multi-role and hence can spawn multiple instances
The Bloor Group
! How would one use the Cirro Multi Store?
! Which companies/products do you regard as competitors (either directly or close competitors)?
! How does a Cirro implementation proceed, i.e., where do you start, what are the medium term goals, what do you replace?
! Conceptually a hub for the data layer is attractive. But how well does it scale out?
The Bloor Group
! Can the hub be physically distributed, i.e., one logical instance with multiple physical instances?
! How does your proprietary MapReduce differ from Hadoop MapReduce?
! Is there any aspect of BI that you don’t or can’t cater for (CEP, Data governance, MDM, etc.)?
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
January: Big Data
February: Analytics
March: Data in Motion
2013 Editorial Calendar www.insideanalysis.com
Twitter Tag: #briefr
The Briefing Room
Thank You for Your
Attention