making the most of in-memory: more than speed
DESCRIPTION
The Briefing Room with Robin Bloor and Kognitio Live Webcast Oct. 1, 2013 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?AT=pb&SP=EC&rID=7539482&rKey=bc304aa8dac7b781 Everyone’s talking about in-memory these days, and the term has become synonymous with speed. But pinning data into memory is just the beginning, and it’s about more than speed. In-memory solutions need a tailored architecture, one that can take full advantage RAM processing from every aspect, and this requires an approach that considers memory and CPU from the ground-up. Register for this episode of The Briefing Room to hear from veteran Analyst Robin Bloor as he explains how memory is on the fast track to supersede disk, at least with respect to advanced analytics. He’ll be briefed by Kognitio CTO Roger Gaskell, who pioneered the in-memory analytical platform since its inception in 1989. He will also discuss how this type of solution changes the landscape for the modern data architecture and its impact on advanced analytical capabilities. Visit InsideAnalysis.com for more informationTRANSCRIPT
The Briefing Room
Making the Most of In-Memory: More than Speed
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
Topics
This Month: DATA PROCESSING
November: DATA DISCOVERY & VISUALIZATION
December: INNOVATORS
Twitter Tag: #briefr
The Briefing Room
Data Processing
Efficiency is doing things right; effec2veness is doing the right things.
“~Peter Drucker
Twitter Tag: #briefr
The Briefing Room
Analyst: Robin Bloor
Robin Bloor is Chief Analyst at The Bloor Group
Twitter Tag: #briefr
The Briefing Room
Kognitio
! Founded in 1989, Kognitio is both an in-memory database and an analytical engine
! The Kognitio Analytical Platform can be deployed as software, as an appliance, or in the cloud
! The platform enables flexible, ad hoc queries on complex data sets, including data from Hadoop, and it offers scale-up and scale-out capabilities
Twitter Tag: #briefr
The Briefing Room
Guest: Roger Gaskell
Roger Gaskell is the Chief Technology Officer and one of the founding members of the Kognitio Development Team. He has overall responsibility for all product development, strategic direction and roadmap of new innovation for the Kognitio Analytical Platform. Roger has been instrumental in all generations of the product to date. Over this time, it has evolved from an appliance-based system in the original beta offering in 1989, to a hardware-independent software for x86 processing, then to a cloud-based Platform-as-a-Service offering in in the mid-1990s. Prior to Kognitio, Roger was test and development manager at AB Electronics. During this time his primary responsibility was for the famous BBC Micro Computer and the development and testing of the first mass production of personal computers for IBM.
Making the most of in-memory platforms
October 2013
10
What is an “In-memory” analytical platform
A database where queries are run from data held in computer memory (RAM) rather than mechanical disk
Memory = Fast / Disk = Slow
Analytics go much quicker – SIMPLE? Unfortunately, it’s not as simple as that….
11
Why in-memory: RAM is faster than disk (really!)
Actually, this only part of the story: Analytics completely change the workload characteristics on the database workload
Simple reporting & transactional processing is all about “filtering” the data of interest filtering
Analytics is all about complex “crunching” of the data once it is filtered crunching
Crunching needs processing power & consumes CPU cycles CPU cycles
Storing data on physical disks severely limits the rate at which data can be provided to the CPUs storing
Accessing data directly from RAM allows much more CPU power to be deployed access
12
Analytics is about through data
• To understand what is happening in the data
“CRUNCHING”
Joins
Sorts
Aggregations
Grouping
Analytical Functions
crunching CPU cycle-intensive & CPU-bound
• In-memory analytical platforms are therefore CPU-bound – Assume disk I/O speeds not a bottleneck – In-memory removes the disk I/O bottleneck
More complex analytics More pronounced this becomes =
13
For analytics, the CPU is king
Being CPU-bound fundamentally changes a system’s design philosophy
Interactive / ad hoc analytics: THINK data to core ratios ≈ <10GB data per CPU core
Disk IO Bound CPUs wait for data from disk No need for efficient coding
Parallelisation ineffective
CPU Bound Every CPU cycle is precious – efficient coding
Parallelization = scalable performance Advanced techniques minimize CPU cycles
14
Why now?
Price of RAM,
Logarithmic (10)
1995 2000 2005 2010 1987
Interest in in-memory
15
Mature BI being overtaken
Numbers, tables, charts, indicators
…accessed with ease and simplicity Historical information, latency
But BI and BI tools have plateaued! Decision Support
Progression into advanced analytics & data science
It’s now all about doing more math …a lot more math
16
Machine learning algorithms
Dynamic Simulation
Statistical Analysis
Clustering
Behaviour modelling
Thus more complex methods – real-time
Reporting & BPM Fraud detection
Dynamic Interaction
Technology/Automation
Ana
lytic
al C
ompl
exity
Campaign Management
#PP_R
17
How to efficiently exploit RAM
• A large cache is not in-memory – In-memory platforms hold data in structures that take advantage of the
properties of RAM – Caches are copies of frequently used disk blocks
• Platform designed to specifically exploit the random access nature of memory – Different algorithms – CPU cycles are precious – code efficiency paramount – Advanced techniques used to reduce code path length
• Dynamic Machine Code Generation • Extended CPU instruction sets
• Parallelize everything – Scale-out and Scale-up – Fully and efficiently use every CPU
core, in every CPU, in every server
18
Analytical Platform Reference Architecture
Persistence Layer
Hadoop Clusters
Enterprise Data Warehouses
Legacy Systems
Kognitio Storage
Reporting
Analytical Platform
Layer Near-line Storage (optional)
Application & Client Layer
All BI Tools All OLAP Clients Excel
Cloud Storage
Twitter Tag: #briefr
The Briefing Room
Perceptions & Questions
Analyst: Robin Bloor
Big Data, Maybe — Big Parallelism, Yes
Many latency-reducing changes are afoot:
u Hadoop is a data lake – It’s about latency
u CPU and memory rule – The old database is dying
u Grids, not clusters – A server is now a cluster
u Scaling Up AND Scaling Out – “Only scaling out” is last year’s story
u SSD will replace spinning disk – But it will never compete with RAM
Why the Excitement?
What are the “new” applications?
BIG DATA capture and staging
BIG DATA ANALYTICS
LITTLE DATA ANALYTICS
OPERATIONAL INTELLIGENCE
A “Modern” Workload
Query Light &
Math Heavy
Where the Rubber Meets the Road
It isn’t really about application latency any more, it’s about business process latency (business time!). This can have many aspects:
u The collapse of data flows – take the processing to the data
u Data warehouse offload
u Full process automation
u Lower latency = NEW BUSINESS PROCESSES
The Question
Exactly how do we take
advantage of these changes?
This is a BUSINESS question AND a TECHNICAL question.
The question for most organizations is:
u Low latency is exciting, but where do you see the clear business opportunities?
u There seems to be a conundrum about where to store “slow” data: Ø Hadoop? Ø Traditional data warehouse? Ø New data warehouse?
u Is the split between the application and the data real any more?
u In your opinion, does the Enterprise need a new architecture?
u How is it possible to define and monitor service levels with in-memory applications?
u Whither data governance?
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
www.insideanalysis.com
This Month: DATA PROCESSING
November: DATA DISCOVERY & VISUALIZATION
December: INNOVATORS
Twitter Tag: #briefr
The Briefing Room
Thank You for Your
Attention