building real-time data pipelines with kafka, spark, and memsql
TRANSCRIPT
Building Real-Time Data Pipelineswith Ka(a, Spark, and MemSQL
PHX Data Conference 29 Oct 2016@garyorenstein @memsql
(c) Gary Orenstein and MemSQL
About Me: Gary Orenstein• MemSQL - real--me database
• Fusion-io (SanDisk) - flash memory solu-ons
• Compellent (Dell) - enteprise storage
• experience in networking, caching, file systems
• co-author two O'Reilly Books
• Building Real-Time Data Pipelines (2015)
• The Path to Predic-ve Analy-cs and Machine Learning (2016)
(c) Gary Orenstein and MemSQL
Digital businesses' inexhaus0ble demand for faster performance,
greater scalability and deeper real-4me insight is boos0ng the market for IMC technologies, which is expected
to reach $13 billion by 2020.- Gartner
(c) Gary Orenstein and MemSQL
MemSQL BasicsThe Database Pla,orm For Real-Time Analy7cs
• In-Memory (plus disk)
• Rela7onal (SQL)
• Mul7-model (JSON, Geospa7al)
• Distributed (100s of nodes)
Combining streaming, database, and data warehouse workloads(c) Gary Orenstein and MemSQL
2014Building Real-Time Pla0orms with MemSQL
and Apache Spark
Strata New York
(c) Gary Orenstein and MemSQL
Combine the power of a real-2me transforma2on 2er
with the power of a real-.me distributed, persistent, database
making Spark results more accessible to all
(c) Gary Orenstein and MemSQL
1. We're finished with batch and the world is moving to streaming and real-9me
2. Topologies need to change
3. Messaging seman9cs need to improve
(c) Gary Orenstein and MemSQL
Familiar data integra-on pa0erns centered on physical data
movement (bulk/batch data movement, for example) are no longer a sufficient solu-on for
enabling a digital business. > Gartner
(c) Gary Orenstein and MemSQL
I hate batch processing so much that I won't even use the dishwasher.I just wash, dry, and put away real
;me. > Ed Weissman (@edw519)
(c) Gary Orenstein and MemSQL
Germany Just Got Almost All ofIts Power From Renewable Energy
May 15, 2016
Bloomberg: h,p://www.bloomberg.com/news/ar5cles/2016-05-16/germany-just-got-almost-all-of-its-power-from-renewable-energy
(c) Gary Orenstein and MemSQL
Investment in renewablesreached $286 billion worldwide
in 2015BBC: h&p://www.bbc.com/news/science-environment-36420750
(c) Gary Orenstein and MemSQL
Enabling predic.ve analy.cs• Use exis(ng models from SAS
• Create models in Spark MLlib
• Predic(ve scoring as part of the pipeline
(c) Gary Orenstein and MemSQL
Business Intelligence Details
• Na$vely connect to BI tools like Tableau
• Also Zoomdata, Looker, MicroStrategy
• Business analysts inside your company can use a tool they know and love
(c) Gary Orenstein and MemSQL
Join our co-founder and CTO, Nikita Shamganov
plus engineering and product experts
(c) Gary Orenstein and MemSQL