big data and mstr bridge the elephant
DESCRIPTION
Presentation: “Big Data and MicroStrategy: Building a Bridge for the Elephant” Intelligent engineering of an agile business requires the ability to connect the vast array of requirements, technologies and data that build up over time, while avoiding the pitfalls commonly encountered on the road to giving users comprehensive, yet nimble business analytics with MicroStrategy. The Google generation armed with iPads, Droid Phones bring big bold ideas on how “Big Data” will solve the new wave of business problems; traditional users know that addressing them requires more than just embracing the buzzwords like “sentiment”, “R” and “Hadoop.” Overall success requires building a bridge between the stable, proven, mature BI solutions in place today with the disruptive new world. Enabling deeper analytics, predictive modeling, social media analysis in combination with scalable self-service dashboards, reporting and analytics is no longer an idea but a MUST DO. This informative presentation describes these business challenges and how an organization leveraged the Kognitio Analytical Platform under MicroStrategy to build such a bridge.TRANSCRIPT
![Page 1: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/1.jpg)
Big Data and MicroStrategy: Building a Bridge for the Elephant
Jan 2013Paul Groom, Chief Innovation Officer
![Page 2: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/2.jpg)
Let’s start at…
The End.
![Page 3: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/3.jpg)
Panacea
![Page 4: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/4.jpg)
You…built the DWE
![Page 5: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/5.jpg)
You…built the BICC
![Page 6: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/6.jpg)
and yes you built… lots of cool reports and dashboards
![Page 7: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/7.jpg)
EpilogueA comfortable status quo
![Page 8: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/8.jpg)
How are you really judged?
• Fast?• Consistent?• All users?
![Page 9: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/9.jpg)
![Page 10: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/10.jpg)
Rrrrrriiiiiiinnnnnngggggg!
Back to the real world
![Page 11: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/11.jpg)
Disruption
![Page 12: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/12.jpg)
Disruptor: New Data
![Page 13: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/13.jpg)
Disruptor: Social Media & Sentiment
![Page 14: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/14.jpg)
Data ?
Disruptor:
![Page 15: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/15.jpg)
Disruptor: More Connected Users
![Page 16: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/16.jpg)
Disruptor: Data Discovery Tools
Choices for engaging quickly with data
Business users head’s distracted from core BI!
![Page 17: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/17.jpg)
BI Wild West
![Page 18: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/18.jpg)
Where it matters
![Page 19: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/19.jpg)
![Page 20: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/20.jpg)
Lots of variety of DW and EDW
![Page 21: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/21.jpg)
analytical workload
The Reality of the DW
![Page 22: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/22.jpg)
EDW says no or not now!…and CFO says no big upgrades
![Page 23: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/23.jpg)
Pragmatism
…ok so you enable plenty of caching,limit drill anywhere and add Intelligent Cubes
![Page 24: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/24.jpg)
![Page 25: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/25.jpg)
And then came…
![Page 26: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/26.jpg)
http://oris-rake.deviantart.com/
BoonDistraction
or
![Page 27: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/27.jpg)
Scalable, resilient, bit bucket
![Page 28: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/28.jpg)
Experimenting
© 20th Century Fox
![Page 29: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/29.jpg)
The Hadoop stack
HDFSHDFS
HB
ase
HB
ase
MapReduceMapReduceO
ozie
Ooz
ie
ZooK
eppe
r/ A
mba
riZo
oKep
per/
Am
bari
HCatalogHCatalog
PigPig HiveHive
![Page 30: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/30.jpg)
Hadoop Performance Reality
• Hadoop is batch oriented• HDFS access is fast but crude• MapReduce is powerful but has overheads
– ~30 second base response time– Too much latency in stack and processing model– Trade-off in optimization and latency
• MapReduce complex– Typically multiple Java routines
https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920
![Page 31: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/31.jpg)
SQL to the Rescue• So MapReduce is complicated
HDFSHDFS
HB
ase
HB
ase
MapReduceMapReduce
Ooz
ieO
ozie
ZooK
eppe
r/ A
mba
riZo
oKep
per/
Am
bari
HCatalogHCatalog
PigPig HiveHive
– use Hive (SQL) as the easy way out
![Page 32: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/32.jpg)
Hive• Simplifies access
“Hive is great, but Hadoop’s execution engine
makes even the smallest queries take minutes!”
• Only basic SQL support• Concurrency needs careful system admin• It’s not a silver bullet for interactive BI usage
![Page 33: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/33.jpg)
Hadoop just too slow for interactive BI!
…loss of train-of-thought
Conclusion
“while hadoop shines as a processing
platform, it is painfully slow as a query tool”
![Page 34: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/34.jpg)
Hive is based on Hadoop which is a batch processing system. Accordingly, this system does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real time queries. As a result it should not be compared with systems like Oracle where analysis is done on a significantly smaller amount of data but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes. For Hive queries response times for even the smallest jobs can be of the order of 5-10 minutes and for larger jobs this may even run into hours.
I remain skeptical on the practical performance of the Hive query approach and have yet to talk to any beta customers. A more practical approach is loading some of the Hadoop data into the in-memory cube with the new Hadoop connector.
![Page 35: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/35.jpg)
![Page 36: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/36.jpg)
Why can’t Hadoopbe in-memory?Why can’t I have a
giant icubes?
![Page 37: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/37.jpg)
Lots of these
Not so many of these
Remember…
Hadoop inherently disk oriented
Typically low ratio of CPU to Disk
![Page 38: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/38.jpg)
Larger cubes
Issues: Time to Populate, Proliferation
![Page 39: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/39.jpg)
Analytics requires CPU,RAM keeps the data close
Alternative - In-memory Processing
Cores do the work!Scale with the data
![Page 40: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/40.jpg)
Goals: Minimise Disruption, Cut Latency
• Don’t change the existing BI and analytics• Support more creative and dynamic BI• Don’t introduce yet more slow disk
– Help the DW investment• No complex ETL, just pull data as required• Pull data simply and intelligently from Hadoop• Simplify – less cubes, caches• Improve sharing of data• Increase concurrency and throughput
– Its all about queries per hour!• Minimal DBA requirement
![Page 41: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/41.jpg)
![Page 42: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/42.jpg)
Kognitio Hadoop Connectors
HDFS Connector• Connector defines access to hdfs file system• External table accesses row-based data
in hdfs• Dynamic access or “pin” data into memory• Selected hdfs file(s) loaded into memory
Filter Agent Connector• Connector uploads agent to Hadoop nodes• Query passes selections and relevant
predicates to agent• Data filtering and projection takes place
locally on each Hadoop node• Only data of interest is loaded into memory
via parallel load streams
![Page 43: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/43.jpg)
Centrally defined data modelsPersist data in natural storeFetch when needed, agileAvailable to all tools
Analytical power
BI – Central Governance
![Page 44: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/44.jpg)
Engineering for Success
Thomas Herbrich
![Page 45: Big data and mstr bridge the elephant](https://reader030.vdocuments.us/reader030/viewer/2022020306/547339e7b4af9fae0a8b525c/html5/thumbnails/45.jpg)
connect
www.kognitio.com
twitter.com/kognitiolinkedin.com/companies/kognitio
tinyurl.com/kognitio youtube.com/kognitio
NA: +1 855 KOGNITIOEMEA: +44 1344 300 770