high performance is no longer a “nice to have” in ...files.meetup.com/14317202/twingo event...

MicroStrategy and Big Data

Presented by: Javier Valladares,

November 2015

Agenda

MicroStrategy Overview

Leveraging Hadoop for BI

Customer’s cases

MicroStrategy Enterprise Analytics

A powerful Business Intelligence solution that meets the needs of Business and IT in a single platform.

Best for ITBest for business

Data Discovery

and Visualization

• Exceptional Ease of

Use

• Schema-free

• Data Preparation +

Blending

• Rapid prototyping

• Agile and visual

analysis

Geo-SpatialAppsGraphsDashboards Banded

Reports

OLAP

Reports

Data

DiscoveryPredictive

Analytics

• Reusable Object Model

• Single Security Architecture

• Single Metadata

• Optimized Multi-Source Data Access

• Design Once Deploy Everywhere

• Enterprise Reporting on any device

• Highest User Scale

• High Data Scale

• Fastest Query Performance

• Secure, Personalized Analytics for

10,000s

MicroStrategy Analytics PlatformMicroStrategy Desktop

Alerts &

Distribution

Rapid | Intuitive Powerful | Scalable | Extensible | Governed | Highly Performant | Secure

Visualizations

What is Hadoop?

• Hadoop is an open source software framework to support distributive storage and

processing for large datasets on commodity hardware.

• It is a software platform designed to store and process quantities of data that are too

large for just one particular device or server.

• It has two main components:

• HDFS: Hadoop Distributed File System. It is the “secret sauce” that enables

Hadoop to store huge files. It’s a scalable file system that distributes and

stores data across all machines in a Hadoop cluster.

• Map Reduce: MapReduce is the system used to efficiently process the large

amount of data Hadoop stores in HDFS. Originally created by Google, its

strength lies in the ability to divide a single large data processing job into

smaller tasks.

Extreme Scalability and Reliability

These sources provide scalable and reliable data

storage that is designed to span large clusters of

commodity servers

Affordable Data Storage

Looking to store large volumes and variety of data in

a relational source, is not possible anymore. It is

expensive and Hadoop offers much cheaper data

storage

Highly Flexible

Hadoop bypasses the need to specify a

schema/structure the data. Allows to dump the data

and ask questions later.

What is the reason behind Hadoop’s value proposition?

Challenges with Big Data Analytics

Performance: Organizations seeking to implement advance analytics on Hadoop,

struggle for high performance.

MapReduce persists intermediate results to disk after each pass through the data;

as a result, iterative algorithms implemented in MapReduce run significantly slower

than they do on distributed in-memory platform

Data Federation: Many real world applications require integration across projects,

which is challenging due to the multiple analytic point solutions introduced by

Hadoop.

Time to Market: Enterprises are always keen to shorter their time to market and it is

challenging when dealing with several types of sources for varied types of data and

then different technologies to query them.

Data Cleansing: Enterprises find it challenging to cleanse varied forms of data to

make it ready for analytics.

Se

co

nd

s E

xe

cu

tio

n T

ime

s M

inu

tes

GBs Data Volume PBs

In-m

em

ory

RD

BM

S

Hadoop

In-memory

RDBMS

Hadoop

Query Execution Times in an environment with Hadoop

Support for More Big Data Sources

Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single Database

Data Warehouse

Appliances

MapReduce &

NOSQL Databases

Relational

Databases

Multidimensional

Databases

Columnar

Databases

SaaS-Based App

Data

HANA

BigInsights

Parallel Data Warehouse

Elastic Map

Reduce

Analysis Services

Redshift

Brin

g A

ll R

ele

va

nt D

ata

to

Decis

ion

Ma

ke

rs

Distribution

Clipboard MicroStrategy

Dataset

Google

Analytics Zendesk

HDFS

Generic Web

ServicesSOAP REST

Generic Web

Services with

OAuth..many more..

User / Departmental

Data

http://www.google.com/url?sa=i&rct=j&q=ibm&source=images&cd=&cad=rja&docid=HNmZSfbwBlZ0NM&tbnid=MuIYzE7gaWQMqM:&ved=0CAUQjRw&url=http://dailycloudinfo.com/ibm-opens-smarter-data-center-in-mexico/&ei=MEZ1UbyVFeOh2QXE_IGIAw&bvm=bv.45512109,d.b2I&psig=AFQjCNHBQERqeMvzsCjj5bA-6oZGZhQKEg&ust=1366726574225585

http://www.google.com/url?sa=i&rct=j&q=ibm&source=images&cd=&cad=rja&docid=HNmZSfbwBlZ0NM&tbnid=MuIYzE7gaWQMqM:&ved=0CAUQjRw&url=http://dailycloudinfo.com/ibm-opens-smarter-data-center-in-mexico/&ei=MEZ1UbyVFeOh2QXE_IGIAw&bvm=bv.45512109,d.b2I&psig=AFQjCNHBQERqeMvzsCjj5bA-6oZGZhQKEg&ust=1366726574225585

Usage Patterns for MicroStrategy with Hadoop as a Data Source

Maturity of Data Access

RDBMS

1.Visually explore

subject–matter

extract in-memory

through a one-time

query to Hadoop

2.Self-service

parameterized

queries directly

to Hadoop

Multi-dimensional

Business Model

ETL

3.Model-driven

access to Hadoop.

4.Query multi-source

schema model and

drill down among

Intelligent Cubes,

EDW, Hive

Ways to Connect and Query Hadoop

#1 SQL on Hadoop/HDFS

MicroStrategy

Analytics Platform

Hadoop

HDFS

Hive ODBC

Connector

Hadoop

Distribution Hive

• This is the most popular way of querying Hadoop, via Hive/Impala

• Hive allows users who aren’t familiar with programming to access and analyze big data in

a less technical way, using a SQL-like syntax called Hive Query Language (HiveQL).

Hive is used for complex, long-running tasks and analyses on large sets of data, e.g.

analyzing the performance of every store within a particular region for a chain retailer.

• Impala: Like Hive, Impala also uses SQL syntax to query Hadoop. Impala is used for

analyses that you want to run and return quickly on a small subset of your data, e.g.

analyzing company finances for a daily or weekly report. Not ideal for complex data

manipulation, data preparation etc.

• Hive is a screwdriver and Impala is a drill bit.

Apache

Shark/Spark

Apache Pig

Apache

Hive

SQL on Hadoop

How does MicroStrategy integrates with Hadoop?

• MicroStrategy certifies

Cloudera Impala, Google Big

Query and Pivotal HAWQ as a

data source.

• MicroStrategy optimizes and

certifies Hadoop/Hive as a

data source.

• MicroStrategy certifies

Spark/Shark on HDFS.

• MicroStrategy also provides a

connector to execute Freeform

Pig-Latin reports

MicroStrategy

Analytics Platform

Hadoop

HDFSBig Data

Engine/Hadoop

Gateway

NEW

#2 Tap into Hadoop Natively

Ways to Connect and Query Hadoop

• We launched this connectivity with v10. Big Data Engine (BDE), is a native YARN based

application that enables direct access to HDFS.

• YARN (Yet Another Resource Negotiator) is the prerequisite for Enterprise Hadoop,

providing resource management and a central platform to deliver consistent operations,

security, and data governance tools across Hadoop clusters.

• Use Case: Fulfills faster data loading of data from HDFS and leverage our in-memory

layer for analytics.

How Big Data Engine works?

Hadoop Cluster

Data Node

Big Data Execution Engine

Name NodeData Node

Big Data Execution Engine

Datapartition

Datapartition

Big Data Query Engine

….

In-memory Cubes (PRIME)

BDE Streamer

• Big Data Engine has two components:

• Big Data Query Engine (BDQE)

• Big Data Execution Engine (BDEE)

• I-server sends the query to BDQE. BDQE will

further assign sub task to related BDEE which

runs on each data node of Hadoop

• BDEE will work in parallel, perform the

needed aggregation and wrangling and push

the data to the I-server

• BDE Streamer will merge the data from each

BDEE and pass the final result to Analytical

Engine to either publish the cube or render it

directly on VI

I-Server

Three Steps for Self Service Access to Hadoop with Native Connectivity

Web logs, survey/feedback forms,

machine generated data…

Import Data from

HDFS directly

Cleanse, Refine with

Data Wrangler

Analyze with

Visual Insight

• Cleanse, refine and transform

data from HDFS, make it

ready for analysis.

• Designed for business users

Get full insights from

Hadoop/HDFS data using Visual

Insight

22

Some experience

RetailKey BI Characteristics:

Business Use and Benefits

INDUSTRY: Retail (Online Commerce)

BI COMPONENTS: Reports, Dashboards, VI

USERS: ~200

DATABASE: Hadoop, Oracle

HADOOP DISTRIBUTION: Apache

VOLUME OF DATA: Petabytes

TYPE OF DATA: Web Logs, Online behavior

APPLICATIONS: Sales Analysis

• Analyzing web logs/online behavior stored in Hadoop. Dashboards and VI

analysis run against in-memory cubes, while ad-hoc reports run live against

the Hadoop data using a combination of Hive/Shark

• Match customer transactions in Oracle DWH against clickstream data in

Hadoop to gather a holistic view of the online customer

• End users do not need to code with MapReduce

• Developers are more productive delivering self service BI through a tool

instead of coding custom user interfaces

Entertainment

INDUSTRY: Entertainment

BI COMPONENTS: Traditional Reports, VI

USERS: ~200

DATABASE: Hadoop, Teradata

HADOOP DISTRIBUTION: Amazon EMR

VOLUME OF DATA: Petabytes

TYPE OF DATA: Log and Events data from

Streaming Service

APPLICATIONS: Sales Analysis

Key BI Characteristics:

• Sales Analysis generally with a new launch in new region, quick report analysis to understand the new accounts,

number of hours of viewing etc.

• Directly querying and reporting from MicroStrategy on logs via Hive

• Able to make better Sales decisions

• Short-lived analytics on the use of streaming service

• Easy access for analysts to Hadoop data without using MapReduce

• Shortcut the ETL to warehouse cycle that would otherwise take weeks

• Extend business model to create own content:

https://en.wikipedia.org/wiki/List_of_original_programs_distributed_by_Netflix


Digital MediaKey BI Characteristics:


INDUSTRY: Digital Media

BI COMPONENTS: 1 Application; Reports, VI, Dashboards

DATABASE: Hadoop, Hive, Impala

HADOOP DISTRIBUTION: Cloudera

VOLUME OF DATA: Over 1 Billion traffic attribute

combinations

APPLICATIONS: Traffic Attribute Multiplier

• The Traffic Attribute Multiplier application is helping Adconion to:

o Target their digital ads better

o Shorten the time to prepare and tune models

o Provide better ad delivery ROI for their customers

• Leveraging MicroStrategy’s integration to Impala and the rich visualizations

library, making it easy to be consumed and scalable for business users

• Data blending and data clustering for better business insights

• Achieved 2.4% improvement in ad budgets spending efficiency

• Evaluated MSTR against Tableau, Pentaho, and Jaspersoft and chose us for

our completeness

high performance is no longer a “nice to have” in ...files.meetup.com/14317202/twingo event...

Documents