big data & advanced analytics roadshow...hadoop and spark on- premises. provisioning hdinsight...

BIG DATA &Advanced AnalyticsRoadshowBig Data-as-a-Service Demos

DEMO OVERVIEW

Hadoop and SPARK on-premises

Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI

Enabling independent scaling compute & storage

Pricing it up: Deriving insights from terabytes of data for under $10/day

1

2

3

4

DEPLOYMENT

MODELSOn Premise Deployment Big Data-as-a-Service

Azure HDInsight

Azure SQL Data Warehouse

Amazon Elastic MapReduce

Amazon RedShift

Microsoft Analytics Platform System (APS)

Oracle Big Data Appliance

Hortonworks Data Platform (HDP)

Cloudera (CDH)

Pivotal Data Computing Appliance (DCA)

hadoop fs -put <localsrc> ... <HDFS_dest_Path>

ON PREMISE DEMO

HADOOP/ SPARK

• Import Data from local to HDFS

• Create Hive External Tables• Run Sample Covariance script using HiveQL• Run the same Covariance script using Spark SQL

Objectives:

Hadoop Component

HiveWhat is Hive

• Hive is a SQL-Like data warehousing layer that lies on top of MapReduce.

• Hive Query Language (HQL) is translated into MapReduce jobs, yet the language is familiar to SQL

professionals.

• Used for batch & interactive processing

• Supports ACID operations, UDFs, UDTF, UDAF, Window Functions

• Supports cubes, dimensions, and star schemas

• Supports Storage Based Authorization and SQL Standard Based Authorization and Authentication

Yarn Application

SparkWhat is Spark

The Spark core is complemented by a set of powerful, higher-level libraries which can be seamlessly used in the same application. Spark Core API and Execution Model

• RDDs & DAG• Scala• Python • Java• R

WHAT IT MEANS

COVARIANCE

A positive covariance means that asset returns moved together. If investment instruments or stocks tend to be up or down during the same time periods, they have positive covariance.

Covariance (noun)

Covariance is a financial term that represents the degree or amount that two stocks move together or apart from each other. With covariance, investors have the opportunity to seek out different investment options based upon their respective risk profile. It is a statistical measure of how one investment moves in relation to the other.

A negative covariance means returns move inversely. If one investment instrument tends to be up while the other is down, they have negative covariance.

CODE

HIVEQL

select a.STOCK_SYMBOL, b.STOCK_SYMBOL, month(a.STOCK_DATE),

(AVG(a.STOCK_PRICE_HIGH*b.STOCK_PRICE_HIGH) –(AVG(a.STOCK_PRICE_HIGH)*AVG(b.STOCK_PRICE_HIGH)))

from NYSE a join NYSE b on

a.STOCK_DATE=b.STOCK_DATE where a.STOCK_SYMBOL<b.STOCK_SYMBOL

Group by a.STOCK_SYMBOL, b. STOCK_SYMBOL, month(a.STOCK_DATE);

THE

RESULT

STOCKS QRR AND QTM

These are having more positive covariance than negative covariance, so having high probability that stocks will move together in same direction.

STOCKS QRR AND QXM

These are mostly having negative covariance. So there exists a greater probability of stock prices moving in an inverse direction.

STOCKS QTM AND QXM

These are mostly having positive covariance for most of all months, so these tend to move in the same direction most of the times.

DEMO

HDINSIGHT &AZURE SQL DW

PROVISIONING SCALING DATA INGESTION QUERYING

AZURE

PRICING IT UP

https://azure.microsoft.com/en-us/pricing/calculator/



Orion [email protected]

Twitter: @oriongm

BIG DATA &

Advanced Analytics Roadshow

Questions?

big data & advanced analytics roadshow...hadoop and spark on- premises. provisioning hdinsight...

Documents