big data & advanced analytics roadshow...hadoop and spark on- premises. provisioning hdinsight...

12
BIG DATA & Advanced Analytics Roadshow Big Data-as-a-Service Demos

Upload: others

Post on 15-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

BIG DATA &Advanced AnalyticsRoadshowBig Data-as-a-Service Demos

Page 2: BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

DEMO OVERVIEW

Hadoop and SPARK on-premises

Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI

Enabling independent scaling compute & storage

Pricing it up: Deriving insights from terabytes of data for under $10/day

1

2

3

4

Page 3: BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

DEPLOYMENT

MODELSOn Premise Deployment Big Data-as-a-Service

Azure HDInsight

Azure SQL Data Warehouse

Amazon Elastic MapReduce

Amazon RedShift

Microsoft Analytics Platform System (APS)

Oracle Big Data Appliance

Hortonworks Data Platform (HDP)

Cloudera (CDH)

Pivotal Data Computing Appliance (DCA)

Page 4: BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

hadoop fs -put <localsrc> ... <HDFS_dest_Path>

ON PREMISE DEMO

HADOOP/ SPARK

• Import Data from local to HDFS

• Create Hive External Tables• Run Sample Covariance script using HiveQL• Run the same Covariance script using Spark SQL

Objectives:

Page 5: BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

Hadoop Component

HiveWhat is Hive

• Hive is a SQL-Like data warehousing layer that lies on top of MapReduce.

• Hive Query Language (HQL) is translated into MapReduce jobs, yet the language is familiar to SQL

professionals.

• Used for batch & interactive processing

• Supports ACID operations, UDFs, UDTF, UDAF, Window Functions

• Supports cubes, dimensions, and star schemas

• Supports Storage Based Authorization and SQL Standard Based Authorization and Authentication

Page 6: BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

Yarn Application

SparkWhat is Spark

The Spark core is complemented by a set of powerful, higher-level libraries which can be seamlessly used in the same application. Spark Core API and Execution Model

• RDDs & DAG• Scala• Python • Java• R

Page 7: BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

WHAT IT MEANS

COVARIANCE

A positive covariance means that asset returns moved together. If investment instruments or stocks tend to be up or down during the same time periods, they have positive covariance.

Covariance (noun)

Covariance is a financial term that represents the degree or amount that two stocks move together or apart from each other. With covariance, investors have the opportunity to seek out different investment options based upon their respective risk profile. It is a statistical measure of how one investment moves in relation to the other.

A negative covariance means returns move inversely. If one investment instrument tends to be up while the other is down, they have negative covariance.

Page 8: BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

CODE

HIVEQL

select a.STOCK_SYMBOL, b.STOCK_SYMBOL, month(a.STOCK_DATE),

(AVG(a.STOCK_PRICE_HIGH*b.STOCK_PRICE_HIGH) –(AVG(a.STOCK_PRICE_HIGH)*AVG(b.STOCK_PRICE_HIGH)))

from NYSE a join NYSE b on

a.STOCK_DATE=b.STOCK_DATE where a.STOCK_SYMBOL<b.STOCK_SYMBOL

Group by a.STOCK_SYMBOL, b. STOCK_SYMBOL, month(a.STOCK_DATE);

Page 9: BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

THE

RESULT

STOCKS QRR AND QTM

These are having more positive covariance than negative covariance, so having high probability that stocks will move together in same direction.

STOCKS QRR AND QXM

These are mostly having negative covariance. So there exists a greater probability of stock prices moving in an inverse direction.

STOCKS QTM AND QXM

These are mostly having positive covariance for most of all months, so these tend to move in the same direction most of the times.

Page 10: BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

DEMO

HDINSIGHT &AZURE SQL DW

PROVISIONING SCALING DATA INGESTION QUERYING

Page 11: BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

AZURE

PRICING IT UP

https://azure.microsoft.com/en-us/pricing/calculator/

Page 12: BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

Orion [email protected]

Twitter: @oriongm

BIG DATA &

Advanced Analytics Roadshow

Questions?