big data & advanced analytics roadshow...hadoop and spark on- premises. provisioning hdinsight...
TRANSCRIPT
BIG DATA &Advanced AnalyticsRoadshowBig Data-as-a-Service Demos
DEMO OVERVIEW
Hadoop and SPARK on-premises
Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI
Enabling independent scaling compute & storage
Pricing it up: Deriving insights from terabytes of data for under $10/day
1
2
3
4
DEPLOYMENT
MODELSOn Premise Deployment Big Data-as-a-Service
Azure HDInsight
Azure SQL Data Warehouse
Amazon Elastic MapReduce
Amazon RedShift
Microsoft Analytics Platform System (APS)
Oracle Big Data Appliance
Hortonworks Data Platform (HDP)
Cloudera (CDH)
Pivotal Data Computing Appliance (DCA)
hadoop fs -put <localsrc> ... <HDFS_dest_Path>
ON PREMISE DEMO
HADOOP/ SPARK
• Import Data from local to HDFS
• Create Hive External Tables• Run Sample Covariance script using HiveQL• Run the same Covariance script using Spark SQL
Objectives:
Hadoop Component
HiveWhat is Hive
• Hive is a SQL-Like data warehousing layer that lies on top of MapReduce.
• Hive Query Language (HQL) is translated into MapReduce jobs, yet the language is familiar to SQL
professionals.
• Used for batch & interactive processing
• Supports ACID operations, UDFs, UDTF, UDAF, Window Functions
• Supports cubes, dimensions, and star schemas
• Supports Storage Based Authorization and SQL Standard Based Authorization and Authentication
Yarn Application
SparkWhat is Spark
The Spark core is complemented by a set of powerful, higher-level libraries which can be seamlessly used in the same application. Spark Core API and Execution Model
• RDDs & DAG• Scala• Python • Java• R
WHAT IT MEANS
COVARIANCE
A positive covariance means that asset returns moved together. If investment instruments or stocks tend to be up or down during the same time periods, they have positive covariance.
Covariance (noun)
Covariance is a financial term that represents the degree or amount that two stocks move together or apart from each other. With covariance, investors have the opportunity to seek out different investment options based upon their respective risk profile. It is a statistical measure of how one investment moves in relation to the other.
A negative covariance means returns move inversely. If one investment instrument tends to be up while the other is down, they have negative covariance.
CODE
HIVEQL
select a.STOCK_SYMBOL, b.STOCK_SYMBOL, month(a.STOCK_DATE),
(AVG(a.STOCK_PRICE_HIGH*b.STOCK_PRICE_HIGH) –(AVG(a.STOCK_PRICE_HIGH)*AVG(b.STOCK_PRICE_HIGH)))
from NYSE a join NYSE b on
a.STOCK_DATE=b.STOCK_DATE where a.STOCK_SYMBOL<b.STOCK_SYMBOL
Group by a.STOCK_SYMBOL, b. STOCK_SYMBOL, month(a.STOCK_DATE);
THE
RESULT
STOCKS QRR AND QTM
These are having more positive covariance than negative covariance, so having high probability that stocks will move together in same direction.
STOCKS QRR AND QXM
These are mostly having negative covariance. So there exists a greater probability of stock prices moving in an inverse direction.
STOCKS QTM AND QXM
These are mostly having positive covariance for most of all months, so these tend to move in the same direction most of the times.
DEMO
HDINSIGHT &AZURE SQL DW
PROVISIONING SCALING DATA INGESTION QUERYING
AZURE
PRICING IT UP
https://azure.microsoft.com/en-us/pricing/calculator/