![Page 1: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/1.jpg)
Tejas Shah – Senior Program Manager
Microsoft Data Platform Team
Twitter: @mr_tejs
LinkedIn: https://www.linkedin.com/in/tejas-shah-72a62027/
Introduction to
SQL Server 2019 Big Data Clusters
![Page 2: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/2.jpg)
The Big Data Landscape
![Page 3: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/3.jpg)
Data GrowthComputing and Storage advances impact data collection abilities
• Computing and Storage technologies allow greater data collection points
• They also allow longer historical data storage, and as time goes on become part of that storage lineage
• Walmart is a classic example of data proliferation and leverage
![Page 4: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/4.jpg)
Use-CasesEvery Industry classification
benefits from Big Data, Retail
and Finance leads the way
Industry Sector Primary Use-Cases
Retail Demand prediction
In-store analytics
Supply chain optimization
Customer retention
Cost/Revenue analytics
HR analytics
Inventory control
Finance Cyberattack Prevention
Fraud detection
Customer segmentation
Market analysis
Risk analysis
Blockchain
Customer retention
Healthcare Fiscal control analytics
Disease Prevention prediction and classification
Clinical Trials optimization
Patient load analysis
Episode analytics
Public Sector Revenue prediction
Education effectiveness analysis
Transportation analysis and prediction
Energy demand and supply prediction and control
Defense readiness predictions and threat analysis
Manufacturing Predictive Maintenance (PdM)
Anomaly Detection
Pattern analysis
Agriculture Food Safety analysis
Crop forecasting
Market forecasting
Pipeline Optimization
![Page 5: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/5.jpg)
Scale-Out Processing
![Page 6: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/6.jpg)
Scaled Processing and Scaled StorageThe foundations of scale
HadoopSpark
![Page 7: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/7.jpg)
VirtualizationHardware Abstraction
Building on hardware, you can create a complete “PC” on top of a Hypervisor layer, which abstracts out the hardware. You still own the Operating System and up
This allows for scale by ring-fencing OS-level dependencies
![Page 8: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/8.jpg)
ContainersAbstracting the OS, Allowing complete portability
Containers go one level further than the Hypervisor, and focusing on binaries and applications
Storage and networking are a consideration
Scale is achieved through multiple containers
![Page 9: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/9.jpg)
Node
NodeNode
Node
Node
Node
Node
kube-proxykubelet
Pod
Pod Pod
KubernetesMaster
![Page 10: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/10.jpg)
KubernetesMaster
Web Tier
Web Tier
Web Tier
Business Logic
Business Logic
Data Tier
Data Tier Data Tier
Data Tier
Data Tier
![Page 11: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/11.jpg)
LinuxWindowsSQL Server
ContainersSQL Server SQL Server
On Premises Public/Private cloud
Hybrid
![Page 12: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/12.jpg)
SQL Server 2019 Big Data Cluster – Complete
Architecture
![Page 13: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/13.jpg)
KubernetesMaster
Web Tier
Web Tier
Web Tier
Business Logic
Business Logic
Data Tier
Data Tier Data Tier
Data Tier
Data Tier
![Page 14: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/14.jpg)
LOB AppsApplication Calls to SQL Server Master Instance. Relational, multi-type, Graph, and ML features supported. No code change.
![Page 15: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/15.jpg)
Control Plane
ComputePlane
Data Plane
Compute Pool Compute Pool
HDFS
Storage Pool SQL Data Pool SQL Data PoolStorage Pool Storage Pool
KubernetesMaster
SQL ServerMaster
SQL Cluster Administration
PortalKnox Gateway
Livy
HIVE
GrafanaDashboard
Kibana Dashboard
SQL Server SparkSQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
HDFS
SQL Server Spark
HDFS
SQL Server SparkSQL Server
App Pool
Job (SSIS)
(Web Apps)
MLServer
PolyBase Connector
![Page 16: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/16.jpg)
SQL Server 2019 Big Data – Data Virtualization
![Page 17: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/17.jpg)
Control Plane
ComputePlane
Data Plane
Compute Pool Compute Pool
HDFS
Storage Pool SQL Data Pool SQL Data PoolStorage Pool Storage Pool
KubernetesMaster
SQL ServerMaster
SQL Cluster Administration
PortalKnox Gateway
Livy
HIVE
GrafanaDashboard
Kibana Dashboard
SQL Server SparkSQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
HDFS
SQL Server Spark
HDFS
SQL Server SparkSQL Server
App Pool
Job (SSIS)
(Web Apps)
MLServer
PolyBase Connector
![Page 18: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/18.jpg)
HDFS
Compute Pool
NoSQL
Multiple Data SourcesData Virtualization Scale-out calls through SQL Server Master Instance using External Tables, through the Compute Pool using PolyBase Connectors at the Source
![Page 19: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/19.jpg)
RDBMS
NoSQL
Scale-Out
PolyBase Connector
PolyBase Connector
PolyBase Connector
PolyBaseExternal Table
![Page 20: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/20.jpg)
SQL Server 2019 Big Data Cluster – Data Mart
![Page 21: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/21.jpg)
Control Plane
ComputePlane
Data Plane
Compute Pool Compute Pool
HDFS
Storage Pool SQL Data Pool SQL Data PoolStorage Pool Storage Pool
KubernetesMaster
SQL ServerMaster
SQL Cluster Administration
PortalKnox Gateway
Livy
HIVE
GrafanaDashboard
Kibana Dashboard
SQL Server SparkSQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
HDFS
SQL Server Spark
HDFS
SQL Server SparkSQL Server
App Pool
Job (SSIS)
(Web Apps)
MLServer
PolyBase Connector
![Page 22: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/22.jpg)
HDFS
Compute Pool
Data Pool
NoSQLData Persistence Using Multiple Data SourcesData Virtualization Scale-out calls through SQL Server Master Instance using External Tables, through the Compute Pool using PolyBase Connectors at the Source. Results are stored in the Shards of the Data Pool.
![Page 23: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/23.jpg)
RDBMS
Cosmos DB
HDFS
(Shards)
PolyBase Connector
PolyBase Connector
PolyBase Connector
SQL Server Data Pool
Compute Pool
![Page 24: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/24.jpg)
ExampleSQL Server Big Data Cluster – Data Mart
![Page 25: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/25.jpg)
SQL Server 2019 Big Data Cluster – Data Lake,
Machine Learning and Spark
![Page 26: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/26.jpg)
Control Plane
ComputePlane
Data Plane
Compute Pool Compute Pool
HDFS
Storage Pool SQL Data Pool SQL Data PoolStorage Pool Storage Pool
KubernetesMaster
SQL ServerMaster
SQL Cluster Administration
PortalKnox Gateway
Livy
HIVE
GrafanaDashboard
Kibana Dashboard
SQL Server SparkSQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
HDFS
SQL Server Spark
HDFS
SQL Server SparkSQL Server
App Pool
Job (SSIS)
(Web Apps)
MLServer
PolyBase Connector
![Page 27: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/27.jpg)
Storage Pool
HDFS
Compute Pool
App Pool
Data Pool
NoSQL
Multiple Data SourcesData Virtualization Scale-out calls through SQL Server Master Instance using External Tables through the Compute Pool to the Data Pool
Scaled Data AnalysisData Mart Scale-out calls through SQL Server Master Instance using External Tables into Data Pool. Direct calls to a Data Lake (HDFS) using the Storage Pool.
Data ScienceData Engineering and Pipelines for Models with big data using Notebooks and other tools through to Spark, ingesting and processing data using the Storage Pool
AI EnablementPrediction and Classification Scoring to AI apps using the App Pool
![Page 28: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/28.jpg)
ExampleSpark Query Notebook
![Page 29: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/29.jpg)
SQL Server 2019 Big Data Cluster – Installation, Tools, Management and Monitoring
![Page 30: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/30.jpg)
Control Plane
ComputePlane
Data Plane
Compute Pool Compute Pool
HDFS
Storage Pool SQL Data Pool SQL Data PoolStorage Pool Storage Pool
KubernetesMaster
SQL ServerMaster
SQL Cluster Administration
PortalKnox Gateway
Livy
HIVE
GrafanaDashboard
Kibana Dashboard
SQL Server SparkSQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
HDFS
SQL Server Spark
HDFS
SQL Server SparkSQL Server
App Pool
Job (SSIS)
(Web Apps)
MLServer
PolyBase Connector
![Page 31: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/31.jpg)
ExampleSQL Server Big Data
Cluster – Management and Monitoring
![Page 32: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/32.jpg)
Control Plane
ComputePlane
Data Plane
Compute Pool Compute Pool
HDFS
Storage Pool SQL Data Pool SQL Data PoolStorage Pool Storage Pool
KubernetesMaster
SQL ServerMaster
SQL Cluster Administration
PortalKnox Gateway
Livy
HIVE
GrafanaDashboard
Kibana Dashboard
SQL Server SparkSQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
SQL Server
HDFS
SQL Server Spark
HDFS
SQL Server SparkSQL Server
App Pool
Job (SSIS)
(Web Apps)
MLServer
PolyBase Connector
![Page 33: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/33.jpg)
Takeaways
SQL Server 2019 Big Data cluster includes SQL Server together with the HDFS and Spark Compute engine as one package for big data processing, Machine Learning and AI
Spark is a distributed compute engine that provides a unified framework for E2E big data processing pipeline including Machine learning and AI
You can use SQL Server 2019 to create a secure, hybrid, machine learning architecture starting with data preparation, training a machine learning model, operationalizing your Model and using it for scoring
![Page 34: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/34.jpg)
Resources
Official documentation – aka.ms/bdc
In-depth training - aka.ms/sqlworkshops
![Page 35: Introduction to SQL Server 2019 Big Data Clusters · Introduction to SQL Server 2019 Big Data Clusters. The Big Data Landscape. Data Growth ... Kibana Dashboard SQL ServerSpark SQL](https://reader034.vdocuments.us/reader034/viewer/2022052314/5f1a9f223ab9e31138585728/html5/thumbnails/35.jpg)
THANK YOU