near real-time data warehousing the final frontier? · 2019. 10. 21. · “real time ” low...
TRANSCRIPT
![Page 1: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/1.jpg)
NEAR REAL-TIME DATA
WAREHOUSING –THE FINAL FRONTIER?
![Page 3: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/3.jpg)
1.Data
WarehouseLet’s start with a simple
definition
![Page 4: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/4.jpg)
“"A data warehouse is a copy of transaction
data specifically structured for query
and analysis."
Ralph Kimball
4
![Page 5: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/5.jpg)
5
Data Integration Methods
Traditional ETL
CDC Replication
Real Time Streaming
Batch BasedHigh Latency
“Real Time”Low Latency
Low LatencyIn-line in-memory transformation
![Page 6: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/6.jpg)
Data Movement
▹ Batching▹ Micro Batching▹ Streaming
6
![Page 7: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/7.jpg)
Main Characteristics
Batching
▹ New data elements grouped into a batch
▹ Based on a time-based batch interval
Micro Batching
▹ New data elements more frequently grouped into a batch
▹ Real-time analytics not essential
Streaming
▹ Event driven architecture
▹ Low latency is critical
7
![Page 8: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/8.jpg)
2.Big Data
Yet another simple definition...
![Page 9: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/9.jpg)
“"larger, more
complex data sets, especially from new
data sources."
source: oracle.com
![Page 10: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/10.jpg)
BIG DATA
The final nail in the coffin?
10
3 V's:1. Velocity2. Volume3. Variety
![Page 11: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/11.jpg)
3.Data Lake
Last definition – I promise...
![Page 12: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/12.jpg)
“"A centralized
repository that allows you to store all your structured and unstructured data at any scale."
12
source: amazon.com
![Page 13: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/13.jpg)
Big Data – Data Lake
▹ Schema on Read
▹ Data stored in raw form
▹ "Freeform"
Are Apples & Pears the same?
Data Warehouse
▹ Schema on Write
▹ Structured Data
▹ Query limitations
▹ Value of data clear from the outset
13
![Page 14: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/14.jpg)
Complimentary Technologies
14
source: DellEMC.com
📊
💯
![Page 15: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/15.jpg)
Use Cases
Data Warehouse
▹ Highly curated data
▹ Structured standard reporting
Data Lake
▹ Data Scientist access to raw data
▹ Flexible, reactive business model
15
![Page 16: NEAR REAL-TIME DATA WAREHOUSING THE FINAL FRONTIER? · 2019. 10. 21. · “Real Time ” Low Latency Low ... Based on a time-based batch interval Micro Batching New data elements](https://reader033.vdocuments.us/reader033/viewer/2022052022/6037dfda3f2cdc13c34bf512/html5/thumbnails/16.jpg)
THE BEST OF BOTH WORLDS?
Streaming
▹ Next generation Data Integration
▹ In-flight
▹ Real-time
▹ Can load into data lakes and data warehouses
16