enabling next gen analytics with azure data lake and streamsets
TRANSCRIPT
![Page 1: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/1.jpg)
Enabling Next Gen Analytics with Azure Data Lake
![Page 2: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/2.jpg)
Microsoft Azure
![Page 3: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/3.jpg)
Microsoft Cloud
Global Trusted Hybrid
![Page 4: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/4.jpg)
Big Data Definition
Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.
– Gartner, Big Data Definition*
* Gartner, Big Data (Stamford, CT.: Gartner, 2016), URL: http://www.gartner.com/it-glossary/big-data/
![Page 5: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/5.jpg)
Big Data as a Cornerstone of Cortana Intelligence
Action
People
Automated Systems
Apps
Web
Mobile
Bots
Intelligence
Dashboards & Visualizations
Cortana
Bot Framework
Cognitive Services
Power BI
Information Management
Event Hubs
Data Catalog
Data Factory
Machine Learning and Analytics
HDInsight (Hadoop and Spark)
Stream Analytics
Intelligence
Data Lake Analytics
Machine Learning
Big Data Stores
SQL Data Warehouse
Data Lake Store
Data Sources
Apps
Sensors and devices
Data
![Page 6: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/6.jpg)
However, there are challenges to Big Data…
Obtaining skills and capabilities
Determining howto get value
Integrating with existing IT
investments*Gartner: Survey Analysis – Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)
![Page 7: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/7.jpg)
Azure HDInsightA Cloud Spark and Hadoop service for the Enterprise
Reliable with an industry leading SLA
Enterprise-grade security and monitoring
Productive platform for developers and scientists
Cost effective cloud scale
Integration with leading ISV applications
Easy for administrators to manage
63% lower TCO than deploy your own Hadoop on-premises*
*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”
![Page 8: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/8.jpg)
• One-click deploy experience for installing apps.
• Fully managed PaaS offering.
• Access to entire cluster and secure by default.
• Install apps on new or existing clusters.
• Ease of authoring and deployment.
• Certified partners only.
HDInsight Application Platform
![Page 9: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/9.jpg)
Hybrid cloud, a reality today
74%
Enterprises believe a hybrid cloud will enable
business growth1
82%
Enterprises have a hybrid cloud strategy, up from 74
percent a year ago2
Workload requirements
Regulation
Sensitive data
Customization
Latency
Legacy support
![Page 10: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/10.jpg)
Introduction to StreamSets for Microsoft Azure
![Page 11: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/11.jpg)
Who is StreamSets?
Enterprise Data DNA
StreamSets Mission
Top-tier Investors Commercial Customers Across Verticals
150,000 downloads⅓ of the Fortune 100
Empower enterprises to harness their data in motion.
ProductsStreamSets Dataflow Performance Manager™ (DPM)StreamSets Data Collector™ (open source)
Strong Partner Ecosystem Open Source Success
![Page 12: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/12.jpg)
StreamSets Solution
Desired Business Outcomes
● Developer & operator efficiency
● On-time delivery
● Data trust & governance
Data in motion middleware that ensures data trust.
![Page 13: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/13.jpg)
StreamSets Dataflow Performance Manager (DPM)
StreamSets ProductsStreamSets
Data Collector (SDC)
Open source tooling and engine to build complex any-to-any dataflows.
Cloud Service to map, measure and master
dataflow operations.
DATAFLOW LIFECYCLE
DEVELOP OPERATE
EVOLVE (Proactive)
REMEDIATE (Reactive)
● Developers● Scientists● Architects
● Operators● Stewards● Architects
![Page 14: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/14.jpg)
StreamSets Deployment Models
Install on Local Machine
Install on Azure VM
![Page 15: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/15.jpg)
StreamSets Deployment Models
![Page 16: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/16.jpg)
StreamSets and Microsoft Azure in Use in a Major Bank
![Page 17: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/17.jpg)
The Customer
● Forbes Global 500 financial services company.
● Adopting and moving into cloud at rapid phase.
● Growing rapidly both via acquisitions and organic growth.
![Page 18: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/18.jpg)
Key Challenges Related to Data Movement
● Number of legacy tools both customer and vendor built.
● Security policy changes very hard to manage.
● Lack of security governance due to fragmentation of tools and lack of standardization.
● Difficulty onboarding new data sources as soon as the are created (technology change).
● Data drift (unexpected changes) very hard to manage at scale.
![Page 19: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/19.jpg)
Key Factors for the Customer to Consider Streamsets
● KPIs
● Delivery guarantees
● Multiple types of origins and destinations using a single tool.
● Works natively with Microsoft Azure as part of HDInsight or Azure Virtual Machine or deployed on premise.
● Visualization of actual data transfers.
● Define security boundaries, actors etc.
● Repeating pattern
![Page 20: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/20.jpg)
Customer’s Business Objectives
● Short Compute and Long Storage (ADLS,Azure Blob) in turn fine-grained cost control.
● Ability to build microanalytics framework. For instance, instead of taking entire dataset, build same micro datasets and build microanalytics framework and derive results faster (faster iteration).
● Move away from traditional Data Lake to Azure Data Lake to manage cost and scale.
![Page 21: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/21.jpg)
Use Cases for StreamSetsUse Cases
1. Data Movement from On-Premise to Azure Data Lake
2. Consolidating Migration tools into single tool
3. Building DR for HDInsight Kafka workloads.
![Page 22: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/22.jpg)
Resources / Q & A
StreamSets Data Collector @ Azure Marketplacehttps://azure.microsoft.com/en-us/marketplace/partners/streamsets/streamsets-data-collector/
Ingest Data into Microsoft Azure Data Lake (YouTube)https://www.youtube.com/watch?v=c1dVnOK7Luw
StreamSets Communityhttps://streamsets.com/community/
StreamSets Dataflow Performance Manager Product Information https://streamsets.com/products/dpm/
![Page 23: Enabling Next Gen Analytics with Azure Data Lake and StreamSets](https://reader036.vdocuments.us/reader036/viewer/2022062901/58f2d21c1a28ab9b1a8b4581/html5/thumbnails/23.jpg)
Thanks!