![Page 1: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/1.jpg)
#TalendConnect#TalendConnect
Best practices for unleashing the power of data lakesIsabelle Nuage & Christophe Toum, Big Data Products, Talend
![Page 2: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/2.jpg)
#TalendConnect
Self-service data lake, cafeteria style
Using sensor data collected in real-time to improve gas turbines reliability, operational performance and extend lifetime value.
![Page 3: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/3.jpg)
#TalendConnect
Why Do We Need a Data Lake?“Data lakes are enterprise-wide data management platforms for analyzing disparate sources of data in its native format.”, Gartner.
Busin
ess V
alue
Reducing cost
Generating new opportunities
• ETL offload• EDW offload/optimization• Data archiving
• Customer acquisition, retention..• Real-time engagement• Pricing optimization• Demand forecasting• Risk and fraud• Predictive maintenance• Smart products…
![Page 4: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/4.jpg)
#TalendConnect
But Data Lakes Bring New Challenges
The rest of us
Data Lakes Bring New Challenges
High-end users
Complexity, poor governance and control, no reuse
![Page 5: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/5.jpg)
#TalendConnect
Data Lake – Conceptual Architecture
AcquireIngest
Understand & Improve
Curate & Govern
DeliverSelf-service
SCALE
![Page 6: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/6.jpg)
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate Data
Ingestion
Understand & Govern Your Data
Remove Silos
Unify Data Managemen
t
Deliver Data to a Wide Audience
Continuously refreshed data Continuous data delivery and data processes
![Page 7: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/7.jpg)
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate Data
Ingestion
Understand & Govern Your Data
Remove Silos
Unify Data Managemen
t
Deliver Data to a Wide Audience
Wide connectivity Batch & streaming ubiquity Scale with volume and variety
Pitfalls:o Hand codingo Fragmented tools
![Page 8: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/8.jpg)
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate Data
Ingestion
Understand & Govern Your Data
Remove Silos
Unify Data Managemen
t
Deliver Data to a Wide Audience
Add context on data (provenance, semantics…)
Optimize data with curation, stewardship, preparation…
Use a collaborative process
Pitfalls:o Authoritative governanceo Inconsistent framework
![Page 9: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/9.jpg)
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate Data
Ingestion
Understand & Govern Your Data
Remove Silos
Unify Data Managemen
t
Deliver Data to a Wide Audience
Pervasive DQ, masking… Consistent operationalization Single platform for all use cases
& personas
Pitfalls:o Fragmented toolso Hand codingo Shadow IT
![Page 10: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/10.jpg)
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate Data
Ingestion
Understand & Govern Your Data
Remove Silos
Unify Data Managemen
t
Deliver Data to a Wide Audience
Make data accessible Governed self-service Scalable operationalization
Pitfalls:o Unmanaged autonomyo Self-service tools for the tech
savvy
![Page 11: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/11.jpg)
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate Data
Ingestion
Understand & Govern Your Data
Remove Silos
Unify Data Managemen
t
Deliver Data to a Wide Audience
GET READY FOR CHANGE
![Page 12: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/12.jpg)
#TalendConnect
Ingestion Best Practices
Transactions
Messages & Events
1011011100
10
1011011100
10
Logs
Sensors
Data Analytics & Data Science
Real-time Data Visualization
Real-time Indicators / Scorecard
Collect - Distribute
Track
Streaming
WindowingAlert
NYC Taxi Data Streaming
![Page 13: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/13.jpg)
#TalendConnect#TalendConnect
NYC Taxi Data Streaming
![Page 14: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/14.jpg)
#TalendConnect
• The future features described in this presentation are under consideration by Talend and are not commitments for future products, technologies, or services.• The roadmap is subject to change and Talend does not guarantee the features
or release dates.
Disclaimer
![Page 15: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/15.jpg)
#TalendConnect
Roadmap 2017
Addressing the needs of large enterprises
Big Data
1st on Spark 2.0&
Data Prep on Big Data
Data Prep&
Data Ingestion
Cloud Self-service
Data Stewardship &
Self-service connectors
Governance
Apache Atlas
![Page 16: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/16.jpg)
#TalendConnect
Analyze way more data to find more opportunities for innovations and transformations
Real-time data streaming brings increased agility
To unleash data lakes, data governance is essential
Key Take Aways
![Page 17: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/17.jpg)
#TalendConnect
Free Trial: Talend Big Data Sandbox
• A ready-to-run Docker environment
• A step-by-step expert guide
• Real-world scenarios using Spark, Kafka, MapReduce & NoSQL
www.talend.com/BigDataSandbox
Hit the Easy Button for Hadoop, Spark and Machine Learning
#TalendConnect
![Page 18: Best Practices for Unleashing the Power of Data Lakes](https://reader035.vdocuments.us/reader035/viewer/2022070603/587288131a28ab36118b45f7/html5/thumbnails/18.jpg)
#TalendConnect#TalendConnect
Thank You