talend big data capabilities - 2014
TRANSCRIPT
1© Talend 2014
Talend: Solutions
Overview
2© Talend 2014
Rajan Kanitkar
• Senior Solutions Engineer
• Rajan Kanitkar is a Pre-Sales Consultant with Talend. He has been active in the broader Data Integration space for the past 15 years and has experience with several leading edge software companies in these areas. His areas of specialties at Talend include Data Integration (DI), Big Data (BD), Data Quality (DQ) , and Master Data Management (MDM).
• Contact: [email protected]
About the Presenter
3© Talend 2014
Talend Big Data PlatformHadoop, MapReduce, NoSQL capabilities …
4© Talend 2014
The Big Data Ecosystem
• Hadoop: the core project
• HDFS: the Hadoop Distributed File System
• MapReduce: the software framework for distributedprocessing of large data sets
• Hive: a data warehouse infrastructure that provides data summarization and a querying language
• Pig: a high-level data-flow language and executionframework for parallel computation
• HBase: this is the Hadoop database. Use it whenyou need random, realtime read/write access toyour Big Data
• And many many more: Sqoop, HCatalog, Zookeeper, Oozie, Cassandra, MongoDB, Flume, Impala, Stinger, Neo4J, etc.
5© Talend 2014
Talend’s Solution
6© Talend 2014
JAVA
ETLDay-to-dayintegration
Run everywhere
SQL
ELTDW
appliance
Teradata, Netezza…
MapReduce+ PIG + HiveQL+ Sqoop + …
HadoopHighly
Scalable
Hadoop Grid
CAMEL
CAMELMessage
transform-ation
High Frequency
No black-box engine
Enables light-weight distributed, customizable and parallelizable run time
Standards-Based
Code Generator
Key differentiator of Our Next Gen Architecture…
7© Talend 2014
Trying to get from this…
8© Talend 2014
Talend Big Data – “pure Hadoop”
Visual design in Map Reduce and optimize before
deploying on Hadoop
to this…
9© Talend 2014
Native Map/Reduce Jobs
• Create classic ETL patterns using native Map/Reduce
- Only data management solution on the market to generate native
Map/Reduce code
• Reduce the need for big data coding skills
• Zero pre-installation on the Hadoop cluster
• Hadoop is the “engine” for data processing
10© Talend 2014
MapReduce 2.0, YARN, Storm, Spark
• Yarn: Ensures predictable performance & QoS for all apps
• Enables apps to run “IN” Hadoop rather than “ON”
• In Labs: Streaming with Apache Storm
• In Labs: mini-Batch and In-Memory with Apache Spark
Applications Run Natively IN Hadoop
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH(MapReduce)
INTERACTIVE(Tez)
STREAMING(Storm, Spark)
GRAPH(Giraph)
NoSQL(MongoDB)
EVENTS(Falcon)
ONLINE(HBase)
OTHER(Search)
Source: Hortonworks
11© Talend 2014
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH(MapReduce)
INTERACTIVE(Tez)
STREAMING(Storm, Spark)
GRAPH(Giraph)
NoSQL(MongoDB)
Events(Falcon)
ONLINE(HBase)
OTHER(Search)
Talend: Ingest – Transform – Deliver
TRANSFORM (Data Refinement)
PROFILE PARSEMAP CDCCLEANSESTANDARD-
IZEMACHINELEARNING
MATCH
INGEST(Ingestion)
SQOOP
FLUME
HDFS API
HBase API
HIVE
800+
DELIVER(as an API)
ActiveMQKaraf
CamelCXF
KafkaStorm
MetaSecurity
MDMiPaaS
GovernHA
12© Talend 2014
Talend Big Data Sandbox &
Talend Big Data JumpstartDelivering instant value from all your data
13© Talend 2014
BIG DATA CHALLENGES
The Big Data Customer Discussion
14© Talend 2014
Top Big Data Challenges
Source: Gartner - Survey Analysis: Big Data Adoption in 2013 Shows Substance
Behind the Hype - 12 September 2013 - G00255160
Talend Directly
Addresses these
Challenges
15© Talend 2014
Talend’s Solution
16© Talend 2014
TALEND BIG DATA SANDBOX
30 day customer trial
17© Talend 2014
Cookbook Step-by-Step Directions
• Completely Self-contained Demo Sandbox
• Key Scenarios:
- Twitter Analysis
- Clickstream Analysis
- Web Log analysis
- ETL Offload
• Scenario Summaries
- Social Media insights
- Channel optimization
- Customer insights
- Data Warehouse Cost Reduction
18© Talend 2014
Ready for Launch
• Announcements
- Public announcement Tuesday 15th
- Newsletter was sent 9th July
• Customer Nurture campaign
- Scenario reminders, videos & Links
- Reminder to Talend AE
• Two Routes for 5.5
- Sandbox Download publicly available – 15th July
- Jumpstart and AE ‘access’ – 15th July
• Links for the 15th (Sandbox download)
- Public: http://www.talend.com/talend-big-data-sandbox
- Account Exec: send download link for customer to fill in:
• https://info.talend.com/prodevaltpbdsandbox
19© Talend 2014
TALEND BIG DATA JUMPSTART
A ‘guided tour’ of the Sandbox
20© Talend 2014
Why the ‘Jumpstart’?
Practical
Guided Tour
• Lead by Talend Solutions Engineer
• Learn about the Talend Studio
• See how to execute Hadoop processes
- Map/Reduce with YARN
- Pig
- HDFS
• See NoSQL Examples
- Hive
- HBase
- MongoDB
- Cassandra
21© Talend 2014
Key benefits
• NO Configuration/Development
• INSTANT results now, for the Future
• Valuable prototypes for FREE
• Working on the top THREE Hadoop Distributions
22© Talend 2014
3 Simple Messages
• Sandbox is Customer led, Jumpstart is Sales led
• Jumpstart is the best way to ‘get Talend’
- Google: Talend Jumpstart
• Work to get the best conversation & involve pre-sales
23© Talend 2014
Sandbox
- Talend Jumpstart Sandbox - virtual image installed with:
• Apache Hadoop distribution provided Hortonworks, Cloudera & MapR
• Pre-configured Talend Platform for Big Data 5.5*
• Four scenarios for you to try:
– Clickstream data
– Twitter sentiment
– Apache weblogs
– ETL Offload
• Demonstrations of several NoSQL databases
*Includes Talend Studio (graphical IDE), team working, management, data quality and advanced big data features.
www.talend.com/products/platform-for-big-data
24© Talend 2014
SHOW ME
Talend Demo