talend big data capabilities overview
DESCRIPTION
Talend Big Data Capabilities OverviewTRANSCRIPT
Talend: Solutions Overview
Presenter: Rajan Kanitkar
Talend Big Data Overview
© Talend 2012
The Drivers for Big Data
Volume
Velocity
Variety
© Talend 2012
The defacto standard for big data processing
How to process big data?
© Talend 2012
Apache Hadoop, an open-source software library, is a framework that allows for the distributed processing of large data sets across clusters of commodity hardware using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
What is Hadoop?
© Talend 2012
The Big Data Ecosystem
Hadoop: the core project
HDFS: the Hadoop Distributed File System
MapReduce: the software framework for distributed processing of large data sets
Hive: a data warehouse infrastructure that provides data summarization and a querying language
Pig: a high-level data-flow language and execution framework for parallel computation
HBase: this is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data
And many many more: Sqoop, HCatalog, Zookeeper, Oozie, Cassandra, MongoDB, Flume, Impala, Stinger, Neo4J, etc.
Thanks to you all!Google, Amazon, Facebook, Twitter, Yahoo, 10gen, Cloudera, Hortonworks, MapR, etc.
Talend Big Data Overview
© Talend 2012
Key differentiator Of Our Next Gen Architecture…
JAVA
ETLDay-to-
day integratio
n
Run everywhere
SQL
ELTDW
appliance
Teradata, Netezza…
MapReduce
HadoopHighly
Scalable
Hadoop Grid
CAMEL
CAMELMessage transform
-ation
High Frequency
No black-box engine Enables light-weight
distributed, customizable and parallelizable run time
Standards-Based
Code Generator
?
future-proof
© Talend 2012 12
Talend Unique Integration Solution
Consolidated metadata & project
information
Repository
2
Web-based deployment &
scheduling
Deployment
3 Same container for batch processing,
message routing & services
Execution
4
Single web-based monitoring console
Monitoring
5
ComprehensiveEclipse-based user interface
1
Studio
DataQuality
DataIntegration MDM ESB BPM
Best-of-Breed Solutions +
Talend Unified Platform
=
Unique Integration Solution
© Talend 2012
Talend Big Data Product Strategy
Big Data Integration▶ Land data in a Big Data cluster without coding
▶ Code generation for MapReduce, HDFS, Hbase, Pig, Hive, Hcatalog, etc.
Big Data Manipulation▶ Simplify manipulation, such as sort and filter
▶ Computational expensive functions using Hadoop
Big Data Quality & Governance▶ Identify linkages & duplicates, validate big data
▶ Match component, execute basic quality features
Big Data Project Management▶ Place frameworks around big data projects
▶ Common Repository, scheduling, monitoring
4strategic pillars
© Talend 2012
…an open source ecosystem
Talend Open Studio for Big Data
• Improves efficiency of big data job design with graphic interface
• Generates Hadoop code and run transforms inside Hadoop
• Native support for HDFS, Pig, Hbase, Hcatalog, Sqoop and Hive
• 100% open source under an Apache License
• Standards based
Pig
Vision: Democratize big data
© Talend 2012
…an open source ecosystem
Talend Platform for Big Data
• Builds on Talend Open Studio for Big Data
• Adds data quality, advanced scalability and management functions
• MapReduce massively parallel data processing
• Shared Repository and remote deployment
• Data quality and profiling
• Data cleansing
• Reporting and dashboards
• Commercial support, warranty/IP indemnity under a subscription license
Pig
Vision: Democratize big data
© Talend 2012
Talend Big Data Partnerships
Hadoop Distributions
Talend Big Data Partners
© Talend 2012
Demonstration: ETL for Big Data with Talend
Extract
Transform
Load
Talend Demo2013