talend
TRANSCRIPT
Solution Spotlight Presents
© Talend 2011 2
Integration with CDH in Talend
Talend, Global Leader in Open Source Integration Solutions
Connect external data to Hadoop/HDFSLeverage MapReduce in Talend job design
Market Positioning – Products
© Talend 2011 3
Data QualityData profiling & data cleansing
Analytics (ETL) Operational Integration
Data replication & synchronization,data migration & capture,application upgrade, etc.
Extract, Transform & Load for decision support systems
Data Integration
Reference data management
MDM
Application IntegrationConnect applications & services
Solution Positioning
© Talend 2011 4
Talend Open ProfilerIdentify data quality problems- Free, GPL, no limitations- Custom indicators
Talend Open StudioCreate data flows- Free, GPL, no limitations- Unlimited data flows- 450+ components included
Talend MDM Community EditionManage master data- Free, GPL, no limitations- Active data model- Lightweight business user UI
Talend Data QualityCleanse & track- Specific components- Reports- Data Quality Portal
Talend Integration SuiteDeploy data integration- Teamwork- Automated deployment & load balancing- Scheduling & Monitoring
Talend MDM Enterprise EditionDeploy large scale MDM - Full permissions management - Validation rules - Complex workflows
Talend LCpManage best practices- Testing Platform- Repository Manager- Project Audit
Talend Unified PlatformCommon, unified environment - Front end: UI (Eclipse, Web) - Back end: repository
Talend ESBIndustrialize deployment of Apache-based ESB- Free, Apache-based ESB- Fully functional Enterprise Service Bus
Talend ASFDeploy large-scale enterprise SOA- Governance & security- Advanced monitoring
Hadoop Integration
Hadoop Integration Overview
Talend Integration Suite (TIS) key features: Graphical flow design Connecting 450+ set of connectors to Hadoop Providing HDFS input/output Read/Write form any source to
HDFS, Hive, HBase, and Sequence Files Processing Data inside Hadoop
Using ELT with HiveQLUnleashing the Pig
Aggregation and cleansing inside Hadoop Mass import/export btw Hadoop and RDBMS Automated deployment Time & Event based scheduler Fail Over / Load Balancing Centralized monitoring of integration processes Shared repository / Metadata management
© Talend 2011 5
Graphical Interface
© Talend 2011 6
Graphical flow design Java code generation from flow design Native Hadoop code gen (Java API) Metadata integration with Hive Connect external sources into HDFS (450+
connectors)
Aggregate, Cleanse, Transform data in HDFS Leverage MR, Pig, Hive for data processing Real time debugger & direct job deployment on
Hadoop cluster Sqoop connectors for mass export/import to RDBMS
Typical Use Case Scenario
Landing RAW data into Hadoop with tHDFS components Processing & transformation with Hive ELT or tPig Series Load to traditional RDBMS or Hive for Analysis and Reporting
© Talend 2011 7
Hadoop Connectors
© Talend 2011 8
Resources
© Talend 2011 9
Visit our website at http://www.talend.com Watch pre-recorded webinar about our Hadoop
Integration:http://www.talend.com/webinar/archive/
Send questions to [email protected] or [email protected]
Download Talend Software:http://www.talend.com/download.php
Join Talend community to ask technical questions and connect: http://www.talendforge.org
http://www.cloudera.com/partners