talend big data capabilities overview

15
Talend: Solutions Overview Presenter: Rajan Kanitkar

Upload: rajan-kanitkar

Post on 27-Jan-2015

120 views

Category:

Technology


5 download

DESCRIPTION

Talend Big Data Capabilities Overview

TRANSCRIPT

Page 1: Talend Big Data Capabilities Overview

Talend: Solutions Overview

Presenter: Rajan Kanitkar

Page 2: Talend Big Data Capabilities Overview

Talend Big Data Overview

Page 3: Talend Big Data Capabilities Overview

© Talend 2012

The Drivers for Big Data

Volume

Velocity

Variety

Page 4: Talend Big Data Capabilities Overview

© Talend 2012

The defacto standard for big data processing

How to process big data?

Page 5: Talend Big Data Capabilities Overview

© Talend 2012

Apache Hadoop, an open-source software library, is a framework that allows for the distributed processing of large data sets across clusters of commodity hardware using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

What is Hadoop?

Page 6: Talend Big Data Capabilities Overview

© Talend 2012

The Big Data Ecosystem

Hadoop: the core project

HDFS: the Hadoop Distributed File System

MapReduce: the software framework for distributed processing of large data sets

Hive: a data warehouse infrastructure that provides data summarization and a querying language

Pig: a high-level data-flow language and execution framework for parallel computation

HBase: this is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data

And many many more: Sqoop, HCatalog, Zookeeper, Oozie, Cassandra, MongoDB, Flume, Impala, Stinger, Neo4J, etc.

Thanks to you all!Google, Amazon, Facebook, Twitter, Yahoo, 10gen, Cloudera, Hortonworks, MapR, etc.

Page 7: Talend Big Data Capabilities Overview

Talend Big Data Overview

Page 8: Talend Big Data Capabilities Overview

© Talend 2012

Key differentiator Of Our Next Gen Architecture…

JAVA

ETLDay-to-

day integratio

n

Run everywhere

SQL

ELTDW

appliance

Teradata, Netezza…

MapReduce

HadoopHighly

Scalable

Hadoop Grid

CAMEL

CAMELMessage transform

-ation

High Frequency

No black-box engine Enables light-weight

distributed, customizable and parallelizable run time

Standards-Based

Code Generator

?

future-proof

Page 9: Talend Big Data Capabilities Overview

© Talend 2012 12

Talend Unique Integration Solution

Consolidated metadata & project

information

Repository

2

Web-based deployment &

scheduling

Deployment

3 Same container for batch processing,

message routing & services

Execution

4

Single web-based monitoring console

Monitoring

5

ComprehensiveEclipse-based user interface

1

Studio

DataQuality

DataIntegration MDM ESB BPM

Best-of-Breed Solutions +

Talend Unified Platform

=

Unique Integration Solution

Page 10: Talend Big Data Capabilities Overview

© Talend 2012

Talend Big Data Product Strategy

Big Data Integration▶ Land data in a Big Data cluster without coding

▶ Code generation for MapReduce, HDFS, Hbase, Pig, Hive, Hcatalog, etc.

Big Data Manipulation▶ Simplify manipulation, such as sort and filter

▶ Computational expensive functions using Hadoop

Big Data Quality & Governance▶ Identify linkages & duplicates, validate big data

▶ Match component, execute basic quality features

Big Data Project Management▶ Place frameworks around big data projects

▶ Common Repository, scheduling, monitoring

4strategic pillars

Page 11: Talend Big Data Capabilities Overview

© Talend 2012

…an open source ecosystem

Talend Open Studio for Big Data

• Improves efficiency of big data job design with graphic interface

• Generates Hadoop code and run transforms inside Hadoop

• Native support for HDFS, Pig, Hbase, Hcatalog, Sqoop and Hive

• 100% open source under an Apache License

• Standards based

Pig

Vision: Democratize big data

Page 12: Talend Big Data Capabilities Overview

© Talend 2012

…an open source ecosystem

Talend Platform for Big Data

• Builds on Talend Open Studio for Big Data

• Adds data quality, advanced scalability and management functions

• MapReduce massively parallel data processing

• Shared Repository and remote deployment

• Data quality and profiling

• Data cleansing

• Reporting and dashboards

• Commercial support, warranty/IP indemnity under a subscription license

Pig

Vision: Democratize big data

Page 13: Talend Big Data Capabilities Overview

© Talend 2012

Talend Big Data Partnerships

Hadoop Distributions

Talend Big Data Partners

Page 14: Talend Big Data Capabilities Overview

© Talend 2012

Demonstration: ETL for Big Data with Talend

Extract

Transform

Load

Page 15: Talend Big Data Capabilities Overview

Talend Demo2013