talend big data capabilities - 2014

24
1 © Talend 2014 Talend: Solutions Overview

Upload: rajan-kanitkar

Post on 12-Jul-2015

575 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Talend Big Data Capabilities - 2014

1© Talend 2014

Talend: Solutions

Overview

Page 2: Talend Big Data Capabilities - 2014

2© Talend 2014

Rajan Kanitkar

• Senior Solutions Engineer

• Rajan Kanitkar is a Pre-Sales Consultant with Talend. He has been active in the broader Data Integration space for the past 15 years and has experience with several leading edge software companies in these areas. His areas of specialties at Talend include Data Integration (DI), Big Data (BD), Data Quality (DQ) , and Master Data Management (MDM).

• Contact: [email protected]

About the Presenter

Page 3: Talend Big Data Capabilities - 2014

3© Talend 2014

Talend Big Data PlatformHadoop, MapReduce, NoSQL capabilities …

Page 4: Talend Big Data Capabilities - 2014

4© Talend 2014

The Big Data Ecosystem

• Hadoop: the core project

• HDFS: the Hadoop Distributed File System

• MapReduce: the software framework for distributedprocessing of large data sets

• Hive: a data warehouse infrastructure that provides data summarization and a querying language

• Pig: a high-level data-flow language and executionframework for parallel computation

• HBase: this is the Hadoop database. Use it whenyou need random, realtime read/write access toyour Big Data

• And many many more: Sqoop, HCatalog, Zookeeper, Oozie, Cassandra, MongoDB, Flume, Impala, Stinger, Neo4J, etc.

Page 5: Talend Big Data Capabilities - 2014

5© Talend 2014

Talend’s Solution

Page 6: Talend Big Data Capabilities - 2014

6© Talend 2014

JAVA

ETLDay-to-dayintegration

Run everywhere

SQL

ELTDW

appliance

Teradata, Netezza…

MapReduce+ PIG + HiveQL+ Sqoop + …

HadoopHighly

Scalable

Hadoop Grid

CAMEL

CAMELMessage

transform-ation

High Frequency

No black-box engine

Enables light-weight distributed, customizable and parallelizable run time

Standards-Based

Code Generator

Key differentiator of Our Next Gen Architecture…

Page 7: Talend Big Data Capabilities - 2014

7© Talend 2014

Trying to get from this…

Page 8: Talend Big Data Capabilities - 2014

8© Talend 2014

Talend Big Data – “pure Hadoop”

Visual design in Map Reduce and optimize before

deploying on Hadoop

to this…

Page 9: Talend Big Data Capabilities - 2014

9© Talend 2014

Native Map/Reduce Jobs

• Create classic ETL patterns using native Map/Reduce

- Only data management solution on the market to generate native

Map/Reduce code

• Reduce the need for big data coding skills

• Zero pre-installation on the Hadoop cluster

• Hadoop is the “engine” for data processing

Page 10: Talend Big Data Capabilities - 2014

10© Talend 2014

MapReduce 2.0, YARN, Storm, Spark

• Yarn: Ensures predictable performance & QoS for all apps

• Enables apps to run “IN” Hadoop rather than “ON”

• In Labs: Streaming with Apache Storm

• In Labs: mini-Batch and In-Memory with Apache Spark

Applications Run Natively IN Hadoop

HDFS2 (Redundant, Reliable Storage)

YARN (Cluster Resource Management)

BATCH(MapReduce)

INTERACTIVE(Tez)

STREAMING(Storm, Spark)

GRAPH(Giraph)

NoSQL(MongoDB)

EVENTS(Falcon)

ONLINE(HBase)

OTHER(Search)

Source: Hortonworks

Page 11: Talend Big Data Capabilities - 2014

11© Talend 2014

HDFS2 (Redundant, Reliable Storage)

YARN (Cluster Resource Management)

BATCH(MapReduce)

INTERACTIVE(Tez)

STREAMING(Storm, Spark)

GRAPH(Giraph)

NoSQL(MongoDB)

Events(Falcon)

ONLINE(HBase)

OTHER(Search)

Talend: Ingest – Transform – Deliver

TRANSFORM (Data Refinement)

PROFILE PARSEMAP CDCCLEANSESTANDARD-

IZEMACHINELEARNING

MATCH

INGEST(Ingestion)

SQOOP

FLUME

HDFS API

HBase API

HIVE

800+

DELIVER(as an API)

ActiveMQKaraf

CamelCXF

KafkaStorm

MetaSecurity

MDMiPaaS

GovernHA

Page 12: Talend Big Data Capabilities - 2014

12© Talend 2014

Talend Big Data Sandbox &

Talend Big Data JumpstartDelivering instant value from all your data

Page 13: Talend Big Data Capabilities - 2014

13© Talend 2014

BIG DATA CHALLENGES

The Big Data Customer Discussion

Page 14: Talend Big Data Capabilities - 2014

14© Talend 2014

Top Big Data Challenges

Source: Gartner - Survey Analysis: Big Data Adoption in 2013 Shows Substance

Behind the Hype - 12 September 2013 - G00255160

Talend Directly

Addresses these

Challenges

Page 15: Talend Big Data Capabilities - 2014

15© Talend 2014

Talend’s Solution

Page 16: Talend Big Data Capabilities - 2014

16© Talend 2014

TALEND BIG DATA SANDBOX

30 day customer trial

Page 17: Talend Big Data Capabilities - 2014

17© Talend 2014

Cookbook Step-by-Step Directions

• Completely Self-contained Demo Sandbox

• Key Scenarios:

- Twitter Analysis

- Clickstream Analysis

- Web Log analysis

- ETL Offload

• Scenario Summaries

- Social Media insights

- Channel optimization

- Customer insights

- Data Warehouse Cost Reduction

Page 18: Talend Big Data Capabilities - 2014

18© Talend 2014

Ready for Launch

• Announcements

- Public announcement Tuesday 15th

- Newsletter was sent 9th July

• Customer Nurture campaign

- Scenario reminders, videos & Links

- Reminder to Talend AE

• Two Routes for 5.5

- Sandbox Download publicly available – 15th July

- Jumpstart and AE ‘access’ – 15th July

• Links for the 15th (Sandbox download)

- Public: http://www.talend.com/talend-big-data-sandbox

- Account Exec: send download link for customer to fill in:

• https://info.talend.com/prodevaltpbdsandbox

Page 19: Talend Big Data Capabilities - 2014

19© Talend 2014

TALEND BIG DATA JUMPSTART

A ‘guided tour’ of the Sandbox

Page 20: Talend Big Data Capabilities - 2014

20© Talend 2014

Why the ‘Jumpstart’?

Practical

Guided Tour

• Lead by Talend Solutions Engineer

• Learn about the Talend Studio

• See how to execute Hadoop processes

- Map/Reduce with YARN

- Pig

- HDFS

• See NoSQL Examples

- Hive

- HBase

- MongoDB

- Cassandra

Page 21: Talend Big Data Capabilities - 2014

21© Talend 2014

Key benefits

• NO Configuration/Development

• INSTANT results now, for the Future

• Valuable prototypes for FREE

• Working on the top THREE Hadoop Distributions

Page 22: Talend Big Data Capabilities - 2014

22© Talend 2014

3 Simple Messages

• Sandbox is Customer led, Jumpstart is Sales led

• Jumpstart is the best way to ‘get Talend’

- Google: Talend Jumpstart

• Work to get the best conversation & involve pre-sales

Page 23: Talend Big Data Capabilities - 2014

23© Talend 2014

Sandbox

- Talend Jumpstart Sandbox - virtual image installed with:

• Apache Hadoop distribution provided Hortonworks, Cloudera & MapR

• Pre-configured Talend Platform for Big Data 5.5*

• Four scenarios for you to try:

– Clickstream data

– Twitter sentiment

– Apache weblogs

– ETL Offload

• Demonstrations of several NoSQL databases

*Includes Talend Studio (graphical IDE), team working, management, data quality and advanced big data features.

www.talend.com/products/platform-for-big-data

Page 24: Talend Big Data Capabilities - 2014

24© Talend 2014

SHOW ME

Talend Demo