Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Download Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Post on 02-Jul-2015

1.616 views

Category:

Technology

0 download

DESCRIPTION

No matter if you are new to Hadoop or have a mature cluster in production, scale will be a critical factor of your success with Hadoop. Are you ready to take the next big step as you scale out your data architecture? Talend and Hortonworks discuss where we will help you learn how to implement an effective big data and Hadoop strategy across your IT infrastructure. You will learn: How to grow a pilot into production How to scale-out architecture & systems affordably How to leverage the flexibility of Hadoop to optimize your data integration processes Recording: http://www.talend.com/resources/webinars/starting-small-and-scaling-big-with-hadoop

TRANSCRIPT

  • 1. Talend 20141Starting Small andScaling Bigwith HadoopNovember 20, 2014

2. Talend 20142Your Speakers TodayJim WalkerDirector, Product MarketingJulien SauvageDirector, Product Marketing 3. Page 3 Hortonworks Inc. 2011 2014. All Rights Reserved2013Digital universe 2.3 Zettabytes1 Zettabyte (ZB) = 1 million Petabytes (PB); Sources: IDC and IDG Enterprise85% of growth from new types of data with machine-generated data increasing 15x2020Digital universe 40 Zettabytes& Hadoop Market $50BAnalysts consensus estimates enterprise data growth of year over year through 202050x 4. Page 4 Hortonworks Inc. 2011 2014. All Rights ReservedA shift from reactive to proactive interactionsHDP and Hadoop allow organizations to shift interactions fromReactive Post TransactionProactive Pre Decisionto Real-time PersonalizationFrom static brandingto repair before breakFrom break then fixto Designer MedicineFrom mass treatmentto Automated AlgorithmsFrom Educated Investingto 1x1 TargetingFrom mass brandingA shift in AdvertisingA shift in Financial ServicesA shift in HealthcareA shift in RetailA shift in Telco 5. Page 5 Hortonworks Inc. 2011 2014. All Rights ReservedHDP Realized Cost Savings with EDW OptimizationArchive Data away from EDWMove cold or rarely used data to Hadoop as active archiveStore more of data longerOffload costly ETL processFree your EDW to perform high-value functions like analytics & operations, not ETL.Use Hadoop for advanced ETLOptimize the value of your EDWUse Hadoop to refine new data sources, such as web and machine data for new analytical contextANALYTICSDATA SYSTEMSData MartsBusiness AnalyticsVisualization& DashboardsSystems of RecordRDBMSERPCRMOtherClickstreamWeb & SocialGeolocationSensor & MachineServer LogsUnstructuredNEW SOURCESHDP 2.2ELTNCold Data,Deeper Archive & New SourcesEnterprise Data WarehouseHotHadoop Helps you optimize and reduce costs associated with your EDW 6. Page 6 Hortonworks Inc. 2011 2014. All Rights ReservedRealize dramatic savings for cost of storageCost Efficiencies Reduce costs associated with expensive archive systemsUtilize existing relationships with hardware vendorsOpen Source SoftwareActive Archive Provide access to archived data not just collect dustMPPSANEngineered SystemNASHADOOPCloud Storage$0$20,000$40,000$60,000$80,000$180,000Fully-loaded Cost Per Raw TB of Data (MinMax Cost)Hadoop Enables Scalable Compute & Storage at a Compelling Cost StructureStorage Costs/Compute Costs from $19/GB to $0.23/GB 7. Page 7 Hortonworks Inc. 2011 2014. All Rights ReservedUnlock New Applications from New Types of DataINDUSTRYUSE CASESentiment& WebClickstream& BehaviorMachine & SensorGeographicServer LogsStructured &UnstructuredFinancial ServicesNew Account Risk ScreensTrading RiskInsurance UnderwritingTelecomCall Detail Records (CDR)Infrastructure InvestmentReal-time Bandwidth AllocationRetail360 View of the CustomerLocalized, Personalized PromotionsWebsite OptimizationManufacturingSupply Chain and LogisticsAssembly Line Quality AssuranceCrowd-sourced Quality AssuranceHealthcareUse Genomic Data in Medial TrialsMonitor Patient Vitals in Real-TimePharmaceuticalsRecruit and Retain Patients for Drug TrialsImprove Prescription AdherenceOil & GasUnify Exploration & Production DataMonitor Rig Safety in Real-TimeGovernmentETL Offload/Federal Budgetary PressuresSentiment Analysis for Government Programs 8. Page 8 Hortonworks Inc. 2011 2014. All Rights ReservedEnd Game: Data Lake - An architectural shiftSCALESCOPEUnlocking the Data LakeRDBMSMPPEDWData Lake Enabled by YARNSingle data repository, shared infrastructureMultiple biz apps accessing all the dataEnable a shift from reactive to proactive interactionsGain new insight across the entire enterpriseNew Analytic Apps or IT OptimizationHDP 2.1Governance & IntegrationSecurityOperationsData AccessData ManagementYARN 9. Page 9 Hortonworks Inc. 2011 2014. All Rights ReservedEnterprise Goals for the Modern Data ArchitectureConsolidate siloed data sets structured and unstructuredCentral data set on a single clusterMultiple workloads across batch interactive and real timeCentral services for security, governance and operationPreserve existing investment in current tools and platformsSingle view of the customer, product, supply chainAPPLICATIONSDATA SYSTEMBusiness AnalyticsCustom ApplicationsPackagedApplicationsRDBMSEDWMPPYARN: Data Operating System1NInteractiveReal-TimeBatchCRMERPOther1HDFS (Hadoop Distributed File System)SOURCESEXISTING SystemsClickstreamWeb &SocialGeolocationSensor & MachineServer LogsUnstructured 10. Page 10 Hortonworks Inc. 2011 2014. All Rights ReservedHDP delivers a comprehensive data management platformHortonworks Data Platform 2.2YARN: Data Operating System(Cluster Resource Management)1Script PigSQLHiveTezTezJava Scala CascadingTezOthers ISV EnginesHDFS (Hadoop Distributed File System)StreamStormSearchSolrNoSQL HBase AccumuloSliderSliderSECURITYGOVERNANCEOPERATIONSBATCH, INTERACTIVE & REAL-TIME DATA ACCESSIn-Memory SparkProvision, Manage & MonitorAmbariZookeeperSchedulingOozieData Workflow, Lifecycle & GovernanceFalconSqoopFlumeKafkaNFSWebHDFSAuthentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, Pipeline: Falcon Cluster: Knox Cluster: RangerDeployment ChoiceLinuxWindowsOn-PremisesCloudYARN is the architectural center of HDPEnables batch, interactive and real-time workloadsProvides comprehensive enterprise capabilitiesThe widest range of deployment optionsDelivered Completely in the OPEN 11. Page 11 Hortonworks Inc. 2011 2014. All Rights ReservedOPERATIONAL TOOLSDEV & DATA TOOLSINFRASTRUCTUREHDP and Talend in the Modern Data ArchitectureSOURCESEXISTING SystemsClickstreamWeb &SocialGeolocationSensor & MachineServer LogsUnstructuredDATA SYSTEMRDBMSEDWAPPLICATIONSBusinessObjects BIHDP 2.1Governance & IntegrationSecurityOperationsData AccessData ManagementYARNHadoop 2.0, YARNData QualityPig, Hive,ETL ELTHBase, NoSQLDeep PartnershipsHortonworks engages in deep engineered relationships with the leaders in the data center, such as Microsoft, Teradata, Redhat, HP, SAS & SAPBroad PartnershipsOver 600 partners work with us to certify their applications to work with Hadoop so they can extend big data to their users 12. Talend 201412Connecting the Data-Driven Enterprise 13. Talend 201413The Talend Platform 14. Talend 201414Still Hand-Coding Data Integration?Hand-codingTalend EnterpriseUnproductiveNeed specialized skillsHard to maintainLimited support800+ drag-n-drop componentsGenerates optimized codeCollaboration & managementGold support (SLAs) 15. Talend 201415Encumbered with Legacy ETL?Legacy ETLTalend EnterpriseProprietary engineHard to scale Big DataExpensiveOpenGenerates native codeLow TCO 16. Talend 2014 16Next bigthingSQLELTDW applianceFuture-Proof ArchitectureETLDay-to-dayintegrationJAVAHadoopHighly ScalableMapReduceCAMELMessagetransform-ationCAMEL 17. Talend 201417ONE cluster to deployONE cluster to manageONE cluster to monitorONE cluster to scaleONE cluster to updateONE cluster to pay for!And it will be 100x faster in 2 yearsInfinite Scale 18. Talend 201418Unlock New Applications from New Types of DataINDUSTRYUSE CASESentiment& WebClickstream& BehaviorMachine & SensorGeographicServer LogsStructured &UnstructuredFinancial ServicesNew Account Risk ScreensTrading RiskInsurance UnderwritingTelecomCall Detail Records (CDR)Infrastructure InvestmentReal-time Bandwidth AllocationRetail360 View of the CustomerLocalized, Personalized PromotionsWebsite OptimizationManufacturingSupply Chain and LogisticsAssembly Line Quality AssuranceCrowd-sourced Quality AssuranceHealthcareUse Genomic Data in Medial TrialsMonitor Patient Vitals in Real-TimePharmaceuticalsRecruit and Retain Patients for Drug TrialsImprove Prescription AdherenceOil & GasUnify Exploration & Production DataMonitor Rig Safety in Real-TimeGovernmentETL Offload/Federal Budgetary PressuresSentiment Analysis for Government Programsintegrationjobs+++ 19. Talend 201419100xperformance increase< 1 secresponseAddress new use cases(last minute defense, dynamic pricing, real-time fraud detection, etc.)Simplify Real-Time Big DataNew components for streaming data 20. Talend 201420The Talend SolutionScalableGenerates native codeFuture-proofBuilt-in data qualityMore productiveOpen sourceInnovativeAgileOpen source platformLearn onceExpand many timesEasySubscription pricingPer developerPredictable costLowest TCOThe ease of use of the Talend platform allows us to deliver 21. Talend 2014 21The Three Drivers of SuccessProduct Innovation Market Adoption Industry RecognitionCustomersCommunityPartnersVisionaryLeaderMulti-award winnerBig DataCloud 22. Page 22 Hortonworks Inc. 2011 2014. All Rights ReservedCustomer Case StudyProduct Inventory and Pricing 23. Talend 201423The Old Way to Do ForecastingProduct categoryHALLOWEEN 24. Talend 201424Data Explosion in SizeMultiple SKUsMultiple storesProduct 2Product 1Product 3Halloween maskHalloween candiesPumpkin10,000s1, 000sX 25. Talend 201425Need for a Modern Architecturedata at restDAOCassandraOLTPHadoopEDWdata in motionBIViz &AnalyticsGraphical Generates code Runs on Hadoop 26. Talend 201426A New EDW Eco SystemEnterprise Intelligence&Advanced AnalyticsSSASEnterpriseDataWarehouseAdvanced AnalyticsPlatformData Refinery & Ingest EngineFast Data Cache 27. Page 27 Hortonworks Inc. 2011 2014. All Rights ReservedTalend + Hortonworks = Open = Awesome!Pure open source governed clusterDont need to recode or reformat dataNo vendor lock-inSubscription modelsMost recent releases of Apache projectsWe are always aligned and up to date 28. Page 28 Hortonworks Inc. 2011 2014. All Rights ReservedThe Forrester WaveBig Data Hadoop Solutions Q1 2014Hortonworks loves and lives open source innovationWorld Class Support and Services. Hortonworks' Customer Support received a maximum score and was significantly higher than both Cloudera and MapRA Leader in Hadoop 29. Talend 201429Questions?Jim Walker@jaymceJulien Sauvage@sauvageju 30. Talend 201430Check Out Our Talend + Hortonworks Sandbox!http://www.talend.com

Recommended

View more >