hadoop and modern data architecture

39
25/05/2022 www.bilot.fi 1 #azurehadoo p

Upload: bilot

Post on 13-Jan-2017

403 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 1

#azurehadoop

Page 2: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 2

Hosts

Tuomas Autio

Bilot

Head of Big Data & Business Lead (BI)

[email protected]

@BigDataTuomas

Mikko Mattila

Bilot

Solution Lead, Analytics

[email protected]

@MattilaJMikko

Antti Alila

Microsoft

Product Manager, Azure

[email protected]

Mats Johansson

Hortonworks

Solution Architect

mjohansson@

hortonworks.com

Pasi Vuorela

Hortonworks

Sales Manager Nordics

pvuorela@

hortonworks.com

Page 3: Hadoop and Modern Data Architecture

Hadoop and Modern Data ArchitectureBreakfast seminar 26.4.2016

Page 4: Hadoop and Modern Data Architecture

Agenda• Introductions

• Microsoft and Azure Marketplace

• Hadoop and modern data architecture + demo

• Hortonworks, HDP and HDF

• Case study by Hortonworks

• Wrap-up & next steps

01/05/2023 www.bilot.fi 4

Page 5: Hadoop and Modern Data Architecture

Key take-aways from today What to expect

• What Hadoop is

• How does Hadoop fit into enterprise architecture

• What does Hadoop mean to my organizational structure

• Big data is relevant to every industry

• Real world use cases

01/05/2023 www.bilot.fi 5

”Hadoop plays significant role filling that gap in the market. Open standard approach is needed to keep up with the pace. Old technologies are not capable for billions of things to be connected.” GE’s CIO Vince Campisi

”Spark [on top of Hadoop] has been ‘instrumental in where we’ve gotten to’” Vinoth Chandar, Uber

”100 % of large (over $1 bil) enterprises adapts Hadoop by 2020” Forrester’s Principal Analyst Mike Gualteri

“Hadoop is the most important technological part of the digitalization” SAP’s CTO Quentin Clark

“Who cares about Hadoop on Linux? Microsoft (yes, really) … We want Azure to be a place where all operating systems can run” T. K. "Ranga" Rengarajan, Microsoft's corporate VP, Data Platform

Page 6: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 6

Bilot stands for BIBilot’s offering for Analytics & Big data, Tuomas Autio Bilot

Page 7: Hadoop and Modern Data Architecture

About us

130+EXPERTS

100+CUSTOMERS

16M€TURNOVER IN

2014

+40%AVERAGE

GROWTH

15NATIONALITIES

100%OWNED BY EMPLOYEES

10YEARS

2COUNTRY HQ’S

Page 8: Hadoop and Modern Data Architecture

Bilot’s portfolio and analytics?

Page 9: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 9

Our customers* are recognized leaders in their markets

*) >120 customers in total. And increasing…

Page 10: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 10

Microsoft and Azure MarketplaceAntti Alila, Microsoft

Page 11: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 11

Hadoop 101Tuomas Autio, Bilot

Page 12: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 12

DATA SYSTEMS

REPORTING & APPLICATIONS

AnalyticsCustom

applications

Packagedapplication

s

EDWRDBMS MPP

New Data Sources

Social media

Click-stream

Marketing data

Server logs / RFID

(TRADITIONAL) DATA SOURCES

POS

ERP CRM

1

Sensor / Machine data

Geo locations

Unsctructured documents

2

(Old) Architectures under pressure

Page 13: Hadoop and Modern Data Architecture

Quick Intro to Hadoop

01/05/2023 www.bilot.fi 13

• Hadoop is an open source framework for distributed file storage

• Managed by Apache Foundation • De facto standard for big data• Enterprise Hadoop distributions

• Hortonworks HDP (”Red Hat” of Hadoop), HDP for Windows, IBM, Microsoft Azure HDInsight (HDP), Cloudera, MapR, AWS (EMR), Rackspace

• >50% of US Fortune 100 companies use Hadoop, ~60% CAGR (2020 $50bn)

• ~25 Finnish instances, ~10 known production instances in Finland (strongly behind US and central European markets)

Hadoop 2.x

Framework

Page 14: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 14

Key Features• Cluster of commodity servers, scales out ”infinitely” affordably• Linear growth of performance• Distributed processing• Schemaless• Hadoop stores files in a distributed file system• Fast (for big data), maps data wherever it is located in cluster• Resilient to failure• Flexible• Cost effective

Page 15: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 15

USE CASEBUT

”Haters to the left! Kill the fear! Just get it started and go!”, Symantec’s Cloud Platform Engineering Leader David Lin

Value compounds with use, as more use cases,sources, time periods join in a data lake

Page 16: Hadoop and Modern Data Architecture

www.bilot.fi 17

”Hadoop – It’s damn hard to use”, anonymous CXO

01/05/2023

Mitigation: Right Team and skills!

IT and the Business MUST Work Together to Create Maximum Value

Typical (new) roles needed in the organization:• The Data Architect• The Data Scientist• The Business Analyst• The Developer• The Administrators

Page 17: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 18

Modern Data ArchitectureMikko Mattila, Bilot

Page 18: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 19

Why Hadoop will success

IKEA’s Business Idea“to offer a wide range of home furnishings with good design and function at prices so low that as many people as possible will be

able to afford them”

Page 19: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 20

Why Hadoop will success“HADOOP IS A SOFTWARE PACKAGE AT SUCH A LOW PRICE

THAT ALMOST EVERY COMPANY IS ABLE TO AFFORD IT ALREADY”

“HADOOP AND OTHER OPEN SOURCE BIG DATA PROJECTS PROVIDE A HUGE RANGE OF IT SOFTWARE FOR AREAS OF

DATA MANAGEMENT AND SYSTEM INTEGRATION”

“HADOOP TOOLS ARE DESIGNED TO SOLVE ISSUES IMPOSSIBLE FOR TRADITIONAL COMMERCIAL TOOLS”

Page 20: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 21

Hadoop Scenario 1: pre-process ETL• Shift the pre-processing of ETL in staging data warehouse to Hadoop• Shifts high cost data warehousing to lower cost Hadoop clusters

Page 21: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 22

Hadoop Scenario 2: hot and cold storage• Offloading large volume of historical data into cold storage with Hadoop• Keep data warehouse for hot data to allow BI and analytics • When data from cold storage is needed, it can be moved back into the warehouse

Staging

Hot datain DW

Cold datain Hadoop

Page 22: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 23

Hadoop Scenario 3: true data discovery• Keep data warehouse for operational BI and analytics• Allow data scientists to gain new discoveries on raw data (no format or structure)• Operationalize discoveries back into the warehouse

Staging

Page 23: Hadoop and Modern Data Architecture

Traditional Enterprise software and files

Online systems (log or Streams)

05/01/2023 www.bilot.fi

RDBMSERP

Hadoop ecosystem: All you need for modern analytics architecture as open source

Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc

(Un)Structured & documents

Clickstream Server logs /RFID

Sentiment, Some Sensor

ETL + DW

Digital organization Traditional organization

Page 24: Hadoop and Modern Data Architecture

Traditional Enterprise software and files

Online systems (log or Streams)

05/01/2023 www.bilot.fi

RDBMSERP

Hadoop ecosystem: All you need for modern analytics architecture as open source

Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)

Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc

(Un)Structured & documents

Clickstream Server logs /RFID

Sentiment, Some Sensor

Message Queue and history(Kafka)

Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)

Real time machine interface for applications

ETL + DW

Digital organization Traditional organization

Page 25: Hadoop and Modern Data Architecture

Traditional Enteprice software and files

Interactive processing &

queries(Spark & Hive)

Online systems (log or Streams)

FileSystem (HDFS) +

Core Services

05/01/2023 www.bilot.fi

RDBMSERP

Hadoop ecosystem: All you need for modern analytics architecture as open source

Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)

Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc

(Un)Structured & documents

Clickstream Server logs /RFID

Sentiment, Some Sensor

Message Queue and history(Kafka)

Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)

Real time machine interface for applications

ETL + DW

BI User

Digital organization Traditional organization

Batch Processing

Page 26: Hadoop and Modern Data Architecture

Traditional Enterprise software and files

Interactive processing &

queries(Spark & Hive)

Online systems (log or Streams)

FileSystem (HDFS) +

Core Services

05/01/2023 www.bilot.fi

RDBMSERP

Batch processing

(MapReduce & Pig Latin)

Hadoop ecosystem: All you need for modern analytics architecture as open source

Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)

Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc

(Un)Structured & documents

Clickstream Server logs /RFID

Sentiment, Some Sensor

Message Queue and history(Kafka)

Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)

Real time machine interface for applications

ETL + DW

RDBMS -> HDFS

batch load(Sqoop)

StatisticalAnalysis(Spark)

BI User Data Scientist

Digital organization Traditional organization

Batch Processing

Page 27: Hadoop and Modern Data Architecture

Traditional Enterprise software and files

Interactive processing &

queries(Spark & Hive)

Online systems (log or Streams)

FileSystem (HDFS) +

Core Services

05/01/2023 www.bilot.fi

RDBMSERP

Batch processing

(MapReduce & Pig Latin)

Hadoop ecosystem: All you need for modern analytics architecture as open source

Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)

Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc

(Un)Structured & documents

Clickstream Server logs /RFID

Sentiment, Some Sensor

Message Queue and history(Kafka)

Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)

Real time machine interface for applications

ETL + DW

RDBMS -> HDFS

batch load(Sqoop)

StatisticalAnalysis(Spark)

BI User Data Scientist

Digital organization Traditional organization

Batch Processing

Page 28: Hadoop and Modern Data Architecture

Traditional Enterprise software and files

Interactive processing &

queries(Spark & Hive)

Online systems (log or Streams)

FileSystem (HDFS) +

Core Services

05/01/2023 www.bilot.fi

RDBMSERP

Batch processing

(MapReduce & Pig Latin)

Hadoop ecosystem: All you need for modern analytics architecture as open source

Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)

Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc

(Un)Structured & documents

Clickstream Server logs /RFID

Sentiment, Some Sensor

Message Queue and history(Kafka)

Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)

Real time machine interface for applications

ETL + DW

RDBMS -> HDFS

batch load(Sqoop)

StatisticalAnalysis(Spark)

NoSQL database

for interactive

use (hbase)

BI User Data Scientist

Batch Processing

Digital organization Traditional organization

Page 29: Hadoop and Modern Data Architecture

Traditional Enterprise software and files

Interactive processing &

queries(Spark & Hive)

Online systems (log or Streams)

FileSystem (HDFS) +

Core Services

05/01/2023 www.bilot.fi

RDBMSERP

Batch processing

(MapReduce & Pig Latin)

Hadoop ecosystem: All you need for modern analytics architecture as open source

Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)

Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc

(Un)Structured & documents

Clickstream Server logs /RFID

Sentiment, Some Sensor

Message Queue and history(Kafka)

Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)

Real time machine interface for applications

ETL + DW

RDBMS -> HDFS

batch load(Sqoop)

StatisticalAnalysis(Spark)

NoSQL database

for interactive

use (hbase)

Data VirtualizationVirtual Datamodels / security

O/JDBC, MDX, REST outbound interfaces

BI User Data Scientist

Batch Processing

O/JDBC, MDX, REST inbound interfacesLogical Data Warehouse

Traditional BI Tools

Digital organization Traditional organization

Page 30: Hadoop and Modern Data Architecture

Traditional Enterprise software and files

Interactive processing &

queries(Spark & Hive)

Online systems (log or Streams)

FileSystem (HDFS) +

Core Services

05/01/2023 www.bilot.fi

RDBMSERP

Batch processing

(MapReduce & Pig Latin)

Hadoop ecosystem: All you need for modern analytics architecture as open source

Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)

Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc

(Un)Structured & documents

Clickstream Server logs /RFID

Sentiment, Some Sensor

Message Queue and history(Kafka)

Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)

Real time machine interface for applications

ETL + DW

RDBMS -> HDFS

batch load(Sqoop)

StatisticalAnalysis(Spark)

NoSQL database

for interactive

use (hbase)

Data VirtualizationVirtual Datamodels / security

O/JDBC, MDX, REST outbound interfaces

BI User Data Scientist

Batch Processing

O/JDBC, MDX, REST inbound interfacesLogical Data Warehouse

Traditional BI Tools

Digital organization Traditional organization

Page 31: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 32

Example use case: Dynamic Pricing

Dynamic pricing will be more and more common in the future

Usage of dynamic pricing should be business decision – not restricted by your technical capabilities

Dynamic Pricing

Same price for every onein every store

More you visit onbooking pages the

higher price

Page 32: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 33

Dynamic OmniChannel Pricing

Store

Consumerbuying

On-line Channel

Consumption(IoT)

Price Cache (SmartPricing

Accelerator SPA)Pricingrules

Price ListCustomerProduct

Basket SizeHistory

Warehouse levelsDelivery time / type

WebSite ActivityIOT consumption

MQ

Analytics and Pricing Simulations(SmartPricing)

Supply ChainManagement(+other sources)

BatchProcessing & History

Second & Minute Level Price optimizationMonthly level Price optimization

Orders /ClickStream

Sensor Data

POS data

CEP

Page 33: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 34

Demo Scope

Consumerbuying

Pricingupdates

MQ

Analytics

DataWarehouse CEP

WebShopClickStream

ClickStream

Orders, Product categories, Suppliers

MS SQL Server

HTML5 + Tomcat serverKafka

HDFS +Hive

MS PowerBI

Log file sniffing to streamFlume-ng

Every visit to ”product page” increasesprice with 5%

Indentifies ”product page”and viewed product + sends request to increase price

Page 34: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 35

DEMOReal-time Dynamic Webshop Pricing and real-time Reporting (Hadoop), Mikko Mattila Bilot

Page 35: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 36

Hortonworks: HDP & HDFReferences & Use CasesPasi Vuorela, Hortonworks

Page 36: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 37

Hortonworks Techical Case StudyMats Johansson, Hortonworks

Page 37: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 38

Next Steps?Tuomas Autio, Bilot

Page 38: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 39

Bilot’s Hadoop Accelerator Program

1. Business Strategy

2. Hadoop bootcamp

3. Proof of Concept

4. Proof of Solution

5. Build & Implement 6. Run

0,5 day 1 day

• Intro to Hadoop

• Vision

• Use cases

• Prioritization

• 1 use case

• Deep dive with business, IT, and operations

• Business case

• Platform deployed on Azure

• Integrations + use case

• Look & feel

• Test drive

• Scalability

• Security

• Tools and methods

• Cloud/on-prem

• Licences/ support descriptions

• Implementation

• Agile dev

• Roll-out and roadmap

• Change mgmt. begins

• Hadoop as a Service

• AMS

• Data driven enterprise/ organization dev

2 - 8 weeks 2-3 months 3-6 months

• Insight for Hadoop-enabled business

• List of prioritized Hadoop use cases

DELIVERABLES

• Business case for PoC use case

• “How to get there?”

• Technical: Up and running system and technical evaluation

• Confirmed business case

• Plans for scalable and secure Hadoop solution ready for implementation

• Hadoop implemented

• Roadmap for further use cases

• Fully functional Hadoop environment

• Continuous support model

• Organizational adaptation

PoC / Pilot Production implementation

Contact Bilot to hear more

Page 39: Hadoop and Modern Data Architecture

01/05/2023 www.bilot.fi 40

Interested? Contact us for a tailored demo and workshop!Bilot is Hortonworks’ first systems integrator partner in Finland and Microsoft’s Gold Partner

Real customer usecases and industry examples available for demo. Contact us for your own

tailored session!

In pre-PoC phase for sandboxing and light demo purposes we can utilize Azure or Bilot’s 5-

node on-premises HDP clusterMikko Mattila

Solution Lead, Analytics

[email protected]

@MattilaJMikko

Tuomas Autio

Head of Big Data & BI Business Lead

[email protected]

@BigDataTuomas