hadoop and modern data architecture
TRANSCRIPT
01/05/2023 www.bilot.fi 1
#azurehadoop
01/05/2023 www.bilot.fi 2
Hosts
Tuomas Autio
Bilot
Head of Big Data & Business Lead (BI)
@BigDataTuomas
Mikko Mattila
Bilot
Solution Lead, Analytics
@MattilaJMikko
Antti Alila
Microsoft
Product Manager, Azure
Mats Johansson
Hortonworks
Solution Architect
mjohansson@
hortonworks.com
Pasi Vuorela
Hortonworks
Sales Manager Nordics
pvuorela@
hortonworks.com
Hadoop and Modern Data ArchitectureBreakfast seminar 26.4.2016
Agenda• Introductions
• Microsoft and Azure Marketplace
• Hadoop and modern data architecture + demo
• Hortonworks, HDP and HDF
• Case study by Hortonworks
• Wrap-up & next steps
01/05/2023 www.bilot.fi 4
Key take-aways from today What to expect
• What Hadoop is
• How does Hadoop fit into enterprise architecture
• What does Hadoop mean to my organizational structure
• Big data is relevant to every industry
• Real world use cases
01/05/2023 www.bilot.fi 5
”Hadoop plays significant role filling that gap in the market. Open standard approach is needed to keep up with the pace. Old technologies are not capable for billions of things to be connected.” GE’s CIO Vince Campisi
”Spark [on top of Hadoop] has been ‘instrumental in where we’ve gotten to’” Vinoth Chandar, Uber
”100 % of large (over $1 bil) enterprises adapts Hadoop by 2020” Forrester’s Principal Analyst Mike Gualteri
“Hadoop is the most important technological part of the digitalization” SAP’s CTO Quentin Clark
“Who cares about Hadoop on Linux? Microsoft (yes, really) … We want Azure to be a place where all operating systems can run” T. K. "Ranga" Rengarajan, Microsoft's corporate VP, Data Platform
01/05/2023 www.bilot.fi 6
Bilot stands for BIBilot’s offering for Analytics & Big data, Tuomas Autio Bilot
About us
130+EXPERTS
100+CUSTOMERS
16M€TURNOVER IN
2014
+40%AVERAGE
GROWTH
15NATIONALITIES
100%OWNED BY EMPLOYEES
10YEARS
2COUNTRY HQ’S
Bilot’s portfolio and analytics?
01/05/2023 www.bilot.fi 9
Our customers* are recognized leaders in their markets
*) >120 customers in total. And increasing…
01/05/2023 www.bilot.fi 10
Microsoft and Azure MarketplaceAntti Alila, Microsoft
01/05/2023 www.bilot.fi 11
Hadoop 101Tuomas Autio, Bilot
01/05/2023 www.bilot.fi 12
DATA SYSTEMS
REPORTING & APPLICATIONS
AnalyticsCustom
applications
Packagedapplication
s
EDWRDBMS MPP
New Data Sources
Social media
Click-stream
Marketing data
Server logs / RFID
(TRADITIONAL) DATA SOURCES
POS
ERP CRM
…
1
Sensor / Machine data
Geo locations
Unsctructured documents
2
(Old) Architectures under pressure
Quick Intro to Hadoop
01/05/2023 www.bilot.fi 13
• Hadoop is an open source framework for distributed file storage
• Managed by Apache Foundation • De facto standard for big data• Enterprise Hadoop distributions
• Hortonworks HDP (”Red Hat” of Hadoop), HDP for Windows, IBM, Microsoft Azure HDInsight (HDP), Cloudera, MapR, AWS (EMR), Rackspace
• >50% of US Fortune 100 companies use Hadoop, ~60% CAGR (2020 $50bn)
• ~25 Finnish instances, ~10 known production instances in Finland (strongly behind US and central European markets)
Hadoop 2.x
Framework
01/05/2023 www.bilot.fi 14
Key Features• Cluster of commodity servers, scales out ”infinitely” affordably• Linear growth of performance• Distributed processing• Schemaless• Hadoop stores files in a distributed file system• Fast (for big data), maps data wherever it is located in cluster• Resilient to failure• Flexible• Cost effective
01/05/2023 www.bilot.fi 15
USE CASEBUT
”Haters to the left! Kill the fear! Just get it started and go!”, Symantec’s Cloud Platform Engineering Leader David Lin
Value compounds with use, as more use cases,sources, time periods join in a data lake
www.bilot.fi 17
”Hadoop – It’s damn hard to use”, anonymous CXO
01/05/2023
Mitigation: Right Team and skills!
IT and the Business MUST Work Together to Create Maximum Value
Typical (new) roles needed in the organization:• The Data Architect• The Data Scientist• The Business Analyst• The Developer• The Administrators
01/05/2023 www.bilot.fi 18
Modern Data ArchitectureMikko Mattila, Bilot
01/05/2023 www.bilot.fi 19
Why Hadoop will success
IKEA’s Business Idea“to offer a wide range of home furnishings with good design and function at prices so low that as many people as possible will be
able to afford them”
01/05/2023 www.bilot.fi 20
Why Hadoop will success“HADOOP IS A SOFTWARE PACKAGE AT SUCH A LOW PRICE
THAT ALMOST EVERY COMPANY IS ABLE TO AFFORD IT ALREADY”
“HADOOP AND OTHER OPEN SOURCE BIG DATA PROJECTS PROVIDE A HUGE RANGE OF IT SOFTWARE FOR AREAS OF
DATA MANAGEMENT AND SYSTEM INTEGRATION”
“HADOOP TOOLS ARE DESIGNED TO SOLVE ISSUES IMPOSSIBLE FOR TRADITIONAL COMMERCIAL TOOLS”
01/05/2023 www.bilot.fi 21
Hadoop Scenario 1: pre-process ETL• Shift the pre-processing of ETL in staging data warehouse to Hadoop• Shifts high cost data warehousing to lower cost Hadoop clusters
01/05/2023 www.bilot.fi 22
Hadoop Scenario 2: hot and cold storage• Offloading large volume of historical data into cold storage with Hadoop• Keep data warehouse for hot data to allow BI and analytics • When data from cold storage is needed, it can be moved back into the warehouse
Staging
Hot datain DW
Cold datain Hadoop
01/05/2023 www.bilot.fi 23
Hadoop Scenario 3: true data discovery• Keep data warehouse for operational BI and analytics• Allow data scientists to gain new discoveries on raw data (no format or structure)• Operationalize discoveries back into the warehouse
Staging
Traditional Enterprise software and files
Online systems (log or Streams)
05/01/2023 www.bilot.fi
RDBMSERP
Hadoop ecosystem: All you need for modern analytics architecture as open source
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured & documents
Clickstream Server logs /RFID
Sentiment, Some Sensor
ETL + DW
Digital organization Traditional organization
Traditional Enterprise software and files
Online systems (log or Streams)
05/01/2023 www.bilot.fi
RDBMSERP
Hadoop ecosystem: All you need for modern analytics architecture as open source
Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured & documents
Clickstream Server logs /RFID
Sentiment, Some Sensor
Message Queue and history(Kafka)
Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL + DW
Digital organization Traditional organization
Traditional Enteprice software and files
Interactive processing &
queries(Spark & Hive)
Online systems (log or Streams)
FileSystem (HDFS) +
Core Services
05/01/2023 www.bilot.fi
RDBMSERP
Hadoop ecosystem: All you need for modern analytics architecture as open source
Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured & documents
Clickstream Server logs /RFID
Sentiment, Some Sensor
Message Queue and history(Kafka)
Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL + DW
BI User
Digital organization Traditional organization
Batch Processing
Traditional Enterprise software and files
Interactive processing &
queries(Spark & Hive)
Online systems (log or Streams)
FileSystem (HDFS) +
Core Services
05/01/2023 www.bilot.fi
RDBMSERP
Batch processing
(MapReduce & Pig Latin)
Hadoop ecosystem: All you need for modern analytics architecture as open source
Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured & documents
Clickstream Server logs /RFID
Sentiment, Some Sensor
Message Queue and history(Kafka)
Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL + DW
RDBMS -> HDFS
batch load(Sqoop)
StatisticalAnalysis(Spark)
BI User Data Scientist
Digital organization Traditional organization
Batch Processing
Traditional Enterprise software and files
Interactive processing &
queries(Spark & Hive)
Online systems (log or Streams)
FileSystem (HDFS) +
Core Services
05/01/2023 www.bilot.fi
RDBMSERP
Batch processing
(MapReduce & Pig Latin)
Hadoop ecosystem: All you need for modern analytics architecture as open source
Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured & documents
Clickstream Server logs /RFID
Sentiment, Some Sensor
Message Queue and history(Kafka)
Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL + DW
RDBMS -> HDFS
batch load(Sqoop)
StatisticalAnalysis(Spark)
BI User Data Scientist
Digital organization Traditional organization
Batch Processing
Traditional Enterprise software and files
Interactive processing &
queries(Spark & Hive)
Online systems (log or Streams)
FileSystem (HDFS) +
Core Services
05/01/2023 www.bilot.fi
RDBMSERP
Batch processing
(MapReduce & Pig Latin)
Hadoop ecosystem: All you need for modern analytics architecture as open source
Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured & documents
Clickstream Server logs /RFID
Sentiment, Some Sensor
Message Queue and history(Kafka)
Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL + DW
RDBMS -> HDFS
batch load(Sqoop)
StatisticalAnalysis(Spark)
NoSQL database
for interactive
use (hbase)
BI User Data Scientist
Batch Processing
Digital organization Traditional organization
Traditional Enterprise software and files
Interactive processing &
queries(Spark & Hive)
Online systems (log or Streams)
FileSystem (HDFS) +
Core Services
05/01/2023 www.bilot.fi
RDBMSERP
Batch processing
(MapReduce & Pig Latin)
Hadoop ecosystem: All you need for modern analytics architecture as open source
Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured & documents
Clickstream Server logs /RFID
Sentiment, Some Sensor
Message Queue and history(Kafka)
Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL + DW
RDBMS -> HDFS
batch load(Sqoop)
StatisticalAnalysis(Spark)
NoSQL database
for interactive
use (hbase)
Data VirtualizationVirtual Datamodels / security
O/JDBC, MDX, REST outbound interfaces
BI User Data Scientist
Batch Processing
O/JDBC, MDX, REST inbound interfacesLogical Data Warehouse
Traditional BI Tools
Digital organization Traditional organization
Traditional Enterprise software and files
Interactive processing &
queries(Spark & Hive)
Online systems (log or Streams)
FileSystem (HDFS) +
Core Services
05/01/2023 www.bilot.fi
RDBMSERP
Batch processing
(MapReduce & Pig Latin)
Hadoop ecosystem: All you need for modern analytics architecture as open source
Real-time stream, log data and rdbms change capturing(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured & documents
Clickstream Server logs /RFID
Sentiment, Some Sensor
Message Queue and history(Kafka)
Complex event processing(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL + DW
RDBMS -> HDFS
batch load(Sqoop)
StatisticalAnalysis(Spark)
NoSQL database
for interactive
use (hbase)
Data VirtualizationVirtual Datamodels / security
O/JDBC, MDX, REST outbound interfaces
BI User Data Scientist
Batch Processing
O/JDBC, MDX, REST inbound interfacesLogical Data Warehouse
Traditional BI Tools
Digital organization Traditional organization
01/05/2023 www.bilot.fi 32
Example use case: Dynamic Pricing
Dynamic pricing will be more and more common in the future
Usage of dynamic pricing should be business decision – not restricted by your technical capabilities
Dynamic Pricing
Same price for every onein every store
More you visit onbooking pages the
higher price
01/05/2023 www.bilot.fi 33
Dynamic OmniChannel Pricing
Store
Consumerbuying
On-line Channel
Consumption(IoT)
Price Cache (SmartPricing
Accelerator SPA)Pricingrules
Price ListCustomerProduct
Basket SizeHistory
Warehouse levelsDelivery time / type
WebSite ActivityIOT consumption
MQ
Analytics and Pricing Simulations(SmartPricing)
Supply ChainManagement(+other sources)
BatchProcessing & History
Second & Minute Level Price optimizationMonthly level Price optimization
Orders /ClickStream
Sensor Data
POS data
CEP
01/05/2023 www.bilot.fi 34
Demo Scope
Consumerbuying
Pricingupdates
MQ
Analytics
DataWarehouse CEP
WebShopClickStream
ClickStream
Orders, Product categories, Suppliers
MS SQL Server
HTML5 + Tomcat serverKafka
HDFS +Hive
MS PowerBI
Log file sniffing to streamFlume-ng
Every visit to ”product page” increasesprice with 5%
Indentifies ”product page”and viewed product + sends request to increase price
01/05/2023 www.bilot.fi 35
DEMOReal-time Dynamic Webshop Pricing and real-time Reporting (Hadoop), Mikko Mattila Bilot
01/05/2023 www.bilot.fi 36
Hortonworks: HDP & HDFReferences & Use CasesPasi Vuorela, Hortonworks
01/05/2023 www.bilot.fi 37
Hortonworks Techical Case StudyMats Johansson, Hortonworks
01/05/2023 www.bilot.fi 38
Next Steps?Tuomas Autio, Bilot
01/05/2023 www.bilot.fi 39
Bilot’s Hadoop Accelerator Program
1. Business Strategy
2. Hadoop bootcamp
3. Proof of Concept
4. Proof of Solution
5. Build & Implement 6. Run
0,5 day 1 day
• Intro to Hadoop
• Vision
• Use cases
• Prioritization
• 1 use case
• Deep dive with business, IT, and operations
• Business case
• Platform deployed on Azure
• Integrations + use case
• Look & feel
• Test drive
• Scalability
• Security
• Tools and methods
• Cloud/on-prem
• Licences/ support descriptions
• Implementation
• Agile dev
• Roll-out and roadmap
• Change mgmt. begins
• Hadoop as a Service
• AMS
• Data driven enterprise/ organization dev
2 - 8 weeks 2-3 months 3-6 months
• Insight for Hadoop-enabled business
• List of prioritized Hadoop use cases
DELIVERABLES
• Business case for PoC use case
• “How to get there?”
• Technical: Up and running system and technical evaluation
• Confirmed business case
• Plans for scalable and secure Hadoop solution ready for implementation
• Hadoop implemented
• Roadmap for further use cases
• Fully functional Hadoop environment
• Continuous support model
• Organizational adaptation
PoC / Pilot Production implementation
Contact Bilot to hear more
01/05/2023 www.bilot.fi 40
Interested? Contact us for a tailored demo and workshop!Bilot is Hortonworks’ first systems integrator partner in Finland and Microsoft’s Gold Partner
Real customer usecases and industry examples available for demo. Contact us for your own
tailored session!
In pre-PoC phase for sandboxing and light demo purposes we can utilize Azure or Bilot’s 5-
node on-premises HDP clusterMikko Mattila
Solution Lead, Analytics
@MattilaJMikko
Tuomas Autio
Head of Big Data & BI Business Lead
@BigDataTuomas