sharing bisnis big data v3 part2

18
Akselerasi Pertumbuhan Startup dengan Big Data Dwika Sudrajat IT Consultant Florida, Hong Kong & Jakarta. November 23 th , 2016 Part 2 email: [email protected] Florida: +1-407-2502812 Hong Kong: +852-54152971 Jakarta: +62-8161108571 FB: dwika.sudrajat TW: @dwikasudrajat managingconsultant.blogspot.com dwikasudrajat.blogspot.com dwikasudrajat.wordpress.com

Upload: dwika-sudrajat

Post on 15-Apr-2017

136 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Sharing  bisnis big data v3 part2

Akselerasi Pertumbuhan Startupdengan Big Data

Dwika SudrajatIT Consultant

Florida, Hong Kong & Jakarta.November 23th, 2016

Part 2▐ email: [email protected]▐ Florida: +1-407-2502812▐ Hong Kong: +852-54152971▐ Jakarta: +62-8161108571▐ FB: dwika.sudrajat▐ TW: @dwikasudrajat▐ managingconsultant.blogspot.com▐ dwikasudrajat.blogspot.com▐ dwikasudrajat.wordpress.com

Page 2: Sharing  bisnis big data v3 part2

FOUNDED IN 2004

Case Study :

Page 3: Sharing  bisnis big data v3 part2

INDEXARCHITECTURE OVERVIEW SOFTWARE & SCALABILITY LAMP LINUX&APACHE MYSQL PHP & HIPHOP DISADVANTAGES OF LAMP MEMCACHED HADOOP ECOSYSTEM HADOOP HIVE

123

Page 4: Sharing  bisnis big data v3 part2

MORE THAN 60.000 SERVERS

THE LAST DATACENTER IS BASED ON ENTIRELY SELF-DESIGN HARDWARE THAT WAS RECENTLY UNVEILED AS

“OPEN COMPUTE PROJECT”

300 TB OF DATA STORED IN MEMCACHE PROCESSES

Scaling challenge

Page 5: Sharing  bisnis big data v3 part2

THE HADOOP AND HIVE CLUSTER IS MADE OF 300.000 SERVERS WITH 8 CORES, 32GB RAM, 12TB DISKS

100BILLION HITS, 50BILLION PHOTOS, 3TRILLION OBJECTS CACHED, 130TB OF LOGS PER DAY

TOTAL: 24.000 CORES, 96TB RAM AND 36PB DISKS

Page 6: Sharing  bisnis big data v3 part2

ARCHITECTURE OVERVIEWFront end & Back end

FRONTEND

presentationlayer

BACKEND

presentationlayer

Business &data access

layers

DATABASE

VISITORS WEB SERVER STAFF

Page 7: Sharing  bisnis big data v3 part2

7

Architecture of Facebook

Components of Facebook

Page 8: Sharing  bisnis big data v3 part2

PHP & HIPHOP

PARSER STATICANALYZER

PRE-OPTIMIZER

TYPEINFERENCE

ENGINE

POST-OPTIMIZER

CODEGENERATOR

g++

FACEBOOK’S HIPHOP IS A SOURCE CODE TRANSFORMER THAT CONVERTS THE PHP INTO C++ AND COMPILES IT USING G+

+, THUS PROVIDING AHIGH PERFORMANCE TEMPLATING A WEB LOGIC

EXECUTION LAYER

Page 9: Sharing  bisnis big data v3 part2

DISADVANTAGES OF LAMP

FACEBOOK HAS REALIZED THAT THERE ARE DISADVANTAGES TO USING THE LAMP STACK, IS NOT NECCESSARILY OPTIMIZED FOR WEBSITES SIZE AND THEREFORE DIFFICULT TO SCALE.

IT IS THE FASTEST EXECUTING LANGUAGE AND THE FRAMEWORK OF THE EXTENSION IS DIFFICULT TO USE

Web/AppServer

Database

HTTP Request

HTML

HTTP Request

API/FQL

Response

FBML

Browser

Page 10: Sharing  bisnis big data v3 part2

CLIENT SERVERPUT/GET/REMOVE(Sync)

Mem

Cach

ed

WebServer

1

WebServer

2

WebServer

3

MemCachedServer Partition 1

MemCachedServer Partition 2

MemCachedServer Partition 3

MemCachedServer Partition 4

Page 11: Sharing  bisnis big data v3 part2

Memory Management using Memcached

05/03/2023

Memcached based Architecture

Protects the main database from high read demands from users

Page 12: Sharing  bisnis big data v3 part2

HADOOP HIVE

APACHE HIVE IS A DATA WAREHOUSE INFRASTRUCTURE BUILT ON TOP OF HADOOP FOR PROVIDING DATA SUMMARIZATION, QUERY AND

ANALYSIS, DEVELOPED BY FB

HADOOP WAS BUILT TOORGANIZE AND STOREMASSIVE AMOUNTS OF

DATA

HIVE ALLOWS USERS TOEXPLORE AND STRUCTURE

THAT DATA, ANALYZE ITAND THEN TURN IT INTO

BUSINESS INSIGHT

FAMILIARSCALABLE &EXTENSIBLE FAST INFORMATIVE

Page 13: Sharing  bisnis big data v3 part2

SCRIBEServer logs

IT IS A SERVER FOR AGGREGATING LOG DATA STREAM IN REAL TIME ON MANY OTHER SERVERS, IT IS SCALABLE

FRAMEWORK USEFUL FOR RECORDING A WIDE RANGE OF DATA.

IT IS BUILT ON TOP OF SAVINGS.

DATA SUCH AS LOGIN, CLICKS AND FEEDS TRANSIT USING SCRIBE AND ARE AGGRAVATING AND

STORED IN HDFS USING SCRIBE-HDFS, ALLOWING EXTENDED ANALYSING USING MAPREDUCE

Page 14: Sharing  bisnis big data v3 part2

MOVES DATA FROM THE SERVERTO A CENTRAL REPOSITORY

Page 15: Sharing  bisnis big data v3 part2

Storing

05/03/2023

• Apache Hadoop is being used in three broad types of systems:

• as a warehouse for web analytics

• as storage for a distributed database

• and for MySQL database backups.

Page 16: Sharing  bisnis big data v3 part2

VARNISH CACHE

IT IS USED FOR HTTP PROXYINGTHEY HAVE IT FOR ITS HIGH PERFORMANCE AND

EFFICIENCY

Request

Response

CachingProxy

WebServer

WEB APPLICATION ACCELERATOR

Page 17: Sharing  bisnis big data v3 part2

HAYSTACKTHE STORAGE OF THE BILLIONS OF PHOTOS POSTED BY USERS IS HANDLED WITH THIS AD-HOC STORAGE SOLUTION DEVELOPED BY FACEBOOK WHICH BRINGS LOW LEVEL OPTIMIZATIONS AND APPEND-ONLY WRITES

Page 18: Sharing  bisnis big data v3 part2

QUESTIONS?

18

Q&A

Thanks