hadoop for carrier
Post on 05-Dec-2014
996 Views
Preview:
DESCRIPTION
TRANSCRIPT
04/10/2023
Welcome to FlytxtLeveraging Hadoop Cluster for Carrier grade application
Copyright © 2011 Flytxt B.V. All rights reserved.
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 2
Service discovery
No Personalization
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 3
Mammoth DataData Analysis
600- 800 GB of CDR per day◦ GPRS Signaling 50GB/day◦ 3G Signaling 300GB/day◦ Voice 100GB/day◦ SMS 200GB/day
100 - 200 GB/day of Web Data
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 4
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 5
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 6
Framework for distributed processing of large data sets across clusters
Consists of ◦ Hadoop Distributed File System aka HDFS (File system)◦ Hadoop MapReduce (programming model )
Characteristics ◦ Performance shall scale linearly ◦ Compute should move to data◦ Simple core, Modular and Extensible
What is Hadoop
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 7
Current Bottleneck
◦ Data resides in multiple nodes/zones/VM instance & no elegant, reliable and efficient way of extracting data
◦ Loading terabytes of data into database is slow
◦ Parallel computing not a possibility in Conventional BI ETL
◦ User profile and application data resides in DB which can scale only vertically
ETL aka Extract Transform & Load
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 8
Structured Data
sqoop --connect jdbc:mysql://db.example.com/website --table USERS --as-sequencefile
Un Structured Data
ETL The Hadoop Way
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 9
A Distributed data Collection server◦ Scalable◦ Configurable ◦ Extensible ◦ Manageable
Built around the concept of flows◦ A single flow corresponds to a type of data source◦ Supports compression, batching & reliability setups per flow
Data come in through a source◦ Optionally processed by one or more decorators◦ And transmitted out via sink
Flume
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 10
Flume
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 11
Hadoop Storage system
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 12
Map Reduce is very powerful, but:◦ It requires a Java programmer◦ User has to re-invent common◦ functionality (join, filter, etc.)
Execution engine atop Hadoop
Pig provides a higher level language Pig Latin
Opens the system to non-Java programmers
Provides common operations like join, group, filter, sort
Pig
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 13
Web log processing. Data processing for web search platforms. Ad hoc queries across large data sets. Rapid prototyping of algorithms for processing large data
sets. Pig runs on local machine and job gets executed in hadoop
cluster $ cd /usr/share/cloudera/pig/ $ bin/pig –x local grunt>
Log = LOAD ‘excite-small.log’ AS (user, timestamp, query); grpd = GROUP log BY user; cntd = FOREACH grpd GENERATE group, COUNT(log); STORE cntd INTO ‘output’;
Pig usage
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 14
System for querying and managing structured data Built on top of hadoop Uses map reduce for execution SQL like syntax; supports
◦ From clause subquery◦ ANSO Join (equi join )◦ Multi-table insert◦ Multi group-by◦ Sampling◦ Object traversal
Engagement◦ Summarization◦ Ad hoc analysis◦ Spam detection
Hive
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 15
Hive: component
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 16
Feature Hive Pig
Language SQL-like PigLatin
Schemas/Types Yes (explicit) Yes (implicit)
Partitions Yes No
Server Optional(thirft) No
User Defined Functions Yes Yes
Custom Serializer/Deserializer Yes Yes
DFS Direct Access Yes (implicit) Yes (explicit)
Join/Order/Sort Yes Yes
Shell Yes Yes
Streaming Yes No
Web Interface Yes No
JDBC/ODBC Yes (limited) No
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved. 17
04/10/2023Copyright © 2011 Flytxt B.V. All rights reserved.
18
Thank you
top related