changes of huawei big data platform - cedar.liris.cnrs.fr · pdf fileoss . ne parameter ne...

13
www.huawei.com Security Level: HUAWEI TECHNOLOGIES CO., LTD. Challenges of Big Data Platform

Upload: buidiep

Post on 09-Mar-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

www.huawei.com

Security Level:

HUAWEI TECHNOLOGIES CO., LTD.

Challenges of Big Data Platform

Categories of data in carrier network Network insight Customer behavior insight Society activity insight Challenges

Contents

Five Categories of Data Enterprise

Management

E-Learning

ERP HR

Account CMS

Structured(Table) Unstructured

(graphics、text、video)

Network Element

SDR CDR CHR

MR

NE Log Counter

Semi-Structured(signaling call records)

Unstructured(Time series data)

BSS

Billing Order User Profile

CRM

MKT

report

Structured(table)

OSS

NE Parameter

NE Config Alert

Perf NE Log

Structured(table) Unstructured(Time

series data)

VAS

Order Usage Service Content

Structured(table、point sets)

Semi-Structured(column cluster)

Unstructured(graphics、text、video、time-series

data)

ISP GIS

100TB~10TB 10PB / Year,1~3 years accumulation

100GB~10TB xxTB / Year

1TB~100TB xxGB / Year

100TB~10PB 100TB / Year

Generated Manually Generated by Machine Generated Manually Generated by

machine Generated Manually

Business Domain

Volume

Source

Characteristics

NodeB RNC SGSN GGSN/DPI

Probe or NE Integration

OSS BSS VAS

企业管理域

ERP HRM

SCM FRM MRP

Evolutions of data analytic business in big data era

Data Volume/Flow Velocity Data Variety

Data Set>100TB Data flow rate Accumulation rate ( >60% scenarios)

Requirements on scale-out Data format and sources

Statistics、offline、isolated

Operation Report Statistics data Offline Statistic scenario,low accumulation rate

No CRM、Billing,structured

Billing Verification <100T Offline Fixed No Billing structured

Large volume、real-time, convergent of various data types

Network optimization Network equipment data,10PB

Elastic data processing cluster of over 100 servers, Handle 1PB data

Data from NEs, such as RAN, PS, etc

Customer experience

Network data, 10PB ~200Gbps Archive 1 year’s data Elastic data processing cluster of over 100 servers, Handle 1PB data

Network signaling, xDR, traffic stastics, NE configuration data, semi-structured data takes the majority

Precise marketing Customer profile 100GB~300GB

~100,000 packages/s

Fixed volume In-memory computing CRM、billing、xDR, structured data , semi-structured data

Operational analytic system:operation reports/KPI reports

(statistics)

order

Performance

NE data

OSS

BSS

VAS Data

HR/FRM/SRM Enterprise management

Stats of network management performance

HR, Financial reports

NPM/SQM

CEM

Network schedule(statistics)

Alerts

CRM/Billing AD promotion

Offer design

Business

Indicator

Evolution

Past: Typical analytic business is operation analysis, based on statistics, off line, isolated data; Nowadays: New business,such as network optimization, customer experience, etc. Large volume, real-time, various kinds of data type;

Data evolutions driven by carrier business

Three categories of Big

Data business

Business Evolution

• Analytics based on network data, combined with user data , to adjust network layout;

• Focus on network status: location, equipment workload, adjust network dynamically

Network Insight

Society Insight

Customer Insight

• Analytics based on user data, combined with network equipment data, to recognize characters of customer behavior

• To understand who is using network, consume which service , and to optimize business

• Analytics based on laws behind data, ,to dig out data values

• Based on laws, guide carrier develop new valuable business

Présentateur
Commentaires de présentation
流量经营:基于BSS域的业务收入/用户消费数据、网元侧的网络KPI数据和经分系统的业务数据/终端特征数据等进行数据分析,实现对网络资源动态精确配置、对业务流量分层管理、用户可识别、业务可区分、流量可优化、网络可管理、计费可灵活 体验经营:基于对网络探针数据、用户行为数据实现用户洞察、能力开放,化流量为服务,变体验为黄金 价值经营:基于对用户分层价值的分析,对用户价值的重构和挖掘,识别用户基本需求和增值需求,黏住有基本业务需求的用户,吸引和培养高价值需求用户

Categories and characteristics of carrier big data business

Customer Insight Network Insight Society Insight

Capability

Business

Data representation

and query

Data storage and

integration

ETL

Data UP

Complaints

User account

User consuming

TS

Summary Data

MR Log xDR

DPI Dial test

Traffic test

NE data Operational data

order

Account

CRM CBS IPCC

VAS Network

Marketing

Achieved data VAS and External data

LBS Internet

VAS usage User profile

xDR Traffic statistics

Log

Ad-Hoc Query

Query is not complex

Large volume,10PB level,Low cost

Raw data

High performance loading

Real-time response

High concurrency

Multi-dimension

Complex Query

Low data volume

Summarized data

Moderate Volume

Mixed with raw data and summarized data

Data model complex Real time update

Data visualization, rich and complex models

Complex data mining algorithms, need the guides from data scientist

and industry experts

Data volume varies in different domain, averagely 10PB level,

requires low cost

Cross domain data integration

Real time High concurrency

High performance Low cost

Complex models and algorithms

Complex Query

Business requirements onNetwork Insight Requirements Data processing procedure

For a carrier network to provide service for 40M users, there are several challenges: Volume: 120T -> 5.6P; Integration: 33 nodes -> 6 nodes; query response time: 100s -> 15s; Multi-dimension analytics

Target(40M users)

Data summarization and storage

②: raw data summarization

• Feeding rate 90,000rows/s • Ensure stable query performance

①:Archive and query raw data

• 1 year’s data,5.6P • Compression rate: 10:1 • Support a few AD-hoc queries

③:statistics /analysis libs

• Support complex queries invoving10 tables

• 20 concurrent reporting queries, respond in 15 seconds

Data analytics and processing

④: Multi-dimension analytics

• Multi- Dimension:14 dimensions; • General analytics:combination of

5 to 9 dimensions of SDR • BKPI combination of 10 to14

dimensions in BKPI • Second level response time, on

1.4 billion rows

20M users,25Gbps, 60 days’ raw data, 120TB 40M users,200Gbps, 1 year’s raw data, 5.6PB

Data Management

Data preprocessing

Data

ingress

PS CS NMS EMS

Data representation

Archieve DW Summarize

② ③

Data analytics and processing

Multi-dimension analytic

60 days,120T1 Year,5.6P 140k Records/s354kRecords/s

Business requirements on Customer Insight

Promote electronic magazine for people taking public traffic

Promote Wifi offers to people in coffee shops without wifi services

Promote cosmetics vouchers to females in

shopping market

8 AM Go to office

Working days weekends holidays vocations

Get subscriber’s location Based on behaviors,analysis users’ consuming characteristic, favorite content ant offers; Big Data Platform

Precise AD promotion based on user behavior information, refined event content requirements from suppliers

Business requirements on Customer Insight

Requirements Data processing procedure

Distributed/Distributed DBMS query engine

Characteristic profile

ingress

Distributed database

Distributed file system

Distributed computation

Hardware

Distributed platform

Statistics analysis

classification aggregation

predicates association

Infrastructure (Data mining, analysis)

Content classification

retrieve Location service

visualization

Text processing

Graphic service

Service capabilities (information archive,

process)

Dynamic policy

Item inquiry

Network analysis

Performance assess

Traffic analysis

Finance analysis

Application

Customer insight

Marketing management

……

……

……

Query: • Point query and analytic query from RTD • Exploring query such as customer segmentation requires

full table scan and muti-table join • Query on predefined 1024 KPIs • Tag ,labeling, 500+ indicators, 50+ graphic computation

Data mining: • Customized model(User Modeling)

User/Item/content/properties/similarity,Min Hash(CF) • Behavior Targeting,customer profiling based on behavior

and values

Pain point 1:Poor OLAP performance, minute level response time with server hundreds GB data. OLAP system is built by ROLAP solution, such as Cognos, DB2 etc;

Pain point 2:Poor DW performance, high cost(raw data storage and computation costs above 70% capability of a DW,reach the maximum volume and capability of traditional database)

Pain point 3:high software / hardware cost:solution is composed with high end servers, disk array and commercial dbms, expensive license and hardware

Two general requirements on BI technologies:High performance DW with low cost, analysis & mining algorithms based on user behaviors and values

Business requirements on society Insight

Traffic Application:Congestion information possible through Telco signaling data

Population Analytics:traffic planning, city resources distribution, abnormal events

Focus on anonymous wireless users and location based application, focus on government, industry and enterprise application

MR Data (Time, IMSI, Longitude, Latitude, RNCID, CellID) Data Sources

Visualization

Data Cleaning Data Integration Data Exploration Data Selection

Population Density

HDFS + Map/Reduce

HDFS + HQL

Map preprocessing

OD Table OD transportation Mode Classification

Traffic Congestion Detection

OD Graph&Matrix Population Density OD transport classification Traffic congestion detection

District segmentation

Extract district coordinates

UniBI Reporting Tools

Road segmentation Extract road coordinates

Data Analysis

Data Preprocessing

Business requirements on society Insight To dig out laws of group activity through data mining algorithms applied on maps and dimensional data. Core part is the data analysis layer.

Summary of big data business requirements

Requirements

Data storage and computation Data analytics

• MPP DB:Support 10PB level volume; 100+ node linear scalability; respond queries on 0.1 billion rows in 1 minute;10:1 compression ratio;

• Real-time analytics in-memory DB:100TB, columnar, wide table with 2000-5000 columns, 30,000 updates/s, ad-hoc query respond in 3 seconds, to support real time business policy adjustment, real-time KPI calculation

• Streaming processing: 1 million events per second; 1 micro second latency for each event

• MOLAP:support SQL and MDX, <5s response time in 80~90% scenarios; 1s response latency on TB data with hundred dimensions

• Real-time dashboard; • Data mining : High accuracy , various

algorithms, online data mining , quick

response.

• Huawei product lines is attempting to build new big data business.

• Huawei product lines have various requirements on big data components: mainly on MPP DB、in-memory analytics DB、streaming computation、MOLAP、parallel computation, analytics & mining algorithms;

Thank you www.huawei.com

Copyright©2011 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.