changes of huawei big data platform - cedar.liris.cnrs.fr · pdf fileoss . ne parameter ne...
TRANSCRIPT
Categories of data in carrier network Network insight Customer behavior insight Society activity insight Challenges
Contents
Five Categories of Data Enterprise
Management
E-Learning
ERP HR
Account CMS
Structured(Table) Unstructured
(graphics、text、video)
Network Element
SDR CDR CHR
MR
NE Log Counter
Semi-Structured(signaling call records)
Unstructured(Time series data)
BSS
Billing Order User Profile
CRM
MKT
report
Structured(table)
OSS
NE Parameter
NE Config Alert
Perf NE Log
Structured(table) Unstructured(Time
series data)
VAS
Order Usage Service Content
Structured(table、point sets)
Semi-Structured(column cluster)
Unstructured(graphics、text、video、time-series
data)
ISP GIS
100TB~10TB 10PB / Year,1~3 years accumulation
100GB~10TB xxTB / Year
1TB~100TB xxGB / Year
100TB~10PB 100TB / Year
Generated Manually Generated by Machine Generated Manually Generated by
machine Generated Manually
Business Domain
Volume
Source
Characteristics
NodeB RNC SGSN GGSN/DPI
Probe or NE Integration
OSS BSS VAS
企业管理域
ERP HRM
SCM FRM MRP
Evolutions of data analytic business in big data era
Data Volume/Flow Velocity Data Variety
Data Set>100TB Data flow rate Accumulation rate ( >60% scenarios)
Requirements on scale-out Data format and sources
Statistics、offline、isolated
Operation Report Statistics data Offline Statistic scenario,low accumulation rate
No CRM、Billing,structured
Billing Verification <100T Offline Fixed No Billing structured
Large volume、real-time, convergent of various data types
Network optimization Network equipment data,10PB
Elastic data processing cluster of over 100 servers, Handle 1PB data
Data from NEs, such as RAN, PS, etc
Customer experience
Network data, 10PB ~200Gbps Archive 1 year’s data Elastic data processing cluster of over 100 servers, Handle 1PB data
Network signaling, xDR, traffic stastics, NE configuration data, semi-structured data takes the majority
Precise marketing Customer profile 100GB~300GB
~100,000 packages/s
Fixed volume In-memory computing CRM、billing、xDR, structured data , semi-structured data
Operational analytic system:operation reports/KPI reports
(statistics)
order
Performance
NE data
OSS
BSS
VAS Data
HR/FRM/SRM Enterprise management
Stats of network management performance
HR, Financial reports
NPM/SQM
CEM
Network schedule(statistics)
Alerts
CRM/Billing AD promotion
Offer design
Business
Indicator
Evolution
Past: Typical analytic business is operation analysis, based on statistics, off line, isolated data; Nowadays: New business,such as network optimization, customer experience, etc. Large volume, real-time, various kinds of data type;
Data evolutions driven by carrier business
Three categories of Big
Data business
Business Evolution
• Analytics based on network data, combined with user data , to adjust network layout;
• Focus on network status: location, equipment workload, adjust network dynamically
Network Insight
Society Insight
Customer Insight
• Analytics based on user data, combined with network equipment data, to recognize characters of customer behavior
• To understand who is using network, consume which service , and to optimize business
• Analytics based on laws behind data, ,to dig out data values
• Based on laws, guide carrier develop new valuable business
Categories and characteristics of carrier big data business
Customer Insight Network Insight Society Insight
Capability
Business
Data representation
and query
Data storage and
integration
ETL
Data UP
Complaints
User account
User consuming
TS
Summary Data
MR Log xDR
DPI Dial test
Traffic test
NE data Operational data
order
Account
CRM CBS IPCC
VAS Network
Marketing
Achieved data VAS and External data
LBS Internet
VAS usage User profile
xDR Traffic statistics
Log
Ad-Hoc Query
Query is not complex
Large volume,10PB level,Low cost
Raw data
High performance loading
Real-time response
High concurrency
Multi-dimension
Complex Query
Low data volume
Summarized data
Moderate Volume
Mixed with raw data and summarized data
Data model complex Real time update
Data visualization, rich and complex models
Complex data mining algorithms, need the guides from data scientist
and industry experts
Data volume varies in different domain, averagely 10PB level,
requires low cost
Cross domain data integration
Real time High concurrency
High performance Low cost
Complex models and algorithms
Complex Query
Business requirements onNetwork Insight Requirements Data processing procedure
For a carrier network to provide service for 40M users, there are several challenges: Volume: 120T -> 5.6P; Integration: 33 nodes -> 6 nodes; query response time: 100s -> 15s; Multi-dimension analytics
Target(40M users)
Data summarization and storage
②: raw data summarization
• Feeding rate 90,000rows/s • Ensure stable query performance
①:Archive and query raw data
• 1 year’s data,5.6P • Compression rate: 10:1 • Support a few AD-hoc queries
③:statistics /analysis libs
• Support complex queries invoving10 tables
• 20 concurrent reporting queries, respond in 15 seconds
Data analytics and processing
④: Multi-dimension analytics
• Multi- Dimension:14 dimensions; • General analytics:combination of
5 to 9 dimensions of SDR • BKPI combination of 10 to14
dimensions in BKPI • Second level response time, on
1.4 billion rows
20M users,25Gbps, 60 days’ raw data, 120TB 40M users,200Gbps, 1 year’s raw data, 5.6PB
Data Management
Data preprocessing
Data
ingress
③
PS CS NMS EMS
Data representation
Archieve DW Summarize
② ③
Data analytics and processing
Multi-dimension analytic
④
60 days,120T1 Year,5.6P 140k Records/s354kRecords/s
①
Business requirements on Customer Insight
Promote electronic magazine for people taking public traffic
Promote Wifi offers to people in coffee shops without wifi services
Promote cosmetics vouchers to females in
shopping market
8 AM Go to office
Working days weekends holidays vocations
Get subscriber’s location Based on behaviors,analysis users’ consuming characteristic, favorite content ant offers; Big Data Platform
Precise AD promotion based on user behavior information, refined event content requirements from suppliers
Business requirements on Customer Insight
Requirements Data processing procedure
Distributed/Distributed DBMS query engine
Characteristic profile
ingress
Distributed database
Distributed file system
Distributed computation
Hardware
Distributed platform
Statistics analysis
classification aggregation
predicates association
Infrastructure (Data mining, analysis)
Content classification
retrieve Location service
visualization
Text processing
Graphic service
Service capabilities (information archive,
process)
Dynamic policy
Item inquiry
Network analysis
Performance assess
Traffic analysis
Finance analysis
Application
Customer insight
Marketing management
……
……
……
Query: • Point query and analytic query from RTD • Exploring query such as customer segmentation requires
full table scan and muti-table join • Query on predefined 1024 KPIs • Tag ,labeling, 500+ indicators, 50+ graphic computation
Data mining: • Customized model(User Modeling)
User/Item/content/properties/similarity,Min Hash(CF) • Behavior Targeting,customer profiling based on behavior
and values
Pain point 1:Poor OLAP performance, minute level response time with server hundreds GB data. OLAP system is built by ROLAP solution, such as Cognos, DB2 etc;
Pain point 2:Poor DW performance, high cost(raw data storage and computation costs above 70% capability of a DW,reach the maximum volume and capability of traditional database)
Pain point 3:high software / hardware cost:solution is composed with high end servers, disk array and commercial dbms, expensive license and hardware
Two general requirements on BI technologies:High performance DW with low cost, analysis & mining algorithms based on user behaviors and values
Business requirements on society Insight
Traffic Application:Congestion information possible through Telco signaling data
Population Analytics:traffic planning, city resources distribution, abnormal events
Focus on anonymous wireless users and location based application, focus on government, industry and enterprise application
MR Data (Time, IMSI, Longitude, Latitude, RNCID, CellID) Data Sources
Visualization
Data Cleaning Data Integration Data Exploration Data Selection
Population Density
HDFS + Map/Reduce
HDFS + HQL
Map preprocessing
OD Table OD transportation Mode Classification
Traffic Congestion Detection
OD Graph&Matrix Population Density OD transport classification Traffic congestion detection
District segmentation
Extract district coordinates
UniBI Reporting Tools
Road segmentation Extract road coordinates
Data Analysis
Data Preprocessing
Business requirements on society Insight To dig out laws of group activity through data mining algorithms applied on maps and dimensional data. Core part is the data analysis layer.
Summary of big data business requirements
Requirements
Data storage and computation Data analytics
• MPP DB:Support 10PB level volume; 100+ node linear scalability; respond queries on 0.1 billion rows in 1 minute;10:1 compression ratio;
• Real-time analytics in-memory DB:100TB, columnar, wide table with 2000-5000 columns, 30,000 updates/s, ad-hoc query respond in 3 seconds, to support real time business policy adjustment, real-time KPI calculation
• Streaming processing: 1 million events per second; 1 micro second latency for each event
• MOLAP:support SQL and MDX, <5s response time in 80~90% scenarios; 1s response latency on TB data with hundred dimensions
• Real-time dashboard; • Data mining : High accuracy , various
algorithms, online data mining , quick
response.
• Huawei product lines is attempting to build new big data business.
• Huawei product lines have various requirements on big data components: mainly on MPP DB、in-memory analytics DB、streaming computation、MOLAP、parallel computation, analytics & mining algorithms;
Thank you www.huawei.com
Copyright©2011 Huawei Technologies Co., Ltd. All Rights Reserved. The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.