the practice of presto & alluxio in e-commerce big data platform · stores mapreduce/spark...
TRANSCRIPT
The Practice of Presto & Alluxio in E-Commerce Big
Data Platform
2019-06-20
Tao Huang, JD.comBig Data Platfrom Engineer
1 2
3 4
JD BDPIntroducation of JD.com BDP architecture
Practice with Presto in BDP
Introducation of Presto and practice in
JD BDP
Presto & Alluxio StackOur user case of Presto & Alluxio
Ongoing ExplorationThe features we are exploring
Contents
JD BDP1
JD BDP
4
Tens of thousands of nodes
Thousands of users
cluster scale Computing ability
Tens of PB offline data daily
Millions of jobs daily
Storage capacity
Hundreds of PB data
Tens of PB daily increase
Business scale
Tens of business units
Hundreds of data models
BDP architecture
5
Practice with Presto in BDP2
Presto Architecture
Our Works on Presto
8
Cluster Scaling01
ERP Authorization03
Job Isolation02
Operation & Maintenance04
Presto on YARN
Unified Resource ManagementYARN
Presto worker scaling
DynamicResource
Configure Presto in WebConfiguration
load/unload pluginsPlugin
PowerServer for operation and maintenance
10
• export query result• update plugin
Plugin manager
• route query to cluster• adjust resource group
Dynamic Congfiguation
• track users’query• security
ERP Authorization
• dynamic auto-scale• start/stop cluster
Auto Maintenance
Intelligent Scheduler
Periodical Queries
◉ controllable data range
◉ high query frequency
◉ high data reuse rate
◉ high proportion
Unpredicatible Queries
◉controllable data range
◉ low query frequency
◉ low data resuse rate
◉ low proportion
Application Scenario
Presto Jobs in BDP
Presto & Alluxio Stack3
Data Ecosystem with Alluxio
15
•
•
•
•
Presto + Alluxio = Better Together
16
Higher query throughput
Consistent low query latency
Eliminates network traffic
JD Contribution to Alluxio
17
BusinessStrategy
ui-grid based sort/pagination/filter add an
input field
New Web UI
high watermark start evictlow watermark stop evict
Watermark Evict Strategy
check startsupcheck every time
Cache Consistency
monitor JVM pause periodicallylog message and metrics
JVM Pause Monitor
cp/ls/load/rm/format
Shell Command
Deadlockthrift add timeout time…
Bugfix
shellRESTful API
Change Log Level
SyncQueryAlluxioTools…
Test
Sync Evit Strategy Async Evit Strategy
Watermark Evict Strategy
Cache Consistency
Keep Alluxio & HDFS Consistency
RPC API
RESTful API
Alluxio Master startup
Client request metadata by getFileId, getFileInfo, listStatus, etc
Alluxio master will check file cache consistency
To ensure that dirty data is not read. There are three ways to trigger file consistency
check.
calling reloadMetaData to trigger Alluxio to reload all metadata
check file cache consistency while master start up
Presto on Alluxio
Why Presto on Alluxio?
High Performance
Consistent Low Query Latency
Eliminate Network Traffic
Others: Fault-tolerant & Pluggable
When we use Alluxio for Presto, we make some changes and
bring some good features
•Alluxio led to 10x performance improvement
•Hundreds of nodes
•More than 2 years in production enviroment.
Presto on Alluxio
Presto on Alluxio
Presto on Alluxio
Presto on Alluxio
Presto on Alluxio
Ongoing Exploration4
Presto Exploration
Presto Master Load Balancing
Thread Level Resource Isolation
Unify Larger Clusters
As the amount of data grows, the cluster size becomes larger, and query tasks become more and more, Master will become a performance bottleneck. To achieve load balancing, how to improve Presto will be a challenge.
The execution tasks running on the workers compete for resources, especially the jobs in the test phase. If we can restrict the execution tasks with CGroups, it will reduce the mutual impact among queries.
Large-scale cluster help improving resource utilization. In the past year, we have reduced the number of
clusters from more than 100 to 20. Within ensuring query efficiency, we will further increase the cluster size
to reduce the number of clusters.
Alluxio Exploration
Exploring more application scenarios
Porting HDFS Authentication to Alluxio
HDFS RBF or Alluxio
Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle
data
We are going to port custom authentication on our HDFS to Alluxio.
We have tried to use HDFS router-based fedration, but its performance does not meet our online requirements. We find that Alluxio also has forwarding capabilities and hopes that Alluxio will perform better, That is what we are doing.
Thank You!