timeline service v2 at the hadoop summit sj 2016
TRANSCRIPT
(Big Data)2
How YARN Timeline Service v.2 Unlocks 360-Degree Pla@orm Insights at Scale
Sangjin Lee @sjlee (Twi5er) Li Lu (Hortonworks)
Vrushali Channapa5an @vrushalivc (Twi5er)
Outline• Why v.2?
• Highlights
• Developing for Timeline Service v.2
• SeIng up Timeline Service v.2
• Milestones
• Demo
Why v.2?
• YARN Timeline Service v 1.x
• Gained good adopSon: Tez, HIVE, Pig, etc.
• Keeps improving with v 1.5 APIs and storage implementaSon
• SSll facing some fundamental challenges...
Why v.2?• Scalability and reliability challenges
• Single instance of Timeline Server
• Storage (single local LevelDB instance)
• Usability
• Flow
• Metrics and configuraSon as first-class ciSzens
• Metrics aggregaSon up the enSty hierarchy
Highlightsv.1 v.2
Single writer/reader Timeline Server Distributed writer/collector architecture
Single local LevelDB storage* Scalable storage (HBase)
v.1 enSty model New v.2 enSty model
No aggregaSon Metrics aggregaSon
REST API Richer query REST API
Architecture• SeparaSon of writers (“collectors”) and readers
• Distributed collectors: one collector for each app
• Dedicated RM collector for RM-generated data
• Collector discovery via RM
• Pluggable storage with HBase as default storage
Distributed collectors & readers
!melinereader
!melinereader
Storage
!melinereader
AM !melinecollector
NM
!meline reader pool
app metrics/events
container events/metrics
RM
!meline collector
app/container events
user queries
(worker node running AM)
(worker node running containers)
write flowread flow
Collector discovery
RMAM
app id => address
! start AM container
NM
3melinecollector
" node heartbeat
# allocate response
worker node
3melineclient
New enSty model
• Flows and flow runs as parents of YARN applicaSon enSSes
• First-class configuraSon (key-value pairs)
• First-class metrics (single-value or Sme series)
• Designed to handle mulS-cluster environment out of the box
What is a flow?• A flow is a group of YARN
applicaSons that are launched as parts of a logical app
• Oozie, Scalding, Pig, etc.
• name: “frequent_visitor_stat”
• run id: 1466097809000
• version: “b9b9068”
ConfiguraSon and metrics
• Now explicit top-level a5ributes of enSSes
• Fine-grained updates and queries made possible
• “update metric A to value x”
• “query enMMes where config A = B”
container 1_1
metric: A = 10
metric: B = 100
config: "Foo" = "bar"
ConfiguraSon and metrics
• Now explicit top-level a5ributes of enSSes
• Fine-grained updates and queries made possible
• “update metric A to value x”
• “query enMMes where config A = B”
container 1_1
metric: A = 50
metric: B = 100
config: "Foo" = "bar"
HBase Storage• Scalable backend
• Row Key structure
• efficient range scans
• KeyPrefixRegionSplitPolicy
• Filter pushdown
• Coprocessors for flow aggregaSon (“readless” aggregaSon)
• Cell tags for metadata (applicaSon id, aggregaSon operaSon)
• Cell Smestamps generated during put
• lei shiied with app id added to avoid overwrites
Tables in HBase• flow run
• application
• entity
• flow activity
• app to flow
table: flow run
Row key:
clusterId!userName!flowName!inverted(flowRunId)
most recent flow run stored first
coprocessor enabled
table: applicaSonRow key:
clusterId!userName!flowName!inverted(flowRunId)!AppId
applicaSons within a flow run stored
together
most recent flow run stored first
table: enStyRow key:
userName!clusterId!flowName!inverted(flowRunId)!AppId!entityType!entityId
enSSes within an applicaSon within a flow run stored together per
type
• for example, all containers within a yarn applicaSon will be
stored together
pre-split table
stores information per entity run like info, relatesTo, relatedTo, events, metrics, config
table: flow acSvityRow key:
clusterId!inverted(TopOfTheDay)!userName!flowName
shows the flows that ran on that day
stores informaSon per flow like number of
runs, the run ids, versions
table: appToFlowRow key:
clusterId!appId
- stores mapping of appId to
flowName and flowRunId
Metrics aggregaSon• ApplicaSon level
• Rolls up sub-applicaSon metrics
• Performed in real Sme in the collectors in memory
• Flow run level
• Rolls up app level metrics
• Performed in HBase region servers via coprocessors
• Offline aggregaSon (TBD)
• Rolls up on user, queue, and flow offline periodically
• Phoenix tables
Container 1_1“bytes” : 23
Container 1_2“bytes” : 135
Container 2_1“bytes” : 50
Container 3_1“bytes” : 64
App1“bytes”: 158
App2“bytes”: 50
App3“bytes”: 64
flow1“bytes”: 208
flow2“bytes”: 64
user1“bytes”: 272
queue1“bytes”: 272
App aggregationIn collector
flow aggregationIn hbase
offline aggregation
FlowRun Aggrega:on
via the HBase Coprocessor
App Metrics
Cells in
HBase
FlowRun Metric Sum
App Metrics
Cells in
HBase
FlowRun Metric Sum
FlowRun Aggrega:on
via the HBase Coprocessor
Reader REST API: paths• URLs under /ws/v2/Smeline
• Canonical REST style URLs: /ws/v2/Smeline/clusters/cluster_name/users/user_name/flows/flow_name/runs/run_id
• Path elements may be omi5ed if they can be inferred
• flow context can be inferred by app id
• default cluster is assumed if cluster is omi5ed
Reader REST API: query params• limit, createdTimeStart, createdTimeEnd: constrain the enSSes
• fields (ALL | EVENTS | INFO | CONFIGS | METRICS | RELATES_TO | IS_RELATED_TO): limit the contents to return
• metricsToRetrieve, confsToRetrieve: further limit the contents to return
• metricsLimit: limits the number of values in a Sme series
Reader REST API: query params
• relatesTo, isRelatedTo: filters by associaSon
• *Filters: filters by info, config, metric, event, …
• Supports complex filters including operators
• metricFilter=(((metric1 eq 50) AND (metric2 gt 40)) OR (metric1 lt 20))
Developing: TimelineClientIn your application master:
// create TimelineClient v.2 style TimelineClient client = TimelineClient.createTimelineClient(appId); client.init(conf); client.start();
// bind it to AM/RM client to receive the collector address amRMClient.registerTimelineClient(client);
// create and write timeline entities TimelineEntity entity = new TimelineEntity(); client.putEntities(entity);
// when the app is complete, stop the timeline client client.stop();
Developing: Flow contextIn your app submitter:
ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext();
// set the flow context as YARN application tags Set<String> tags = new HashSet<>(); tags.add(TimelineUtils.generateFlowNameTag("distributed grep")); tags.add(TimelineUtils.generateFlowVersionTag( "3df8b0d6100530080d2e0decf9e528e57c42a90a")); tags.add(TimelineUtils.generateFlowRunIdTag(System.currentTimeMillis()));
appContext.setApplicationTags(tags);
SeIng up Timeline Service v.2• Set up the HBase cluster (1.1.x)
• Add the Smeline service jar to HBase
• Install the flow run coprocessor
• Create tables via TimelineSchemaCreator uSlity
• Configure the YARN cluster
• Enable Timeline Service v.2
• Add hbase-site.xml for the Smeline collector and readers
• Start the Smeline reader daemon
Milestone 1 ("Alpha 1")• Merge discussion (YARN-2928) in progress as we speak! ✓ Complete end-to-end read/write
flow
✓ Real Sme applicaSon and flow aggregaSon
✓ New enSty model
✓ HBase Storage
✓ Rich REST API
✓ IntegraSon with Distributed Shell and MapReduce
✓ YARN generic events and system metrics
Milestones - Future• Milestone 2 (“Alpha 2”)
• IntegraSon with new YARN UI
• IntegraSon with more frameworks
• Beta
• Freeze API and storage schema
• Security
• Collectors as containers
• Storage fault tolerance
• ProducSon-ready
• MigraSon-ready
Demo
Contributors• Li Lu, Junping Du, Vinod Kumar Vavilapalli (Hortonworks)
• Varun Saxena, Naganarasimha G. R. (Huawei)
• Sangjin Lee, Vrushali Channapa5an, Joep RoInghuis (Twi5er)
• Zhijie Shen (now at Facebook)
• The HBase and Phoenix community!
Thank you!