eron wright - introducing flink on mesos

17
Introducing Flink on Mesos Eron Wright – [email protected] DELL EMC @eronwright

Upload: flink-forward

Post on 16-Apr-2017

189 views

Category:

Data & Analytics


6 download

TRANSCRIPT

Page 1: Eron Wright - Introducing Flink on Mesos

Introducing Flink on MesosEron Wright – [email protected] EMC@eronwright

Page 2: Eron Wright - Introducing Flink on Mesos

2of 15

What is Apache Mesos?

• A popular cluster manager (similar to YARN)• Makes available CPU, memory, & disk resources• Unique capabilities for storage services• Emerging as a foundation for data-centric, converged

infrastructure

• Provides a programming model for using cluster resources• A Mesos program is called a “framework”

• Packaged into an open-source distribution called DCOS• Prescribes best practices related to Mesos frameworks,

related services, etc.

Page 3: Eron Wright - Introducing Flink on Mesos

3of 15

Why Flink on Mesos?

• Flink works best on a cluster manager– Easy to scale each job independently– Externalize scheduling logic (fairness, quota, …)– Good job isolation

• Flink can benefit from unique Mesos capabilities– Disk resources– Dynamic resource management– Unique management features (e.g. inverse offers for controlled downscaling & maintenance)

Page 4: Eron Wright - Introducing Flink on Mesos

Demo

Page 5: Eron Wright - Introducing Flink on Mesos

Flink Master Process

Page 6: Eron Wright - Introducing Flink on Mesos

6of 15

Introduction

Flink Master Process

• The Flink Master Process is:– The “Application Master” for a single Flink cluster– A Mesos framework!

• Hosts numerous components:– Job Manager– Resource Manager (acts as Mesos scheduler)– Artifact Server (HTTP server for Mesos fetcher)

• Responsible for TM scaling and recovery– Handles JobManager scale change requests– Stores task state in ZooKeeper

host1host2

Master

JM

RM

HTTPD

TM TM

Mesos

Page 7: Eron Wright - Introducing Flink on Mesos

7of 15

How it Works

Flink Master Process

• Offer handling:– Uses Netflix Fenzo as an optimizer– Gathers offers until all tasks launched

• Recovery:– Stores intentional state in ZooKeeper– Master uses leader election– Mesos allows some time for recovery before killing

tasks

• Monitoring:– Detects task failure; launches replacement

automatically.

host1host2

Master

TM TM

4. Launch

Mesos2. Resource Offers

1. Register

5. Fetch (HTTP)

6. Status update

3. Optimize

Page 8: Eron Wright - Introducing Flink on Mesos

8of 15

Configuration

Flink Master Process (Con’t)

• Framework Info– mesos.resourcemanager.framework.secret– mesos.resourcemanager.framework.principal– mesos.resourcemanager.framework.role

• Mesos Master Info– mesos.master: (IP address or ZK lookup info)– mesos.failover-timeout

Note: no port configuration is necessary; Mesos automatically assigns ports.

Page 9: Eron Wright - Introducing Flink on Mesos

Dispatcher

Page 10: Eron Wright - Introducing Flink on Mesos

10of 15

Introduction

Dispatcher

• A highly-available service for launching Flink clusters.

• A Mesos framework!

• Accessed via REST by the CLI

• DCOS compatibility: – HTTP-based– Accessible via the Admin Router– (future) JWT authentication

• Aligned with FLIP-6

host1

1D

1C

1B

1A

host2

2D

2C

2B

2A

host3

3D

3C

3B

3A

host4

4D

4C

4B

4ADispatcher

Master

TM TM

TMTM

Master

CLI

TM

Mesos

Page 11: Eron Wright - Introducing Flink on Mesos

11of 15

Framework Hierarchy

Dispatcher (Con’t)

• Nesting of frameworks is a common Mesos pattern. Here, Marathon launches the dispatcher, which launches the Flink Master Process, etc.

• Architecturally, it avoids a dependency on the Marathon API. For example, Aurora could be used here in place of Marathon.

Dispatcher

Master

Marathon

TM

(Task)

(Task)

(Task)

Page 12: Eron Wright - Introducing Flink on Mesos

12of 15

Launching a Session

Dispatcher (Con’t)

• Use: mesos-session.sh

• CLI uploads files to dispatcher via HTTP– Flink Configuration– Supplemental files (--ship)– Keytabs– Certificates

• Dispatcher adds additional elements:– Configuration

› ZooKeeper Namespace

– Flink JAR– …

host1

1D

1C

1B

1A

host2

2D

2C

2B

2A

host3

3D

3C

3B

3A

host4

4D

4C

4B

4ADispatcher

Master

TM TM

CLI

HTTP(S)

TM

HTTP(S)

Mesos

Page 13: Eron Wright - Introducing Flink on Mesos

13of 15

Dispatcher Deployment Modes

Dispatcher (Con’t)

• Dispatcher is usable in two ways

• Remote Mode:– Recommended for detached execution

• Local Mode:– Recommended for simple, interactive sessions

(e.g. flink shell)

3C

3B

3A

4C

4B

4ADispatcher

Master

Master

CLI

HTTP(S)

3C

3B

3A

4C

4B

4A

Master

CLI + Dispatcher

Local Mode Remote Mode

Page 14: Eron Wright - Introducing Flink on Mesos

Summary

Page 15: Eron Wright - Introducing Flink on Mesos

15of 15

Future Directions

• Dynamic Scaling– Add/remove Task Managers in response to scale changes over a job’s lifetime– Support Mesos maintenance procedures (e.g. inverse offers)

• Dispatcher Evolution (FLIP-6)– Generalize to support all deployment scenarios, unified CLI – Provide a centralized Web UI (incl. job history)– Authentication Support (e.g. OAuth 2.0)

• Docker Image Support– Tracking the “Mesos unified containerizer”

• Mesos Disk Support– Allocate multiple disks for Task Manager temp space– Scale up the I/O

Page 16: Eron Wright - Introducing Flink on Mesos

16of 15

Project Status

• Targeted for: Flink 1.2

• Contributors:– Eron Wright (Dell EMC)– Maximilian Michels (data Artisans)

• Design Doc: – Mesos Integration on Google Docs

• JIRAs:– FLINK-1984 – Integrate Flink with Apache Mesos

• Code:– https://github.com/EronWright/flink/tree/feature-FLINK-1984-T2

Page 17: Eron Wright - Introducing Flink on Mesos