aws batch: simplifying batch computing in the cloud
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Adrian Hornsby, Technical Evangelist @ AWS
Twitter: @adhorn
Email: [email protected]
AWS Batch: Simplifying Batch
Computing in the Cloud
• Technical Evangelist, Developer Advocate,
… Software Engineer
• My @home is in Finland
• Previously:
• Solutions Architect @AWS
• Lead Cloud Architect @Dreambroker
• Director of Engineering, Software Engineer, DevOps, Manager, ... @Hdm
• Researcher @Nokia Research Center
• and a bunch of other stuff.
• Love climbing and ginger shots.
What to expect from this session
• Batch processing overview
• AWS Batch platform walkthrough
• API overview
• Demo(s)
• Show me the code!
• Usage patterns
What is batch computing?
What is batch computing?
Run jobs asynchronously and automatically across one or more
computers.
Jobs may have dependencies, making the sequencing and scheduling of
multiple jobs complex and challenging.
Early Batch APIs (19th Century)
• Processing of data stored on decks of punch
card
• Tabulating machine by Herman Hollerith,
used for the 1890 United States Census.
• Each card stored a separate record of data
with different fields.
• Cards were processed by the machine one
by one, all in the same way, as a batch.
IBM Type 285 tabulators (1936) being used for batch
processing of punch cards (in stack on each machine) with
human operators at U.S. Social Security Administration
Batch in Linux
echo "cc -o foo foo.c" | at 1145 jan 31
Batch in Linux
echo "cc -o foo foo.c" | at 1145 jan 31
> job 1 at Wed Jan 31 11:45:00 2018
Batch in Linux
echo "cc -o foo foo.c" | at 1145 jan 31
> job 1 at Wed Jan 31 11:45:00 2018
$ at 1145 jan 31
at> cc -o foo foo.c
at> ^D
$ atq (list jobs)
$ atrm <job_number>
Batch computing today
• In-house compute clusters powered by open source or
commercial job schedulers.
• Often comprised of a large array of identical,
undifferentiated processors, all of the same vintage and
built to the same specifications.
It’s like trying to fit a square into a circle
Batch computing today …
AWS Batch
Overview & Concepts
AWS Batch in a nutshell
• Fully managed batch primitives
• Focus on your applications • Shell scripts,
• Linux executables,
• Docker images
• and their resource requirements
• We take care of the rest!
AWS Batch advantages
Reduces
operational
complexities
Saves time Reduces costs
AWS Batch Components
• Jobs
• Job definitions
• Job queues
• Job Scheduler
• Compute environments
Components relation
Batch Compute Environment **
Batch Queue (2)
Batch Queue (1)
Batch Queue (0)
Job Definition 1
Job Definition 2
Job Definition 3
Job Definition n
priorityJob 1
Job 2
Container Property
Compute
Resources
De
pe
nd
s O
n Container Property
Container Property
Container Property
** regional service
Jobs
Jobs are the unit of work executed by AWS Batch as containerized
applications running on Amazon EC2.
Containerized jobs can reference a container image, command, and
parameters.
Or, users can fetch a .zip containing their application and run it on a
Amazon Linux container.
Submit Job
aws batch submit-job --cli-input-json file://submit_job.json --region us-east-1
Submit Jobwith dependency
aws batch submit-job --cli-input-json file://submit_job.json --region us-east-1
Job States
Jobs submitted to a queue can have the following states:
SUBMITTED: Accepted into the queue, but not yet evaluated for execution
PENDING: Your job has dependencies on other jobs which have not yet completed
RUNNABLE: Your job has been evaluated by the scheduler and is ready to run
STARTING: Your job is in the process of being scheduled to a compute resource
RUNNING: Your job is currently running
SUCCEEDED: Your job has finished with exit code 0
FAILED: Your job finished with a non-zero exit code, was cancelled or terminated.
Job Definition
AWS Batch job definitions specify how jobs are to be run.
Some of the attributes specified in a job definition:
• IAM role associated with the job
• vCPU and memory requirements
• Mount points
• Container properties
• Environment variables
• Retry strategy
• While each job must reference a job definition, many parameters
can be overridden.
Create
Job Definition
aws batch register-job-definition --region us-east-1 --cli-input-json file://job_def.json
Job Queue
Jobs are submitted to a job queue, where they reside until they are
able to be scheduled to a compute resource. Information related to
completed jobs persists in the queue for 24 hours.
Job queues support priorities and multiple queues can schedule work
to the same compute environment.
Create
Job Queue
aws batch create-job-queue --region us-east-1 --cli-input-json file://job_queue.json
Job Scheduler
The scheduler evaluates when, where, and how to run jobs
that have been submitted to a job queue.
Jobs run in approximately the order in which they are
submitted, as long as all dependencies on other jobs have
been met.
Compute Environment
Job queues are mapped to one or more compute environments.
Managed compute environments enable you to describe your business
requirements (instance types, min/max/desired vCPUs, and Spot
Instance bid as a % of the On-Demand price) and we launch and scale
resources on your behalf.
You can choose specific instance types or choose “optimal” and AWS
Batch launches appropriately sized instances.
Create
Environment
aws batch create-compute-environment --cli-input-json file://job_env.json --region us-east-1
Customer Provided AMIs
Customer Provided AMIs let you set the AMI that is
launched as part of a managed compute environment.
Makes it possible to configure Docker settings, mount
EBS/EFS volumes, and configure drivers for GPU jobs.
AMIs must be Linux-based, HVM and have a working ECS
agent installation.
Resource Limits
Deployment
Pricing
AWS Batch: Demo
Fetch&Run
IAM Role
AWS Batch
QueueAWS Batch
Compute Env.Read/Write
Fetch & Run Demo
Job definition
AWS Batch execution
Container
AWS Batch
Scheduler
Amazon DynamoDB
Fe
tch
Sc
rip
t
Submit job
Developer
Amazon S3
Show me the code!
AWS Batch: Typical Use cases
AWS Batch Use Cases
High Performance Computing
Post-Trade Analytics
Fraud Surveillance
Drug Screening
DNA Sequencing
Rendering
Transcoding
Media Supply Chain
Financial Services: Automate the analysis of the day’s transaction for fraud surveillance.
Life Sciences: Drug Screening for BiopharmaRapidly search libraries of small molecules for drug discovery.
Digital Media: Visual Effects RenderingAutomate content rendering workloads and reduce the need for human intervention due to execution
dependencies or resource scheduling.