a file system for serverless computing - hpts - 2019 home · aws announced lambda function as a...

42
A File System for Serverless Computing Presented at HPTS November 5, 2019 Johann Schleier-Smith and Joseph M. Hellerstein

Upload: others

Post on 22-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

A File System forServerless Computing

Presented at HPTSNovember 5, 2019

Johann Schleier-Smith and Joseph M. Hellerstein

Page 2: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

2©2019 RISELab

OutlineServerless computing backgroundLimitationsThe Cloud Function File System (CFFS)Evaluation and learnings

Page 3: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

3©2019 RISELab

Thumbnail

Serverless computing in 2014AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quicklyWrite code, upload it to the cloud, run at any scaleSuccessful for web APIs and event processing, limited in stateful applications

Page 4: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

4©2019 RISELab

Thumbnail

Serverless computing in 2014AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quicklyWrite code, upload it to the cloud, run at any scaleSuccessful for web APIs and event processing, limited in stateful applications

Page 5: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

5©2019 RISELab

Defining characteristics of serverless abstractions

Hiding the servers and the complexity of programming them

Consumption-based costs and no charge for idle resources

Excellent autoscaling so resources match demand closely

Page 6: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

6©2019 RISELab

Understanding serverless computing’s impact

Job Serverful Cloud Serverless Cloud

Infrastructure Administration Outsourced job Outsourced job

System Administration Simplified job Outsourced job

Software Development Little change Simplified job

On-premise Serverful cloud Serverless cloud

Time

Reso

urce

sAl

loca

ted

& Ch

arge

d

Resources People

Page 7: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

7©2019 RISELab

Serverless as next phase of cloud computing

Serverless abstractions offerSimplified programmingOutsourced operationsImproved resource utilization

Page 8: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

8©2019 RISELab

Object StorageAWS S3

Azure BlobsGoogle Cloud Storage

Serverless is much more than FaaS

Queue ServiceAWS SQS

Google Cloud Pub/Sub

Function as a ServiceAWS Lambda

Google Cloud FunctionsGoogle Cloud RunAzure Functions

Big Data ProcessingGoogle Cloud Dataflow

AWS GlueAWS AthenaAWS Redshift

Key-Value StoreAWS DynamoDBAzure CosmosDB

Google Cloud Datastore

Mobile back endGoogle FirebaseAWS AppSync

Google App EngineAWS Serverless Aurora

Page 9: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

Fertile ground for research

0

20

40

60

80

100

120

2012 2013 2014 2015 2016 2017 2018 2019

Num

ber o

f Pub

licat

ions

Year

Actual Extrapolated

Source: dblp computer science bibliography.

Page 10: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

10©2019 RISELab

OutlineServerless computing backgroundLimitationsThe Cloud Function File System (CFFS)Evaluation and learnings

Page 11: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

Limitations of FaaS

Limited runtime

Ephemeral state No specialized hardware, e.g., GPU

No inbound network connections

Page 12: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

Plenty of complementary stateful serverless services

Object Storage• AWS S3• Azure Blobs• Google Cloud Storage

Key-Value Store• AWS DynamoDB• Azure CosmosDB• Google Cloud Datastore• Anna KVS (Berkeley)

Others• AWS Aurora Serverless• Google Firebase

Page 13: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

Combine with FaaS to build applications

Object Storage

Key-Value Store

λ λ

Etc.

Compute:isolated & ephemeral state

Storage:shared & durable state

Page 14: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

Allows independent scaling

Object Storage

Key-Value Store

λ λ λ λ λ

Etc. Etc. Etc.

Page 15: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

How happy are we?

Page 16: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

How happy are we?

Page 17: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the
Page 18: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

Two main problems

Latency API

Page 19: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

Can I please have something like local disk, but for the cloud?

Page 20: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

20©2019 RISELab

File systems let us run so much software

Data analysis with PandasMachine learning with TensorFlowSoftware builds with MakeSearch indexes with SphinxImage rendering with RadianceDatabases with SQLiteWeb serving with NginxEmail with Postfix

Page 21: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

21©2019 RISELab

Objections

It won’t scaleNeed a simpler data model (key-value store, immutable objects)Need a weaker consistency model (eventual consistency)Performance and reliability will suffer otherwise

You don’t need itYou should be rewriting your software for the cloud anyhowWhy not just use use a key-value store?

Page 22: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

22©2019 RISELab

How does this make me feel?

Page 23: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

23©2019 RISELab

OutlineServerless computing backgroundLimitationsThe Cloud Function File System (CFFS)Evaluation and learnings

Page 24: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

24©2019 RISELab

Introducing the Cloud Function File System (CFFS)

POSIX semantics, including strong consistency

Local caches for local disk performance

Works under autoscaling, extreme elasticity, and FaaS limitations

Implemented as a transaction system

Page 25: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

25©2019 RISELab

What is special about the FaaS environment?

Function invocations have well defined beginning and endAt-least-once execution—expects idempotent codeConstrained execution model

Clients frozen in between invocations

No inbound network connections

No root access

Page 26: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

26©2019 RISELab

CFFS Architecture

λ λ λλλλ

Back end transactional system

CFFS CFFS CFFS CFFS CFFS CFFS

CFFS

Page 27: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

27©2019 RISELab

FaaS instance caching“Function as a Service” suggests statelessness, but most implementations reuse instances and preserve their stateSetup of sandboxed environment takes time

Loads selected runtime (e.g., JavaScript, Python, C#, etc.)Configures network endpoint, IAM privilegesLoads user codeRuns user initialization

Caching is key to amortizing instance setup

Page 28: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

28©2019 RISELab

Cloud Function File System (CFFS)

Back end transactional system

Cache

Txn Buffer

Application— POSIX API —

Our emphasis

Cache Logs Txn Commit

Page 29: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

29©2019 RISELab

Core API implemented in CFFSopen New descriptor (handle) for file or

directoryclose Close descriptorwrite / pwrite Write / positioned writeread / pread Read / positioned readstat Get size, ownership, access,

permissions, last modifiedseek Set descriptor positiondup / dup2 Copy descriptortruncate Set file sizeflock Byte range lock and unlock

mkdir Create directoryrename Rename file / directoryunlink Delete file / directorychmod Set access permissionschown Set ownershiputimes Update modified / accessed

timestampsclock_gettime Get current timechdir Set working directorygetcwd Get working directorybegin Start transactioncommit / abort End transaction

Page 30: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

POSIX guarantees - language from the spec

• Writes can be serialized with respect to other reads and writes.

• If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write()…

• A similar requirement applies to multiple write operations to the same file position…

• This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics.

https://pubs.opengroup.org/onlinepubs/9699919799/

“ ”

Page 31: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

31©2019 RISELab

POSIX guarantees in database terms

Atomic operationsEach operation (at the API level) is observed entirely or not at allSome violations in practice

Consistency modelSpec references time, technically requires strict consistency (shared global clock)Actually implemented as sequential consistency (global order exists, consistent with order at each processor)We use serializability to provide isolation and atomicity at function granularity

Open question: what guarantees do applications actually rely on?

Page 32: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

32©2019 RISELab

Implementation highlights

Choice of transaction mechanism not fundamentalImplemented timestamp-order serilizableOptimistic protocols can be a good fit—FaaS side effects must be idempotentState-of-the-art protocols promise lower abort rates, more effective local caches (e.g., Yu, et al., VLDB 2018)

Cache updates through on-demand filtererd log shippingCheck for updates when function starts executionEviction messages help back-end track client cache content

Page 33: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

33©2019 RISELab

CFFS in context: Transactional file systems

QuickSilver distributed system – IBM, 1991Very close in spiritNo FaaSNo caching

Inversion File System – Berkeley, 1993Built on top of PostgreSQLAccess through custom library

Transactional NTFS (TxF) – Microsoft, 2006Shipping in WindowsDeprecated on account of complexity

There are many non-transactional shared & distributed file systems

Page 34: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

34©2019 RISELab

CFFS in context: Shared file systems

Must choose between consistency and latencyEventual consistencyDelegation/lock-based cachingNo caching

LustreGPFS (IBM)GlusterFS (RedHat / IBM)GFS (Google File System)MooseFSLizardFSBeeGFS

HDFSGFS (Google)Ceph (IBM)MapR-FSAlluxio

NFSSMB

zFS

Client-serverBig data

Mainframe

HPC

Page 35: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

35©2019 RISELab

OutlineServerless computing backgroundLimitationsThe Cloud Function File System (CFFS)Evaluation and learnings

Page 36: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

36©2019 RISELab

Sample workload call frequencies

-

20,000

40,000

60,000

80,000

100,000

120,000

chdir

chmod

close du

p

getde

ntop

en read

renam

ese

ek stat

unlink write

make OpenSSHTPC-C on SQLite

Page 37: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

37©2019 RISELab

Caching benefits (TPC-C / SQLite)

8,105

142 817

- 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000

Local NFS CFFS

tpm

C

Page 38: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

38©2019 RISELab

Scaling in AWS Lambda

- 20,000 40,000 60,000 80,000

100,000 120,000 140,000 160,000 180,000

10 20 30 40 50 60 70 80 90 100

Uncached Cached

4k random reads

Page 39: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

39©2019 RISELab

Returning to objections

It won’t scale—contention risksFile length: must be checked on every readStat command: mainly used use it to check the file length or permissions, but also returns modification time and access time

You don’t need itI think it will be pretty useful

Challenges here, optimistic they will be overcome

Page 40: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

How happy are we?

Page 41: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

How happy are we?

Page 42: A File System for Serverless Computing - HPTS - 2019 Home · AWS announced Lambda Function as a Service (FaaS) in 2014, others clouds followed quickly Write code, upload it to the

42©2019 RISELab

CFFS Summary

Transactions are a natural fit for FaaSBEGIN and END from function context

At-least-once execution goes well with optimistic transactions

On-demand filtered log shipping allows asynchronous cache updates

Overcomes limitations of FaaS & traditional shared file systemsAllows caching for lower latency, preserving consistency

Highly scalable, especially with snapshot reads

POSIX API enables vast range of tools and libraries