a comparison of compiler strategies for serverless...

IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2020

A comparison of compiler strategies for serverless functions written in Kotlin

KIM BJÖRK

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

A comparison of compiler strategies for serverless

functions written in Kotlin

-

En jamforelse av kompilatorstrategier for

serverless-funktioner skrivna i Kotlin

Kim Bjork - [email protected] Intitute of Technology

Stockholm, Sweden

Supervisor: Cyrille Artho - [email protected]: Pontus Johnson - [email protected]

January 2020

Abstract

Hosting options for software have become more modifiable with time, from re-quiring on-premises hardware to now being able to tailor a flexible hosting solu-tion in a public cloud. One of the latest hosting solution option is the serverlessarchitecture, entailing running software only when invoked.

Public cloud providers such as Amazon, Google and IBM provide serverlesssolutions, yet none of them provide an official support for the popular languageKotlin. This may be one of the reasons why the performance of Kotlin in aserverless environment is, to our knowledge, relatively undocumented. This the-sis investigates the performance of serverless functions written in Kotlin whenrun with different compiler strategies, with the purpose of contributing knowl-edge within this subject. One Just-In-Time compiler, the Hotspot Java VirtualMachine (JVM), is set against an Ahead-Of-Time compiler, GraalVM.

A benchmark suite was constructed and two serverless functions were createdfor each benchmark, one run with the JVM and one run as a native image,created by GraalVM. The benchmark tests are divided in two categories. Oneconsisting of cold starts, an occurrence that arises the first time a serverlessfunction is invoked or has not been invoked for a longer period of time, causingthe need for certain start-up actions. The other category is warm starts, a runwhen the function has recently been invoked and the cold starts start-up actionsare not needed.

The result showed a faster total runtimes and less memory requirementsfor GraalVM-enabled functions during cold starts. During warm starts theGraalVM-enabled functions still required less memory but the JVM functionsshowed large improvements over time, making the total runtimes more similarto their GraalVM-enabled counterparts.

Sammanfattning

Mojligheterna att hysa (engelska: host) mjukvara har blivit fler och mer modi-fierbara, fran att behova aga all hardvara sjalv till att man nu kan skraddarsyen flexibel losning i molnet. Serverless ar en av de senaste losningarna.

Olika leverantorer av publika molntjanster sa som Amazon, Google och IBMtillhandahaller serverless-losningar. Dock har ingen av dessa leverantorer ettofficiellt stod for det populara programmeringsspraket Kotlin. Detta kan varaen av anledningarna till att sprakets prestanda i en serverless-miljo ar, sa vitt vivet, relativt okand. Denna rapport har som syfte att bidra med kunskap inomjust detta omrade.

Tva olika kompilatorstrategier kommer att jamforas, en JIT (Just-In-Time)-kompilator och en AOT (Ahead-Of-Time) -kompilator. Den JIT-kompilatorsom anvands ar Hotspot Java Virtual Machine (JVM). Den AOT-kompilatorsom anvands ar GraalVM.

For detta arbete har en benchmark svit skapats och for varje test i dennasvit skapades tva serverless-funktioner. En som kompileras med JVM och ensom kors som en fardig binar skapad av GraalVM. Testerna har delats upp itva kategorier. En dar alla tester genomgatt kallstarter, nagot som sker da detar forsta gangen funktionen kallas eller da det har gatt en langre tid sedanfunktionen kallades senast. Den andra kategorien ar da testet inte behover gaigenom en kallstart, da har funktionen blivit kallad nyligen. Korningen kan daundvika att genomga vissa steg som kravs vid en kallstart.

Resultatet visade att de tester som genomfordes inom kategorin kallstartervisade pa att kortiden var snabbare och att minnesanvandningen var mindre forde funktioner som kompilerats av GraalVM. I den andra kategorin, da testernainte behovde genomga en kallstart, kravde GraalVM-funktionerna fortfarandemindre minne men JVM-funktionerna visade pa en stor forbattring nar det komtill exekveringstid. De totala kortiderna av de tva olika kompilatorstrategiernavar da mer lika.

Contents

1 Introduction 31.1 Problem and Research Question . . . . . . . . . . . . . . . . . . . 41.2 Contributions and Scope . . . . . . . . . . . . . . . . . . . . . . . 51.3 Ethics and Sustainability . . . . . . . . . . . . . . . . . . . . . . 51.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Background 72.1 Serverless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 The Attributes of Serverless . . . . . . . . . . . . . . . . . 72.1.2 Use Cases for Serverless Functions . . . . . . . . . . . . . 9

2.2 Kotlin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Types of Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.1 Ahead-of-Time Compiler (AOT) . . . . . . . . . . . . . . 122.3.2 Just-In-Time Compiler (JIT) . . . . . . . . . . . . . . . . 12

2.4 The JVM Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 The GraalVM Compilation Infrastructure . . . . . . . . . . . . . 142.6 Performing Benchmark Tests . . . . . . . . . . . . . . . . . . . . 152.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.7.1 Solutions Silimar to Serverless . . . . . . . . . . . . . . . 172.7.2 GraalVM at Twitter . . . . . . . . . . . . . . . . . . . . . 182.7.3 Benchmark Environment and the Cloud . . . . . . . . . . 18

2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Method 203.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.1 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.2 Response time . . . . . . . . . . . . . . . . . . . . . . . . 223.1.3 Memory consumption . . . . . . . . . . . . . . . . . . . . 223.1.4 Execution time . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2.1 Real benchmarks . . . . . . . . . . . . . . . . . . . . . . . 233.2.2 Complementary benchmarks . . . . . . . . . . . . . . . . 24

3.3 Environment and Setup . . . . . . . . . . . . . . . . . . . . . . . 253.4 Sampling Strategy and Calculations . . . . . . . . . . . . . . . . 26

1

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Result 284.1 Static metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3 Application Runtime . . . . . . . . . . . . . . . . . . . . . . . . . 324.4 Response Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.5 Memory Consumption . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Discussion 365.1 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.1.1 Cold start . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.1.2 Warm start . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2 Application Runtime . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.1 Cold start . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2.2 Warm start . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.3 Response Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.3.1 Cold start . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.3.2 Warm start . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.4 Memory Consumption . . . . . . . . . . . . . . . . . . . . . . . . 505.4.1 Cold start . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.4.2 Warm start . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.5 Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6 Conclusion 556.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.1.1 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.1.2 Application Runtime . . . . . . . . . . . . . . . . . . . . . 566.1.3 Response Time . . . . . . . . . . . . . . . . . . . . . . . . 576.1.4 Memory Consumtion . . . . . . . . . . . . . . . . . . . . . 57

6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2

Chapter 1

Introduction

Companies are constantly looking to digitize and are conceiving new use casesthey want to explore every day. This is preferably done in an agile and mod-ular way. The key factors to making this possible are a reasonable cost, fastrealization time and flexibility.

Hosting is an area that has followed this trend. As a response, to this needfor a more agile way of working, companies have moved from bare metal on-premises hosting to cloud hosting. In a cloud adaptation survey by IDG done in2018, 73 % of companies stated that they already had adopted cloud technologyand 17 % said they intended to do so within a year [1]. Another survey predictsthat 83 % of enterprise workload will be in the cloud [2].

By using cloud computing, companies can allocate just the amount of com-putation power they need to host their solutions. A cloud solution can alsoeasily be scaled up or down when the need changes. Cloud computing alsomakes it possible for small-scale solutions to be hosted with great flexibility andbe economically defensible.

The next step in this development toward more agile and modular hostingoptions could be claimed to be the serverless architecture. A serverless archi-tecture lets customers run code without having to buy, rent or provision serversor virtual machines. In fact a serverless architecture also relieves a client ofeverything that is connected to servers and more traditional hosting, such asmaintenance, monitoring and everything infrastructure related. All that theclients need to concern themselves with is the actual code. These attributesenables a more fine-grained billing method, where clients gets charge solely onthe resources used. These resources are the time it takes, as well as the memoryneeded, to execute the serverless function. The vendors providing the serverlesssolution, such as Amazon (AWS Lambda [3]) and Google (Google Cloud Func-tions [4]), also provide automatic scaling for their serverless solutions enablinga steady high availability. These are presumably among the top reasons whyserverless is rapidly increasing in usage. According to Serverless the usage ofserverless functions has almost doubled among their respondents from 45 % in2017 to 82 % in 2018 [5]. Notable is also that 53.2 % stated serverless technology

3

is critical for their job.During the growth of serverless architecture, cloud providers have added

support for more languages. AWS for example have gone from only supportingNode.js to now also supporting Python, Ruby, Java, Go and C# [6]. But onelanguage that is still lacking official support, from any cloud provider that offersa serverless solution, is Kotlin.

Kotlin is a programming language developed by JetBrains and was first re-leased in February 2016. Kotlin is mostly run on JVM but can also be compiledinto JavaScript or native code (utilizing LLVM) [7]. Despite being a newer lan-guage it has already gained a lot of traction by being adopted by large companiesand is currently used in production by Pinterest [8] and Uber [9] among others.Kotlin is also as of 7th May 2019 Google’s preferred language for Android appdevelopment [10] and has been among the top in “most loved” languages ac-cording to Stackoverflow developer survey reports the last years [11, 12]. One ofthe reasons for its popularity is Kotlins interoperability with Java, meaning itis possible to continue to work on an already existing Java project using Kotlin.Other popular attributes are the readability of Kotlin as well as its null safety,facilitated by the language’s ability to distinguish between non-null types andnullable types.

Seeing as Kotlin is such a widely used and favored language it would beof interest to developers and companies to continue utilizing their knowledgewithin the language in more parts of their work, such as in a serverless context.

The rest of this chapter contains an introduction to this thesis. It explainsthe problem that brought about the subject of this thesis and specifies whatresearch question that is aimed to be answered. Moreover, this chapter also in-cludes the intended contribution as well as a section covering ethics and sustain-ability connected to this thesis. Concluding this chapter is a section describingthe outline of this report.

1.1 Problem and Research Question

Kotlin is a popular language that is increasing in usage, however, it is notyet officially supported in a serverless solution provided by any public cloudprovider. Since the serverless architecture is also being utilized more, companiesmight be looking to apply their knowledge of a language that they already know.It could also be that a company already has an application written in Kotlinthat they would like to convert into a serverless function.

Since Kotlin is able to run on the JVM, it is possible to package a Kotlinapplication as a jar file and run it as a serverless function. However, is that theoptimal option? An application written in Kotlin could likewise be convertedinto a native image and run as a serverless function.

Since the payment plan of serverless solutions are based on resource usage,where every millisecond is counted and billed for, there is a possible cost savingfactor to be had from optimizing the serverless function´s execution.

The aim of this thesis is to find out how Kotlin performs in a serverless

4

environment and what is the best way to run a serverless function written inKotlin. From this statement two research questions can be extracted:

• What is the difference, if any, of running a serverless function writtenin Kotlin with a Just-in-Time compiler compared to running the samefunction as a binary?

• How does cold starts affect the performance of a serverless function writtenin Kotlin? Does it matter if the function is run with a JIT compiler or asa binary?

1.2 Contributions and Scope

Kotlin is not officially supported by any public cloud provider that offers server-less solutions. To the best of our knowledge, there exists scant knowledge onhow Kotlin performs in a serverless environment. This thesis aims to contributemore knowledge on this subject. The expanded knowledge could serve as a foun-dation, should a company be looking into utilizing Kotlin for writing serverlessfunctions. The work done for this thesis could both give information about theperformance of serverless functions written in Kotlin in general but also pro-vide an understanding about what is the better way to run a serverless functionwritten in Kotlin.

Only one public cloud provider will be tested. The reason being that thereis only one public cloud provider, Amazon, that offers the possibility of customruntimes.

The runtimes that will be compared for this thesis is the JVM and GraalVM.The JVM will represent JIT compilers and GraalVM will represent AOT com-pilers.

1.3 Ethics and Sustainability

From a sustainability standpoint the cloud and the serverless architecture areboth environmentally defensible. To begin with, users of the cloud do not haveto buy their own hardware. This means that they also do not have to estimatehow much computation power they need and therefore the risk of buying morethan what they actually need is eliminated. Since computer power in shared inthe cloud, the usage of the cloud´s resources can be optimized where the samehardware used by one client one day can be used by another client another day.This entails power savings and less impact on the environment.

Due to serverless being more lightweight than other more traditional hostingoptions it is also more attainable. Clients can host their application at a lowerprice which means more have the opportunity to host applications.

There is a given ethics perspective to this thesis, as with any investigatingreport. It is of great importance that the work being performed is unbiased.One way to create a conviction of this is to only use open source code and

5

tools available to the general public to ensure repeatability. Results will also bereported in their raw form to ensure readers have the opportunity to performtheir own calculations or verify the ones presented in this thesis.

1.4 Outline

Chapter 2 contains necessary background information, such as explanations re-garding the different compilers and an ingoing clarification about what a server-less architecture is and what it entails. This chapter also contains researchconcerning related work.

Chapter 3 incorporates a description of the methodology used to perform thework done for this thesis. It includes how and what benchmarks where chosen.It also gives an explanation to what metrics where used and why there wherechosen.

The result of the work is presented in Chapter 4 and a discussion regardingthe result can be found in Chapter 5. Finally Chapter 6 contains the conclusionsdrawn from the result and the discussion, it also contains a section for possiblefuture work.

6

Chapter 2

Background

This chapter contains useful background information about this thesis mainsubjects: serverless, Kotlin, compilers and benchmarks. It also incorporates asection that presents and discusses related work. A summary of the chapterskey points concludes the chapter.

2.1 Serverless

Serverless is a concept that was first commercialized by Amazons service AWSLambda in 2014 [13], the company was the first public cloud provider to offerserverless computing in the way it is known today. Since then serverless hasgained a great deal of traction. Google [14], IBM [15] and Microsoft [16] nowalso provide their own serverless services.

Serverless refers to a programming model and an architecture, aimed toexecute a modest amount of code in a cloud environment were the users do nothave any control over the hardware or software that runs the code. Despite thename, there are still servers executing the code, however, the servers have beenabstracted away from the developers, to the point where the developers do notneed to concern themselves with operational tasks associated with the server,e. g., maintenance, scalability and monitoring.

The provided function is executed in response to a trigger. A trigger is anevents that can arise from various sources, e. g., a database change, a sensorcaption, an API call or a scheduled job. After the trigger has been received,a container is instantiated and the code, provided from the developer, is thenexecuted inside that container.

2.1.1 The Attributes of Serverless

A serverless architecture does not imply the same infrastructural-related con-cerns and dilemmas, such as capacity, scaling and setup, as more traditionalarchitectures do. This enables developers to acquire a much shorter time to

7

market. A very important factor in the software development industry, wherechanges are happening at a rapid pace and where market windows can openand close fast. The ability to launch code quickly also enables prototypes to becreated and tested at a lower cost and therefore at a lower risk. Furthermore,this benefit implies that there can be a larger focus on the product itself. Giv-ing developers the opportunity to concentrate on application design and newfeatures instead of spending time on the infrastructure.

Cloud providers that offers a serverless solution charge only for what thefunction utilizes, in terms of execution time and memory. The owner of thefunction is therefore only billed when the function is invoked. This entails, giventhe right use case, that the infrastructural cost can be reduced, when comparedto a more traditional hosting option. Since there is no need to maintain ahosting solution, it can also lead to developers being able to take over the entiredeployment chain, rendering the operations role more obsolete and in extentenable an additional cost saving factor.

A serverless solution can bring many benefits to a project, however, it isnot an appropriate solution for all projects. A function is executed only whentriggered by an event, nothing is running when the function is not needed.The result of this is that when a function is invoked for the first time, or aftera long time of no invocations, the cloud provider needs to go through moresteps in order to start executing the function. An invocation of this type iscalled a cold start. During a cold start at Amazon AWS Lambda the additionalphases that needs to be executed, before the invoked function starts executing,are: downloading the function, starting a new container and bootstrapping theruntime [17]. The outcome of this is that during a cold start the execution time,and in extent the response time, will be noticeably longer. A longer executiontime also entails a greater cost.

To prevent cold starts, and spare end users of a long response time, one op-tion is to trigger the function periodically to keep it ”warm”. Amazon providesone such solution, Cloud Watch [18], were it is possible to schedule triggers witha certain interval. There are also third party tools serving as warmers [19] [20].Some tools also analyzes the usage of a serverless function and claims to predictwhen a trigger is needed, making the function warm upon real invocations [21].Keeping a function warm may be an option for functions that are being triggeredfairly often or if its being triggered with some predictable consistency. Other-wise there is a possibility that the warm up triggers deplete the cost savings thata serverless solution otherwise would provide. In that case a more traditionalhosting solution might be a better option, if response time is a decisive factor.

Another approach, to reducing the impact of a cold start, is to reduce theresponse time during a cold start. In this case there are two parameters that areconfigurable. The first parameter is the code. One way to optimize the code is tocarefully choose programming language, different languages have varying startup times [22]. Furthermore, Amazon recommends a few design approaches thatcould help optimise the performance of a function. Amazon suggest avoidinglarge monolithic functions and instead divide the code into smaller more special-ized functions. Only loading necessary dependencies rather then entire libraries

8

is also good practice. Amazon also recommends using language optimizationtools such as Browserfy and Minify [17]. If an AWS Lambda function is readingfrom another service Amazon emphasises the importance of only fetching whatis actually needed. That way both runtime and memory usage can be reduced.The second parameter, that is configurable, is the runtime, which will be thefocus of this thesis.

Resource limitations are, like cold starts, a restraint to serverless solutions.Cloud providers limits the resources a function can allocate. However, AmazonAWS Lambda has been continuously increased these limits. In November 2017they doubled the memory capacity, from 1.5 GB to 3 GB, that a lambda functioncan allocate [23]. In October 2018 they tripled the time limitation from 5 to 15minutes per execution [24]. There is a possibility this trend will continue in thefuture and facilitate additional use cases for serverless functions.

In a serverless architecture a third party, the public cloud provider, has takenover a great deal of the responsibility related to hosting compared to a moretraditional architecture. This entails that a great deal of trust has to be placedupon the provider. Especially since a serverless solution implies a vendor-lock-in, where a migration can be problematic and require multiple adjustments, dueto the code not only being tied to specific hardware but also to a specific datacenter.

Further trust also has to be put onto the cloud provider on account of se-curity. In a public cloud, where many users’ arbitrary functions are running atthe same time, security has to be a high priority in order to prevent intersectionof remote procedure calls and insuring container security.

To fully take advantage of the benefits, as well as avoid consequential im-plications from various drawbacks, a serverless solution can bring, it can beconcluded that not just any use cases can be applied in a favorable way.

2.1.2 Use Cases for Serverless Functions

For many applications, from a functionality perspective, a serverless architec-ture and more traditional architectures could be used interchangeably. Otherfactors, such as the solutions need for control over the infrastructure, cost andthe applications expected workload, are determining when considering using aserverless architecture.

From a cost perspective, serverless performs well when invocations occursin bursts. This since a burst implies many invocations happening close to eachother, time wise, and therefore entails that only the first execution will haveto go though a, more expensive, cold start. The other calls in the burst willthereafter use the same container and will therefore execute faster. When theburst has ended a serverless architecture will let the infrastructure scale downto zero, during which time there is no charge.

Computation heavy applications could, under the right circumstances, alsobe a good fit since the cost of other infrastructure solutions grow in proportionto computer power needed. However, to keep in consideration is that if a publiccloud provider is used, limitations on computing exist, such as memory and

9

time limits. This could mean that, from a performance perspective, a compu-tation heavy application might not be an appropriate use case for a serverlessarchitecture.

From a developer perspective serverless would be a good option in the caseswhen the drawbacks of lacking control over the infrastructure is outweighed bythe fact that there is no need for maintaining the infrastructure or worry aboutscaling.

Based on the characteristics and limitations of a serverless architecture, suchas the basis for the cost and resource limitations, the general usage for a server-less solution has a few common characteristics: lightweight, scalable and single-purposed.

IoT and mobile device backend

When it comes to IoT and backend solutions for mobile devices, a serverlessapproach could be advantageous. It could offload burdens from a device withlimited resources, such as computer power and battery time. Internet connectionis also a limited resource on an IoT and mobile device. By using a serverlesssolution as an API aggregator, required connection time could be reduced dueto a reduced number of API calls.

There could also be a benefit from a developer perspective since mobile ap-plications are developed by mostly front-end skilled people, some may thereforelack the experience and knowledge of developing back-end components. Creat-ing a serverless back end simplifies both its creation and set up, as well as elim-inates the need for maintenance. All this enables mobile apps and IoT devicesthat are fast and consistent in their performance independent of unpredictablepeak usage.

iRobot, the developers of the internet-connected Roomba vacuums, is one ofthe companies that are using a serverless architecture as an IoT backend [25].

Event triggered computing

Event driven applications are ideal for a serverless architecture. AWS Lambdahas many ways a user can trigger its functions. One of them is events thathappens in there storage solution S3.

One company that is taking advantage of this solution is Netflix. Before avideo can be streamed by end users, Netflix needs to encode and sort the videofiles. The process begins with a publisher uploading their video files to Netflix’sS3 database. That triggers Lambda functions that handles splitting up the filesand processes them all in parallel. Thereafter Lambda aggregates, validates andtags the video files before the files are ultimately published [26].

Another company that is also utilizing the same type of solution is AugerLabs, that focuses on custom-branding apps for artist. Auger Labs founder andCEO’s intent has been to remain NoOps, where no configuration or managingof back-end infrastructure was needed. Among other use cases, Auger Labsare using their serverless architecture of choice, Google’s Cloud Functions, in

10

combination with Googles Firebase Storage. When an image is uploaded totheir storage a function is triggered to create thumbnails in order to enhancemobile app responsiveness. They also use Cloud Functions to send notificationsvia Slack to handle monitoring [27].

Scaling solutions

Since scaling is handled automatically, developers do not have to worry abouthow the infrastructure is going to perform in case an expected, or unexpected,burst of requests occurs. The service provider will make sure to start enoughcontainers that can support all the heavy traffic being generated.

Hosting an application with a lower everyday usage, where heavy spikesoccurs very rarely, could lead to a high hosting cost, where the clients will payfor computer power that is unused most of the time, in order to maintain highavailability even at the spikes.

One such use case is presented by Amazon, regarding Alameda County inCalifornia. Their problem was a huge spike in usage during the elections. Theirprevious solution included on-premises servers that did not measure up. Bymoving to the cloud and utilizing Lambda AWS, the application could easilyscale at a satisfactory rate. Alameda County could avoid buying more expensivehardware that would not be used the rest of the year, at the same time as theycould serve all their users during their peak [28].

2.2 Kotlin

Kotlin is a statically typed programming language developed by JetBrains andwas first released in February 2016. Kotlin is most commonly run on the JVMbut can also be compiled into JavaScript or native code (utilizing LLVM) [7].

Despite being a newer language it has already gained a lot of traction. Largecompanies such as Pinterest [8] and Uber [9] are currently using Kotlin in pro-duction. Kotlin is also as of May 2019 Google’s preferred language for Androidapp development [10] and has been among the top in “most loved” languagesaccording to Stack Overflow developer survey reports the last years [11, 12].

The reasons behind Kotlins success may be many. One contributor could beits interoperability with Java. It is possible to continue to work on an alreadyexisting Java project using Kotlin. Other praised features are Kotlins readabilityas well as its null safety, facilitated by the languages ability to distinguishingbetween non-null types and nullable types.

2.3 Types of Compilers

A compiler is a program that translates code written in a higher level languageto a lower level language in order to make the code readable and executable bya computer. A compilers type is defined by when this translation is made. An

11

Ahead-of-Time compiler performs the conversion before the code is run while aJust-in-Time compiler translated the high level code at runtime.

2.3.1 Ahead-of-Time Compiler (AOT)

An Ahead-of-Time compiler does precisely what the name suggests, it compilescode ahead of time, i.e, before runtime. When an application is compiled withan AOT compiler no more optimizations are done after the compilation phase.

There are both benefits and drawbacks to an AOT compiler. One benefitis that the runtime overhead is smaller since there is no optimizations duringruntime. It is therefore also possible that an AOT compiled application is lessdemanding when it comes to computer resources such as RAM. The drawbackto this is that the compiler knows nothing about the workload of this applicationor how it will be used. There is therefore a risk that the compiler spends timeon optimization of, for example, methods that are rarely used.

2.3.2 Just-In-Time Compiler (JIT)

A Just-in-Time compiler offers a dynamic compilation process. Meaning blocksof code are translated into native code during runtime rather than prior toexecution like an AOT compiler [29].

A JIT compiler optimizes code during runtime using profiling. Meaning thatthe program is analysed to determine what optimizations would be profitableto carry out. A JIT compiler will therefore perform well informed optimizationsand will not waste time on compiling parts of an application that wont lead toan increase of performance. Examples of metrics that a JIT profiler is based onis method invocation count and loop detection [30]. A high method invocationcount means that method is a good candidate for compilation into native codeto speed up execution. Loops can be optimized in many ways, one favorableway is to unroll a loop. An unrolling of a loop entails an increase of operationsperformed by each iteration of the loop. Steps that would be performed insubsequent iterations are merged into earlier iterations.

The drawback to these specialized optimisations is the fact that executiontime during the first runs will be longer. Performance will however improve overtime as more parts of the code gets translated into native code and the compilergets more execution history to base its optimizations on.

2.4 The JVM Compiler

Even though all CPU’s are very similar, e. g., have the same functionalities suchas perform calculations and control memory access, programs that are designedfor one CPU can not be executed on another. The developers of the Javaprogramming language wanted a solution to this problem. They decided todesign an abstraction of a CPU, a virtual computer that could run all programswritten for it on any system, the result was the Just-in-Time compiler: Java

12

Virtual Machine (JVM). This idea was the basis for the slogan created for Javaby the developer of Java, Sun Microsystems: write once, run anywhere.

Another benefit facilitated by the JVM’s abstraction of a CPU is the abstractview of the memory that JVM has. Since the JVM treats the memory as acollection of objects it has more control over which programs that are allowedto access which parts of the memory. That way the JVM can prevent harmfulprograms accessing sensitive memory.

The JVM also include an algorithm called verification that contains rulesevery program has to follow and aims to detect malicious code and prevent itfrom running [31]. This algorithm is one of the three cornerstones of the JVM,stated in the Java Virtual Machine Specification [32]:

• An algorithm for identifying programs that cannot compromise the in-tegrity of the JVM. This algorithm is called verification.

• A set of instructions and a definition of the meanings of those instructions.These instructions are called bytecodes.

• A binary format called the class file format (CFF), which is used to con-vey bytecodes and related class infrastructure in a platform-independentmanner.

The JVM was developed primarily for the Java programming language butthe JVM has the possibility to execute any language that can be compiled intobytecode. The JVM, in fact, knows nothing of the Java programming language,only of the binary format CFF, that is the result of compiled Java code.

Some of the more well-known languages that can be executed by the JVMaside from Java, are Kotlin, Scala and Groovy [33]. These languages, and allothers that can be executed on the JVM, also gets the JVM’s benefits, such asits debugging features and garbage collection, that prevents memory leaks.

The most used JVM is the Java Hotspot Performance Engine that is main-tained and distributed by Oracle and is included in their JDK and JRE. TheHotspot JVM continuously analyses the program for code that is executed re-peatedly, so called hot spots, and aims to optimize these blocks, aspiring tofacilitate a high-performance execution.

The Hotspot JVM has two different flavors, the Client and Server VM. Thetwo modes run different compilers that are individually tuned to benefit thedifferent use cases and characteristics of a server and a client application. Com-pilation inlining policy and heap default are examples of these differences.

Since characteristics of a server include a long run time, the Server VMaims to optimize running speed. This comes at the cost of slower start-uptime and larger runtime memory footprint. On the opposite side, the ClientVM does not try to execute some of the more complex optimizations that theServer VM performs. This enables a faster start-up time and is not as memorydemanding [34].

13

2.5 The GraalVM Compilation Infrastructure

GraalVM is a compilation infrastructure that started out as a research projectfrom Oracle and was released as a production ready beta in May 2019 [35].

GraalVM contains the Graal compiler that is a dynamic JIT, just-in-time,compiler that is utilizing novel code analysis and optimizations. The compilertransforms byte code into machine code. GraalVM is then dependent on a JVMto install the machine code in. The JVM, that is used, also needs to supportthe JVM Compiler Interface in order for the Graal compiler to interact withthe JVM. One that does this is the Java Hotspot VM that is included in theGraalVM Enterprise Edition.

Before the Graal compiler translates the bytecode into machine code it isconverted into an intermediate representation, Graal IR [36]. In this represen-tation optimizations are made.

One goal of the GraalVM is to enable performance advantages for JVM-based languages, such as minimizing memory footprint through its ability toavoid costly object allocations. This is done by a new type of Escape Analysisthat instead of using an all-or-nothing approach uses Partial Escape Analy-sis [37]. A more traditional Escape Analysis would check for all objects thatare accessible outside its allocating method or thread and move these objectsto the heap in order to make them accessible in other contexts. Partial EscapeAnalysis, however, is a flow-sensitive Escape Analysis taking into account if theobject only escapes rarely, for example in one single unlikely branch. PartialEscape Analysis can therefore facilitate optimizations in cases a traditional Es-cape Analysis can not, enabling memory savings. During an evaluation done ina collaboration between Oracle Labs and the Johannes Kepler University theysaw a memory allocation reduction of up to 58.5 % and a performance increaseof 33 % [37]. Notably they also saw a performance decrease of 2.1 % on one par-ticular benchmark, indicating, not surprisingly, that Partial Escape Analysis isnot the best solution in every case. But overall all other benchmarks had anincrease in performance and a decrease in memory allocation.

Another goal of GraalVM is to reduce start up time of JVM-based applica-tions. This through a GraalVM feature that creates native images, that per-forms a full ahead-of-time (AOT) compilation. The result being a native binarythat contains the whole program and is ready for immediate execution. By thisGraal states that the program will not only have a faster startup time, but alsohave a lower runtime memory overhead when compared to a Java VM [38].

With the help from the language implementation framework Truffle, GraalVMis not only able to execute JVM-based languages. JavaScript, Python and Rubycan also be run with the GraalVM compilation infrastructure [39]. LLVM-basedlanguages such as C and C++ can also be executed by GraalVM thanks to Su-long [40]. Since the GraalVM ecosystem is language-agnostic, developers cancreate cross-language implementations where they have the ablility to chooselanguages based on what is suitable for each component.

14

2.6 Performing Benchmark Tests

In the field of benchmarks, much research has been done and several opensource benchmark suites have been constructed. There are multiple suites tar-geting the Java Virtual Machine, e .g., SPECjvm2008 [41], DaCapo [42] andRenaissance [43]. Dacapo was developed to expand the SPECjvm2008 suiteby targeting more modern functions [44] and the Renaissance suite focused onbenchmarks using parallel programming and concurrent primitives [45].

Looking at the thought processes behind building these suites, certain com-mon requirements can be identified. Only open source benchmarks and librarieshave been selected. One of the benefits of this is that it enables inspection of thecode and the workload. Diversity is also a common attribute these benchmarksuites are striving for, a good feature in principle but one that is harder to putinto practice. The Renaissance suites interpretation and approach to achievediversity is to include different concurrency related features of the JVM. Object-orientation is also mentioned as an important factor in the Renaissance suite,since that will lead to an exploit of the JVM parts that are responsible for ef-ficient executions of code patterns commonly associated with object orientedfeatures, e.g., frequent object allocation and virtual dispatch. The developersof the DaCapo suite strived to achieve diversity through maximizing coverageof application domains and application behavior.

Another type of benchmark suite is The Computer Language BenchmarksGame [46]. The aim of the suite is to provide a number of algorithms written indifferent languages. Kotlin, however, is not one of them. It is used, for example,in an evaluation of various JVM languages made by Li et al. [47]. From thissuite the authors categorized the benchmarks, depending on if the programmostly manipulated integers, floating-point numbers, pointers or strings. TheComputer Language Benchmarks Game has also been used by Schwermer [48].In his paper a subset of benchmarks was chosen. One benchmark for each type ofmanipulation focus, e.i, integers, floating-point numbers, pointers and strings.The chosen benchmarks where translated to Kotlin to be compared with theJava implementation provided by The Computer Language Benchmarks Game.The Kotlin translated suite will serve as a complementary part of the benchmarksuite used in this thesis.

When creating a benchmark suite, preferably there would exist a tool in thelikes of the one described by Dyer et al. [49], that is under construction. A toolis describes where it would be possible to search for open source benchmarksgiven certain requirements and where researchers could contribute with theirown benchmarks. The vision being faster and more transparent research.

Traditionally, performance tests are run in dedicated environment where asmuch as possible is done to minimize external impact on the result. Factorssuch as hardware configurations are ensured to be kept static, all backgroundservices are turned off and the machine should be single tenant. None of thiscan be found in a serverless solution hosted in a public could. Configurations areunknown and made by the cloud provider and the machines hosting the functionsare exclusively multi tenant. This entails an unpredictable environment where

15

there always will be uncertainties. The benefit, however, to perform tests inthe public cloud, is that it is easy to set up and at a low cost, where a moretraditional approach would mean a higher cost and an environment that requiresa high amount of effort to maintain.

A study by Laaber et al. [50] investigates the effect of running micro bench-marks in the cloud. The focus of their study consisted of measuring to whatextent slowdowns are detected in a public cloud environment, where the testswere run on server instances hosted by different public cloud providers.

One of the problems the authors address is that the instances might be up-graded by the provider between test executions, which can result in inexplicabledifferences in the results. However, if tests are done during a short period oftime, to avoid such changes by the provider, the results will only represent aspecific snapshot of the public cloud. It can then be argued that tests run overa longer period, e.g, a year, would result in a better representation. However,this large amount of time is, in many cases, an unobtainable asset.

The authors also mention the difference between private and public cloudtesting and emphasizes that the two can not be compared. This due to thepossibility of noisy neighbours in a public cloud but also due to hardware het-erogeneity [51], where different hardware configurations are used for instanceswith the same type.

Furthermore, the authors acknowledge that even though it is possible tomake reasonable model assumptions about the underlying software and hard-ware in a public cloud, based on literature and information published by theproviders, when experiments are done in the public cloud the cloud providershould always be considered a black-box that cannot be controlled.

The paper concludes that slowdowns of below 10 % can be reliably detected77–83 % of the time and the authors therefore considers micro benchmark ex-periments possible in instances hosted in a public cloud. They also concludedthat there was no big differences between instance types for the same provider.

According to Alexandrov et al. [52] there are four key factors to buildinga good benchmark suite and running a the benchmarks in the cloud is (1)meaningful metrics, (2) workload design, (3) workload implementation and (4)creating trust.

When considering meaningful metrics, the example of runtime is given as anatural and undebatable metric. Furthermore, cost is discussed as an interestingfactor but mostly relevant in research that is meant as support for businessdecisions. Although the cloud can be seen as infinitely scalable, it is only anillusion and therefore throughput can be seen as a valuable metric.

The workload has to be designed with the metrics in mind. Where theapplication should be modeled as a real world scenario with a plausible workload.

One of the important factors mentioned, when it comes to workload im-plementation, is the workload generation. The recommendation is that this isdone by pseudo-random number generators to ensure repeatability. A pseudo-random number generator also has the benefit of being much more accessiblethan it would be to gather the same amount of real data.

Creating trust is considered especially important when it comes to running

16

benchmark tests in the public cloud. The reason being the public clouds black-box property. As a client of a public cloud one can never be certain about theunderlying software or hardware. To create trust, the authors recommend exe-cuting previously mentioned aspects well, along with choosing a representativebenchmark scenario.

2.7 Related Work

In this section, previous work that relates to the work that will be done for thisthesis will be discussed. It starts with a discussion about solutions that havesimilar attributed as the serverless architecture. Followed by a section abouthow GraalVM is used at Twitter. To conclude this topic, about related work,there is a section covering how benchmark environments affect the result andwhat is thought of running benchmarks in the cloud.

2.7.1 Solutions Silimar to Serverless

The idea to start a processes only once called upon, is not unique for the server-less architecture. Super-servers, or a service dispatchers, is based on the sameprinciple. A super-server is a type of daemon which job is to start other serviceswhen needed. Examples of super-servers are: launchd, systemd and inetd.

inetd is an internet service deamon in Unix systems that was first introducedin 4.3BSD, 1986 [53]. The inetd super-server listens to certain predefined portsand when a connection is made upon one of them inetd starts the correspondingservice what will handle the request. These ports support the protocols TCPand UDP and examples of services that inetd can call are FTP and telnet. Forservices that do not expect high loads, this solution is a favorable option. Thisis due to the fact that such services do not have to run continuously, resulting ina reduced system load. Another benefit is that the services connected to inetddoes not have to provide any network code since inetd links the socket directlyto the service’s standard input, standard output and standard error.

To create an inetd service, developers only need to provide the code andspecify where the file, containing the code, will be located and which port shouldtrigger the service.

In similarity to the serverless architecture, not needing to care about serversis also a principle of agent-based application mobility where an application iswrapped by a mobile agent that has full control over the application. Themobility of the agent lets the application migrate from one host to another,where the application can resume its execution [54]. Instead of abstractingaway the server from the developer, like in the serverless solution, this approachlets the developer implement services and avoid servers all together.

Although this approach can bring many benefits, such as reduced networkload and latency due to local execution, agent-based application mobility alsohas its drawbacks. One of the drawbacks is the high complexity of developingthe application. The application needs to be delicately designed in order to be

17

device-independent and have the ability to be migrated between devices [55].The solution many applies to this problem is to use an underlying infrastructureor middleware [56, 57].

2.7.2 GraalVM at Twitter

Despite GraalVM only having a beta release, Twitter is already using it inproduction. Their purpose to adopting GraalVM is to save money from thedecrease in computer power needed. Another motivation was that the HotspotServer VM is old and complex while GraalVM is easier to understand [58].

By switching to GraalVM the VM-team at Twitter saw a decrease of 11 % ofused CPU-time in their tweet service, compared to running the Hotspot ServerVM. Twitter also discovered that they could decrease CPU-time further by tun-ing some of GraalVM’s parameters. One of these parameters where TrivialInlin-ingSize. Graphs with less nodes than the number represented by this parameterwould always be inlined. With their machine learning based tuner, Autotuner,that automatically adjusts these parameters, CPU-time was dropped another6 % [59].

To take into consideration is that the Hotspot JVM is tuned to the Javalanguage and Twitter is mainly using Scala in their services. The same codebase written in Java might not have produced the same dramatic improvements.

2.7.3 Benchmark Environment and the Cloud

When analysing the result of this thesis, it is important to take into consid-eration impacting error sources. One such error source is the hardware thefunctions will be running on. Since there will be no indication of what CPU isused for any execution, nothing could be said about its impact on performance.In a runtime comparison made by Hitoshi Oi three different processors whereused [60]. All three was made by Intel and based on the netburst microarchi-tecture but had different clock speed and cache hierarchies. Despite being fromthe same manufacturer and based on the same architecture, varied performancecould still be seen in almost all use cases. In AWS, no guarantee is given thatany feature of the different processors used will be the same. This fact and thestudy made by Hitoshi Oi gives an indication on the possible impact this factorcan have on the results.

This is further emphasised in a conference where John Chapin shares hisinvestigation into AWS performance [61]. Among other topics he speaks aboutthe difference in performance in relation to how much memory the user spec-ifies as maximum. Since AWS Lambda allows CPU allocation in proportionto maximum memory usage specified, it would be logical that lower amount ofmemory allocated always would lead to an inferior performance. However, inChapin’s experiments he found that this is not always the case. In some in-stances he got almost the same performance independent of available memoryallocation. He draws the conclusion that this is connected to the randomness ofthe container distribution. Some containers may be placed on less busy servers

18

and can therefore deliver better performance. This emphasises the importanceof rigorous performance testing; where the testing is well, time wise, distributedto get the best possible representation of the overall performance of the givenfunction in the public cloud.

A comparison of public could providers by Hyungro Lee et al. can give anindication as to how AWS will perform when testing its throughput [62].

Martin Maas et al. suggests that runtimes used in the serverless contextshould be rethought [63]. This based on the fact that most runtimes today isnot optimized for the modern cloud related use cases. They envision a genericmanaged runtime framework that supports different languages, front ends andback ends, for various CPU instruction sets, FPGAs, GPUs and other acceler-ators. Graal/Truffle is mentioned as a good example of a framework that cancreate high performance and maintainability by its ability to execute severaldifferent languages.

2.8 Summary

Serverless refers to a programming model and an architecture, aimed to executea modest amount of code in a cloud environment were the users do not haveany control over the hardware or software that runs the code. The servers areabstracted away from the developer and the only thing the developer needs tobe concerned about is the code. Every task related to maintaining servers aretaken care of by the cloud provider. Therefore solutions that requires scaling,for example, are a good fit for the serverless architecture.

The provided code only runs when the serverless function is invoked. Mean-ing that there is nothing running connected to the function when it is notinvoked. This also entails that for the first time, and every time the functionhas not been invoked for a while, start-up actions, such as starting a container,needs to be performed. An execution containing these start-up actions is saidto have gone through a cold start, otherwise it is a, so called, warm start.

There are two types of compilers compared in this thesis a Just-in-Time (JIT)compiler and an Ahead-Of-Time (AOT) compiler. An AOT compiler compilescode before it is run and creates an executable file. The AOT compiler used inthis thesis is GraalVM that started out as a research project from Oracle. Itwas released as a production ready beta May 2019. A JIT compiler compilesthe code during runtime. The JIT compiler used in this thesis is the HotspotJVM that is maintained and distributed by Oracle.

When running benchmarks, dedicated and isolated environments are usuallyused to minimize external impact on the results. The public cloud is eminentlyunlike such an environment. One reason being that the hardware and its con-figurations are hidden from the user. The fact that the public could is sharedalso enables the possibility of a neighbour having an effect on the performanceof one’s function. These factors have to be taken into account when analysingthe result.

19

Chapter 3

Method

A benchmark suite was created for this thesis. For every benchmark two cor-responding serverless functions were implemented in Amazon Web Service’sserverless solution Lambda. One that runs with the hotspot JVM providedby Amazon and one that runs as a native image created with the tool GraalVMCE. These functions were then invoked trough the AWS’s command line inter-face. The commands were run locally to simulate a more real world scenariowhere network latency can impact the result. All programs returns a JSONcontaining information about the execution.

We grouped the test into two categories, one that contains the executionsthat went through a cold start and one that contains the executions that reusealready started containers, warm starts.

The arithmetic mean of the different metrics were calculated along with atwo sided confidence interval to be able to analyse the results fairly.

This chapter describes, more in detail, how the work for this thesis wascarried out and the motivations behind the choices made. The last section ofthis chapter contains a summary containing the chapters key points.

3.1 Metrics

The metrics focused on in this thesis are mainly dynamic metrics [64]. Meaningmetrics that are to a higher degree based on the execution of code rather thanthe code itself [65]. This is due to the fact that the interest of this thesis lieswithin the performance of code given different runtimes. What applicationsare used and what techniques that were used developing them, factors that areconnected to static metrics, are secondary. Some static metrics will, however,be collected.

The static metrics used in this thesis is chosen with the purpose to givethe reader an indication of the overall size of the different benchmarks. Fourdifferent static metrics will be documented. Two of them are the sizes of theJVM and the GraalVM function, collected from Amazon Console. The other

20

two are lines of code and the number of Kotlin files.The dynamic metrics chosen for this thesis is based on what would be of

interest to a developer that is considering using Kotlin in a serverless context.We hypothesised that the factors a developer would be most interested in are acomparison of performance as well as cost.

When performance of software is measured, one of the most interesting ele-ments to attain is knowledge about how much resources are being used. Sincecost, in this case, is exclusively based on resources used, there is no need to addspecific cost-related metrics. The second element of interest is what is causingthese resource allocations. An example of a factor affecting the performance ofa program is garbage collection.

In this thesis the public cloud is used, that can be seen as a black box sinceusers can not be certain what environment their code is executed in. This entailsthat there are a large amount of factors that can affect the performance of thefunctions, such as hardware configuration. A choice have therefore been madeto only focus on measuring the resources that are being used and not measurefactors that are believed to cause these performance changes.

The resources that will be measured are latency, application runtime, re-sponse time as well as memory consumption. In Figure 3.1 we can see anillustration of the dynamic metrics which are measured in time.

Figure 3.1: An illustration of the metrics measured in time

3.1.1 Latency

Latency is measured by subtracting the start time, recorded by the locallyexecuted script invoking the Lambda function, from the start time recorded bythe function that is returned in the response JSON.

Latency can be important in cases were data becomes stale fast, it is there-fore important that the data gets processed quickly. One example of this is a

21

navigation system that gets location data from a car and needs to update itsdirection accordingly.

3.1.2 Response time

Response time is measured by subtracting the start time recorded by the invo-cation script from the end time recorded at the time the response is returnedfrom Amazon. Response time is a meaningful metric in multiple use cases.

One example is user interfaces. In one study from 1968 [66] and a compli-mentary study from 1991 [67], three types of limits for human and computerinteractions are summarized. For a user to experience that a system is react-ing instantaneously the requested result should be delivered within 0.1 second.To insure a user’s continuous, uninterrupted thought process the response timeshould not exceed 1.0 second. If the response time surpass a limit of 10 secondsusers will want to switch to another task during the execution.

Even thought these studies are written several decades ago there is no indi-cation that users should have raised there tolerance. With faster internet speedsand more powerful computers the opposite are presumably more truthful.

3.1.3 Memory consumption

Memory consumption is another essential factor. As always in software develop-ment, developers and operators are looking to optimize execution. One simplereason is that the more memory an application uses the more expensive it isto run. If a developer is running an application on an on-premises system theeffect might not be as palpable, until the need to buy more RAM arises. In aserverless context, however, optimization of memory usage can easily lead to avisible cost reduction.

The memory consumption of a function execution is recorded by AWS Lambdaand will be retrieved from its logs.

3.1.4 Execution time

The response time might be the most interesting time metric in this work.However, it is also of interest to see how much of the total time that compriseof actual application execution time and how that time changes given differentcircumstances. Execution time is also unaffected by external factors, such asinternet connection, and is only a result of the characteristics of AWS Lambda.This makes it a good measurement of the performance of AWS Lambda.

3.2 Benchmarks

Every benchmarks have two different Lambda functions. One that is run withthe JIT compiler hotspot JVM and one that is run as a native image, createdwith GraalVM.

22

All benchmarks are open source and have a separate repository on github [68].Each benchmark has main class that contains a main-function and a functioncalled handler. The handler -function is used as entry point for the serverlessfunctions using the JVM and the main-function is used for the serverless func-tions compiled with GraalVM.

3.2.1 Real benchmarks

A real test is to be preferred when performing benchmarks. These test are realin the sense that they are real repositories acquired from GitHub. They arenot originally intended as serverless applications and a discussion could be hadweather any of them would fit in a serverless context. Nevertheless, they stillrepresent real workloads of real applications and will therefor presumably be abetter indicator of the performance, of Graal and the hotspot JVM respectivelyin a serverless environment, than artificial applications.

Each of these real benchmarks contains test written using JUnit. To simulateworkload, some, or all, of the tests in the repository are invoked when runningthe benchmark. After invoking an AWS Lambda function a response from thefunction is sent back in the form of a JSON containing these fields:

• coldStart : Boolean - Indicates if the run has gone through a cold start ornot.

• startTime : Long - The time when the functions code starts to execute rep-resented in milliseconds since the UNIX epoch (January 1, 1970 00:00:00UTC).

• runTime : Long - Runtime of the application in milliseconds.

• wasSuccess : Boolean - Indicating if the tests where a success, used fordebugging purposes.

• failures : List - Containing reasons why tests failed, if there where failuresotherwise the list is empty. Used only for debugging purposes.

To determine if a run was a cold start or not, a search is made of a specificfile in the /tmp folder (where Amazon lets users write files with a combined sizeof 512 MB). If the file is not there the file is created. Since the file is removedwhen that container is, the applications will know if the container is new (thefile does not exist), meaning a cold start, or of it has been used before (the fileexist), meaning a warm start.

Kakomu

Kakomu is a repository that contains a go simulator [69]. The repository enablesa user to play a game of Go with a bot, but it can also simulate a game betweentwo bots. There are 18 tests used for this thesis and focuses on the game model,ensuring a game is evaluated correctly.

23

State machine

The state machine benchmark is taken from a repository containing a KotlinDSL for finite state machine developed by Tinder [70]. This benchmark contains13 tests.

Math functionalities

This repository provides discrete math functionalities as extension functions [71].Some examples of its capabilities are permutations and combinations of sets,factorial function and iterable multiplication. The benchmark implementationbased on this repository runs 55 individual tests that ensures all equations aredone correctly, i.e., most are mathematical equations and set-operations.

3.2.2 Complementary benchmarks

Finding suitable real benchmarks proved to be challenging, therefore the bench-mark suite is supplemented with additional artificial benchmarks. One of themis a simple ”Hello world”-example, its only purpose is to return the basic in-formation the other benchmarks does: start time of the function and if it wentthrough a cold start or not.

The other complementary benchmarks are algorithms from the benchmarksuite The Computer Language Benchmarks Game [46] implemented in Kotlinfor the purpose of the paper ”Performance Evaluation of Kotlin and Java onAndroid Runtime” [48]. These benchmarks all were categorized by Li et. al[47] to mainly manipulate different data types. These categorizations can beseen in Table 3.1.

Benchmark Data type

Fasta PointerN-body Floating-pointFannkuch-Redux IntegerReverse-Complement String

Table 3.1: Mainly manipulated data types

The benchmarks which originates from The Computer Language Bench-marks Game also returns a JSON, but since there are no JUnit tests run thefield wasSuccess and failures are omitted, otherwise the fields are the same asin the real test, i.e., coldStart, startTime and runtime.

Fasta

The Fasta benchmark is categorised as a pointer intensive algorithm that alsohas a large amount of output. Running the algorithm results in three generated

24

DNA sequences, the length of the sequences are decided by an input parameterrepresented as an integer. The length used in this thesis is 5× 106.

The generated output is written to a file and consists of three parts. The firstpart of the DNA sequence is a repetition of a predefined sequence and the lasttwo are generated in a pseudo-random way using a seeded random generator.After the file has been generated it will be removed to not affect the followingtests, since some are running in sequences.

Reverse-Compliment

The Reverse Compliment benchmark takes the input from a file containing theoutput from a run of the Fasta application, that in turn had an input of 106.

The aim is for the Reverse-Complement program to calculate the comple-menting DNA strands to fit the three DNA sequences that the input file con-tains. It is calculated with the help of a pre-defined translation table. Due tothe fact that the input file that is processed consist of strings, this benchmarkis categorized as mainly handling strings. Another attribute of this benchmarkto keep in mind is that it is also both input and output heavy.

N-body

The N-body benchmark simulates planets movements and manipulates for themost part floating points. It requires an integer as input that represent thenumber of simulation steps to be taken. The input used for this benchmark inthis thesis is 106.

Fannkuch-Redux

The Fannkuch-Redux benchmark permutes a set of numbers S = {1, ..., n},where n is the input value, in this case 10. In a permutation P of the set Sthe first k elements of P is reversed, where k is the first element in P . This isrepeated until the first element of the permuted list is a 1. This is done for alln-length permutations P of S.

Since all the elements in the list are integers this benchmark classifies as anapplication that mostly handles integers.

3.3 Environment and Setup

For this thesis Amazons Web Services is chosen as the public cloud provider,on account of Amazon being the only provider that offers the possibility forcustomers to provide a custom runtime. AWS’s serverless solution is calledLambda. A user of Lambda has the possibility to create and manipulate Lambdafunctions by using a CLI provided by Amazon, which we used for both creationand invocations in this thesis.

A Lambda function that should run on the JVM require a so called uber JAR,a JAR file that not only contains the program, but also includes its dependencies.

25

That way the JAR file only requires a JVM to run. The JAR files used in thisthesis are created with the help of a Gradle plug-in called Shadow [72] and theopen JDK version 1.8.0 222. The Lambda functions that executes these JARfiles uses the runtime Amazon calls java8 that is based on the JDK java-1.8.0-openjdk. When using the java8 runtime, Amazon utilizes their own operatingsystem, Amazon Linux, on the containers executing that function.

Using GraalVM CE a native image is created from the JAR generated byGradle. The latest release of GraalVM CE is 19.3 but contains a known bugwhere it is unable to create native images [73]. Therefore, the previous version,19.2.1, is used. In this thesis the community edition is used due to its availability.

To create a Lambda function with a custom runtime a bootstrap file isneeded in addition to the executable file. This bootstrap file needs to invoke theexecutable as well as report its result. The bootstrap file and the executable arethen decompressed in a zip file format and pushed Lambda to create a function.

All the Lambda functions that was created runs with a maximum memorysize of 256 MB and a timeout of 100 seconds. Meaning a program can notuse more than 256 MB memory, otherwise the invocation fails, and it will beinterrupted if it runs for more than 100 seconds.

3.4 Sampling Strategy and Calculations

Since the benchmarks are executed in a public cloud where the results can beaffected by factors such as noisy neighbours, it is reasonable to be mindful ofthe selection of execution times in order to achieve a representative result.

Two different aspects about time was taken into consideration, day versusnight and weekday versus weekend. Although the region chosen for hosting theAWS Lambda functions where us-east-1, there is no guarantee that the users ofthat region should all have a timezone used in the eastern parts of the UnitesStates. These test for example are made from the Central European Time zone(GMT+1). It was therefore concluded that since no distinction can be made,between day and night of the users of the same region, an interval was chosen.

The tests done in sequence where performed with 8 hours apart: 12PM, 8PMand 4AM (CET). The benchmarks ran 6 times over the span of three weekdays,from Tuesday 10/12 12:00 PM to Thursday 12/12 4:00 AM. In order to cover theweekend as well, three tests where run, with an 8 hour interval, from Saturday14/12 12:00 PM to Sunday 15/12 4:00 AM. Since these test where not meantto go through a cold start they could be done in sequence.

When deciding how many invocation each sequence should contain, previouswork were consulted. When a JVM is used for running benchmarks, a warmup sequence is commonly defined and used in order to ensure that the JVMhas achieved the so called steady state when samples are acquired [74] [75] [76].The optimal would be to get both warm up instances as well as instances wherethe JVM has reached a steady state in order to get a fair representation. Theamount of invocations required for each benchmark to achieve steady state couldbe examined, however, is out of the scope for this thesis. Therefore a report

26

by Lengauer et al. was used in order to determine a reasonable sample size.In the report, three different benchmark suites were used and the amount ofwarm up instances was base-lined at 20 due to the built-in mechanism in theDaCapo suite that requires a maximum of 20 warm-ups to automatically detecta steady state [75]. The suite used for this thesis is undoubtedly different inmany ways but this still gives an indication of how many invocations are requiredbefore a steady state is reached. We hypothesize that a steady state is reachedafter 20 invocations, but we also want some samples capturing the steady state.The number was therefore doubled and it was reasoned that 40 invocationspresumably would suffice. The first invocation, however, will inevitably includea cold start and will be excluded, entailing 39 usable executions per sequence.

To get measurements of executions including cold starts, invocations has tobe made with a large enough gap. After some trials, 20 minutes was found to bean adequate gap. Benchmarks was executed with a 20 minute interval betweenTuesday 10/12 16:40 and Wednesday 11/12 09:00 as well as between Sunday15/12 10:40 and Monday 16/12 10:20.

When the results have been gathered the raw data has to be compressedin some way in order to make it presentable and comprehensible. For this thearithmetic mean is chosen as a first step. To be able to argue for the accuracyof the result the confidence interval is also calculated. The confidence level usedin this work is 95 %, meaning that the level of confidence one can have thatthe actual value is within the given interval will be 95 %. The confidence levelwas chosen on account of it being one of the most commonly used [77] and itcontributes to a high credibility.

3.5 Summary

For this thesis we create a benchmark suite. The goal of the benchmarks isto simulate a real workload. The suite consists of three benchmarks that arereal applications, one simple hello world benchmark as well as four smallercomplementary benchmarks. Each of the complementary benchmarks focuseson manipulating different data types.

Each benchmark has two AWS Lambda functions, one that runs on the JVMand one that is run as a native image created with GraalVM. The times whenthese benchmarks are run is selected with the intent to create a fair representa-tion of the clouds performance. The metrics that are collected form each run islatency, execution time, response time and memory consumption.

27

Chapter 4

Result

Below follows the result of the performed benchmarks. To be noted is thatnot all benchmark executions where used. The first execution of each runningsequence of 40 invocations were removed since these are supposed to representwarm start and the first will always be a cold start. A few executions from theintended cold starts category was also removed since some were overlapping thesequential batch invocation, resulting in warm starts.

The raw data that was aggregated in this chapter can be found at the general-purpose open-access repository Zenodo [78].

To begin the chapter there is a section describing the static metrics. Thepurpose of these metrics are, as previously mentioned, to supply an overall viewof the benchmarks to the reader.

For each of the dynamic metrics there is a separate section. Every sectioncontains a table where the average value of each set can be viewed with anaccompanying two-sided confidence interval as well as the maximum and theminimum value. Each benchmark has four categories, where all combinationsof warm/cold start and JVM/GraalVM function are represented.

4.1 Static metrics

In the Table 4.1 found below the result of the gathered static metrics can beviewed. Lines of code (LOC) and number of Kotlin files were gathered fromthe IDE IntelliJ. The sizes of the functions presented here were gathered fromAmazon console and are the sizes of the zip files that was uploaded to Amazon.

We can see that the real benchmarks are unsurprisingly larger than most ofthe complementary benchmarks. The exception is reverse-comp, the explanationis the large input file that comes along with that benchmark. The uncompressedversion of the input file is 10 MB.

Since the hello-world benchmark only contains code for creating the responsewe can see that the amount of lines required for this is 52 that are divided overtwo files. It can therefore be concluded that 52 lines and 2 Kotlin files of all

28

benchmarks are dedicated to creating the response and that the rest is the actualapplication.

Benchmark LOC Kotlinfiles

JVM functionsize

GraalVM functionsize

discrete-math 911 26 5.9 MB 6.5 MBgo-simulator 2672 40 9.5 MB 6.7 MB

state-machine 1405 6 13.5 MB 6.5 MBreverse-comp 140 3 8.6 MB 3.7 MB

nbody 229 3 5.7 MB 0.8 MBfannkuch 133 3 5.6 MB 0.8 MB

fasta 205 3 5.7 MB 0.8 MBhello-world 52 2 5.7 MB 0.8 MB

Table 4.1: Static metrics for each benchmark in the suite

4.2 Latency

From Table 4.2 we can see that most of the confidence intervals are narrow,below 33 milliseconds. Giving an indication that the results are trustworthy.There is however an anomaly, and that is the latency results from both thecold and warm start category of the the GraalVM function of the reverse-compbenchmark. The two-sided confidence interval is ± 375.3 for the cold start and± 114 for the warm start. The uncertainties of the result can also be seen inthe min and max result where the span between them is large when comparedto the other benchmarks, 771-37,780 and 605-21,152.

29

Benchmark Category Compiler Latency (ms) Max Min

hello-world Cold JVM 1035± 12.3 1428 726GraalVM 967± 32.8 4036 807

Warm JVM 629± 5.9 1631 593GraalVM 640± 2.8 952 603

discrete-math Cold JVM 1010± 10.5 1437 676GraalVM 1325± 24.1 2450 658

Warm JVM 635± 2.7 848 592GraalVM 643± 2.3 827 605

go-simulator Cold JVM 1021± 9.0 1280 927GraalVM 1186± 14.3 1951 997

Warm JVM 633± 7.3 1851 594GraalVM 641± 2.1 721 602

state-machine Cold JVM 1037± 13.6 1703 921GraalVM 1296± 15.7 1981 1062

Warm JVM 626± 3.1 1065 594GraalVM 642± 3.0 1067 600

reverse-comp Cold JVM 1091± 18.1 1922 732GraalVM 1220± 375.3 37780 771

Warm JVM 632± 2.3 728 595GraalVM 707± 114.0 21152 605

nbody Cold JVM 1038± 11.4 1606 920GraalVM 945± 8.5 1203 786

Warm JVM 629± 1.9 693 595GraalVM 641± 2.0 708 605

fannkuch Cold JVM 1033± 14.2 1951 717GraalVM 955± 12.4 1931 791

Warm JVM 634± 2.2 707 596GraalVM 652± 3.2 982 606

fasta Cold JVM 1017± 9.1 1415 832GraalVM 957± 12.9 1948 802

Warm JVM 631± 2.1 720 594GraalVM 649± 2.2 727 603

Table 4.2: Latency result from benchmark execution in milliseconds

Looking closer at the raw data, illustrated in Figure 4.1 and 4.2, it can beseen that the second largest values for reverse-comp benchmark, when runningthe GraalVM implementation and cold start, are numerically far from the largestvalues. These are 1263 ms for cold start and 1665 ms for warm start. If thelargest values where removed from the resulting set of the GraalVM functionduring cold start the average value would be 1029 ms and during warm start itwould be 649 ms. That is a numerical decrease of 191 and 58 respectively anda the percentage reduction of 15.7 % and 8.2 %.

30

Figure 4.1: Latency of the GraalVM benchmark reverse-comp during warm-starts

Figure 4.2: Latency of the GraalVM benchmark reverse-comp during cold-start

The abnormally large values can have many causes, where the real one isimpossible to determine. One possible reason can be noisy neighbours. However,it is interesting that the GraalVM function for both the warm and the coldcategory got one such anomaly each, while the other tests where spared ofsimilar deviations.

The arithmetic mean of a set containing an anomaly such as this becomesless significant and no great weight can be put on this result.

31

4.3 Application Runtime

For this metric benchmark hello-world is excluded. That is because it wouldresult in zero milliseconds every time, since the benchmark does not includeany more code than the creation of the response. The end timestamp would beretrieved right after the start timestamp retrieval.

For most benchmarks in Table 4.3 the confidence interval is relatively nar-row. For all real benchmarks, for example, the largest interval is ± 22.2 msand belongs to the JVM function of the discrete-math benchmark during warmstarts.

The overall largest numerical confidence interval is for the JVM functionof the fannkuch benchmark during cold start, ± 226.5 ms. However, since theaverage value, 25597 ms, is so large it does not have a great impact. If thecorrect value would prove to be at the very edge of the interval it would meana the percentage difference of barely 0.9 %.

32

Benchmark Category Compiler Applicationruntime (ms)

Max Min


Warm JVM 315± 22.2 848 592GraalVM 95± 1.1 137 77


Warm JVM 46± 5.1 289 2GraalVM 16± 0.6 38 1


Warm JVM 13± 2.1 177 1GraalVM 2± 0.5 17 0


Warm JVM 3779± 66.2 6708 3028GraalVM 10591± 14.2 11203 10304


Warm JVM 748± 2.6 829 691GraalVM 840± 2.7 941 737


Warm JVM 19846± 62.3 23667 18117GraalVM 46714± 33.9 49620 45359


Warm JVM 23309± 39.1 24780 22134GraalVM 30990± 50.5 32806 29681

Table 4.3: Runtime result from benchmark execution in milliseconds

4.4 Response Time

The confidence intervals that can be seen in Table 4.4 are relatively limitedand does not indicate any untrustworthy result. It can, however, be noted thatthe GraalVM function of the reverse-comp benchmark during cold start hasthe largest interval, ± 373.1 ms. Since we saw the same pattern in the latencyresult it comes as no surprise. However, since latency is only one part of thetotal response time and there are other time intervals required to make up thewhole, the confidence interval does not have as great of an impact as it hasfor the latency result. Where the latency is 1220± 375.3 compared to the totalruntime average and confidence interval 13060± 373.1.

33

Benchmark Category Compiler Responsetime (ms)

Max Min


Warm JVM 725± 6.1 1716 678GraalVM 836± 3.5 1141 789


Warm JVM 1181± 32.4 2807 847GraalVM 957± 3.2 1161 903


Warm JVM 806± 10.2 1998 691GraalVM 863± 3.0 967 807


Warm JVM 757± 4.9 1240 688GraalVM 855± 70.1 1287 792


Warm JVM 4863± 70.1 8291 4141GraalVM 11516± 114.1 31802 11168


Warm JVM 1484± 3.8 1610 1388GraalVM 1687± 3.9 1822 1586


Warm JVM 20582± 62.3 24411 18891GraalVM 47592± 33.9 50430 46234


Warm JVM 24054± 39.3 25501 22911GraalVM 31861± 50.7 33698 30514

Table 4.4: Response time result from benchmark execution in milliseconds

4.5 Memory Consumption

From the result Table 4.5 we can see that the confidence intervals are quitenarrow for all benchmarks, most intervals have the value ± 0.1 MB. The largestinterval is± 2.3 MB and belong to the cold start version of the GraalVM functionrunning the reverse-comp benchmark.

From this we can conclude that the documented average values can be

34

trusted as representative values of the average memory consumption.

Benchmark Category Compiler Memory (MB) Max Min


Warm JVM 113± 0.1 114 112GraalVM 51± 0.1 52 49


Warm JVM 126± 0.6 141 114GraalVM 77± 0.1 79 76

go-simulator Cold JVM 114± 0.1 117 113GraalVM 65± 0 66 64

Warm JVM 115± 0.1 117 112GraalVM 67± 0.1 68 65

state-machine Cold JVM 115± 0.1 116 113GraalVM 62± 0 63 62

Warm JVM 116± 0.1 117 114GraalVM 64± 0.1 65 62


Warm JVM 214± 0.1 216 212GraalVM 207± 0.5 209 177


Warm JVM 109± 0.1 110 108GraalVM 51± 0.1 52 50

fannkuch Cold JVM 111± 0.1 113 108GraalVM 76± 0 77 76

Warm JVM 111± 0.1 113 110GraalVM 78± 0.1 79 76

fasta Cold JVM 136± 0.1 137 135GraalVM 98± 0 99 97

Warm JVM 158± 0.1 159 156GraalVM 99± 0.1 100 98

Table 4.5: Memory usage result from benchmark execution in mega bytes

35

Chapter 5

Discussion

This chapter contains discussions regarding the result seen in the previous chap-ter. Each section covers one type of metric and contains one subsection for coldstarts and one for covering warm starts.

To conclude the chapter there is a section about internal and external threatsto the validity of the result.

5.1 Latency

Latency is the time between invocation and the time the invoked function codestarts to execute.

JVM functions perform better during cold starts for real benchmarks, whereasGraalVM functions are in general faster for the artificial benchmarks. Duringwarm starts, however, the result for every benchmark for both types of functionsare very similar.

5.1.1 Cold start

From the Figure 5.1 we can see that the JVM functions have a lower averagelatency for all real benchmarks (discrete-math, go-simulator and state-machine)as well as for reverse-comp, although the confidence interval for reverse-comp iswide and is overlapping the GraalVM functions interval. For the other bench-marks the GraalVM functions have a lower average latency. The largest dif-ference can be seen for the discrete-math benchmark, the JVM function has a31.2 % lower latency than the GraalVM function.

36

disc

rete

-mat

h

go-sim

ulat

or

stat

e-m

achi

ne

hello

-wor

ld

reve

rse-co

mp

nbod

y

fann

kuch

fasta

0

200

400

600

800

1,000

1,200

1,400

1,0

10

1,0

21

1,03

7

1,035 1,091

1,038

1,033

1,01

7

1,325

1,18

6 1,29

6

967

1,22

0

945 955 957

Late

ncy

(ms)

JVM GraalVM

Figure 5.1: Average latency during cold start

5.1.2 Warm start

In Figure 5.2 we can see that during warm starts for every benchmark, realand artificial, the function running on the JVM consistently has a lower averagelatency, compared with the functions that has utilized GraalVM. However, thedifference is small, and some of the confidence intervals are overlapping, such asfor the go-simulator benchmark as well as for the reverse-comp benchmark. Thebiggest difference is for reverse-comp where the average latency of the functionrunning on the JVM is 11.9 % faster that the GraalVM function. However, asthe confidence intervals of these two versions collide not much weight can be puton this observation. The second largest difference is with the fasta benchmark,

37

where the JVM function is 2.9 % faster.

disc

rete

-mat

h

go-sim

ulat

or

stat

e-m

achi

ne

hello

-wor

ld

reve

rse-co

mp

nbod

y

fann

kuch

fasta

0

100

200

300

400

500

600

700

800

635

633

626

629 632

629 634

631640 64

3

642

640

707

641

652

649

Late

ncy

(ms)

JVM GraalVM

Figure 5.2: Average latency during warm start

5.2 Application Runtime

As previously mentioned, the hello-world benchmark is missing from Table 4.3,describing the application runtime result, due to redundancy.

From the table we can see a clear difference between JVM functions andGraalVM functions. There is not one type of function that performs betterfor all benchmarks, however, among the real test the GraalVM functions areindisputably faster. For both warm and cold starts.

38

5.2.1 Cold start

In the Figure 5.3 we can see a dramatic difference where the GraalVM functionsexecutes much faster than their JVM counterparts. The largest difference is inthe state-machine benchmark where the GraalVM function executes 62 timesfaster and only takes an average of 21 ms whereas the JVM function requiresan average of 1305 milliseconds.

discrete-math go-simulator state-machine

0

500

1,000

1,5001,

010

1,34

0

1,305

223

134

21

Ap

pli

cati

onru

nti

me

(ms)

JVM GraalVM

Figure 5.3: Average application runtime of real benchmarks during cold start

If we look at Figure 5.4 that outline the average application runtime of theartificial benchmarks, the differences are not as dramatic and not as one sidedas for the real benchmarks. The execution time of reverse-comp and fasta arequite similar. What stands out most is that in the fannkuch benchmark theJVM function was almost twice as fast as the GraalVM function.

39

reverse-comp nbody fannkuch fasta

0

2

4

6

·104

11,2

49

4,0

55

25,5

97 32,

551

11,6

18

835

46,

774

31,

455

Ap

pli

cati

on

runti

me

(ms)

JVM GraalVM

Figure 5.4: Average application runtime of artificial benchmarks during coldstart

The results implies that during a cold start the application runtime is shorterfor a GraalVM function than a JVM function. This is probably due to the factthat the code has been compiled beforehand and all the optimizations havealready been done by GraalVM. Whereas the JVM has to compile during run-time, resulting in a longer execution time. There might be a risk, however, tocompiling before hand since nothing of the application usage is known the com-piler might prioritize incorrectly in its optimisation. This may be the cause ofwhy the JVM-function is almost twice as fast as the GraalVM function in thefannkuch benchmark.

5.2.2 Warm start

In Figure 5.5 we can observe the average runtime results for the warm startcategory. The GraalVM functions still have faster execution times than theJVM function for the real benchmarks. However, we can note how small thedifference now is for the go-simulation and the state-machine benchmark whencompared to the result from the cold start category.

40


0

100

200

300

400

315

46

13

95

16

2Ap

pli

cati

onru

nti

me

(ms)

JVM GraalVM

Figure 5.5: Average application runtime of real benchmarks during warm start

For the go-simulation benchmark, during cold start, the difference betweenthe two types of functions is 1204 ms, where the GraalVM function is 10 timesfaster. The same benchmark during warm start differs only 30 ms, where theGraalVM function now only is 3 times as fast. The execution time for the JVMfunction has been reduced by 1294 ms, which equals a speed up of 281 %.

The same pattern can be seen for the state-machine benchmark. Duringcold start the difference is 1284 ms, where the GraalVM function requires 21ms and the JVM function 1305 ms to execute. When the JVM function doesnot have to go through a cold start however, the time gap is reduced to only 11ms, where the JVM function have an average execution time of 13 ms and theGraalVM function 2 ms.

Similar patterns can be viewed for the artificial benchmarks is Figure 5.6,where the average execution time of the JVM functions has been reduced forall benchmarks.

41

reverse-comp nbody fannkuch fasta

0

2

4

6

·104

3,77

9

748

19,8

46

23,3

09

10,5

91

840

46,7

14

30,9

90

Ap

pli

cati

on

runti

me

(ms)

JVM GraalVM

Figure 5.6: Average application runtime of artificial benchmarks during warmstart

For the benchmarks nbody and fasta benchmark, where the GraalVM func-tion has a lower average execution time during cold starts, the JVM functionis now faster. For the reverse-comp benchmark, where the JVM function isslightly faster during cold starts, the time gap is increased during warm starts.The same goes for the fannkuch benchmark.

We can establish that the JVM functions perform much better during warmstarts than cold starts. The difference is illustrated in Figure 5.7. As a compar-ison we can also view the difference in execution time of the GraalVM functionsduring warm and cold start in Figure 5.8. Some small improvements can benoted on every benchmark, except for the nbody benchmark where there is aslight deterioration, but it is nowhere near the improvements of the JVM func-tions.

42

disc

rete

-mat

h

go-sim

ulat

or

stat

e-m

achi

ne

reve

rse-co

mp

nbod

y

fann

kuch

fasta

0

0.5

1

1.5

2

2.5

3

3.5

·104

1,010

1,34

0

1,3

05

11,2

49

4,0

55

25,5

97

32,

551

315

46 13

3,77

9

748

19,8

47 23

,309

Ap

pli

cati

onru

nti

me

(ms)

JVM-cold start JVM-warm start

Figure 5.7: Comparison of average runtime of JVM functions during warm andcold start

43

disc

rete

-mat

h

go-sim

ulat

or

stat

e-m

achi

ne

reve

rse-co

mp

nbod

y

fann

kuch

fasta

0

1

2

3

4

5

6

·104

223

134

21

11,6

18

835

46,7

74

31,4

55

95 16 2

10,5

91

840

46,7

14

30,

990

Ap

pli

cati

on

runti

me

(ms)

GraalVM-cold start GraalVM-warm start

Figure 5.8: Comparison of average runtime of GraalVM functions during warmand cold start

The reason the JVM functions are performing much better during warmstarts is due to the JVMs JIT attribute. It has the possibility to, during runtime,change its optimizations based on the way the code is used and since the code inthis case always is used in the same way, optimizations are made easier. The JITattribute also means that the more times the code is run the more opportunitiesthe JVM has to make these optimizations. The GraalVM functions on theother hand are already compiled and the optimizations that have been madeare constant.

In the Figure 5.9 this can be viewed more clearly. It illustrates every warmstart execution time of the benchmark go-simulator in the order they whereacquired. We can see a distinct pattern that is repeated nine times, the sameamount of times as the sequential warm start tests where run. The start of eachnew sequential test is marked with a grey vertical line. We can see that duringthese marked points there is a peak. We can also see that the execution time isdecreasing, with some irregular spikes, until the end of the sequence and peaksagain when the next sequence starts.

44

0

50

100

150

200

250

300

Order of collection

Ap

pli

cati

on

runti

me

(ms)

Figure 5.9: Execution time of the JVM-function of the go-simulator benchmarkduring warm starts.

As a comparison, we can observe the same graph for the correspondingGraalVM-function in Figure 5.10. A pattern over the whole course of time canbe seen, where 18 is a reoccurring value, but no repeating pattern with regardsto the plotted intervals can be viewed.

45

0

5

10

15

20

25

30

35

40

Order of collection

Ap

pli

cati

on

runti

me

(ms)

Figure 5.10: Execution time of the GraalVM-function of the go-simulator bench-mark during warm starts.

5.3 Response Time

The total response time is the time it takes from the request is sent until aresponse is received. Which means that it is a product of both latency togetherwith the application runtime as well as creating and delivering the response.

In general we can see that during cold starts the GraalVM-functions performsbetter than the JVM-functions. For average total runtime during warm startsa decrease can be seen for the JVM functions when compared to the valuesfrom the cold start category, the JVM and GraalVM functions then have moresimilar values.

5.3.1 Cold start

For the real benchmarks it is unmistakable that the GraalVM-functions have amuch faster average total runtime. Every GraalVM-function is at least 7 timesfaster than its JVM counter part. An illustration of this can be seen in Figure5.11.

46


0

0.5

1

1.5

·104

14,7

09

13,

480

13,

596

2,047

1,79

6

1,808

Res

pon

seti

me

(ms)

JVM GraalVM

Figure 5.11: Average total runtime of real benchmarks during cold start

In Figure 5.12 a comparison of the average total runtime of the artificialbenchmarks during cold starts can be seen. The GraalVM-functions have thelowest average total runtime for all benchmarks except for the fannkuch bench-mark.

47

hello

-wor

ld

reve

rse-co

mp

nbod

y

fann

kuch

fasta

0

2

4

6

·104

12,0

42 19,6

65

12,7

13

33,9

22 40,6

96

1,1

76

13,0

60

1,99

3

47,9

52

32,6

30

Res

pon

seti

me

(ms)

JVM GraalVM

Figure 5.12: Average total runtime of artificial benchmarks during cold start

5.3.2 Warm start

In Figures 5.13 and 5.14 the GraalVM as well as JVM-functions of all bench-marks can be seen to have a reduced total average runtime during warm startscompared to their cold start counterparts. The largest reductions in averagetotal runtime consists of JVM functions.

The benchmark with the largest numerical difference is the fasta benchmark,where the JVM-function during cold start has an average total runtime of 40,696ms and during warm starts 24,054 ms, a difference of 16,642 ms.

The benchmark that has the largest relative difference is the state-machinebenchmark, where the warm start is 17 times faster than the cold start, from13,596 ms to 757 ms.

48


0

500

1,000

1,181

806

757

957

836 855

Res

pon

seti

me

(ms)

JVM GraalVM

Figure 5.13: Average total runtime of real world benchmarks during warm start

The average total runtime of the two types of functions during warm startshave equalized for almost all benchmarks. For the discrete-math benchmark theGraalVM-function is still faster than the JVM function. However, for the go-simulator and the state-machine benchmark the JVM function performs slightlybetter.

If we look at the artificial benchmarks, in Figure 5.14, we can see that,although the difference is smaller in some cases, the JVM-functions are generallyfaster when compared to its GraalVM counterpart.

49

hello

-wor

ld

reve

rse-co

mp

nbod

y

fann

kuch

fasta

0

2

4

6

·104

725 4,

863

1,48

4

20,5

82

24,

054

836

11,5

16

1,68

7

47,5

92

31,8

61

Res

pon

seti

me

(ms)

JVM GraalVM

Figure 5.14: Average total runtime of artificial benchmarks during warm start

That the average total runtime of the JVM-functions are significantly lowerduring warm starts than during cold starts is not unanticipated, consideringthe result discussed in the previous sections. This since latency and applicationruntime is a part of the total runtime. If one or both of them are reduced thenso should the total runtime value also be. In Section 5.1 we saw that latencyis decreased for all functions during warm starts when compared to cold starts.Then, in Section 5.2, we saw the same pattern for application runtime, wherethe JVM-functions showed the greatest improvements.

5.4 Memory Consumption

Memory is measured by Amazon, where maximum memory used during thefunctions run are stated in the log-reports for each function invocation.

In general it can be said that GraalVM-functions uses significantly less mem-ory than the JVM-functions. This since the JVM needs to compile the code dur-ing runtime and the compilation requires memory, while the GraalVM-functionsare already compiled. It can also be stated that the memory consumption re-mains relatively unchanged when comparing the same functions during cold andwarm start

50

5.4.1 Cold start

In Figure 5.15 the average memory usage result for all functions during cold startare illustrated. It is clear that the GraalVM-functions are using consequentlyless memory than the JVM functions for all benchmarks except the reverse-compbenchmark, which is also using the most memory out of all the benchmarks.Since what is characteristic of reverse-comp is its input-feature, this might bethe reason of the large memory usage. Reading a large input-file might resultin an uncharacteristically large memory usage for GraalVM enabled functions.

disc

rete

-mat

h

go-sim

ulat

or

stat

e-m

achi

ne

hello

-wor

ld

reve

rse-co

mp

nbod

y

fann

kuch

fasta

40

60

80

100

120

140

160

180

200

113

114

115

113

175

109

111

136

76

65

62

50

183

50

76

98

Mem

ory

con

sum

pti

on

(MB

)

JVM GraalVM

Figure 5.15: Average memory consumption during cold start

51

5.4.2 Warm start

Figure 5.16 illustrated the average memory consumption of all benchmarks dur-ing warm starts. A similarity to Figure 5.15 can be seen. One difference is thatthere is now no exceptions where the GraalVM-function does not require lessmemory that its JVM counterpart. Most values are the same or very similar,with only 1 or 2 MB difference. One benchmark that stands out on the otherhand is the reverse-comp benchmark. For the JVM-function the average mem-ory consumption has increased with 39 MB and for the GraalVM-function 24MB.

Another benchmark that stands out is the discrete-math benchmark. TheJVM function still requires the same amount of memory, the GraalVM-function,however, shows a decrease of 25 MB.

52

disc

rete

-mat

h

go-sim

ulat

or

stat

e-m

achi

ne

hello

-wor

ld

reve

rse-co

mp

nbod

y

fann

kuch

fasta

40

60

80

100

120

140

160

180

200

220

113

115

116

113

214

109

111

158

51

64 64

51

207

51

78

99

Mem

ory

con

sum

pti

on

(MB

)

JVM GraalVM

Figure 5.16: Average memory consumption during warm start

5.5 Threats to validity

For the work done in this thesis, a benchmark suite of real as well as complemen-tary benchmarks was created. The goal when selecting them was to simulatereal workloads. It is possible that this is not the case, that they don’t reflectreal workloads. It is also possible that the benchmark suite represents a realworkload but only a certain type of workload. Both of these threats would entailit unfitting to use the result as a basis for the general case.

Since the collection of the metrics was a mix between collecting data fromthe script written for this work as well as log data from AWS Console, it may

53

be possible errors where made when merging these different metrics. There isalso a possibility there have been issues when collecting the metrics, by AWSbut also by the script designed to automatically record the results.

This work relies heavily on AWS, where the goal is to present an as repre-sentative result as possible. It is possible that the benchmarks are run in anabnormal environment for AWS. This would entail that the result would onlybe valid for that abnormal environment.

Since the information about where and how the functions are hosted in AWSis highly limited it can cause all sorts of unknown issues that could threatensthe validity of this work. There may also be other unknown issues to this workthat does not involve AWS.

54

Chapter 6

Conclusion

This chapter concludes this thesis by gathering all results and summarizingthem. It also contains a section about how this work could be elaborated andimproved.

6.1 Performance

The results from the work done in this thesis, give the indication that differentrecommendations on how to run a function written in Kotlin, should be givendepending on how the function is intended to be used.

The result from this work indicates a sizable performance difference depend-ing on if the function undergoes a cold start or not. Latency and memoryconsumption are, however, not as affected as the two other parameters. Ap-plication runtime and total runtime are more dependent on if the function hasbeen invoked recently or not.

If a function written in Kotlin is expected to go through many cold starts,creating a native image with GraalVM is a good option to lower the total run-time. A GraalVM function also requires less memory. The combination entailsa cheaper hosting solution than a JVM function.

On the other hand, if a function is expected to be invoked often and thereforenot undergo many cold starts, the difference in total runtime is smaller for thetwo compilation approaches. Memory consumption is, however, still in the favorof the GraalVM functions. The overall advantages are therefore not as obviousin this case.

A more detailed conclusion for each metric can be found in the subsectionsbelow.

6.1.1 Latency

When latency is regarded, the type of function, JVM-dependent or GraalVM-created, does not seem to matter much. The JVM-functions showed better

55

results for the real benchmarks as well as for the reverse-comp benchmark duringcold start. The result for reverse-comp for the GraalVM-function during coldstarts, however, can not be trusted since its confidence interval is so wide. Forthe other artificial benchmarks the GraalVM-function proved to be faster.

If we put more weight on the real benchmarks, as we should, it could besaid that the JVM-functions perform better than its GraalVM counterparts.However, since the difference is not very clear it can not be said for certain thata JVM-function would have a lower latency in a general case during cold starts.

During warm starts latency is almost equal for all benchmarks and functions.The JVM-functions can be seen to have a slightly lower latency in general, butthe difference is so small that it can be said that the latency during warm startsis equal for JVM and GraalVM-functions.

6.1.2 Application Runtime

From the results we saw that there are big differences between the JVM andthe GraalVM-functions. During cold starts of the real benchmarks it is clearthat the GraalVM-functions perform a great deal better. When it comes to theartificial benchmarks, however, the results are more even, where the fannkuchbenchmark was executed much faster by the JVM-function than the GraalVM-function.

However, if we for application runtime, as for latency, would put more weightonto the real benchmarks it could be stated that GraalVM-functions performmuch better during cold starts than JVM-functions. This since optimizationsand compilations have already been made by GraalVM when the native im-age was created. This has to be done during runtime for the JVM functions,entailing a longer execution time the first times it runs.

For application runtime during warm starts all functions had improved itsexecution time. The JVM-functions, however, had improved drastically. Forthe real benchmarks the GraalVM-functions where still faster but the JVM-functions could be seen to perform better than its GraalVM counterpart for theartificial benchmarks.

The drastic improvement for the JVM-functions can be traced back to theJVMs ability to keep changing optimizations during runtime. More executionsleads to more data the JVM can create more well based optimizations on. Forthe GraalVM-functions the code is already compiled, therefore no great im-provement in application runtime can be seen.

If a function written in Kotlin is expected to undergo many cold starts, e.i,to be invoked rarely, it would be a good option to precompile the function withthe help of GraalVM to lower the application runtime. If a Kotlin function isexpected to have a lot of sequential batch invocations, e.i, many warm startsconsecutively, the difference in execution time is lower and the choice betweena GraalVM precompiled function and a JVM-function becomes less clear.

56

6.1.3 Response Time

We have seen that for cold starts the GraalVM-functions perform better for allbenchmarks, with the exception of the fannkuch benchmark. The GraalVM-functions for the real benchmarks all run at least 7 times faster.

During warm starts the JVM-functions improvements in application runtimecan be seen and the result is a much lower total runtime for those functions. Thedifference between the GraalVM-functions and the JVM-functions are small forall real benchmarks and for the artificial benchmarks the JVM-function performbetter or equal to its GraalVM counterpart.

If a serverless function written in Kotlin is expected to undergo many coldstarts, GraalVM is a good option to lower the total runtime. If many warmstarts are expected the choice between a JVM-function and a GraalVM functionmatters less.

6.1.4 Memory Consumtion

For memory consumption the results are clear. Both for warm and cold startsGraalVM-functions require a much lower amount of memory, in some caseseven half. The results are hardly surprising since the JVM-compiles its functionduring runtime and to perform the compilation the JVM requires memory.

6.2 Future work

There are many components of this work that can be improved and/or expanded.More work could be done to develop a more substantial benchmark suite,

where only real benchmarks are used. A closer look could also be had into thedifferent optimization techniques of the two compilation options. Benchmarkscould then be designed or picked in order to target the compilation strategiesstrength and weaknesses. The impact of the different approaches could then beanalysed and more specific recommendation could then be made as when to useone or the other compilation option.

Another improvement would be to test more cloud providers. As of now onlyAWS supports custom runtimes but it is possible more public cloud providerswill follow. A comparison could at least be done for only the JVM-functions.

To get a more accurate view of the cloud provider, benchmarks could berun for a longer period of time. The result from this work only represents theperformance of a snapshot of the AWS cloud.

There is also the possibly to explore the different regions that AWS has.Since different hardware as well as different configurations could be used fordifferent data centers it might be the case that some regions perform betterthan others.

Furthermore, it would be interesting to perform the same tests for similarfunctions written in other JVM based languages such as Scala and Java to seeif they result in the same outcome as the tests done for Kotlin in this thesis.

57

Another angle to expand the work done in this thesis, could be to explorethe patterns seen in figure 5.9. To look into how a JVM-function performs overlonger time to see when the functions reach a steady state to explore when aJVM-function becomes faster that an already compiled native image producedby GraalVM.

58

Bibliography

[1] IDG. “2018 Cloud Computing Survey”. In: (Aug. 14, 2018).

[2] LogicMonitor. “Cloud Vision 2020: The Future of the Cloud - A survey of in-dustry Influencers”. In: (Dec. 2017).

[3] Amazon. AWS Lambda. url: https://aws.amazon.com/lambda/ (visited on09/06/2019).

[4] Google. Google Cloud Function. url: https://cloud.google.com/functions/(visited on 09/06/2019).

[5] Passwater, Andrea. 2018 Serverless Community Survey: huge growth in server-less usage. 2018. url: https://serverless.com/blog/2018- serverless-

community-survey-huge-growth-usage/ (visited on 09/06/2019).

[6] Amazon. AWS Lambda Releases. url: https://docs.aws.amazon.com/lambda/latest/dg/lambda-releases.html (visited on 09/06/2019).

[7] Kotlin. FAQ. url: https : / / kotlinlang . org / docs / reference / faq . html

(visited on 10/08/2019).

[8] Kotlin, Talking. Kotlin at Pinterest with Christina Lee. May 15, 2017. url:http://talkingkotlin.com/kotlin-at-pinterest-with-christina-lee/

(visited on 10/08/2019).

[9] Kotlin, Talking. Kotlin at Uber. Apr. 30, 2019. url: http://talkingkotlin.com/kotlin-at-uber/ (visited on 10/08/2019).

[10] Lardinois, Frederic. Kotlin is now Google’s Preferred language for Android appdevelopment. May 30, 2019. url: https://techcrunch.com/2019/05/07/

kotlin-is-now-googles-preferred-language-for-android-app-development/

(visited on 10/08/2019).

[11] Overflow, Stack. Developer Survey Results - 2018. 2018. url: https://insights.stackoverflow.com/survey/2018/#most-loved-dreaded-and-wanted (visitedon 10/08/2019).

[12] Overflow, Stack. Developer Survey Results - 2019. 2019. url: https://insights.stackoverflow.com/survey/2019/#most-loved-dreaded-and-wanted (visitedon 10/08/2019).

[13] Miller, Ron. Amazon Launches Lambda, An Event-Driven Compute Service.Nov. 13, 2014. url: https://techcrunch.com/2014/11/13/amazon-launches-lambda-an-event-driven-compute-service/ (visited on 09/30/2019).

59

[14] Novet, Jordan. Google has quietly launched its answer to AWS Lambda. Feb. 9,2016. url: https://venturebeat.com/2016/02/09/google-has-quietly-launched-its-answer-to-aws-lambda/ (visited on 09/30/2019).

[15] IBM. IBM Unveils Fast, Open Alternative to Event-Driven Programming. Feb. 23,2016. url: https://www-03.ibm.com/press/us/en/pressrelease/49158.wss(visited on 09/30/2019).

[16] Miller, Ron. Microsoft answers AWS Lambda’s event-triggered serverless appswith Azure Functions. Mar. 31, 2016. url: https://techcrunch.com/2016/03/31/microsoft-answers-aws-lambdas-event-triggered-serverless-apps-

with-azure-functions/ (visited on 09/30/2019).

[17] Services, Amazon Web. AWS re:Invent 2017: Become a Serverless Black Belt:Optimizing Your Serverless Appli (SRV401). Dec. 1, 2017. url: https://youtu.be/oQFORsso2go (visited on 09/30/2019).

[18] Amazon. Tutorial: Schedule AWS Lambda Functions Using CloudWatch Events.url: https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/

RunLambdaSchedule.html (visited on 09/30/2019).

[19] Jeremydaly. Lambda Warmer. url: https://github.com/jeremydaly/lambda-warmer (visited on 09/30/2019).

[20] FidelLimited. Serverless WarmUp Plugin. url: https://github.com/FidelLimited/serverless-plugin-warmup (visited on 09/30/2019).

[21] Dashbird. X-Lambda. url: https://github.com/dashbird/xlambda/ (visitedon 09/30/2019).

[22] Shilkov, Mikhail. Cold Starts in AWS Lamdba. Sept. 26, 2019. url: https :

//mikhail.io/serverless/coldstarts/aws/ (visited on 10/10/2019).

[23] Amazon. AWS Lambda Doubles Maximum Memory Capacity for Lambda Func-tions. Nov. 17, 2017. url: https://aws.amazon.com/about- aws/whats-

new/2017/11/aws-lambda-doubles-maximum-memory-capacity-for-lambda-

functions/ (visited on 10/01/2019).

[24] Amazon. AWS Lambda enables functions that can run up to 15 minutes. Oct. 10,2018. url: https://aws.amazon.com/about-aws/whats-new/2018/10/aws-lambda-supports-functions-that-can-run-up-to-15-minutes/ (visited on09/30/2019).

[25] Amazon. iRobot Ready to Unlock the Next Generation of Smart Homes Usingthe AWS Cloud. url: https://aws.amazon.com/solutions/case-studies/irobot/ (visited on 10/03/2019).

[26] Amazon. Netflix AWS Lambda Case Study. url: https://aws.amazon.com/solutions/case-studies/netflix-and-aws-lambda/ (visited on 10/02/2019).

[27] Google. NCloud Functions for Firebase Auger Labs. url: https://firebase.google.com/docs/functions/case-studies/augerlabs.pdf?hl=zh-CN (vis-ited on 10/02/2019).

[28] Amazon. Alameda County Serves Election Maps at High Speed, Low Cost UsingAWS. url: https://aws.amazon.com/solutions/case-studies/alameda-

county/ (visited on 10/03/2019).

60

[29] Perez, Guillermo A. et al. “A Hybrid Just-in-time Compiler for Android: Com-paring JIT Types and the Result of Cooperation”. In: Proceedings of the 2012International Conference on Compilers, Architectures and Synthesis for Em-bedded Systems. CASES ’12. Tampere, Finland: ACM, 2012, pp. 41–50. isbn:978-1-4503-1424-4. doi: 10.1145/2380403.2380418. url: http://doi.acm.

org/10.1145/2380403.2380418.

[30] Croce, Louis. Lecture notes in Just in Time Compilation. 2014. (Visited on12/17/2019).

[31] Engels, Joshua. Programming for the JavaTM Virtual Machine. Addison Wesley,1999. isbn: 0-201-30972-6.

[32] Lindholm, Tim et al. The Java Virtual Machine Specification, Java SE 8 Edition.1st. Addison-Wesley Professional, 2014. isbn: 013390590X, 9780133905908.

[33] Urma, Raoul-Gabriel. Alternative Languages for the JVM. July 2014. url: https://www.oracle.com/technetwork/articles/java/architect- languages-

2266279.html (visited on 10/04/2019).

[34] Oracle. The Java HotSpot Performance Engine Architecture. url: https://

www.oracle.com/technetwork/java/whitepaper- 135217.html (visited on10/09/2019).

[35] Lynn, Scott. For Building Programs That Run Faster Anywhere: Oracle GraalVMEnterprise Edition. May 8, 2019. url: https://blogs.oracle.com/graalvm/announcement (visited on 10/07/2019).

[36] Duboscq, Gilles et al. Graal IR: An Extensible Declarative Intermediate Repre-sentation. Feb. 2013.

[37] Stadler, Lukas & Wurthinger, Thomas, and Mossenbock, Hanspeter. “PartialEscape Analysis and Scalar Replacement for Java”. In: Proceedings of AnnualIEEE/ACM International Symposium on Code Generation and Optimization.CGO ’14. Orlando, FL, USA: ACM, 2014, 165:165–165:174. isbn: 978-1-4503-2670-4. doi: 10.1145/2581122.2544157. url: http://doi.acm.org/10.1145/2581122.2544157.

[38] GraalVM Native Image. url: https://www.graalvm.org/docs/reference-manual/aot-compilation/ (visited on 10/04/2019).

[39] Wurthinger, Thomas et al. “One VM to Rule Them All”. In: Proceedings ofthe 2013 ACM International Symposium on New Ideas, New Paradigms, andReflections on Programming & Software. Onward! 2013. Indianapolis, Indiana,USA: ACM, 2013, pp. 187–204. isbn: 978-1-4503-2472-4. doi: 10.1145/2509578.2509581. url: http://doi.acm.org/10.1145/2509578.2509581.

[40] Rigger, Manuel et al. “Sulong, and Thanks for All the Fish”. In: ConferenceCompanion of the 2Nd International Conference on Art, Science, and Engineer-ing of Programming. Programming'18 Companion. Nice, France: ACM,2018, pp. 58–60. isbn: 978-1-4503-5513-1. doi: 10.1145/3191697.3191726. url:http://doi.acm.org.focus.lib.kth.se/10.1145/3191697.3191726.

[41] Corporation, Standard Performance Evaluation. SPECjcm2008. Sept. 26, 2019.url: https://www.spec.org/jvm2008/ (visited on 11/12/2019).

[42] DaCapo. DaCapo Benchmarking Suite. url: http://dacapobench.org/ (visitedon 11/12/2019).

61

[43] Renaissance Suite. url: https://renaissance.dev/ (visited on 11/12/2019).

[44] Blackburn, S. M. et al. “The DaCapo Benchmarks: Java Benchmarking Devel-opment and Analysis”. In: OOPSLA ’06: Proceedings of the 21st annual ACMSIGPLAN conference on Object-Oriented Programing, Systems, Languages, andApplications. Portland, OR, USA: ACM Press, Oct. 2006, pp. 169–190. doi:http://doi.acm.org/10.1145/1167473.1167488.

[45] Prokopec, Aleksandar et al. “Renaissance: Benchmarking Suite for Parallel Ap-plications on the JVM”. In: Proceedings of the 40th ACM SIGPLAN Conferenceon Programming Language Design and Implementation. PLDI 2019. Phoenix,AZ, USA: ACM, 2019, pp. 31–47. isbn: 978-1-4503-6712-7. doi: 10 . 1145 /

3314221.3314637. url: http://doi.acm.org/10.1145/3314221.3314637.

[46] The Computer Language Benchmarks Game. url: https://benchmarksgame-team.pages.debian.net/benchmarksgame/ (visited on 11/12/2019).

[47] Li, Wing Hang & White, David R., and Singer, Jeremy. “JVM-hosted Languages:They Talk the Talk, but Do They Walk the Walk?” In: Proceedings of the 2013International Conference on Principles and Practices of Programming on theJava Platform: Virtual Machines, Languages, and Tools. PPPJ ’13. Stuttgart,Germany: ACM, 2013, pp. 101–112. isbn: 978-1-4503-2111-2. doi: 10.1145/

2500828.2500838.

[48] Schwermer, Patrik. “Performance Evaluation of Kotlin and Java on AndroidRuntime”. (MA thesis). KTH, School of Electrical Engineering and ComputerScience (EECS), 2018.

[49] Sherman, Elena and Dyer, Robert. “Software Engineering Collaboratories (SE-Clabs) and Collaboratories As a Service (CaaS)”. In: Proceedings of the 201826th ACM Joint Meeting on European Software Engineering Conference andSymposium on the Foundations of Software Engineering. ESEC/FSE 2018. LakeBuena Vista, FL, USA: ACM, 2018, pp. 760–764. isbn: 978-1-4503-5573-5. doi:10.1145/3236024.3264839. url: http://doi.acm.org.focus.lib.kth.se/10.1145/3236024.3264839.

[50] Laaber, Christoph & Scheuner, Joel, and Leitner, Philipp. “Software Microbench-marking in the Cloud. How Bad is It Really?” In: Empirical Softw. Engg. 24.4(Aug. 2019), pp. 2469–2508. issn: 1382-3256. doi: 10.1007/s10664-019-09681-1. url: https://doi.org/10.1007/s10664-019-09681-1.

[51] Ou, Zhonghong et al. “Exploiting Hardware Heterogeneity within the Same In-stance Type of Amazon EC2”. In: Presented as part of the. USENIX, Submit-ted. url: https://www.usenix.org/conference/hotcloud12/exploiting-

hardware-heterogeneity-within-same-instance-type-amazon-ec2.

[52] Folkerts, Enno et al. “Benchmarking in the Cloud: What It Should, Can, andCannot Be”. In: Selected Topics in Performance Evaluation and Benchmarking.Ed. by Nambiar, Raghunath and Poess, Meikel. Berlin, Heidelberg: SpringerBerlin Heidelberg, 2013, pp. 173–188. isbn: 978-3-642-36727-4.

[53] FreeBSD. BSD System Manager’s Manual - INETD(8). Jan. 12, 2008. url:https://www.freebsd.org/cgi/man.cgi?query=inetd&sektion=8 (visited on10/22/2019).

[54] Belghiat, Aissam et al. “Mobile Agent-Based Software Systems Modeling Ap-proaches: A Comparative Study”. In: CIT 24 (2016), pp. 149–163.

62

[55] Yu, Ping et al. “Mobile Agent Enabled Application Mobility for Pervasive Com-puting”. In: Proceedings of the Third International Conference on Ubiquitous In-telligence and Computing. UIC’06. Wuhan, China: Springer-Verlag, 2006, pp. 648–657. isbn: 3-540-38091-4, 978-3-540-38091-7. doi: 10.1007/11833529_66. url:http://dx.doi.org/10.1007/11833529_66.

[56] Qin, Weijun & Suo, Yue, and Shi, Yuanchun. “CAMPS: A Middleware for Pro-viding Context-Aware Services for Smart Space”. In: Advances in Grid and Per-vasive Computing. Ed. by Chung, Yeh-Ching and Moreira, Jose E. Berlin, Hei-delberg: Springer Berlin Heidelberg, 2006, pp. 644–653. isbn: 978-3-540-33810-9.

[57] Zhou, Y. et al. “A Middleware Support for Agent-Based Application Mobilityin Pervasive Environments”. In: 27th International Conference on DistributedComputing Systems Workshops (ICDCSW’07). June 2007, pp. 9–9. doi: 10.

1109/ICDCSW.2007.12.

[58] Thalinger, Chris. GeeCON Prague 2017: Chris Thalinger - Twitter’s quest fora wholly Graal runtime. Jan. 24, 2018. url: https://www.youtube.com/watch?v=pR5NDkIZBOA (visited on 10/11/2019).

[59] Thalinger, Chris. Performance tuning Twitter services with Graal and MachineLearning - Chris Thalinger. July 31, 2019. url: https://www.youtube.com/watch?v=3fSKcLM5nGw (visited on 10/11/2019).

[60] Oi, H. “A Comparative Study of JVM Implementations with SPECjvm2008”.In: 2010 Second International Conference on Computer Engineering and Appli-cations. Vol. 1. Mar. 2010, pp. 351–357. doi: 10.1109/ICCEA.2010.75.

[61] Chapin, John. Fearless JVM Lamdbas - John Chapin. Mar. 30, 2017. url: https://www.youtube.com/watch?v=GINI0T8FPD4 (visited on 10/17/2019).

[62] Lee, H. & Satyam, K., and Fox, G. “Evaluation of Production Serverless Com-puting Environments”. In: 2018 IEEE 11th International Conference on CloudComputing (CLOUD). July 2018, pp. 442–450. doi: 10.1109/CLOUD.2018.

00062.

[63] Maas, Martin & Asanovic, Krste, and Kubiatowicz, John. “Return of the Run-times: Rethinking the Language Runtime System for the Cloud 3.0 Era”. In:Proceedings of the 16th Workshop on Hot Topics in Operating Systems. HotOS’17. Whistler, BC, Canada: ACM, 2017, pp. 138–143. isbn: 978-1-4503-5068-6.doi: 10.1145/3102980.3103003. url: http://doi.acm.org/10.1145/3102980.3103003.

[64] Singh, Gurdev & Singh, Dilbag, and Singh, Vikram. “A Study of Software Met-rics”. In: International Journal of Computational Engineering and Management11 (Jan. 2011), pp. 22–27. issn: 2230-7893.

[65] Sharma, Manik and Singh, Dr. Gurdev. “Article: Analysis of Static and DynamicMetrics for Productivity and Time Complexity”. In: International Journal ofComputer Applications 30.1 (Sept. 2011), pp. 7–13.

[66] Miller, Robert B. “Response Time in Man-computer Conversational Transac-tions”. In: Proceedings of the December 9-11, 1968, Fall Joint Computer Con-ference, Part I. AFIPS ’68 (Fall, part I). San Francisco, California: ACM, 1968,pp. 267–277. isbn: 978-1-4503-7899-4. doi: 10.1145/1476589.1476628. url:http://doi.acm.org/10.1145/1476589.1476628.

63

[67] Card, Stuart K. & Robertson, George G., and Mackinlay, Jock D. “The Infor-mation Visualizer, an Information Workspace”. In: Proceedings of the SIGCHIConference on Human Factors in Computing Systems. CHI ’91. New Orleans,Louisiana, USA: ACM, 1991, pp. 181–186. isbn: 0-89791-383-3. doi: 10.1145/108844.108874. url: http://doi.acm.org/10.1145/108844.108874.

[68] Kim Bjork - Github Repositories. url: https://github.com/KimBjork?tab=repositories (visited on 01/11/2020).

[69] uberto. Kakomu. url: https://github.com/uberto/kakomu (visited on 12/12/2019).

[70] Tinder. State Machine. url: https://github.com/Tinder/StateMachine (vis-ited on 12/12/2019).

[71] MarcinMoskala. Kotlin Disctete Math Toolkit. url: https : / / github . com /

MarcinMoskala/KotlinDiscreteMathToolkit (visited on 12/12/2019).

[72] johnrengelman. Gradle Shadow. url: https://github.com/johnrengelman/

shadow (visited on 12/12/2019).

[73] [native-image] cannot build an image with the 19.3.0 version of GraalVM CEfor java8 and java11. url: https://github.com/oracle/graal/issues/1863(visited on 12/12/2019).

[74] Georges, Andy & Buytaert, Dries, and Eeckhout, Lieven. “Statistically RigorousJava Performance Evaluation”. In: Proceedings of the 22nd Annual ACM SIG-PLAN Conference on Object-Oriented Programming Systems and Applications.OOPSLA ’07. Montreal, Quebec, Canada: Association for Computing Machin-ery, 2007, pp. 57–76. isbn: 9781595937865. doi: 10.1145/1297027.1297033.url: https://doi.org/10.1145/1297027.1297033.

[75] Lengauer, Philipp et al. “A Comprehensive Java Benchmark Study on Memoryand Garbage Collection Behavior of DaCapo, DaCapo Scala, and SPECjvm2008”.In: Proceedings of the 8th ACM/SPEC on International Conference on Perfor-mance Engineering. ICPE ’17. Lapos;Aquila, Italy: Association for Comput-ing Machinery, 2017, pp. 3–14. isbn: 9781450344043. doi: 10.1145/3030207.3030211. url: https://doi.org/10.1145/3030207.3030211.

[76] Blackburn, Stephen M. et al. “The DaCapo Benchmarks: Java BenchmarkingDevelopment and Analysis”. In: Proceedings of the 21st Annual ACM SIGPLANConference on Object-Oriented Programming Systems, Languages, and Applica-tions. OOPSLA ’06. Portland, Oregon, USA: Association for Computing Ma-chinery, 2006, pp. 169–190. isbn: 1595933484. doi: 10.1145/1167473.1167488.url: https://doi-org.focus.lib.kth.se/10.1145/1167473.1167488.

[77] Zar, Jerrold H. Biostatistical Analysis (4th Edition). USA: Prentice-Hall, Inc.,2007, pp. 43–45. isbn: 978-0130815422.

[78] Bjork, Kim. A comparison of compiler strategies for serverless functions writtenin Kotlin - dataset. url: https://zenodo.org/record/3648281#.XjwYe2hKiUl.

64

TRITA -EECS-EX-2020:83

www.kth.se

a comparison of compiler strategies for serverless...

Documents