vroom: accelerating the mobile web with server-aided dependency...

VROOM: Accelerating the Mobile Web with Server-AidedDependency Resolution

Vaspol RuamviboonsukUniversity of Michigan

Ravi NetravaliMIT

Muhammed UluyolUniversity of Michigan

Harsha V. MadhyasthaUniversity of Michigan

AbstractThe existing slowness of the web on mobile devices frustrates usersand hurts the revenue of website providers. Prior studies have at-tributed high page load times to dependencies within the page loadprocess: network latency in fetching a resource delays its processing,which in turn delays when dependent resources can be discoveredand fetched.

To securely address the impact that these dependencies have onpage load times, we present VROOM, a rethink of how clients andservers interact to facilitate web page loads. Unlike existing solu-tions, which require clients to either trust proxy servers or discoverall the resources on any page themselves, VROOM’s key character-istics are that clients fetch every resource directly from the domainthat hosts it but web servers aid clients in discovering resources.Input from web servers decouples a client’s processing of resourcesfrom its fetching of resources, thereby enabling independent use ofboth the CPU and the network. As a result, VROOM reduces themedian page load time by more than 5 seconds across popular Newsand Sports sites. To enable these benefits, our contributions lie inmaking web servers capable of accurately aiding clients in resourcediscovery and judiciously scheduling a client’s receipt of resources.

CCS Concepts• Information systems ! Mobile information processing sys-tems; Browsers; • Networks ! Network performance evaluation;Network measurement;

KeywordsWeb performance, Mobile web, Page load times

ACM Reference format:Vaspol Ruamviboonsuk, Ravi Netravali, Muhammed Uluyol, and HarshaV. Madhyastha. 2017. VROOM: Accelerating the Mobile Web with Server-Aided Dependency Resolution . In Proceedings of SIGCOMM ’17, LosAngeles, CA, USA, August 21–25, 2017, 14 pages.https://doi.org/10.1145/3098822.3098851

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] ’17, August 21–25, 2017, Los Angeles, CA, USA© 2017 Copyright held by the owner/author(s). Publication rights licensed to Associationfor Computing Machinery.ACM ISBN 978-1-4503-4653-5/17/08. . . $15.00https://doi.org/10.1145/3098822.3098851

1 IntroductionDespite the rapid increase in mobile web traffic [11], page loadson mobile devices remain disappointingly slow. Recent industrystudies [9, 13], as well as our own measurements (§2), show thata large fraction of mobile-optimized websites load much slowerthan user tolerance levels, even on state-of-the-art mobile devicesand cellular networks. For example, the average web page takes 14seconds to load even on a 4G network [12].

Since the speed of page loads critically impacts user experienceand thus provider revenue [19], much effort has been expended byboth industry and academia to identify the root causes for poor webperformance. Recent studies [35, 41, 42] have found that dependen-cies between the resources on any web page are a key reason forslow page loads. Today, even mobile-optimized web pages includeroughly one hundred resources [7] on average, and client browserscan discover each of these resources only after they have fetched,parsed, and executed other resources that appear earlier in the page.For instance, a browser may learn that it needs to fetch an imageafter executing a script which it discovers after downloading andparsing the page’s HTML.

Prior work has taken one of two approaches to address the impactof these dependencies on web performance, and both approachessuffer from fundamental drawbacks.• Offloading to proxies. In one class of solutions, when a client

loads a page, discovery of resources on the page is offloaded toa proxy [36, 40, 43]. Solutions that take this approach attemptto reduce page load times by leveraging the faster CPUs andnetwork connectivity of proxy servers. However, clients musttrust that proxies preserve the integrity of HTTPS content; proxiesthat disregard HTTPS traffic are limited in the benefits they canprovide given the increased usage of HTTPS [33]. Moreover, topreserve the ability of web providers to personalize content, aclient must share with the proxy its cookies for all domains fromwhich resources must be fetched to load the page.

• Reprioritizing requests at client. An alternative class of solu-tions [22, 35] lets the client itself discover all resources on a page.Instead of fetching resources in the order that they are discovered,these systems preferentially fetch certain resources (e.g., thosethat lead to longer dependency chains [35]) based on precomputed,high-level characterizations of the page’s dependency structure.The problem with these approaches is that, once the client browserdiscovers the need for a resource, the client must necessarily waitfor that resource to be fetched over the network before it can startprocessing the resource, resulting in under-utilization of the CPU.Since the client CPU is the primary bottleneck when loading web

https://doi.org/10.1145/3098822.3098851https://doi.org/10.1145/3098822.3098851

SIGCOMM ’17, August 21–25, 2017, Los Angeles, CA, USA V. Ruamviboonsuk et al.

pages on mobile devices ([34] and §2), this class of solutions haslimited ability to speed up the mobile web.These limitations of prior approaches motivate the need for a new

solution that both preserves the end-to-end nature of the web and aidsclients in discovering the resources on any page. Specifically, a clientmust receive every resource directly from the domain hosting thatresource, thereby enabling the client to verify the integrity of HTTPScontent and requiring the client to share its cookies for a domainonly with servers in that domain. Yet, the new solution must alsopreserve the primary benefit of proxy-based dependency resolution,which is to decouple the client’s downloading of resources on a pagefrom the processing of those resources. Decoupling these functionsmaximizes resource utilization during page loads because fetchingof resources is constrained by the network, whereas the CPU limitsthe parsing and execution of each fetched resource.

We argue that the way to realize this desired end-to-end solutionis to redesign page loads such that web servers securely aid clientsin resource discovery. In addition to returning a requested resource,a web server should inform the client of other dependent resourcesthat the client will need in order to load the page. Though thereare resource overheads associated with identifying these dependentresources, content providers have a strong incentive to incur thisburden in order to decrease load times for their clients, therebyincreasing revenue [6, 46]. We make three contributions in designingVROOM to realize this approach.

First, to aid clients in resource discovery, VROOM-compliant webservers not only push the content of dependent resources (leveragingthe server push capability in HTTP/2 [20]) but also return depen-dency hints in the form of URLs for resources that the client shouldfetch. The use of HTTP/2 push alone is insufficient because contenton modern web pages is often served by multiple domains [21], eachof which can only securely push the content that it owns. In contrast,dependency hints enable a server to inform a client of the dependentresources that it should fetch from other domains, without providingthe content of those resources. This additional input from serversensures that a client’s ability to discover and start downloading re-quired resources is not constrained by the speed with which it canprocess fetched content. In fact, by the time the client discovers theneed for a resource during its execution of a page load, that resourcewill likely already be in its cache.

Second, we develop the mechanisms that VROOM-compliant webservers must employ to identify the resources they should push andthe dependency hints they should include with their responses. Incontrast to prior efforts, which have relied exclusively on eitheronline [36, 40] or offline [22, 35] dependency resolution, we showhow to combine the two approaches to accurately identify the set ofresources that a client will need to fetch within a specific page load.Critically, our design maximizes the number of dependent resourcesthat the client is made aware of while avoiding sending dependencyhints for intrinsically unpredictable resources—ones which varyeven across back-to-back loads of the page—so that the client doesnot incur the overhead of fetching resources that are unnecessary forits page load.

Lastly, while the additional input from VROOM-compliant webservers reduces the latency for clients to discover all resources ona web page, fetching all resources as soon as they are discovered

0.00

0.25

0.50

0.75

1.00

0 5 10 15 20

Page Load Time (s)

CDF

acro

ss w

ebsit

es

Top 100 OverallTop 50 News + Top 50 Sports

Figure 1: Page load times on today’s mobile web.

increases contention for the access link’s bandwidth, delaying thereceipt of some resources. To maximize CPU utilization, we lever-age the property that resources that need to be parsed/executed(HTML, CSS, and JS objects) constitute only a quarter of the byteson the average mobile web page [7]. We coordinate server-sidepushes and client-side fetches such that resources that need to beparsed/executed are received earlier than other resources; the clientcan fetch the latter set of resources while processing the former set.In doing so, we ensure that resources arrive at the client in the orderin which they will be processed.

Our implementation of VROOM enables the use of Google Chrometo load pages from the Mahimahi [36] page load replay environment.On a corpus of web pages from popular News and Sports sites, themedian page load time reduces from the status quo of 10.5 sec-onds to 5.1 seconds with VROOM. These improvements stem fromVROOM’s ability to enable server-side identification of dependentresources with a median false negative rate of less than 5%, whichin turn results in a 22% median decrease in client-side latency todiscover all resources on a page.

2 MotivationWe begin by presenting a range of measurements that illustratethe poor web performance today on mobile devices, estimate thepotential to reduce page load times, and show that existing solutionsare insufficient.Problem: Poor load times. We demonstrate the slowness of themobile web using two sets of websites: the Alexa US top 100 web-sites and the top 50 sites each in the News and Sports categories;these popular sites apply known best practices such as minifyingJavaScript content and eliminating HTTP redirects [4]. We load thelanding page for each site five times on a Nexus 6 smartphone that isconnected to Verizon’s LTE network with excellent signal strength;we report median page load times.1

Figure 1 shows that the median site among the top 100 takesroughly 5 seconds to load, which is higher than the 2–3 secondperiod that a typical user is willing to wait [27]. When consideringNews and Sports sites, which are more complex than the averagesite [21], the median load time is even higher, exceeding 10 seconds.Since the need for faster loads is particularly acute on News andSports sites, we focus on these sites in the rest of this section.Cause: Poor CPU/network utilization. We now consider howlow page load times can be reduced to without web pages beingrewritten [1]. For a client to not have to trust proxies to execute1We compute page load time as the time between when a page load begins and whenthe onload event fires.

VROOM: Accelerating the Mobile Web SIGCOMM ’17, August 21–25, 2017, Los Angeles, CA, USA

0.00

0.25

0.50

0.75

1.00

0 5 10 15 20

Page Load Time (s)

CD

F ac

ross

web

site

s

Network BottleneckCPU BottleneckMax(CPU, Network)Loads from Web

Figure 2: Potential for reducing page load times by making better useof the client’s CPU and network.

page loads on its behalf, the client must fetch all resources on apage directly from origin web servers and locally process all ofthese resources. This places two constraints on web performance:the client’s network connection and its CPU. To compute a lowerbound on page load times, we estimate the potential gains from aredesign of the page load process that would fully utilize at least oneof these two resources.

To mimic a setting where the network bandwidth is the bottleneck,we replay each page load after modifying the root HTML to list allresources required to load the page in a manner that instructs thebrowser to fetch these resources but not evaluate them. To emulatea setting where the client’s CPU is the bottleneck, we load everypage with the client phone connected via USB to a desktop whichhosts all of the web servers. In both cases, we use Mahimahi [36]to record page content and replay page loads, and we use HTTP/2between the client and all web servers in order to make efficient useof the network. We use the same mobile device and cellular networkas above, and we further describe our replay setup in Section 6.

Figure 2 shows that, when exactly one of the network or the CPUis the bottleneck, rather than both limiting each other as is the casetoday, page load times on popular News and Sports websites aresignificantly lower than the status quo (the median load time dropsfrom 10.5 seconds to 5 seconds). Our results also show that theCPU is typically the bottleneck in mobile page loads, corroboratingthe findings of recent studies [34]. Furthermore, page load times inthe case where the CPU is the bottleneck remain largely the sameeven if we disable 1 of the 4 cores on the Nexus 6 smartphone,indicating that adding more cores will not help improve mobile webperformance.Existing solutions are insufficient. Since we seek a solution thatimproves mobile web performance while preserving the end-to-endnature of the web, we consider two existing solutions that satisfythis property.

First, we consider a setting where all domains on the web haveadopted the latest version of the HTTP protocol, HTTP/2. HTTP/2reduces inefficiency in the use of the network by enabling requeststo be multiplexed on the same TCP connection. To estimate thepotential impact of HTTP/2, we replay page loads in an environmentwhere HTTP/2 is universally used (“HTTP/2 Baseline”). The resultsin Figure 3 portend that, though the global adoption of HTTP/2 willreduce the median page load time across popular News and Sportswebsites to roughly 8 seconds, mobile web performance will remainsignificantly short of optimal; the lower bound we saw in Figure 2was more than 2 seconds lower, a substantial gap given that web

0.00

0.25

0.50

0.75

1.00

0 5 10 15

Page Load Time (s)

CD

F ac

ross

web

site

s HTTP/2 BaselinePush All StaticHTTP/1.1Loads from Web

Figure 3: Estimation of page load time improvements that would beenabled once HTTP/2 is globally adopted. Given that the adoption ofHTTP/2 is still in its nascency today, the fidelity of our replay setupis confirmed by the close match between load times measured whenloading pages on the web and in our HTTP/1.1 replay environment.

0.00

0.25

0.50

0.75

1.00

0.0 0.2 0.4 0.6

Fraction of Time on Critical Path Waiting on Network

CD

F ac

ross

web

site

s

Figure 4: Fraction of the critical path spent waiting for the network,when the client uses HTTP/2 to communicate with all domains.

providers have found that even a few 100 milliseconds of additionaldelay significantly reduces their revenue [10]. Configuring the firstparty domain of every page to push (leveraging HTTP/2’s serverpush capability) all static resources that it hosts offers little additionalbenefit, for reasons discussed later.

Performance with HTTP/2 falls significantly short of the lowerbound because the root cause for high page load times remains: sinceboth CPU-bound and network-bound activities are typically on thecritical path of a page load [41], neither the client’s CPU nor itsaccess link is utilized to capacity. The client browser cannot parse anHTML/CSS object or execute a JavaScript file until it has incurredthe latency to fetch that resource, which in turn it can begin to do onlyafter discovering the need to fetch that resource by parsing/executinganother resource. Indeed, Figure 4 shows that a significant fractionof time on the page load’s critical path—over 30% on the medianpage—is spent waiting to receive data over the network, leading tounder-utilization of the CPU, the bottleneck resource (Figure 2).

An alternative end-to-end solution for improving web perfor-mance is to use an approach like Polaris [35]. With Polaris, the clientreceives a characterization of the page’s dependency structure at thestart of the page load and uses this knowledge to prioritize requestsfor more critical resources. However, such an approach can do littleto reduce the network delays encountered on the critical path. Thefundamental constraint with this approach is that the client must dis-cover all resources on the page on its own (i.e., fetching a resourceand then evaluating it to identify new resources to fetch). As a result,once the browser discovers a resource by parsing/executing otherresources that appear earlier in the page load, the latency of fetching


that resource must be incurred at that time before the browser canbegin processing the resource. Since HTTP/2 eliminates head-of-line blocking and network bandwidth is not the bottleneck in mobilepage loads (Figure 2), reordering requests for discovered resourcesoffers little benefit. We provide further evaluation and discussion ofPolaris in Section 6.Summary. Together, the measurements in this section lead to thefollowing takeaways:• Page loads on mobile devices are currently significantly slow even

for popular websites, particularly for sites in categories that havemore complex web pages than others.

• The reduction in load times that we can expect from the adoptionof existing end-to-end solutions will not suffice.

• However, we could potentially halve the median load time if thepage load process were redesigned to more efficiently use theclient’s CPU and network.

3 ApproachOur results in the previous section indicate that the key to optimizingmobile web performance is to maximize the utilization of the client’sCPU. To do this, we need to ensure that the browser’s processing ofany resource is not delayed waiting to receive the resource over thenetwork. To achieve this decoupling of the browser’s use of the CPUand network, web page loads would ideally work as follows. Whena client browser issues a request for a page, it would receive back allthe resources needed to render the page, rather than just the HTMLfor the page. This would optimize page load performance for tworeasons. On the one hand, in contrast to the status quo wherein theclient incrementally fetches resources as it discovers them duringthe page load, receiving all of the objects on the page at once wouldmaximize the utilization of the client’s access link and eliminate theneed for repeated latency-onerous interactions between the clientand web servers. On the other hand, if the resources on the pageare delivered in the order they need to be processed, the client canmake full use of its CPU, processing resources while fetching otherresources in parallel.

3.1 Limitations of HTTP/2 PUSHIt may appear that such an ideal design of the page load process isfeasible today because HTTP/2-compliant web servers can specu-latively push resources to clients [20]. However, today, 1) the re-sources on a page are often spread across multiple domains [21],e.g., a web page from one provider often includes advertising, ana-lytics, JavaScript libraries, and social networking widgets from otherproviders; 2) HTTPS adoption is rapidly growing [8, 33]; and 3)page content is increasingly personalized. These typical characteris-tics of modern web pages make the use of HTTP/2 PUSH inefficientfor the following reasons.2

• When a domain receives a request for the HTML of a page that ithosts, it can only return resources that it hosts and not resourcesserved by other domains. In the example in Figure 5, in re-sponse to the request for the HTML, a.com’s servers can only

2Another commonly cited limitation of PUSH is the potential for bandwidth wastagewhen a resource cached at the client is pushed. However, this problem could be addressedby having the client send a summary of its cache contents to web servers, e.g., in acookie [5].

...a.com/foo.js...

GETa.com

...var image = new Image();image.src = "b.com/img.jpg";document.appendChild(image);...

http://a.com http://a.com/foo.js

a.com

b.com

client

HTML foo.js img.jpg

Parse HTML Execute foo.js

11

12

GETfoo.js

13

14

GETimg.jpg

15

GETa.com

a.com

b.com

client

HTML

Parse HTML

11

12 Execute foo.js13

(a) Client discovers all resources on its own

(b) Servers aid client's resource discovery by pushing some resources

(c) Servers push some resources and return dependency hints for others

GETa.com

a.com

b.com

client

img.jpg

Parse HTML

HTML

b.com/img.jpg+

GETimg.jpg

11

12 Execute foo.js13

foo.js img.jpg

GETimg.jpg

14

foo.js

Figure 5: Comparison of critical path across different approaches forloading web pages, in all of which the client receives every resource fromthe domain from which it is served, so as to preserve personalizationand the client’s ability to verify the integrity of secure content: (a) 5stages on critical path with CPU use blocking use of the network insteps 2 and 4 and vice-versa in steps 3 and 5, (b) 4 stages on criticalpath with CPU use blocking use of the network in steps 2 and 3 andvice-versa in step 4, (c) 3 stages on critical path with CPU and networkutilized throughout.

push the contents of foo.js which is served from the same do-main, but not the third-party resource img.jpg. If web serverswere to fetch resources from external domains and push them toclients [22, 36, 43], clients would be unable to verify the integrityof secure page content. Moreover, since any client’s request to aweb server will only include the client’s cookie for that domain,resources fetched from other domains by that web server will notreflect any personalization of content by those domains.

• If web servers only push locally hosted resources, the client willdiscover resources that it needs to fetch from other domains onlyafter processing previously fetched resources, e.g., in Figure 5(b),the client can discover the need to fetch img.jpg only afterexecuting foo.js. This makes the CPU a potential bottleneckin the client’s fetching of resources.


Client

Scheduler

Domain AFront-end

(Online parsingof HTML)

Offlinedependency

resolution

1. List of URLs of external dependenciesand low priority local dependencies 2. Requested resource3. Content push of highpriority local dependencies

......

Resource list

Resolution table

Domain B Domain C

T=0

Page load timelineFetch all HTML,

JS, CSS

onload

Requests with cookie forcorresponding domain

Requests withno cookies

Fetch other resources

HTML/CSS parsing + JS execution

Figure 6: Illustration of the components in VROOM and the interactions between them.

• During a page load, if every domain independently pushes itsresources to the client, the client’s receipt from one domain of aresource that must be processed (i.e., HTML, CSS, or JavaScript)can be delayed due to bandwidth contention on the client’s ac-cess link with other resources (such as bulky images) from otherdomains. This makes the network a potential bottleneck in theclient’s processing of resources.

3.2 Combining PUSH with dependency hints

Given these limitations associated with relying solely on HTTP/2PUSH, we leverage an additional server-side capability. When a webserver receives a request for a resource, in addition to pushing thecontent for some dependent resources that it owns, the server canalso return a list of URLs for other dependent resources—we referto this list as dependency hints. Web servers can include such a listof URLs as an additional header in HTTP responses.

When a client receives dependency hints, it can fetch every re-source whose URL is included in the list, without having to firstprocess other resources on the page to discover the URLs in the list.For example, in Figure 5(c), based on the hint received from a.com,the client can fetch img.jpg from b.com without having to waitto receive and execute foo.js.

Legacy browsers already support dependency hints in the formof HTTP Link headers which have the preload attribute set [16];resources listed in these headers are immediately fetched by browsersbut are not evaluated until they are referenced by the page (i.e.,Link preload headers are primarily used to prewarm browser cachesduring page loads).

Using dependency hints in addition to HTTP/2 PUSH offersseveral advantages:• Any web server can safely send clients dependency hints for third-

party resources. For any URL received via dependency hints, aclient will fetch it directly from the respective domain, enablingthe client to confirm the integrity of resources served over HTTPSand preserving that domain’s ability to personalize content.

• The client need not fetch resources that are already cached locally,making it easier to minimize bandwidth waste compared to theuse of HTTP/2 PUSH.

• The client maintains control over its concurrent fetches of re-sources from multiple domains; it can coordinate its downloadssuch that high priority resources (those that must be processed)are not delayed.

4 DesignUsing the approach described in the previous section requires us toanswer three questions:• In response to a request from a client, how can a web server

accurately identify the list of dependent resources that it shouldinform the client about, including ones hosted in other domains,without having clients fetch resources that are irrelevant to theongoing page load (thereby wasting bandwidth)?

• How can a server provide a sufficient number of dependency hintsto clients, without knowledge of how content is personalized byother domains that serve resources for a given page?

• At any point in time during a page load, a client’s access linkbandwidth is shared by resources that are explicitly fetched bythe client or proactively pushed by servers. Given this contention,when should clients schedule resource fetches? What resourcesshould servers push?In designing VROOM to address these questions, we respect two

primary constraints: 1) we do not rely on input from developers tocharacterize the dependencies on web pages because the resourceson a page are typically spread across several domains [21], and nosingle developer is likely to have complete knowledge about all de-pendencies on a page; and 2) to preserve the integrity of content andto protect user privacy, any client will accept a resource only fromthe domain that serves that resource, and it will share its cookie fora domain only with web servers in that domain. Figure 6 illustratesthe server-side and client-side components of VROOM, which wedescribe next.

4.1 Server-side dependency resolutionGenerating an accurate list of dependent resources for a web page ischallenging due to the constant flux in resources on modern pages.Moreover, unlike recent work [22, 35] focused on generating a page’sstable dependency structure, here we need to identify the preciseURLs of resources that a server must either push or include in itsdependency hints. To appreciate the challenge in doing so, we firstconsider two strawman approaches before describing our solution.

4.1.1 Strawmans for resource discovery

Strawman 1 (Online). When a server receives a request for theHTML of a page, the server could load the entire page locally,mimicking a client browser, to identify the other resources on thepage. However, many of the URLs fetched by the server during itspage load will not be requested during the client’s page load. On


0.00

0.25

0.50

0.75

1.00

0.25 0.50 0.75 1.00

Fraction of persistent resources

CDF

acro

ss w

ebsit

es

One WeekOne DayOne Hour

Figure 7: Fraction of resources, per page in the Alexa Top 100, thatpersist over different time scales.

Relatively stable resources(Offline loads accounting for

device type customization)

Dynamic page content(Online analysis of HTML)

Resources that varyacross back-to-back loads(Onus on client to discover)

User-specific personalization(Dependencies stemming froma HTML resolved by domain

that serves HTML)

Figure 8: Summary of techniques used in VROOM for server-side de-pendency resolution to account for the different types of resource onany web page.

the one hand, back-to-back loads of a page often differ in the exactset of URLs fetched (e.g., ads typically insert a randomly selectedidentifier into the URLs they fetch). For example, 22% of the URLsfetched to load the median page in the Alexa Top 100 list changeacross back-to-back loads. On the other hand, loading a page at oneserver cannot account for the personalization performed by otherdomains, since the server only has the user’s cookie for its domain.If servers fail to account for these discrepancies and either push tothe client or ask the client to fetch unnecessary objects, the user islikely to experience higher load times.Strawman 2 (Offline). Alternatively, a server can periodically loadeach page that it serves. This enables the server to account for thevariation in resources over time; when a client requests the HTMLfor a page, the server can return the set of resources that it hasrepeatedly observed on recent loads of that page. With this approach,we risk missing a large fraction of URLs that a client will need tofetch when loading the page. For example, the set of stories or set ofproducts on the landing page of a News or Shopping site changesoften. Figure 7 confirms this; for the median site in the Alexa Top100, only 70% of the resources on the landing page remain stableover one hour, and this number drops to 50% over one week.

4.1.2 Our solution: offline + online discovery

The two strawmen approaches for server-side resource discoveryillustrate the following trade-off: servers must ensure that a clientdoes not end up fetching unnecessary resources, but if they are tooconservative (and thus let clients discover many resources on theirown), the utility of input from servers will be minimal. To addressthis trade-off, we observe that both offline and online dependencyresolution are necessary at the server. Figure 8 summarizes thetechniques we employ. Periodic offline resolution of a page helps

0.00

0.25

0.50

0.75

1.00

0 0.25 0.5 0.75 1

Intersection over Union (Compared to a Nexus 6)

CDF

acro

ss w

ebsit

es Nexus 10OnePlus 3

Figure 9: Comparison of the stable set of resources on each page whenthe user device is a Nexus 6 compared with when the user device is aNexus 10 or a OnePlus 3.

confirm which URLs are consistently fetched when loading the page,whereas online resolution helps account for flux in page content.Offline dependency resolution. Offline server-side determinationof the resources on a page works as described previously: the pageis loaded periodically (once every hour in our implementation), andat any point in time, URLs fetched in all recent loads are consideredlikely to also be fetched when a client loads the page. However, eventhe largely stable subset of resources on a page can vary across differ-ent types of client devices (Figure 9). For example, it is common forwebsite providers to use CSS stylesheets and JavaScript objects thatcause different clients to fetch images of different sizes on the samepage depending on the client’s display resolution or pixel density.

VROOM’s offline server-side dependency resolution efficientlyaccounts for device-specific customization of resources in two ways.First, the server need not load each page on every type of device;this would be onerous given the large variety of smartphones andtablets on the market. Instead, after a few loads of a page, the servercan bin all device types into a few equivalence classes. The equiva-lence classes can vary across pages because different pages may becustomized based on different device characteristics. For example,in Figure 9, the stable set of URLs fetched when loading a page on aNexus 6 smartphone matches the stable set of resources for a One-Plus 3 phone much more closely than for a Nexus 10 tablet. Second,after device type equivalence classes for a page are identified, theserver need not load the page on a real device in each class. Instead,the server can leverage existing device emulation tools [3].Online HTML analysis. In addition to offline dependency resolu-tion, when a VROOM-compliant web server responds to a requestwith an HTML object, it not only informs the client of dependenciesdiscovered from loading this object offline, but also includes allURLs seen in the HTML object by parsing it on the fly. While therecan be other sources of dynamism on a page (e.g., a script on thelanding page of a shopping site may fetch products currently on sale),we show later in Section 6 that accounting for the URLs in HTMLobjects suffices on most pages to capture the flux in page content.Importantly, we find that server-side parsing of HTML objects asthey are being served adds a median delay of only roughly 100 msacross the landing pages of the top 1000 websites. This overheadis offset by the multi-second reduction in page load times madepossible by server-aided resource discovery.


a.com/index.html

a.com/banner.jpg

c.com/ad.php

b.com/style.css

c.com/ad_inject.js

c.com/mouseover.js d.com/car.gif

b.com/logo_lo_res.png

Figure 10: Illustration of how VROOM-compliant servers account forpersonalization. A client that requests for index.html from a.comonly discovers the resources within the solid blue envelope, because arequest for ad.php returns a HTML, whose content could be person-alized to a specific user. The client discovers the resources in the dashedred envelope in response to its request for ad.php sent to c.com. Theclient will have to itself discover the need to fetch d.com/cars.gif.

4.2 Accounting for personalization

Unlike content push, dependency hints allow a server to informclients about resources served by other domains. But, content servedby one domain may be personalized in ways that other domains areunaware of (e.g., based on information stored in cookies).

A naive solution for handling personalization would be for anyweb server to never return dependencies derived from external con-tent. In other words, any resource that is discovered during the pageload by parsing or executing an external resource is deemed as onethat could differ due to personalization. For example, in Figure 10,in response to a request for index.html, a.com can inform aclient about b.com/style.css, but let the client discover theneed to fetch b.com/logo lo res.png only when it requestsstyle.css from b.com. Accounting for personalization in thismanner would however inflate the latency incurred by the client indiscovering all resources on the page, thereby limiting its ability tofully utilize the network.

To handle content personalization while enabling low-latency re-source discovery, we observe that websites are personalized primar-ily in two ways: by customizing the content of HTML responses,3

and by adapting the execution of scripts. Server-side resource dis-covery in VROOM accounts for these two types of personalization asfollows. First, web servers omit hints for dependencies derived froman external resource only when that resource is an HTML object (i.e.,an embedded iframe); servers do include dependencies derived fromother types of external resources (e.g., style.css in Figure 10).In comparison with the naive solution described above, our approachreduces the latency for the client to discover resources. Second, ofthe resources determined based on JavaScript execution, those af-fected by user-specific state (e.g., local time) are left to clients todiscover on their own. JavaScript-based personalization will typi-cally vary over time, and hence, such unstable resources will getfiltered out via offline dependency resolution.

3Content of CSS stylesheets and scripts are seldom user-specific. Personalization ofimages and videos (i.e., returning different versions when the same URL is fetched) istypically device-specific, which we have previously accounted for.

-2

-1.5

-1

-0.5

0

0.5

1

1 2 3 4 5 6 7 8 9 10

Incr

ease

in re

ceip

t tim

ere

lativ

e to

bas

elin

e H

TTP/

2

Resource ID

Push All, Fetch ASAPVroom

Figure 11: Need for and utility of careful scheduling of server-sidepush and client-side fetch of the resources that need to be processed(HTML, CSS, and JS) on http://eurosport.com. Resources are orderedbased on the order in which they are fetched with baseline HTTP/2.

4.3 Cooperative request schedulingFinally, we turn our attention to questions that must be answered forclients to benefit from server-side resource discovery. How shouldweb servers combine the use of HTTP/2 PUSH and dependencyhints to aid clients? How should clients utilize the dependency hintsfrom servers?Strawman: Push whenever possible. Fetch upon discovery. Wefirst consider the most straightforward answers to these questions.When a web server receives a client’s request for an HTML object, itcan push to the client all dependent resources that it owns. The servercan inform the client of all other dependencies via dependency hints.As soon as the client receives these hints, it can initiate downloadsfor all specified URLs.Problem: Bandwidth contention. Though this simple strategy sig-nificantly improves utilization of the client’s access link, we do notsee a commensurate increase in CPU utilization. This is because,though the client receives the complete set of resources on a pagewell before it would without server push and dependency hints,contention on the client’s access link slows down fetches of someof the resources that need to be processed. In the example in Fig-ure 11, though the time to fetch the first 10 resources that need to beprocessed reduces by 2 seconds with the “Push All, Fetch ASAP”strawman, simultaneous use of the client’s access link to transferall of these resources delays the first few resources, causing thebrowser’s processing to stall. Due to the resulting under-utilizationof the CPU, we show later in Section 6 that applying this strawmansolution yields no improvements in page load times.Solution: Prioritization via selective push and staged downloads.Our solution to this problem is to prioritize the fetches of resourcesthat need to be processed (HTML, CSS, and JS) over those that neednot be processed (e.g., images and videos).4 Classifying resourcesinto high and low priority groups helps because, once the client hasfinished fetching all resources that need to be processed, utilizationof the CPU and the network are largely decoupled over the rest of thepage load. Moreover, the types of resources that need to be processedtypically constitute a small fraction of the bytes on a page [7].

4We however consider all resources that are descendants of third-party HTML objects(i.e., HTMLs within a page’s iframes) as low priority because web browsers processiframes only after the root HTML for the page has been completely downloaded andparsed. This helps minimize network contention for high priority resources referencedin the root HTML, thus reducing the fetch times for those resources.

http://eurosport.com


Header DescriptionLink preload Resources to be processed (e.g., JavaScript and HTML objects), fetched at the highest priority

x-semi-important Resources to be processed that are lazily fetched, e.g., “async” JavaScript or CSS objectsx-unimportant Resources that do not need to be parsed or executed (cannot have derived children), e.g., images

Table 1: HTTP headers used by VROOM-compliant servers to provide dependency hints to client browsers. Headers are listed in decreasing order ofpriority. Resources in each header are listed in the order they need to be processed.

This prioritization of resources that need to be processed isachieved in VROOM via two means. First, when a web server re-sponds to a client’s request for an HTML object, out of all thedependencies the server identifies, it pushes the content of only thehigh priority resources served from the local domain. All other de-pendencies are returned to the client via dependency hints. Second,when the client receives a list of URLs, it immediately fetches onlyhigh priority resources; the server’s hints specify resources in theorder in which the client will need to process them, so client requestssimply mimic that order. Once resource discovery from servers iscomplete and the client has finished fetching all high priority re-sources that have been discovered, it issues requests for all otherresources at once. In Figure 11, the time by when receipt of the 10resources shown completes is the same with VROOM’s schedulingas with the strawman strategy, but without significantly delaying thereceipt of any individual resource.

Note that VROOM’s scheduler is tailored for the setting where webpages are loaded on a state-of-the-art mobile device connected to aLTE network; as we saw earlier in Section 2, the client CPU is thebottleneck in this case. Alternate scheduling strategies will likely benecessary in settings where either network bandwidth (e.g., becauseof many users simultaneously accessing the cellular network [23])or latency (e.g., on 2G or 3G networks [25]) is the bottleneck.

5 ImplementationThe realization of VROOM requires both server-side (offline andonline dependency resolution; pushing resources to clients and in-cluding dependency hints in responses) and client-side (schedulingfetches of hinted objects) changes. Since this preempts evaluation inthe wild, we have implemented VROOM to accelerate web page loadswhen Google Chrome is used to load pages from the Mahimahi [36]replay environment.

5.1 Server-aided resource discoveryVROOM-compliant servers inform clients of dependent resources viacontent pushes and dependency hints. To push resources, we lever-age HTTP/2’s PUSH capability; since Mahimahi uses Apache webservers which do not yet support HTTP/2, we run an HTTP/2 nghttpxreverse proxy [14] in front of each web server. For dependency hints,we rely on embedding additional headers in HTTP responses. Forexample, when a browser encounters a Link preload header [16] inan HTTP response, the browser will immediately issue a request forthe URL embedded in that header.5 Table 1 summarizes the headersused by VROOM to provide dependency hints to clients.

To minimize stalls in the client browser’s processing of resources,dependency hints from any server list resources in the order the clientwill need to process them and the client requests hinted resources

5While Link preload headers are currently supported by Chrome, other browsers such asFirefox are actively in the process of adding support for these headers: https://bugzilla.mozilla.org/show bug.cgi?id=1222633.

in this order; the server discovers this order during its offline andonline dependency resolution. However, since the client issues theserequests back-to-back, a web server may receive multiple requestsnear-simultaneously, causing it to respond to all requests in parallel.The resulting contention for the client’s access link bandwidth leadsto the slowdown of some resources over others as seen with the“Push All, Fetch ASAP” strategy in Figure 11. Therefore, we modifyMahimahi so that any web server returns the content for requestedresources in the same order in which it receives requests.

5.2 Scheduling requests with JavaScriptTo schedule client-side downloads of URLs learned via dependencyhints (Section 4.3), VROOM uses a JavaScript-based request sched-uler. For each recorded page in Mahimahi, we modify the page’s top-level HTML using Beautiful Soup [2]. VROOM’s scheduler script isadded as the first tag in the HTML, thereby ensuring that the browserexecutes this script as soon as it begins parsing the HTML.

VROOM’s scheduler script begins its execution with two steps.First, it defines an onload handler, response handler, which it at-taches to each request that it makes. This handler maintains a list ofdependency hints that it has seen and fires every time the browser re-ceives a response for a request made by VROOM’s scheduler. Second,the script issues an XHR (XMLHttpRequest) for the page’s HTML,whose URL we embed as an attribute in the tag.6

The scheduler examines Link preload headers in the response forthe page’s HTML to discover dependency hints.7 It then issues re-quests for high priority dependencies by adding tags (withthe preload attribute set) to the DOM. Importantly, the scheduler’s re-quests for high priority resources are served from the browser’s localcache, because the browser itself immediately requests URLs in-cluded in Link preload headers. Moreover, modern browsers permitonly a single outstanding request for any given URL.

Thereafter, VROOM’s scheduler runs in an event-driven loop.Whenever the browser invokes response handler upon receiving aresource, the scheduler marks that resource as fetched. The schedulerthen examines the set of outstanding requests to determine whetherit should start fetching dependencies in the next level of priority.Specifically, once all high priority resources learned via dependencyhints have been received, VROOM’s scheduler issues requests for allsemi-important resources that it has discovered until that point. Thisprocess then repeats for low priority resources.

By using a JavaScript-based request scheduler, our implementa-tion of VROOM can accelerate page loads on unmodified commoditybrowsers. However, due to the single-threaded nature of Javascript, if

6Note that VROOM’s scheduler removes this attribute and its own DOM node from thepage once the XHR for the top-level HTML is issued. Thus, subsequent accesses to theDOM are not affected.7To ensure that the request scheduler can securely access headers in HTTP responsesserved from third-party domains, responses must include the “Access-Control-Expose-Headers” header with the values “Link,” and our custom headers “x-semi-important”and “x-unimportant.”

https://bugzilla.mozilla.org/show_bug.cgi?id=1222633https://bugzilla.mozilla.org/show_bug.cgi?id=1222633


0.00

0.25

0.50

0.75

1.00

0 5 10 15 20Page Load Time (s)

CD

F ac

ross

web

site

s

Lower BoundVroomHTTP/2 BaselineHTTP/1.1

0.00

0.25

0.50

0.75

1.00

0 5 10 15 20Above−the−fold Time (s)

CD

F ac

ross

web

site

s


0.00

0.25

0.50

0.75

1.00

1000 3000 5000 7000 9000Speed Index

CD

F ac

ross

web

site

s


(a) (b) (c)Figure 13: With respect to three different metrics, VROOM yields significant benefits compared to simply upgrading from HTTP/1.1 to HTTP/2, andcomes close to matching the achievable lower bound.

Local desktop

Client nghttpxproxy

ApacheserverVPN over

cellular network

RTT1

RTT2RTT3

Figure 12: Setup to evaluate page load performance enabled by ourimplementation of VROOM.

the browser is executing another script on the page when a responsearrives, response handler will not fire immediately. This delays thefetches of lower priority resources. Therefore, in the future, incorpo-ration of the scheduling logic into the browser may enable greaterperformance gains than what we report.

6 EvaluationWe evaluate VROOM from two perspectives: 1) performance benefitsfor users, and 2) the accuracy with which servers can aid resourcediscovery for clients. The key highlights from our evaluation are:• Across 100 popular News and Sports websites, we see that the

adoption of VROOM would yield near-optimal performance onthe median site with respect to two different metrics: page loadtime (PLT) and above-the-fold time (AFT). In comparison to theadoption of only HTTP/2, VROOM’s use would reduce the medianPLT and AFT values by 30% and 20%, respectively.

• Simple alternatives to VROOM (e.g., relying only on prior loadsof the page for dependency discovery, using only HTTP/2 PUSHbut not dependency hints, or not scheduling pushes and fetches)can increase the median page load time by over 2 seconds.

• On the median site, VROOM’s server-side discovery of dependentresources has a false negative rate below 5%, which in turn resultsin a 22% decrease in client-side latency to discover all resourceson the page.

6.1 Impact on client performance

Methodology. We use the setup shown in Figure 12 to experimen-tally evaluate our implementation of VROOM. We load pages inChrome for Android on a Nexus 6 smartphone connected to a Veri-zon LTE hotspot. The phone is also connected via USB to a desktop,which subscribes to events exported by Chrome via the Remote De-bugging Protocol (RDP). The phone has a VPN tunnel setup to the

desktop, on which we host Mahimahi [36]. For every web page onwhich we test VROOM, we initially load the page with the desktopas a proxy to have Mahimahi record all page contents; page loadtimes with this setup match those measured when we load pageswith the phone directly communicating with web servers. Thereafter,when replaying page loads, we configure Mahimahi such that trafficbetween the phone and any of the web servers is subjected to notonly the delay over the cellular network but also the median RTT ob-served between the desktop and the corresponding web server whenrecording page contents. The desktop on which we deploy Mahimahiis sufficiently well-provisioned so that its CPU or network is nota bottleneck, and as we mentioned earlier in Section 2, load timeswhen we replay page loads using HTTP/1.1 between the client andall servers closely match load times measured when loading pagesdirectly from the web.

We evaluate the utility of VROOM on the landing pages of thetop 50 News and top 50 Sports websites as well as 100 randomlychosen sites from Alexa’s top 400 sites;8 in most of our experiments,we focus on the News and Sports sites because, as seen earlier inSection 2, the need for performance improvements on these sitesis particularly acute with a median page load time of 10.5 seconds.We load each page 3 times in our replay setup and consider the loadwith the median page load time. During these loads, web serversidentify the dependencies to return to clients by drawing upon threeprior loads of each page gathered 1, 2, and 3 hours prior to when werecorded the page content in Mahimahi.

In addition to page load times, we also evaluate VROOM’s benefitswith respect to additional metrics which capture the speed withwhich page content is rendered, as this impacts users’ perceptionof page load performance [29]. To measure these metrics, we usescreenrecord, a utility program which captures videos of page loadson Android devices. We pass the recorded videos to the visualmetricstool [18] which outputs the metrics of interest.Improvement in page load times. First, we compare page loadperformance when using VROOM with that when doing a HTTP/2based replay (i.e., same setup as Figure 12, except that serverssimply return requested resources), which we refer to as the HTTP/2baseline. On the top 50 News and top 50 Sports sites, Figure 13(a)shows that VROOM significantly reduces the page load time onthe median site—from 7.3s with the HTTP/2 baseline to 5.1s withVROOM—closely matching the lower bound (median of 5s); thelower bound corresponds to the maximum of the CPU-bound and

8The high page load times for these pages and the need to load each page multipletimes to account for the variability of the cellular network limit us from running ourevaluation on more web pages.


0.00

0.25

0.50

0.75

1.00

0 4 8 12 16

Page Load Time (s)

CD

F ac

ross

web

site

s

VroomPolaris

Figure 14: Comparison of page load times when pages are loaded withVROOM and with Polaris.

network-bound loads described earlier in Section 2. On the 100 sitesfrom the top 400, where the median page load time is significantlylower than on the News and Sports sites even with the HTTP/2baseline (median of 4.8s), VROOM still reduces the median pageload time to 4s.

The improvements in load times described above are feasiblewhen all domains adopt the changes prescribed by VROOM. Tounderstand VROOM’s benefits as it is incrementally adopted, weevaluate VROOM in a scenario where the first party domain for everyweb page (i.e., the domain that serves the root HTML for the page),along with all other domains controlled by the same organization,are VROOM-compliant. All other domains contacted in each pageload only conform to HTTP/2, without pushing content or providingdependency hints. While the median page load time across the Newsand Sports increases to 5.6s in this case, as compared to 5.1s withuniversal adoption of VROOM, this is still significantly lower thanthe 7.3s median page load time with the HTTP/2 baseline.

We also compared VROOM to Polaris [35], a state-of-the-artweb accelerator which uses information about a page’s dependencystructure to prioritize requests for critical resources. Figure 14 showsthat, compared to Polaris, VROOM is able to reduce the median pageload time for the popular News and Sports sites from 6.4s to 5.1s.As noted in Section 2, the primary reason for this performance gainis that Polaris still leaves clients to discover resources on their own(i.e., the client can discover the need to fetch a resource only afterfetching and evaluating another resource). Further, Polaris does notspecify policies for servers to proactively push resources to clientsin anticipation of future requests.

Though Figures 13(a) and 14 show that VROOM significantly im-proves performance for the median page, these benefits are marginalat the tail; in fact, Polaris outperforms VROOM in the tail of the loadtime distribution. The reasons for this are two-fold. First, certainsites include dynamic content that VROOM is unable to detect simplyby online analysis of HTMLs; VROOM defers the discovery of suchunpredictable resources to clients and is unable to provide hints forthese resources. Second, when clients prefetch objects (specifiedby dependency hints) and servers in multiple domains concurrentlypush resources, bandwidth contention can result in high priorityresources being delayed. These results illustrate that combiningthe complementary approaches used in VROOM and Polaris is apromising direction of future work.Improvement in visual performance metrics. In addition to re-ducing page load times, VROOM also improves metrics such asAbove-the-fold time and Speed Index which grade performance

(a) (b)Figure 15: For the Fox News mobile site, (a) rendering of the abovethe fold content completes at 9.26s with VROOM; (b) with only HTTP/2enabled, rendering is incomplete at that time and completes only laterat 13.87s.

based on visual completeness of page loads. Above-the-fold timemeasures the time until all content that is “above the fold” (i.e., auser sees prior to scrolling) is rendered to its final state. Speed Indexextends this metric to capture the rate at which all the content thatis above the fold gets rendered. Figures 13(b) and (c) show that,across the popular News and Sports sites, VROOM improves theAbove-the-fold time and Speed Index for the median site by 400msand 380ms, respectively.

Figures 15 shows these benefits on an example site (http://m.foxnews.com/). With VROOM, rendering of all above the fold contentcompletes at 9.26s. At that same time in the page load with theHTTP/2 baseline configuration, the main images are still missingand the rendering of the page is yet to converge. With HTTP/2,rendering of all above the fold content takes 4.6s longer, completingat 13.87s into the page load.

The above-described improvements in page load performancewith VROOM are due to several reasons; we dig into each of thesenext. For brevity, we show results only for the popular News andSports sites.Latency in discovering and fetching resources. A key benefit ofserver-aided resource discovery is that it enables clients to discoverresources and complete fetching them much sooner than in normalpage loads. Figure 16 depicts these improvements both when consid-ering all resources identified as dependencies by VROOM-compliantweb servers, and also when considering only those dependencieswhich are high priority resources (i.e., HTML, CSS, and JS objects,which are the ones that need to be parsed or executed).

Reducing the time by when the client completes fetching all de-pendencies (a 22% reduction on the median page) is crucial becauseany further network activity necessary to complete the page load isonly to fetch the small subset of unpredictable resources that serversfail to identify as dependencies. But, in addition, the drop in the timeby when all high priority dependencies finish downloading—a 12%

http://m.foxnews.com/http://m.foxnews.com/


0.00

0.25

0.50

0.75

1.00

−0.25 0 0.25 0.5 0.75 1

Discovery Time Improvement over HTTP/2 (%)

CDF

acro

ss w

ebsit

es

High Priority OnlyAll

(a)

0.00

0.25

0.50

0.75

1.00

−0.25 0 0.25 0.5 0.75 1

Fetch Time Improvement over HTTP/2 (%)

CDF

acro

ss w

ebsit

es

High Priority OnlyAll

(b)Figure 16: In comparison to HTTP/2, VROOM reduces the latency inboth (a) discovering resources and (b) completing their downloads.

0

2

4

6

8

10

12

Lower Bound Vroom Deps fromPrevious Load

HTTP/2Baseline

Page

Loa

d Ti

me

(s)

Figure 17: In contrast to VROOM, if servers return dependencies as allthe resources seen on a prior load of the page, page load times increaseon many pages. 25th percentile, median, and 75th percentile are shownin each case.

median reduction compared to baseline HTTP/2—is also criticalsince use of the CPU and the network are largely decoupled there-after. Both of these improvements are made possible because of thespeedup in the client discovering dependencies: in the median, 22%and 16% faster discovery of all dependencies and of all high prioritydependencies, respectively.

Faster discovery and preemptive receipt of resources also helpsreduce the time spent on the critical path waiting to receive dataover the network. While the simple use of HTTP/2 led to networkdelays accounting for over 30% of the critical path on the mediansite (Figure 4), VROOM’s use reduces the network wait time on thecritical path by 24% on the median site.Utility of accurate dependency inference. While input from serversenables clients to discover and fetch dependent resources sooner,the benefit of this input strongly relies upon the accuracy of thedependencies returned. To show this, we consider servers identifying

0

2

4

6

8

10

Lower Bound Vroom Push HighPriority, No Hints

Push All,No Hints

Page

Loa

d Ti

me

(s)

Figure 18: VROOM improves page load times compared to using onlyserver-side push to inform clients of dependent resources. 25th per-centile, median, and 75th percentile are shown in each case.

0

2

4

6

8

10

Lower Bound Vroom Push All,Fetch ASAP

No Push,No Hints

Page

Loa

d Ti

me

(s)

Figure 19: VROOM’s judicious scheduling of server push and clientfetch is key to enabling improvements in page load time. 25th percentile,median, and 75th percentile are shown in each case.

the set of dependencies to return to a client simply based on a priorload of the page, i.e., all resources seen in a load within the past hourare assumed to be relevant to the new load. Figure 17 shows that,though the median page load time does reduce with this approach,the extraneous inaccurate dependencies returned to the client de-grade performance on many sites; the 75th percentile increases byover 1.5 seconds.Need for combining HTTP/2 PUSH and dependency hints. Be-yond accuracy in server-side dependency discovery, VROOM’s bene-fits draw upon the combined use of HTTP/2 PUSH and dependencyhints. Figure 18 shows that simply relying on server push is insuffi-cient. Irrespective of whether we push all static resources or only thesubset of static resources that need to be processed, median page loadtime remains more than 2 seconds higher than with VROOM. Thisstems from the preponderance of third-party resources on modernweb pages; servers can inform clients of such dependencies only viadependency hints, as any server can securely push only the contentthat it hosts.Utility of scheduling. While HTTP/2 PUSH and dependency hintsenable faster resource discovery, performance improvements withVROOM also hinge upon judicious coordinated scheduling of pushesand downloads of discovered dependencies. Figure 19 illustratesthe utility of the cooperative scheduling in VROOM compared tothe strawman “Push All, Fetch ASAP” approach discussed in Sec-tion 4.3 (where servers push any resource they can and clients fetchany resource immediately upon discovery). Because only high prior-ity resources are pushed by VROOM servers and are preferentiallyfetched by clients, contention for access link bandwidth has mini-mal impact on the processing of resources. The resultant increased


0.00

0.25

0.50

0.75

1.00

0 0.25 0.5 0.75 1

Predictable Resources / Total

CD

F ac

ross

web

site

s CountBytes

0.00

0.25

0.50

0.75

1.00

0 0.25 0.5 0.75 1

Fraction of Predictable Set

CDF

acro

ss w

ebsit

es

Online OnlyVroomOffline Only

0.00

0.25

0.50

0.75

1.00

0 0.25 0.5 0.75 1

Fraction of Predictable Set

CDF

acro

ss w

ebsit

es

VroomOffline OnlyOnline Only

(a) (b) (c)Figure 21: (a) Among the resources derived from a page’s root HTML, except for those derived from embedded HTMLs, the contribution of thepredictable subset to the number of resources and bytes. As a fraction of the size of this predictable subset, the resources that are either (b) missed or(c) are extraneous when using VROOM’s server-side dependency resolution as compared to offline-only and online-only analyses.

0

2

4

6

8

10

Back-to-back 1 Day Later 1 Week Later

Page

Loa

d Ti

me

(s)

VroomHTTP/2 Baseline

Figure 20: For three different delays between the load that warms thebrowser’s cache and the load on which we evaluate page load perfor-mance, VROOM reduces page load times. 25th percentile, median, and75th percentile are shown in each case.

utilization of the CPU enables VROOM to improve performanceover baseline HTTP/2. Whereas, use of the strawman approach forservers to inform clients of dependencies offers minimal benefits; infact, the median load time increases as a result of increased networkcontention.VROOM accelerates page loads with warm caches. All of ourexperiments thus far have considered the browser’s cache to beempty. To evaluate VROOM when the browser’s cache is not empty,we first identify cacheable objects by examining the headers in HTTPresponses. We mimic three different scenarios, wherein once a pageis loaded by a client, it loads the page again immediately thereafter(i.e., back-to-back loads), a day later, or a week later. Importantly, toprevent wasted bandwidth, resources that were already cached at theclient were not pushed by servers.

Figure 20 shows that VROOM significantly improves page loadtimes in all three warm browser cache settings. When consideringback-to-back loads, VROOM reduces the page load time of the me-dian site by 1.6s. This improvement increases to 2.2s and 2.1s forthe loads separated by one day and one week, respectively.

6.2 Accuracy of server-side dependency resolution

Setup. To evaluate the accuracy of the resource dependencies thatVROOM-compliant servers return to clients, we consider 265 webpages drawn from popular News and Sports websites; these pagesspan a variety of page types such as landing pages, individual articles,results for specific games, etc. We load these pages once every hourfor a week from the perspective of four users, whose cookies areseeded by visiting the landing pages of the top 50 pages in the Busi-ness, Health, Computers, and Shopping/Vehicles Alexa categories,

respectively. Every hour, we load each page twice back-to-back fromevery user’s perspective for reasons described shortly.

As described earlier in Section 4.1, server-side dependency reso-lution in VROOM relies upon both offline and online analysis. Thedependencies identified via offline dependency resolution include theresources seen in each of the loads in the past 3 hours. For online anal-ysis, we model the server which serves any HTML object—eitherthe root HTML on a page or one embedded in an iframe—returningall links in that HTML. Recall that, in order to account for person-alization, VROOM-compliant servers return dependencies—eithervia push or via dependency hints—only in response to requests forHTML objects (Section 4.2).Strategies for server-side resource discovery. We compare de-pendency resolution in VROOM with both the strawman approachesdescribed earlier in Section 4.1: offline-only returns URLs seen inthe intersection of loads over the past 3 hours, and online-only loadsthe page on the fly at the server and returns the URLs fetched.Definition of accuracy. To evaluate the accuracy with which eachof these approaches can identify the set of URLs that a client mustfetch during a page load, we partition the set of URLs we see in anypage load into a predictable and unpredictable subset. We identifythe subset of unpredictable URLs as ones that differ between back-to-back loads; these are URLs that VROOM leaves it up to the client todiscover. As seen in Figure 21(a), out of the subset of resources on apage that a server can potentially return as dependencies in responseto a request for a HTML (i.e., all the resources derived from HTMLminus the ones derived from embedded iframes), the predictablesubset accounts for over 80% and over 95%, respectively, in termsof the number of resources and the number of bytes. We evaluate theaccuracy of each approach for server-side resource discovery withtwo metrics—the number of resources identified as dependenciesby the server which do not appear in the predictable subset of theclient’s load (false positives), and the number of resources in thepredictable subset that the server fails to identify (false negatives)—both computed as fractions of the predictable subset’s size.Results. First, Figure 21(b) shows that, out of the resources inthe predictable subset, the fraction that VROOM-compliant serverswould fail to identify is less than 5% for the median page. Whereas,offline-only dependency resolution ends up missing as many as 40%of the predictable subset of resources seen on any particular pageload because of its inability to cope with changes from hour tohour. The online-only approach is perfect with respect to this metric,which validates our design decision to account for personalization


by limiting the set of dependencies returned to exclude resourcesrecursively derived from embedded HTMLs.

Second, with respect to the overhead imposed on clients by return-ing dependencies not in the predictable subset, Figure 21(c) showsthat VROOM matches the offline-only approach. In both of thesecases, identifying the stable set of resources seen consistently on apage helps in ignoring resources that happen to show up on a singleload. Due to its inability to cope with such nondeterminism, theonline-only approach identifies many extra resources, which inflatethe predictable subset by as much as 20% in the median case.

7 Discussion

Deployability. Unlike attempts at clean-slate redesigns of the Inter-net’s architecture, new designs for client-server interactions on theweb are more amenable to deployment, as evidenced by the recentincorporation of SPDY into HTTP/2. This is possible because ofseveral differences between the web and general communicationover the Internet: 1) clients directly interact with web servers, un-like an ISP having to rely on other ISPs to forward traffic, 2) afew popular browsers and web servers are dominant, and 3) sincesome browsers (Chrome and IE) are controlled by popular contentproviders (Google and Bing), these providers can unilaterally testperformance improvements enabled by a new proposal such as ourswithout depending on adoption by others.Server-side overhead. VROOM-enabled servers identify dependentresources using both online and offline analyses. For popular web-sites that host thousands of web pages, loading each page every hourto facilitate offline dependency resolution will likely be onerous.We observe that there are typically only a few types of pages oneach site and the stable set of resources (e.g., CSS stylesheets, fonts,logo images, etc.) are likely to be common across pages of the sametype. For example, on a news site, landing pages for different newscategories are likely to share similarities as will news articles aboutdifferent individual stories. We defer for future work the task ofleveraging the similarity across pages of the same type to improvethe scalability of VROOM’s server-side resource discovery.

Note that the overhead of resolving dependencies will be incurredonly by the domains which serve HTML objects, i.e., the root HTMLfor an entire page or for an individual frame in the page. Otherservers involved in a page load need only serve individual resourcesas they are requested. Moreover, for over 80% of sites, the landingpage’s HTML is served directly from origin web servers [7] and notfrom third-party CDNs. Thus, the overhead imposed by VROOM’sresource discovery will largely be incurred by top-level domainswho are only responsible for the websites they own.Security. At first glance, it may appear that having servers pushresources and having clients fetch hinted resources present new se-curity concerns for page loads, whereby compromised web serverscould push or provide hints for malware. However, of the resourcesthat are pushed or fetched based on dependency hints, client browserswill only process those resources referenced by the page beingloaded, e.g., as HTML tags. Further, page loads that include un-necessary pushes or hints will still load correctly to completion,albeit with higher load times due to the downloads of unnecessaryresources. Thus, page loads with VROOM encounter the same (butnot worse) security concerns as page loads do today.

8 Related work

Mobile web performance. Prior measurement studies [34, 44]have analyzed the performance of mobile web browsers. Like us,these studies find that CPU and network delays are bottlenecks whenloading pages on mobile devices and high-latency cellular links.

Some new proposals aim to alleviate the effects of these bottle-necks by altering how pages are written and served. For example,Google’s AMP project [1] asynchronously fetches many resourcesrequired to load a page, and uses Google’s CDN to serve AMP-enabled content with HTTP/2. In contrast to AMP, VROOM canspeed up the loads of legacy web pages. VROOM can also improvethe performance of AMP-based pages by enabling asynchronousfetches earlier using server-provided hints.Dependencies in page loads. Many recent systems [22, 30, 35]use offline analysis to discover dependencies inherent to web pages.Due to the dynamic nature of web content, previously generateddependency graphs can only capture the structure of a page, butnot the exact set of URLs that must be fetched in any particularload of the page. These systems that leverage a priori knowledgeof inter-object dependencies can therefore only reorder requests forresources, but leave the onus of discovering resources on the client.VROOM overcomes this limitation by combining offline dependencyresolution with online analysis and by carefully spreading resourcediscovery across domains.

WProf [41] identifies dependencies between different browsercomponents (e.g., HTML parser, JavaScript engine) that arise dur-ing page loads and degrade performance. By leveraging HTTP/2PUSH and browser support for fetching dependency hints, VROOMreduces the coupling between the CPU and the network; processinga resource is rarely blocked by fetching, and vice versa.Proxy-based acceleration. Cloud browsers improve mobile webperformance by dividing the load process between the client’s de-vice and a remote proxy server. By resolving dependencies using aproxy’s wired connections to origin servers (in place of the client’sslow access link), such systems can significantly reduce page loadtimes [15, 22, 36, 40, 43]. However, the reliance on web proxiesraises security and privacy concerns, as clients must share their cook-ies with the proxy in order to preserve personalization and must trustthat the proxy preserves the integrity of content served over HTTPS.Network optimizations for faster page loads. HTTP/2 [20] (for-merly SPDY [17]) reduces load times by allowing client browsersto multiplex all requests to an origin on a single TCP connection.HTTP/2 also allows servers to speculatively “push” objects theyown before the user requests them (saving RTTs) [39]. VROOMdemonstrates the need for HTTP/2’s PUSH feature to be combinedwith dependency hints in order to securely speed up dependencyresolution on the client.

Recent work has isolated two distinct factors that limit perfor-mance improvements with HTTP/2: dependencies inherent in webpages and browsers restrict HTTP/2’s ability to reduce load times [42],and the use of a single TCP connection can be detrimental in thepresence of high packet loss [24]. VROOM would benefit from mul-tiplexing requests on the same connection, but it can be used withHTTP/1.1 in the face of high packet loss.


Client-side optimizations. Content prefetching and speculativeloading systems reduce the effect that high network latencies haveon web performance [26, 28, 37, 45]. These systems predict userbrowsing behavior and speculatively fetch content in hopes thatusers will soon do the same. However, accurately predicting userbrowsing behavior remains a challenge. Thus, prefetching oftenleads to wasted device energy and data usage [38].

Other client-side improvements reduce energy usage and compu-tational delays using parallel web browsers [31, 32] and improvedhardware [47]. By increasing the amount of parallelization for nec-essary page load tasks (e.g., rendering), these systems reduce energyusage and have positive impacts on page load times.

Each of these optimizations is complementary to our work. How-ever, VROOM tackles a fundamental source of inefficiency in pageloads that client-only solutions cannot address alone: VROOM decou-ples resource discovery from object evaluation (and thus, networkdelays from computational delays). By using server-provided hints,VROOM shifts resource discovery from being a client-only task toone in which the client and server cooperate.

9 ConclusionsThe recognition that dependencies within the page load process arethe dominant cause for slow page loads has led to a slew of solu-tions recently. However, all of these solutions either compromisesecurity and privacy by relying on proxies to resolve dependenciesor have limited ability to improve mobile web performance sincethey require clients to themselves discover the resources on anypage. VROOM offers the best of both worlds: by having servers aid aclient’s discovery of resources (both via HTTP/2 PUSH and depen-dency hints), we decouple the client’s processing and downloads ofresources, but do so while preserving the end-to-end nature of theweb. By improving CPU utilization, VROOM significantly decreasespage load times compared to baseline HTTP/2.

AcknowledgmentsWe thank the anonymous reviewers, our shepherd Mark Crovella,and Simon Pelchat for their valuable feedback on earlier drafts ofthis paper. This work was supported in part by a Google FacultyResearch Award. Ravi Netravali’s participation in the project wassupported by NSF grant CNS-1407470 (PI: Hari Balakrishnan) andby the MIT Center for Wireless Networks and Mobile Computing.

References[1] Accelerated Mobile Pages Project. https://www.ampproject.org/.[2] Beautiful Soup. http://www.crummy.com/software/BeautifulSoup/.[3] Google Developers - Simulate Mobile Devices with Device Mode. https:

//developers.google.com/web/tools/chrome-devtools/iterate/device-mode/.[4] Google Page Speed. https://developers.google.com/speed/pagespeed/.[5] H2O - The optimized HTTP/2 server. https://h2o.examp1e.net/configure/http2

directives.html#http2-casper.[6] How One Second Could Cost Amazon $1.6 Bil-

lion In Sales. https://www.fastcompany.com/1825005/how-one-second-could-cost-amazon-16-billion-sales.

[7] HTTP Archive. http://httparchive.org/.[8] HTTPS adoption *doubled* this year. https://snyk.io/blog/

https-breaking-through/.[9] Keynote: Mobile Commerce Performance Index. http://www.keynote.com/

performance-indexes/mobile-retail-us.[10] Latency Is Everywhere And It Costs You Sales - How To Crush It. http:

//highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it.[11] Mobile Devices Now Driving 56 Percent Of Traffic To Top Sites. http:

//marketingland.com/mobile-top-sites-165725.

[12] The need for mobile speed: How mobile latency impacts publisher revenue.https://www.doubleclickbygoogle.com/articles/mobile-speed-matters/.

[13] New findings: For top ecommerce sites, mobile web performance iswildly inconsistent. http://www.webperformancetoday.com/2014/10/22/2014-mobile-ecommerce-page-speed-web-performance/.

[14] nghttpx - HTTP/2 proxy. https://nghttp2.org/documentation/nghttpx-howto.html.[15] Opera Mini & Opera Mobile browsers. http://www.opera.com/mobile/.[16] Preload. https://www.w3.org/TR/preload/.[17] SPDY. https://developers.google.com/speed/spdy/.[18] visualmetrics. https://github.com/WPO-Foundation/visualmetrics.[19] WPO Stats. https://wpostats.com/.[20] M. Belshe, R. Peon, and M. Thomson. 2015. Hypertext Transfer Protocol Version

2. http://httpwg.org/specs/rfc7540.html.[21] Michael Butkiewicz, Harsha V. Madhyastha, and Vyas Sekar. 2011. Understand-

ing Website Complexity: Measurements, Metrics, and Implications. In IMC.[22] Michael Butkiewicz, Daimeng Wang, Zhe Wu, Harsha V. Madhyastha, and Vyas

Sekar. 2015. Klotski: Reprioritizing Web Content to Improve User Experience onMobile Devices. In NSDI.

[23] Abhijnan Chakraborty, Vishnu Navda, Venkata N Padmanabhan, and Ramachan-dran Ramjee. 2013. Coordinating Cellular Background Transfers using LoadSense.In MOBICOM.

[24] Jeff Erman, Vijay Gopalakrishnan, Rittwik Jana, and K.K. Ramakrishnan. 2013.Towards a SPDY’ier Mobile Web. In CoNEXT.

[25] Tammy Everts. 2013. Rules for Mobile Performance Optimization. ACM Queue11, 6 (2013).

[26] Li Fan, Pei Cao, Wei Lin, and Quinn Jacobson. 1999. Web Prefetching BetweenLow-Bandwidth Clients and Proxies: Potential and Performance. In SIGMET-RICS.

[27] David F. Galletta, Raymond Henry, Scott McCoy, and Peter Polak. 2004. WebSite Delays: How Tolerant are Users? Journal of the Association for InformationSystems 5, 1 (2004), 1–28.

[28] Zhimei Jiang and Leonard Kleinrock. 1998. Web Prefetching in a Mobile Envi-ronment. IEEE Personal Communications 5, 5 (1998), 25–34.

[29] Conor Kelton, Jihoon Ryoo, Aruna Balasubramanian, and Samir R. Das. 2017.Improving User Perceived Page Load Times Using Gaze. In NSDI.

[30] Zhichun Li, Ming Zhang, Zhaosheng Zhu, Yan Chen, Albert Greenberg, andYi-Min Wang. 2010. WebProphet: Automating Performance Prediction for WebServices. In NSDI.

[31] Haohui Mai, Shuo Tang, Samuel T. King, Calin Cascaval, and Pablo Montesinos.2012. A Case for Parallelizing Web Pages. In HotPar.

[32] L. Meyerovich and R. Bodik. 2010. Fast and Parallel Web Page Layout. In WWW.[33] David Naylor, Alessandro Finamore, Illias Leontiadis, Yan Grunenberger, Marco

Mellia, Konstantina Papagiannaki, and Peter Steenkiste. 2014. The Cost of the”S” in HTTPS. In CoNEXT.

[34] Javad Nejati and Aruna Balasubramanian. 2016. An In-Depth Study of MobileBrowser Performance. In WWW.

[35] Ravi Netravali, Ameesh Goyal, James Mickens, and Hari Balakrishnan. 2016.Polaris: Faster Page Loads Using Fine-grained Dependency Tracking. In NSDI.

[36] Ravi Netravali, Anirudh Sivaraman, Somak Das, Ameesh Goyal, Keith Winstein,James Mickens, and Hari Balakrishnan. 2015. Mahimahi: Accurate Record-and-Replay for HTTP. In ATC.

[37] Venkata N. Padmanabhan and Jeffrey C. Mogul. 1996. Using Predictive Prefetch-ing to Improve World Wide Web Latency. In SIGCOMM.

[38] Lenin Ravindranath, Sharad Agarwal, Jitendra Padhye, and Christopher Riederer.2013. Give in to Procrastination and Stop Prefetching. In HotNets.

[39] Sanae Rosen, Bo Han, Shuai Hao, Z Morley Mao, and Feng Qian. 2017. Pushor Request: An Investigation of HTTP/2 Server Push for Improving MobilePerformance. In WWW.

[40] Ashiwan Sivakumar, Shankaranarayanan Puzhavakath Narayanan, Vijay Gopalakr-ishnan, Seungjoon Lee, Sanjay Rao, and Subhabrata Sen. 2014. PARCEL: ProxyAssisted BRowsing in Cellular Networks for Energy and Latency Reduction. InCoNEXT.

[41] Xiao Sophia Wang, Aruna Balasubramanian, Arvind Krishnamurthy, and DavidWetherall. 2013. Demystifying Page Load Performance with WProf. In NSDI.

[42] Xiao Sophia Wang, Aruna Balasubramanian, Arvind Krishnamurthy, and DavidWetherall. 2014. How Speedy is SPDY?. In NSDI.

[43] Xiao Sophia Wang, Arvind Krishnamurthy, and David Wetherall. 2016. SpeedingUp Web Page Loads with Shandian. In NSDI.

[44] Zhen Wang, Felix Xiaozhu Lin, Lin Zhong, and Mansoor Chishtie. 2011. Whyare Web Browsers Slow on Smartphones?. In HotMobile.

[45] Zhen Wang, Felix Xiazhou Lin, Lin Zhong, and Mansoor Chishtie. 2012. HowFar Can Client-Only Solutions Go for Mobile Browser Speed?. In WWW.

[46] Zizhuang Yang. 2009. Every Millisecond Counts. https://www.facebook.com/note.php?note id=122869103919.

[47] Yuhao Zhu and Vijay Janapa Reddi. 2013. High-Performance and Energy-EfficientMobile Web Browsing on Big/Little Systems. In HPCA.

https://www.ampproject.org/http://www.crummy.com/software/BeautifulSoup/https://developers.google.com/web/tools/chrome-devtools/iterate/device-mode/https://developers.google.com/web/tools/chrome-devtools/iterate/device-mode/https://developers.google.com/speed/pagespeed/https://h2o.examp1e.net/configure/http2_directives.html#http2-casperhttps://h2o.examp1e.net/configure/http2_directives.html#http2-casperhttps://www.fastcompany.com/1825005/how-one-second-could-cost-amazon-16-billion-saleshttps://www.fastcompany.com/1825005/how-one-second-could-cost-amazon-16-billion-saleshttp://httparchive.org/https://snyk.io/blog/https-breaking-through/https://snyk.io/blog/https-breaking-through/http://www.keynote.com/performance-indexes/mobile-retail-ushttp://www.keynote.com/performance-indexes/mobile-retail-ushttp://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-ithttp://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-ithttp://marketingland.com/mobile-top-sites-165725http://marketingland.com/mobile-top-sites-165725https://www.doubleclickbygoogle.com/articles/mobile-speed-matters/http://www.webperformancetoday.com/2014/10/22/2014-mobile-ecommerce-page-speed-web-performance/http://www.webperformancetoday.com/2014/10/22/2014-mobile-ecommerce-page-speed-web-performance/https://nghttp2.org/documentation/nghttpx-howto.htmlhttp://www.opera.com/mobile/https://www.w3.org/TR/preload/https://developers.google.com/speed/spdy/https://github.com/WPO-Foundation/visualmetricshttps://wpostats.com/http://httpwg.org/specs/rfc7540.htmlhttps://www.facebook.com/note.php?note_id=122869103919https://www.facebook.com/note.php?note_id=122869103919

Abstract1 Introduction2 Motivation3 Approach3.1 Limitations of HTTP/2 PUSH3.2 Combining PUSH with dependency hints

4 Design4.1 Server-side dependency resolution4.2 Accounting for personalization4.3 Cooperative request scheduling

5 Implementation5.1 Server-aided resource discovery5.2 Scheduling requests with JavaScript

6 Evaluation6.1 Impact on client performance6.2 Accuracy of server-side dependency resolution

7 Discussion8 Related work9 ConclusionsReferences

vroom: accelerating the mobile web with server-aided dependency...

Documents