cse 127: computer security - cseweb.ucsd.edu

Post on 08-Feb-2022

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CSE 127: Computer Security Modern client-side defenses

Deian Stefan

Today

How can we build flexible and secure client-side web applications (from vulnerable/untrusted components)

Modern web sites are complicated

• Page developer

• Library developers

• Service providers

• Data provides

• Ad providers

• CDNs

• Network provider

Many acting parties on a site

• How do we protect page from ads/services?

• How to share data with a cross-origin page?

• How to protect one user from another’s content?

• How do we protect the page from a library?

• How do we protect the page from the CDN?

• How do we protect the page from network provider?

Recall: Same origin policy

Idea: isolate content from different origins ➤ E.g., can’t access document of cross-origin page

➤ E.g., can’t inspect responses from cross-origin

c.com b.coma.com

postMessage

✓JSON

DOM access✓

Why is the SOP not good enough?

The SOP is not strict enough

• Third-party libs run with privilege of the page

• Code within page can arbitrarily leak/corrupt data ➤ How? So, how should we isolate untrusted code?

• iframes isolation is limited ➤ Can’t isolate user-provided content from page (why?)

➤ Can’t isolate third-party ad placed in iframe (why?)

The SOP is not strict enough

• Third-party libs run with privilege of the page

• Code within page can arbitrarily leak/corrupt data ➤ How? So, how should we isolate untrusted code?

• iframes isolation is limited ➤ Can’t isolate user-provided content from page (why?)

➤ Can’t isolate third-party ad placed in iframe (why?)

The SOP is not flexible enough

• Can’t read cross-origin responses ➤ What if we want to fetch data from provider.com?

➤ JSON with padding (JSONP)

- To fetch data, insert new script tag: <script src=“https://provider.com/getData?cb=f”></script>

- To share data, reply back with script wrapping data f({ ...data...})

http://example.com

provider.com

Why is JSONP is a terrible thing?

• Provider data can easily be leaked (CSRF)

• Page is not protected from provider (XSS)

Why is JSONP is a terrible thing?

• Provider data can easily be leaked (CSRF)

• Page is not protected from provider (XSS)

The SOP is also not enough security-wise

Outline: modern mechanisms

• iframe sandbox

• Content security policy (CSP)

• HTTP strict transport security (HSTS)

• Subresource integrity (SRI)

• Cross-origin resource sharing (CORS)

iframe sandbox

Idea: restrict actions iframe can perform

Approach: set sandbox attribute, by default: ➤ disallows JavaScript and triggers (autofocus,

autoplay videos etc.)

➤ disallows form submission

➤ disallows popups

➤ disallows navigating embedding page

➤ runs page in unique origin: no storage/cookies

Grant privileges explicitly

Can enable dangerous features with: ➤ allow-scripts: allows JS + triggers (e.g., autofocus)

➤ allow-forms: allow form submission

➤ allow-popups: allow iframe to create popups

➤ allow-top-navigation: allow breaking out of frame

➤ allow-same-origin: retain original origin

Grant privileges explicitly

What can you do with iframe sandbox?

• Run content in iframe with least privilege ➤ Only grant content privileges it needs

• Privilege separate page into multiple iframes ➤ Split different parts of page into sandboxed iframes

Least privilege: twitter button

Least privilege: twitter button

• Let’s embed button on our site ➤ How do they suggest doing it?

➤ What’s the problem with this embedding approach?

<a href="https://twitter.com/share?ref_src=twsrc%5Etfw" class="twitter-share-button" data-show-count="false">Tweet</a><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

Least privilege: twitter button

• Let’s embed button on our site ➤ How do they suggest doing it?

➤ What’s the problem with this embedding approach?

<a href="https://twitter.com/share?ref_src=twsrc%5Etfw" class="twitter-share-button" data-show-count="false">Tweet</a><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

Least privilege: twitter button

• Better approach: put it in an iframe iframe

➤ Is this good enough?

➤ What can a malicious/compromised twitter still do?

<iframe src="https://platform.twitter.com/widgets/tweet_button.html" style="border: 0; width:130px; height:20px;"></iframe>

• Use iframe sandbox: remove all permissions and then enable JavaScript, popups, etc.

➤ Is the allow-same-origin unsafe?

➤ Why do you think we need all the other bits?

<iframe src=“https://platform.twitter.com/widgets/tweet_button.html" sandbox=“allow-same-origin allow-scripts allow-popups allow-forms” style="border: 0; width:130px; height:20px;"></iframe>

Least privilege: twitter button

Privilege separation: blog feed• Typically include user content inline:

➤ Problem with this?

• With iframe sandbox:

➤ May need allow-scripts - why?

➤ Is allow-same-origin safe too?

<div class=“post”> <div class=“author”>{{post.author}}</div> <div class=“body”>{{post.body}}</div></div>

<iframe sandbox srcdoc=“...<div class=“post”> <div class=“author”>{{post.author}}</div> <div class=“body”>{{post.body}}</div></div>...”></iframe>

Privilege separation: blog feed• Typically include user content inline:

➤ Problem with this?

• With iframe sandbox:

➤ May need allow-scripts - why?

➤ Is allow-same-origin safe too?

<div class=“post”> <div class=“author”>{{post.author}}</div> <div class=“body”>{{post.body}}</div></div>

<iframe sandbox srcdoc=“...<div class=“post”> <div class=“author”>{{post.author}}</div> <div class=“body”>{{post.body}}</div></div>...”></iframe>

What are some limitations of iframe sandbox?

Too strict vs. not strict enough

• Consider running library in sandboxed iframes ➤ E.g., password strength checker

➤ Desired guarantee: checker cannot leak password

• Problem: sandbox does not restrict exfiltration ➤ Can use XHR to write password to b.ru

b.ru/chk.htmla.com

• Can we limit the origins that the page (iframe or otherwise) can talk talk to? ➤ Can only leak to a trusted set of origins

➤ Gives us a more fine-grained notion of least privilege

• This can also prevent or limit damages due to XSS

Too strict vs. not strict enough

Outline: modern mechanisms

• iframe sandbox (quick refresher)

• Content security policy (CSP)

• HTTP strict transport security (HSTS)

• Subresource integrity (SRI)

• Cross-origin resource sharing (CORS)

Content Security Policy (CSP)

• Idea: restrict resource loading to a whitelist ➤ By restricting to whom page can talk to: restrict

where data is leaked!

• Approach: send page with CSP header that contains fine-grained directives ➤ E.g., allow loads from CDN, no frames, no plugins

Content-Security-Policy: default-src https://cdn.example.net; child-src 'none'; object-src 'none'

script-src: where you can load scripts from

connect-src: limits the origins you can XHR to

font-src: where to fetch web fonts form

form-action: where forms can be submitted

child-src: where to load frames/workers from

img-src: where to load images from

default-src: default fallback

Special keywords• ‘none’ - match nothing

• ‘self’ - match this origin

• ‘unsafe-inline’ - allow unsafe JS & CSS

• ‘unsafe-eval’ - allow unsafe eval (and the like)

• http: - match anything with http scheme

• https: - match anything with https scheme

• * - match anything

How can CSP help with XSS?

• If you list all places you can load scripts from: ➤ Only execute code from trusted origins

➤ Remaining vector for attack: inline scripts

• CSP by default disallows inline scripts ➤ If scripts are enabled at least it disallows eval

Adoption challenge

• Problem: inline scripts are widely-used ➤ Page authors use the ‘unsafe-inline' directive

➤ Is this a problem?

• Solution: script nonce and script hash ➤ Allow scripts that have a particular hash

➤ Allow scripts that have a specific nonce

Adoption challenge

• Problem: inline scripts are widely-used ➤ Page authors use the ‘unsafe-inline' directive

➤ Is this a problem?

• Solution: script nonce and script hash ➤ Allow scripts that have a particular hash

➤ Allow scripts that have a specific nonce

Other adoption challenges

• Goal: set most restricting CSP that is permissive enough to not break existing app

• How can you figure this out for a large app? ➤ CSP has a report-only header and report-uri directive

➤ Report violations to server; don’t enforce

• In practice: devs hardly ever get the policy right

How is CSP really used in practice?

in this case is the resource from c.com. In other words, strict-dynamic allows you to vet a “root” script using a nonce or a hash.That “root” script then has the ability to load other resourcesunmolested.2

• unsafe-inline: Any inline script is allowed. This is very inse-cure.

• unsafe-eval: Allows the use of eval() in allowed resources.This is also unsafe as it allows scripts to inject any JavaScriptonto the page, bypassing any CSPs currently in place.

3 METHODOLOGYIn order to obtain our data set, we performed a simple crawl ofthe Alexa top 1 million sites [2] using a headless browser in or-der to collect the Content-Security-Policy headers present ineach request. With this data, we normalized the policies remov-ing unnecessary white space and sorting both the directives andtheir associated values in lexicographical order. Additionally, wenormalized the ‘nonce’ and ‘sha256’ values by removing theunique content of the nonce and hash values. After identifying5,240 sites from our initial corpus that returned CSP headers withthe ‘script-src’ directive and the ‘unsafe-inline’ keywordwe revisited those sites, this time using a proxy to capture the con-tent of all scripts loaded by each site. We constructed a graph whosevertices are the origins of all loaded scripts and whose edges repre-sent the loading of a script by another script. Finally, we visited thesame 5240 sites again the following day and in order to determinewhether each script changed across visits providing and estimateof the percentage of scripts which are dynamically generated.

3.1 LimitationsThe Internet is constantly growing and evolving and as a result ourdata represents only a snapshot of a small portion of the greaterInternet. Increasingly websites serve di�erent content based upon avisitors browser cookies, device type, and other factors. Our freshlyinstantiated, i.e. without any cookies, headless browsing crawlermay receive di�erent content than the average visitor to the samewebsites. Additionally, for any given website, we browse only thelanding page, not exhaustively exploring all links on each page.

4 RESULTSOut of the 1 million sites that we visit 39,022 or approximately 3.9%return the Content-Security-Policy header and in total we �nd6,670 unique normalized policies. CSP usage is trending upwardincreasing from 1.6% of sites in 2016 [14] to 1.97% in 2017 [5] to3.9% in 2018. The 2016 number is based upon a Google searchindex of approximately 1 billion sites while the 2017 and 2018numbers are both based upon the Alexa top 1 million sites at thetime. The usage frequencies of the various directives are displayedin �gure 1. Our data indicates that the most common usage of CSPtoday is to require the usage of HTTPS. The ‘upgrade-insecure-requests’ directive is themost commonly used directive appearingin 56.74% of policies. This represents a dramatic change from the2It’s important to note that since eval is blocked by default when using CSP, in orderfor the script from b.com to inject new JavaScript resources onto the page it willneed to use some other API method such as document.createElement() which willthen need to be added to the document through document.body.appendChild() orsimilar.

2016 data from Weichselbaum et al. [14] where it appeared in only1.87% of policies. The adoption of forced HTTPS is certainly apositive security trend. The ‘frame-ancestors’ directive has alsogained popularity appearing in only 8.11% of policies in 2016 whileappearing in 54.07% of policies in 2018.

Figure 1: Distribution of CSP directives.

The directive that primarily deals with XSS mitigation is the‘script-src’ directive. In our data this directive appears in 7,317or 18.75% of policies. Figure 2 shows the frequencies of various typesof values included in policies with the ’script-src’ directive.The ‘unsafe-inline’ keyword which renders the policy entirelyuseless for XSS protection appears in 90.69% of policies. The mostsecure keyword ‘sha256’ appears in 7.95% of policies in our dataset, up from just 1.65% in 2016. The next best alternative ‘nonce’appears in 6.57% of policies up from 0.92% in 2016. The use ofwildcards in sources also presents security risks. Only 18.38% ofpolicies that specify at least one host include exclusively hostswith fully speci�ed paths. Of all policies with the ‘srcipt-src’directive 38.28% include at least one host with a wildcard as and18.03% include the general wildcard. Whitelisting entire protocols,i.e. https:// or http://, is also extremely insecure and 59.87% ofpolicies include at least one protocol among their sources.

Figure 3 shows the most commonly whitelisted hosts. Google’spervasiveness on the Internet is apparent as 13 of the top 15 mostwhite-listed hosts are Google domains with the remaining 2 comingfrom Facebook.

Figure 4 shows the origins of the scripts most commonly re-quested by other third party scripts. The most common use casesfor scripts loading other scripts are advertisements and tracking.These account for 59.11% of scripts loaded by other scripts. Wede�ne ads and tracking scripts based upon the uBlock Origin [6]default �lter settings.

The loading of dynamically generated scripts prevents the useof the most secure keyword, ‘sha256’. According to our data ofthe 37,087 scripts loaded by sites that include the ‘script-src’directive and the ‘unsafe-inline’ keyword in their CSPs 13.06%are dynamically generated and 43.03% of sites load at least onedynamically generated script.

3

How is CSP really used in practice?

in this case is the resource from c.com. In other words, strict-dynamic allows you to vet a “root” script using a nonce or a hash.That “root” script then has the ability to load other resourcesunmolested.2

• unsafe-inline: Any inline script is allowed. This is very inse-cure.

• unsafe-eval: Allows the use of eval() in allowed resources.This is also unsafe as it allows scripts to inject any JavaScriptonto the page, bypassing any CSPs currently in place.

3 METHODOLOGYIn order to obtain our data set, we performed a simple crawl ofthe Alexa top 1 million sites [2] using a headless browser in or-der to collect the Content-Security-Policy headers present ineach request. With this data, we normalized the policies remov-ing unnecessary white space and sorting both the directives andtheir associated values in lexicographical order. Additionally, wenormalized the ‘nonce’ and ‘sha256’ values by removing theunique content of the nonce and hash values. After identifying5,240 sites from our initial corpus that returned CSP headers withthe ‘script-src’ directive and the ‘unsafe-inline’ keywordwe revisited those sites, this time using a proxy to capture the con-tent of all scripts loaded by each site. We constructed a graph whosevertices are the origins of all loaded scripts and whose edges repre-sent the loading of a script by another script. Finally, we visited thesame 5240 sites again the following day and in order to determinewhether each script changed across visits providing and estimateof the percentage of scripts which are dynamically generated.

3.1 LimitationsThe Internet is constantly growing and evolving and as a result ourdata represents only a snapshot of a small portion of the greaterInternet. Increasingly websites serve di�erent content based upon avisitors browser cookies, device type, and other factors. Our freshlyinstantiated, i.e. without any cookies, headless browsing crawlermay receive di�erent content than the average visitor to the samewebsites. Additionally, for any given website, we browse only thelanding page, not exhaustively exploring all links on each page.

4 RESULTSOut of the 1 million sites that we visit 39,022 or approximately 3.9%return the Content-Security-Policy header and in total we �nd6,670 unique normalized policies. CSP usage is trending upwardincreasing from 1.6% of sites in 2016 [14] to 1.97% in 2017 [5] to3.9% in 2018. The 2016 number is based upon a Google searchindex of approximately 1 billion sites while the 2017 and 2018numbers are both based upon the Alexa top 1 million sites at thetime. The usage frequencies of the various directives are displayedin �gure 1. Our data indicates that the most common usage of CSPtoday is to require the usage of HTTPS. The ‘upgrade-insecure-requests’ directive is themost commonly used directive appearingin 56.74% of policies. This represents a dramatic change from the2It’s important to note that since eval is blocked by default when using CSP, in orderfor the script from b.com to inject new JavaScript resources onto the page it willneed to use some other API method such as document.createElement() which willthen need to be added to the document through document.body.appendChild() orsimilar.

2016 data from Weichselbaum et al. [14] where it appeared in only1.87% of policies. The adoption of forced HTTPS is certainly apositive security trend. The ‘frame-ancestors’ directive has alsogained popularity appearing in only 8.11% of policies in 2016 whileappearing in 54.07% of policies in 2018.

Figure 1: Distribution of CSP directives.

The directive that primarily deals with XSS mitigation is the‘script-src’ directive. In our data this directive appears in 7,317or 18.75% of policies. Figure 2 shows the frequencies of various typesof values included in policies with the ’script-src’ directive.The ‘unsafe-inline’ keyword which renders the policy entirelyuseless for XSS protection appears in 90.69% of policies. The mostsecure keyword ‘sha256’ appears in 7.95% of policies in our dataset, up from just 1.65% in 2016. The next best alternative ‘nonce’appears in 6.57% of policies up from 0.92% in 2016. The use ofwildcards in sources also presents security risks. Only 18.38% ofpolicies that specify at least one host include exclusively hostswith fully speci�ed paths. Of all policies with the ‘srcipt-src’directive 38.28% include at least one host with a wildcard as and18.03% include the general wildcard. Whitelisting entire protocols,i.e. https:// or http://, is also extremely insecure and 59.87% ofpolicies include at least one protocol among their sources.

Figure 3 shows the most commonly whitelisted hosts. Google’spervasiveness on the Internet is apparent as 13 of the top 15 mostwhite-listed hosts are Google domains with the remaining 2 comingfrom Facebook.

Figure 4 shows the origins of the scripts most commonly re-quested by other third party scripts. The most common use casesfor scripts loading other scripts are advertisements and tracking.These account for 59.11% of scripts loaded by other scripts. Wede�ne ads and tracking scripts based upon the uBlock Origin [6]default �lter settings.

The loading of dynamically generated scripts prevents the useof the most secure keyword, ‘sha256’. According to our data ofthe 37,087 scripts loaded by sites that include the ‘script-src’directive and the ‘unsafe-inline’ keyword in their CSPs 13.06%are dynamically generated and 43.03% of sites load at least onedynamically generated script.

3

Warning: This graph is a few years old

How is CSP really used in practice?

in this case is the resource from c.com. In other words, strict-dynamic allows you to vet a “root” script using a nonce or a hash.That “root” script then has the ability to load other resourcesunmolested.2

• unsafe-inline: Any inline script is allowed. This is very inse-cure.

• unsafe-eval: Allows the use of eval() in allowed resources.This is also unsafe as it allows scripts to inject any JavaScriptonto the page, bypassing any CSPs currently in place.

3 METHODOLOGYIn order to obtain our data set, we performed a simple crawl ofthe Alexa top 1 million sites [2] using a headless browser in or-der to collect the Content-Security-Policy headers present ineach request. With this data, we normalized the policies remov-ing unnecessary white space and sorting both the directives andtheir associated values in lexicographical order. Additionally, wenormalized the ‘nonce’ and ‘sha256’ values by removing theunique content of the nonce and hash values. After identifying5,240 sites from our initial corpus that returned CSP headers withthe ‘script-src’ directive and the ‘unsafe-inline’ keywordwe revisited those sites, this time using a proxy to capture the con-tent of all scripts loaded by each site. We constructed a graph whosevertices are the origins of all loaded scripts and whose edges repre-sent the loading of a script by another script. Finally, we visited thesame 5240 sites again the following day and in order to determinewhether each script changed across visits providing and estimateof the percentage of scripts which are dynamically generated.

3.1 LimitationsThe Internet is constantly growing and evolving and as a result ourdata represents only a snapshot of a small portion of the greaterInternet. Increasingly websites serve di�erent content based upon avisitors browser cookies, device type, and other factors. Our freshlyinstantiated, i.e. without any cookies, headless browsing crawlermay receive di�erent content than the average visitor to the samewebsites. Additionally, for any given website, we browse only thelanding page, not exhaustively exploring all links on each page.

4 RESULTSOut of the 1 million sites that we visit 39,022 or approximately 3.9%return the Content-Security-Policy header and in total we �nd6,670 unique normalized policies. CSP usage is trending upwardincreasing from 1.6% of sites in 2016 [14] to 1.97% in 2017 [5] to3.9% in 2018. The 2016 number is based upon a Google searchindex of approximately 1 billion sites while the 2017 and 2018numbers are both based upon the Alexa top 1 million sites at thetime. The usage frequencies of the various directives are displayedin �gure 1. Our data indicates that the most common usage of CSPtoday is to require the usage of HTTPS. The ‘upgrade-insecure-requests’ directive is themost commonly used directive appearingin 56.74% of policies. This represents a dramatic change from the2It’s important to note that since eval is blocked by default when using CSP, in orderfor the script from b.com to inject new JavaScript resources onto the page it willneed to use some other API method such as document.createElement() which willthen need to be added to the document through document.body.appendChild() orsimilar.

2016 data from Weichselbaum et al. [14] where it appeared in only1.87% of policies. The adoption of forced HTTPS is certainly apositive security trend. The ‘frame-ancestors’ directive has alsogained popularity appearing in only 8.11% of policies in 2016 whileappearing in 54.07% of policies in 2018.

Figure 1: Distribution of CSP directives.

The directive that primarily deals with XSS mitigation is the‘script-src’ directive. In our data this directive appears in 7,317or 18.75% of policies. Figure 2 shows the frequencies of various typesof values included in policies with the ’script-src’ directive.The ‘unsafe-inline’ keyword which renders the policy entirelyuseless for XSS protection appears in 90.69% of policies. The mostsecure keyword ‘sha256’ appears in 7.95% of policies in our dataset, up from just 1.65% in 2016. The next best alternative ‘nonce’appears in 6.57% of policies up from 0.92% in 2016. The use ofwildcards in sources also presents security risks. Only 18.38% ofpolicies that specify at least one host include exclusively hostswith fully speci�ed paths. Of all policies with the ‘srcipt-src’directive 38.28% include at least one host with a wildcard as and18.03% include the general wildcard. Whitelisting entire protocols,i.e. https:// or http://, is also extremely insecure and 59.87% ofpolicies include at least one protocol among their sources.

Figure 3 shows the most commonly whitelisted hosts. Google’spervasiveness on the Internet is apparent as 13 of the top 15 mostwhite-listed hosts are Google domains with the remaining 2 comingfrom Facebook.

Figure 4 shows the origins of the scripts most commonly re-quested by other third party scripts. The most common use casesfor scripts loading other scripts are advertisements and tracking.These account for 59.11% of scripts loaded by other scripts. Wede�ne ads and tracking scripts based upon the uBlock Origin [6]default �lter settings.

The loading of dynamically generated scripts prevents the useof the most secure keyword, ‘sha256’. According to our data ofthe 37,087 scripts loaded by sites that include the ‘script-src’directive and the ‘unsafe-inline’ keyword in their CSPs 13.06%are dynamically generated and 43.03% of sites load at least onedynamically generated script.

3

What’s frame-ancestors?

What problem is this addressing?

Clickjacking!

Busting Frame Busting:a Study of Clickjacking Vulnerabilities on Popular Sites

Gustav Rydstedt, Elie Bursztein, Dan BonehStanford University

{rydstedt,elie,dabo}@stanford.edu

Collin JacksonCarnegie Mellon Universitycollin.jackson@sv.cmu.edu

June 7, 2010

Abstract

Web framing attacks such as clickjacking useiframes to hijack a user’s web session. The mostcommon defense, called frame busting, preventsa site from functioning when loaded inside aframe. We study frame busting practices for theAlexa Top-500 sites and show that all can be cir-cumvented in one way or another. Some circum-ventions are browser-specific while others workacross browsers. We conclude with recommen-dations for proper frame busting.

1 Introduction

Frame busting refers to code or annotationprovided by a web page intended to preventthe web page from being loaded in a sub-frame.Frame busting is the recommended defenseagainst clickjacking [10] and is also requiredto secure image-based authentication such asthe Sign-in Seal used by Yahoo. Sign-in Sealdisplays a user-selected image that authenticatesthe Yahoo! login page to the user. Withoutframe busting, the login page could be openedin a sub-frame so that the correct image isdisplayed to the user, even though the toppage is not the real Yahoo login page. Newadvancements in clickjacking techniques [22]using drag-and-drop to extract and inject datainto frames further demonstrate the importanceof secure frame busting.

Figure 1: Visualization of a clickjacking attackon Twitter’s account deletion page.

Figure 1 illustrates a clickjacking attack: thevictim site is framed in a transparent iframe thatis put on top of what appears to be a normalpage. When users interact with the normal page,they are unwittingly interacting with the victimsite. To defend against clickjacking attacks, thefollowing simple frame busting code is a com-monly used by web sites:

i f ( top . l o c a t i o n != l o c a t i o n )top . l o c a t i o n = s e l f . l o c a t i o n ;

Frame busting code typically consists of aconditional statement and a counter-action thatnavigates the top page to the correct place.As we will see, this basic code is fairly easyto bypass. We discuss far more sophisticatedframe busting code (and circumvention tech-niques) later in the paper.

1

• How does frame-ancestor help? ➤ Don’t allow non twitter origins to frame delete page!

http://best.game.ever

twitter.com

What about the other two?

in this case is the resource from c.com. In other words, strict-dynamic allows you to vet a “root” script using a nonce or a hash.That “root” script then has the ability to load other resourcesunmolested.2

• unsafe-inline: Any inline script is allowed. This is very inse-cure.

• unsafe-eval: Allows the use of eval() in allowed resources.This is also unsafe as it allows scripts to inject any JavaScriptonto the page, bypassing any CSPs currently in place.

3 METHODOLOGYIn order to obtain our data set, we performed a simple crawl ofthe Alexa top 1 million sites [2] using a headless browser in or-der to collect the Content-Security-Policy headers present ineach request. With this data, we normalized the policies remov-ing unnecessary white space and sorting both the directives andtheir associated values in lexicographical order. Additionally, wenormalized the ‘nonce’ and ‘sha256’ values by removing theunique content of the nonce and hash values. After identifying5,240 sites from our initial corpus that returned CSP headers withthe ‘script-src’ directive and the ‘unsafe-inline’ keywordwe revisited those sites, this time using a proxy to capture the con-tent of all scripts loaded by each site. We constructed a graph whosevertices are the origins of all loaded scripts and whose edges repre-sent the loading of a script by another script. Finally, we visited thesame 5240 sites again the following day and in order to determinewhether each script changed across visits providing and estimateof the percentage of scripts which are dynamically generated.

3.1 LimitationsThe Internet is constantly growing and evolving and as a result ourdata represents only a snapshot of a small portion of the greaterInternet. Increasingly websites serve di�erent content based upon avisitors browser cookies, device type, and other factors. Our freshlyinstantiated, i.e. without any cookies, headless browsing crawlermay receive di�erent content than the average visitor to the samewebsites. Additionally, for any given website, we browse only thelanding page, not exhaustively exploring all links on each page.

4 RESULTSOut of the 1 million sites that we visit 39,022 or approximately 3.9%return the Content-Security-Policy header and in total we �nd6,670 unique normalized policies. CSP usage is trending upwardincreasing from 1.6% of sites in 2016 [14] to 1.97% in 2017 [5] to3.9% in 2018. The 2016 number is based upon a Google searchindex of approximately 1 billion sites while the 2017 and 2018numbers are both based upon the Alexa top 1 million sites at thetime. The usage frequencies of the various directives are displayedin �gure 1. Our data indicates that the most common usage of CSPtoday is to require the usage of HTTPS. The ‘upgrade-insecure-requests’ directive is themost commonly used directive appearingin 56.74% of policies. This represents a dramatic change from the2It’s important to note that since eval is blocked by default when using CSP, in orderfor the script from b.com to inject new JavaScript resources onto the page it willneed to use some other API method such as document.createElement() which willthen need to be added to the document through document.body.appendChild() orsimilar.

2016 data from Weichselbaum et al. [14] where it appeared in only1.87% of policies. The adoption of forced HTTPS is certainly apositive security trend. The ‘frame-ancestors’ directive has alsogained popularity appearing in only 8.11% of policies in 2016 whileappearing in 54.07% of policies in 2018.

Figure 1: Distribution of CSP directives.

The directive that primarily deals with XSS mitigation is the‘script-src’ directive. In our data this directive appears in 7,317or 18.75% of policies. Figure 2 shows the frequencies of various typesof values included in policies with the ’script-src’ directive.The ‘unsafe-inline’ keyword which renders the policy entirelyuseless for XSS protection appears in 90.69% of policies. The mostsecure keyword ‘sha256’ appears in 7.95% of policies in our dataset, up from just 1.65% in 2016. The next best alternative ‘nonce’appears in 6.57% of policies up from 0.92% in 2016. The use ofwildcards in sources also presents security risks. Only 18.38% ofpolicies that specify at least one host include exclusively hostswith fully speci�ed paths. Of all policies with the ‘srcipt-src’directive 38.28% include at least one host with a wildcard as and18.03% include the general wildcard. Whitelisting entire protocols,i.e. https:// or http://, is also extremely insecure and 59.87% ofpolicies include at least one protocol among their sources.

Figure 3 shows the most commonly whitelisted hosts. Google’spervasiveness on the Internet is apparent as 13 of the top 15 mostwhite-listed hosts are Google domains with the remaining 2 comingfrom Facebook.

Figure 4 shows the origins of the scripts most commonly re-quested by other third party scripts. The most common use casesfor scripts loading other scripts are advertisements and tracking.These account for 59.11% of scripts loaded by other scripts. Wede�ne ads and tracking scripts based upon the uBlock Origin [6]default �lter settings.

The loading of dynamically generated scripts prevents the useof the most secure keyword, ‘sha256’. According to our data ofthe 37,087 scripts loaded by sites that include the ‘script-src’directive and the ‘unsafe-inline’ keyword in their CSPs 13.06%are dynamically generated and 43.03% of sites load at least onedynamically generated script.

3

What is MIXed content?

• Why is this bad? ➤ Network attacker can inject their own scripts,

images, etc.!

https://…

http://…

What is MIXed content?

• Why is this bad? ➤ Network attacker can inject their own scripts,

images, etc.!

https://…

http://…

How does CSP help?

• upgrade-insecure-requests ➤ Rewrite HTTP URL to HTTPS before making request

• block-all-mixed-content ➤ Don’t load any content over HTTP

Outline: modern mechanisms

• iframe sandbox (quick refresher)

• Content security policy (CSP)

• HTTP strict transport security (HSTS)

• Subresource integrity (SRI)

• Cross-origin resource sharing (CORS)

Motivation for HSTS

• Attacker can force you to go to HTTP vs. HTTPS ➤ SSL Stripping attack (Moxie)

- They can rewrite all HTTPS URLs to HTTP

- If server serves content over HTTP: doom!

• HSTS header: never visit site over HTTP again ➤ Strict-Transport-Security: max-age=31536000

Outline: modern mechanisms

• iframe sandbox (quick refresher)

• Content security policy (CSP)

• HTTP strict transport security (HSTS)

• Subresource integrity (SRI)

• Cross-origin resource sharing (CORS)

Motivation for SRI• CSP+HSTS can be used to limit damages, but

can’t really defend against malicious code

• How do you know that the library you’re loading is the correct one?Won’t using HTTPS address this problem?

Motivation for SRI• CSP+HSTS can be used to limit damages, but

can’t really defend against malicious code

• How do you know that the library you’re loading is the correct one?Won’t using HTTPS address this problem?

Mikko Ohtamaa

Subresource integrity

• Idea: page author specifies hash of (sub)resource they are loading; browser checks integrity ➤ E.g., integrity for scripts

➤ E.g., integrity for link elements

<link rel="stylesheet" href="https://site53.cdn.net/style.css" integrity="sha256-SDfwewFAE...wefjijfE">

<script src="https://code.jquery.com/jquery-1.10.2.min.js" integrity="sha256-C6CB9UYIS9UJeqinPHWTHVqh/E1uhG5Tw+Y5qFQmYg=">

What happens when check fails?

• Case 1 (default): ➤ Browser reports violation and does not render/

execute resource

• Case 2: CSP directive with integrity-policy directive set to report ➤ Browser reports violation, but may render/execute

resource

Multiple hash algorithms

• Authors may specify multiple hashes ➤ E.g.,

• Browser uses strongest algorithm

• Why support multiple algorithms? ➤ Don’t break page on old browser

<script src="hello_world.js" integrity=“sha256-... sha512-... "></script>

Outline: modern mechanisms

• iframe sandbox (quick refresher)

• Content security policy (CSP)

• HTTP strict transport security (HSTS)

• Subresource integrity (SRI)

• Cross-origin resource sharing (CORS)

Recall: SOP is also inflexible

• Problem: Can’t fetch cross-origin data ➤ Leads to building insecure sites/services: JSONP

• Solution: Cross-origin resource sharing (CORS) ➤ Data provider explicitly whitelists origins that can

inspect responses

➤ Browser allows page to inspect response if its origin is listed in the header

E.g., CORS usage: amazon

• Amazon has multiple domains ➤ E.g., amazon.com and aws.com

• Problem: amazon.com can’t read cross-origin aws.com data

• With CORS aws.com can allow amazon.com to readHTTP responses amazon.com evil.biz

aws.com

How CORS works• Browser sends Origin header with XHR request

➤ E.g., Origin: https://amazon.com

• Server can inspect Origin header and respond with Access-Control-Allow-Origin header ➤ E.g., Access-Control-Allow-Origin: https://amazon.com

➤ E.g., Access-Control-Allow-Origin: *

• CORS XHR may send cookies + custom headers ➤ Need “preflight” request to authorize this

New: Permissions Policy

• Web platform has access to many things ➤ accelerometer, ambient-light-sensor, battery,

camera, display-capture, geolocation, …

• Permissions policy tries to limit access to certain resources ➤ Header:

Permissions-Policy: fullscreen=(), geolocation=() ➤ Iframe attribute:

<iframe src="https://other.com/map" allow="geolocation"></iframe>

✓ How do we protect page from ads/services?

✓ How to share data with cross-origin page?

✓ How to protect one user from another’s content?

✓ How do we protect the page from a library?

✓ How do we protect the page from the CDN?

✓ How do we protect the page from network provider?

top related