internet jones and the raiders of the lost trackers...validation of archival data vs. ground truth...

90
Internet Jones and the Raiders of the Lost Trackers An Archaeological Study of Web Tracking from 1996 to 2016 Adam Lerner* 1 Anna Kornfeld Simpson* Tadayoshi Kohno Franziska Roesner *Joint First Authors in Alphabetical Order

Upload: others

Post on 20-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Internet Jones and the Raiders of the Lost TrackersAn Archaeological Study of Web Tracking from 1996 to 2016

Adam Lerner*

1

Anna Kornfeld Simpson* Tadayoshi Kohno Franziska Roesner

*Joint First Authors in Alphabetical Order

Page 2: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Third Party Web Tracking

1. Third party web tracking is important2. We should be studying web tracking over time

2

Page 3: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

What is Third Party Web Tracking?

3

● news.google.com● weather.com● twitch.tv● hulu.com● imdb.com● foxnews.com● zappos.com● honda.com● jetblue.com

I know that user "d04874f3" has visited these sites:

tracker.com

Page 4: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

4

Page 5: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Being watched has chilling effects

Views and edits of privacy-related Wikipedia articles reduced after leaks of NSA surveillance activities*

People donate more for communal coffee when a picture of watching eyes is posted near the coffee**

*Penney, Jon. "Chilling Effects: Online Surveillance and Wikipedia Use."Berkeley Technology Law Journal (2016).

**Bateson M, Nettle D, Roberts G. Cues of being watched enhance cooperation in a real-world setting. Biology Letters. 2006;2(3):412-414. doi:10.1098/rsbl.2006.0509.

5

Page 6: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Web Tracking

1. Third party web tracking is important2. We should be studying web tracking over time

6

Page 7: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

We should study web tracking over time

7

1996 20162009

Krishnamurthy and Wills '09

Roesner, Kohno, and Wetherall '12

2005

Englehardt et al. '15

Eubank et al. '13

Gomer et al. '13

Mayer and Mitchell '12

Page 8: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

We should study web tracking over time

8

1996 20162009

Krishnamurthy and Wills '09

Roesner, Kohno, and Wetherall '12

2005

Englehardt et al. '15

Eubank et al. '13

Gomer et al. '13

Mayer and Mitchell '12

?

Page 9: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Time Travel

9

Page 10: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Time Travel

10

Page 11: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

11

Page 12: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

12

The Wayback Machine

Page 13: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

13

The Wayback Machine

Page 14: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

14

~11 petabytes of web sites archived!

Page 15: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

15

~11 petabytes of web sites archived!

Includes JavaScript! HTTP headers! CSS! Images! Hosted on web servers!

Page 16: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

16

The Wayback Machine enables us to perform retrospective web measurements

Page 17: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

This Talk

A 20 year longitudinal study of third party web tracking, performed using archival web data

17

Page 18: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Abstract

18

Page 19: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Abstract

Insight: Web archives exist (e.g., Wayback Machine)

19

Page 20: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Abstract

Insight: Web archives exist (e.g., Wayback Machine)

Challenges: Web archives are sometimes 1. incomplete & 2. inconsistent

20

Page 21: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Abstract

Insight: Web archives exist (e.g., Wayback Machine)

Challenges: Web archives are sometimes 1. incomplete & 2. inconsistent

Tool: TrackingExcavator

21

Page 22: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Abstract

Insight: Web archives exist (e.g., Wayback Machine)

Challenges: Web archives are sometimes 1. incomplete & 2. inconsistent

Tool: TrackingExcavator

Validation: Comparison of archival measurements to ground truth

22

Page 23: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Abstract

Insight: Web archives exist (e.g., Wayback Machine)

Challenges: Web archives are sometimes 1. incomplete & 2. inconsistent

Tool: TrackingExcavator

Validation: Comparison of archival measurements to ground truth

Measurement Results: A picture of web tracking over the last 20 years

23

Page 24: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

A first-party is a domain to which a user goes intentionally, by typing a URL or clicking a link.

A third-party is a domain whose content is embedded in a first-party web page.

How does Third Party Web Tracking Work?

24

Background

Page 25: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

A first-party is a domain to which a user goes intentionally, by typing a URL or clicking a link.

A third-party is a domain whose content is embedded in a first-party web page.

How does Third Party Web Tracking Work?

25

Background

If a third party is able to link together a subset of a person's browsing history, we call this ability 3rd Party Web Tracking.

Page 26: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

26

How does Third Party Web Tracking Work?

Background

tracker.com

ad

logo

Page 27: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

27

How does Third Party Web Tracking Work?

Background

tracker.com

ad

logo

theonion.com

Page 28: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

28

How does Third Party Web Tracking Work?

Background

tracker.com

ad

logo

theonion.com

Page 29: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

29

How does Third Party Web Tracking Work?

Background

tracker.com

ad

logo

theonion.com

Page 30: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

30

How does Third Party Web Tracking Work?

Background

tracker.com

ad

logo

Page 31: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

31

How does Third Party Web Tracking Work?

Background

tracker.com

ad

logo

Page 32: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

32

How does Third Party Web Tracking Work?

Background

tracker.com

ad

logo

Browsing profile for user 789: theonion.com

Set this cookie: id=789

Page 33: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

33

How does Third Party Web Tracking Work?

Background

tracker.com

ad

logo

Browsing profile for user 789: theonion.com

Set this cookie: id=789

tracker.com:id=789

Page 34: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

34

How does Third Party Web Tracking Work?

Background

tracker.com

ad

Browsing profile for user 789: theonion.com

ad

logo

logo

tracker.com:id=789

Page 35: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

35

How does Third Party Web Tracking Work?

Background

tracker.com

ad

Browsing profile for user 789: theonion.com

ad

logo

logo

cnn.com

tracker.com:id=789

Page 36: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

36

How does Third Party Web Tracking Work?

Background

tracker.com

ad

Browsing profile for user 789: theonion.com

ad

logo

logo

cnn.com

tracker.com:id=789

Page 37: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

37

How does Third Party Web Tracking Work?

Background

tracker.com

ad

Browsing profile for user 789: theonion.com

ad

logo

logo

tracker.com:id=789

Page 38: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

38

How does Third Party Web Tracking Work?

Background

tracker.com

ad

Browsing profile for user 789: theonion.com

ad

My cookie is: id=789

logo

logo

tracker.com:id=789

Page 39: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

39

How does Third Party Web Tracking Work?

Background

tracker.com

ad

Browsing profile for user 789: theonion.com, cnn.com

ad

My cookie is: id=789

logo

logo

tracker.com:id=789

Page 40: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

40

How does Third Party Web Tracking Work?

Background

tracker.com

ad

Browsing profile for user 789: theonion.com, cnn.com

ad

My cookie is: id=789

logo

logo

tracker.com:id=789

Page 41: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

41

How does Third Party Web Tracking Work?

Background

tracker.com

ad

Browsing profile for user 789: theonion.com, cnn.com

ad

logo

logo

tracker.com:id=789

Page 42: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Contributions

● TrackingExcavator, our Tool● Challenges of Retrospective Measurement● Validation of Archival Data vs. Ground Truth● Measurement Results: A 20 Year Picture of Web Tracking

42

Page 43: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Contributions

● TrackingExcavator, our Tool● Challenges of Retrospective Measurement● Validation of Archival Data vs. Ground Truth● Measurement Results: A 20 Year Picture of Web Tracking

43

Page 44: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Our Tool, TrackingExcavator,and the Challenges of RetrospectiveMeasurements

44

Page 45: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

45

TrackingExcavator

Page 46: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Input Munging

46

TrackingExcavator

Page 47: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Input Munging Automatic Browsing

47

TrackingExcavator

Page 48: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Input Munging Automatic Browsing

48

TrackingExcavator

Page 49: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Automatic Browsing

Input Munging Data Analysis & Visualization

49

TrackingExcavator

Page 50: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

● Cookies don't work in the archive● Incomplete archives● Inconsistent archives ("anachronisms")

What Makes this Hard?

50

Page 51: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Cookies don't work in the archive

Set-cookie: id=789

becomes...

X-Archive-Orig-Set-cookie: id=789

51

Page 52: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Automatic Browsing

Input Munging Data Analysis & Visualization

52

TrackingExcavator

Cookie simulation

Page 53: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

● Cookies are weird in archival environments● Incomplete archives● Inconsistent archives ("anachronisms")

What Makes this Hard?

53

Page 54: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Set-cookie: id=789

Set-cookie: oatmeal=raisin

Set-cookie: chocolate=chip

Incomplete Archives

54

Page 55: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

55

Page 56: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking
Page 57: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

57

Page 58: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

58

Fortunately, we can still count requests to missing resources!

Page 59: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

59

Page 60: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

http://doubleclick.net/robots.txt:Robots.txt files

60

User-agent: *Disallow: /searchAllow: /search/aboutDisallow: /sdchDisallow: /groupsDisallow: /index.html?Disallow: /?Allow: /?hl=Disallow: /?hl=*&Allow: /?hl=*&gws_rd=ssl$Disallow: /?hl=*&*&gws_rd=sslAllow: /?gws_rd=ssl$Allow: /?pt1=true$Disallow: /imgres...

Page 61: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

http://doubleclick.net/robots.txt:Robots.txt files

61

User-agent: *Disallow: /searchAllow: /search/aboutDisallow: /sdchDisallow: /groupsDisallow: /index.html?Disallow: /?Allow: /?hl=Disallow: /?hl=*&Allow: /?hl=*&gws_rd=ssl$Disallow: /?hl=*&*&gws_rd=sslAllow: /?gws_rd=ssl$Allow: /?pt1=true$Disallow: /imgres...

The Wayback Machine respects these preferences retroactively!

Page 62: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Automatic Browsing

Input Munging Data Analysis & Visualization

62

TrackingExcavator

Page 63: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Inconsistent Archives

63

Page 64: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

64From https://web.archive.org/web/20151205072445/http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html

Page 65: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

65From https://web.archive.org/web/20151205072445/http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html

2008

2012

Page 66: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Input Munging Automatic Browsing

66

Block non-archival web requests

Page 67: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Automatic Browsing

Input Munging Data Analysis & Visualization

67

Timestamp filtering

Page 68: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Contributions

✓ TrackingExcavator, our Tool✓ Challenges of Retrospective Measurement● Validation of Archival Data vs. Ground Truth● Measurement Results: A 20 Year Picture of Web Tracking

68

Page 69: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Validation of Archival Data

69

Page 70: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Validation of Archival Data vs. Ground Truth

UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking and ground truth data over the last 5 years

*Roesner, Franziska, Tadayoshi Kohno, and David Wetherall. "Detecting and defending against third-party tracking on the web." Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012. 70

Page 71: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Archival trends reflect ground truth trends

71

Page 72: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Contributions

✓ TrackingExcavator, our Tool✓ Challenges of Retrospective Measurement✓ Validation of Archival Data vs. Ground Truth● Measurement Results: A 20 Year Picture of Web Tracking

72

Page 73: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Measurement Results

73

Page 74: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Measurement Results

"Web Tracking has Increased Over Time"

1. Number of trackers2. Type of tracking employed3. Coverage of top trackers

74

Page 75: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Measurement Results

"Web Tracking has Increased Over Time"

1. Number of trackers2. Type of tracking employed3. Coverage of top trackers

75

Page 76: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

76

Page 77: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Measurement Results

"Web Tracking has Increased Over Time"

1. Number of trackers2. Type of tracking employed3. Coverage of top trackers

77

Page 78: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

78

Page 79: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

79

Page 80: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

80

The power of individual trackers has grown over time. Individual companies know more about your browsing history than they did before.

Page 81: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

81

google-analytics.com

Page 82: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Individual companies' choices are important to your privacy

Transparency is important

82

Page 83: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

The "Worst" Days of the Web

Page 84: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

The Worst Days of the Web

84https://www.engadget.com/2014/08/14/the-creator-of-the-pop-up-ad-says-sorry/, accessed Aug 3 2016

Page 85: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

The "Worst" Days of the Web

85

Popups enable stronger tracking

Page 86: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

The "Worst" Days of the Web

Page 87: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

The "Worst" Days of the Web

2004: Internet Explorer starts popup-blocking by default

Page 88: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

The "Worst" Days of the Web

88

2003 - 2004

30 third-party popups (top 500 sites)

The choices of browser manufacturers, even those unrelated to tracking, influence the way trackers behave significantly

Page 89: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

We should study web tracking over time

89

1996 20162009

Krishnamurthy and Wills '09

Roesner, Kohno, and Wetherall '12

2005

Englehardt et al. '15

Eubank et al. '13

Gomer et al. '13

Mayer and Mitchell '12

The Future

This Work

Page 90: Internet Jones and the Raiders of the Lost Trackers...Validation of Archival Data vs. Ground Truth UW NSDI 2012 paper* (and follow-up measurements) provided a taxonomy of web tracking

Adam, Anna, Yoshi, & Franzi say: "Thanks!"

Insight: Web archives exist (Wayback Machine)

Challenges: Web Archives are sometimes 1. incomplete & 2. inconsistent

Tool: TrackingExcavator

Validation: Comparison of archival measurements to ground truth

Measurement Results: A 20 year picture of third party web tracking

90https://trackingexcavator.cs.washington.edu