the long story of short urls

52
The Long Story of Short URLs Federico Maggi Politecnico di Milano http://flic.kr/phretor

Upload: federico-maggi

Post on 28-Nov-2014

4.663 views

Category:

Technology


7 download

DESCRIPTION

I gave a talk based on these slides for the first time at Royal Holloway University of London, in April 2012. This talk discusses the results of a research that we have conducted about the impact on users of short URLs. I describe a system that we designed, implemented, and deployed that observes and collects the short URLs that more than 7,000 real web users have encountered while browsing the Web between March 2010 and April 2011 (and counting). On this dataset, which comprises 16,075,693 distinct short URLs, we first precisely characterized the usage habits observed during our collection process, and the content typically referred by short URLs: Users exhibit different usage habits depending on the type of content they are using short URLs for. We then analyzed the abuse of short URLs to hide the true URL of malicious pages: This practice is not widespread, although we noticed that the miscreants tend to post the same malicious short URL on multiple pages. Finally, we analyzed the countermeasures of shortening services against abuses of short URLs, and found that they are trivially bypassed by shortening a benign URL that turns malicious only a few moments after submitting it to the shortening service.

TRANSCRIPT

Page 1: The Long Story of Short URLs

The Long Storyof Short URLs Federico Maggi

Politecnico di Milano

http://flic.kr/phretor

Page 2: The Long Story of Short URLs

European System Security Researchers

• The research leading to the results presented in this talk has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no 257007.

• Builds on the FORWARD initiative, SysSec aims at:

• creating a virtual center of excellence, to consolidate the systems security research community in Europe,

• promoting cybersecurity education,

• engaging a think-tank in discovering the threats and vulnerabilities,

• creating an active research roadmap in the area, and

• developing a joint working plan to conduct collaborative research.

http://www.syssec-project.eu

Page 3: The Long Story of Short URLs

Brief history of short URLs

US Patent2000 2001

TinyURL2002

Twitter startsusing TinyURL

2005 2009 2011

Twitter switchesto Bit.ly

Bit.ly wp.metr.im

Rush to theshortest domains

Page 4: The Long Story of Short URLs

Today is it just bit.ly and t.co?

• We observed up to 622 shortening services

• Companies and famous bloggers have started using their own custom domains (e.g., pep.si, ti.me, flic.kr)

Short URLs have become a sort of "trendy gadget"

Page 5: The Long Story of Short URLs

How short URLs work

URL shorteningservice

http://example.com/very/long/?url=to&the=landing-pagelong URL

http://ab.cd/d73fYfzshort URL RANDOM SUFFIXIS GENERATED

"make me shorter"

Page 6: The Long Story of Short URLs

How short URLs work cont'd

REDIRECTIONMECHANISM

http://ab.cd/d73fYfz http://example.com/very/long/...HTTP 302

HTML META Refresh

JavaScript

ActionScript

executes on the browser's side

Page 8: The Long Story of Short URLs

Why short URLs could be misused

• Users have grown accustomed to see short URLs

• Users typically trust short URLs

• They look harmless

Page 9: The Long Story of Short URLs

http://srv153.example.com/very/long/?url=to&the=landing-page&p=121&id=20&par=value&very=suspicious&long=url&that=would&probably=not&fi

t=into&your=IM&chat=window&or=may&be=broken&into=severla=lines

http://i.am/so-tiny

VS

Page 10: The Long Story of Short URLs

From the bad guys' perspective

Perfect mean for masquerading suspicious URLs

• Trivially evade naïve checks

• Trendy effect (e.g., Twitter, Facebook)

• Robust to those clients that break long URLs into multiple lines

• Dynamic redirection mechanisms (e.g., JavaScript, timeout, "Click to continue") make the landing page unaccessible to automated scanners

Page 11: The Long Story of Short URLs

State of the art and related work

• Spam, phishing and other malicious activity on social networks use short URLs

• [Stringhini et al., ACSAC 2010], [Grier et al., CCS 2010], [Gao et al., IMC 2010]

• "Quality" of the content aliased via short URLs is either very high or very low

• [Kandylas et al., WWW 2010]

• Crawling existing short URLs and use APIs to expand and analyze them

• [Antoniades et al., WWW 2011]

• Common nodes of the redirection chains are distinctive of bad short URLs

• [Lee, S. and Kim, J., NDSS 2012]

Page 12: The Long Story of Short URLs

These work consider existing short URLs found on websites

None of them take the end users into account

Page 13: The Long Story of Short URLs

A different perspective what is the impact on users?

• What kind of short URLs users typically encounter?

• Do users stumble upon malicious short URLs that often?

• Do users perceive the maliciousness of a short URL?

• Do shortening services take enough countermeasures to protect the users?

User-centered measurement

Page 14: The Long Story of Short URLs

Data collection infrastructure

JS

Resolver

Collectors (users)

http://ab.cd/4jaYashttp://ab.cd/4jaYas

http://ab.cd/4jaYas

http://ab.cd/4jaYas

LANDINGPAGE

Page 16: The Long Story of Short URLs

How to avoid biased measurements?

• We do not ask a user to become a collector

• We provide a useful service that users may need

• Users spontaneously subscribe as collectors

What kind of service do we offer?

Page 17: The Long Story of Short URLs
Page 18: The Long Story of Short URLs

Data that we collect

Raw data

Timestamp

Short URL

Client's IP

Referrer

Extracted data

Next hop

Redirection chain

Landing pageTitleSizeContent. . .

Page 19: The Long Story of Short URLs

Collected data

• Total 7,000 distinct users (estimate from 1,370,277 distinct IPs)

• about 500 to 1,000 active users per day

• about 20,000 to 50,000 short URLs sent each day (100,000 peaks)

• 24,953,881 distinct short URLs encountered by users while browsing

0

50000

100000

150000

200000

250000

300000

Apr’10

May’10

Jun’10

Jul’10

Aug’10

Sep’10Sep’10

Oct’10

Nov’10

Dec’10

Jan’11

Feb’11

Mar’11

Apr’11

May’11

Jun’11

Jul’11

Aug’11

Sep’11Sep’11

Oct’11

Nov’11

Dec’11

Jan’12

Feb’12

Mar’12

Apr’12

Page 20: The Long Story of Short URLs

Geographical distribution of the collectors (GeoIP)

Page 21: The Long Story of Short URLs

Top services encountered by users while browsing

Distinct URLs Log entries

8,179,229 bit.ly 13,407,588 bit.ly1,047,790 tinyurl.com 2,056,857 tinyurl.com

922,682 t.co 1,658,808 t.co651,074 ow.ly 1,154,522 ow.ly607,939 goo.gl 1,045,336 goo.gl508,969 fb.me 709,444 j.mp481,398 4sq.com 648,435 is.gd435,418 tl.gd 618,033 4sq.com369,960 j.mp 576,815 fb.me332,118 is.gd 485,221 durl.me

(as of April 2011)

Page 22: The Long Story of Short URLs

Type of content aliased via short URLs

• We categorize landing pages and container pages

• We use a human-maintained list of categories (DMOZ Open Directory Project)

Table 2. Most- and least-popular landing page categories for the top 5 shortening services. Notethat there is an overlap between categories, so percentages do not necessarily add up to 100%.

Service Most frequent % Least frequent %

bit.lyNews 35.01 Naturism 14 ·10�6

Audio-video 17.47 Contraception 9.0 ·10�6

File-hosting 12.23 Astrology 2.0 ·10�6

tinyurlNews 31.41 Astrology 11 ·10�6

Audio-video 18.18 Childcare 2.95 ·10�6

File-hosting 16.31 Naturism 2.95 ·10�6

t.coAudio-video 44.37 Naturism 8.31 ·10�6

News 29.47 AntiSpyware 5.54 ·10�6

Blog 6.60 Childcare 2.77 ·10�6

ow.lyNews 35.55 Naturism 6.96 ·10�6

Socialnet 18.04 Astrology 6.96 ·10�6

Audio-video 17.94 Childcare 3.48 ·10�6

goo.glNews 50.35 Contraception 3.39 ·10�6

Blog 45.36 Weapons 3.39 ·10�6

Audio-video 6.43 Childcare 3.39 ·10�6

Thus, it appears that there is an incentive to using short URLs to interconnect a newwebsite to the existing sites and, in some sense, help drive more traffic toward it.3.5 The Short URLs Ecosystem

As part of their research, Antoniades and colleagues in [4] have analyzed the category ofthe pages to which bit.ly and ow.ly short URLs typically point, along with the categoryof the container page, that they had available for bit.ly URLs only. They assignedcategories to a selection of URLs. We did a similar yet more comprehensive analysis bycharacterizing all the short URLs that we collected by means of the categories describedin the following.

We classified the container pages (about 11,500,000 distinct URLs) and landingpages (about 500,000 distinct URLs) based on well-known and well-maintained pub-licly available directories: the DMOZ Open Directory Project (http://www.dmoz.org)and URLBlacklist.com. The former, which comprises 3,883,992 URLs, is a human-maintained directory organized in a tree structure: URLs are associated to nodes, eachwith localized, regional mirrors. We expanded these nodes by recursively merging

mail

chat

weather

searchengines

do-it-yourself

news

gambling

sportnews

adult

abortion

onlineauctions

games

instantmessaging

sports

0 5 10 15 20 25 30 35 40

Average number of distinct short URLs/page per container page category

54380

1102294

626

66063

4615

384242

18706

8753

74943

86

34

193888

110

17993

Figure 6. Categories of container page ranked by the average number of short URLs/page theyheld. The total number of distinct short URLs is also shown.

Distinct short URLs per container page category

Total short URLs per container page category

Page 23: The Long Story of Short URLs

What happens when users click on a short URL?

Page 24: The Long Story of Short URLs
Page 25: The Long Story of Short URLs

r Category

0.00 naturism 0.18 artnudes 0.36 weapons 0.75 shopping0.01 personalfinance 0.21 antispyware 0.36 cleaning 0.78 games0.01 do-it-yourself 0.23 drinks 0.37 dating 0.80 news0.03 pets 0.25 medical 0.39 vacation 0.82 government0.04 gardening 0.25 weather 0.40 religion 0.88 chat0.07 clothing 0.30 onlinegames 0.42 culinary 0.90 blog0.07 mail 0.32 jobsearch 0.45 filehosting 0.91 socialnetworking0.09 banking 0.33 sportnews 0.52 kidstimewasting 1.00 contraception0.12 abortion 0.33 gambling 0.55 ecommerce 1.00 childcare0.12 instantmessaging 0.36 drugs 0.67 adult 1.00 astrology0.13 jewelry 0.36 searchengines 0.68 audio-video 1.00 cellphones0.18 hacking 0.36 weapons 0.69 sports 1.00 onlineauctions

1.00 onlinepayment

⇢ =In(cat)

In(cat) +Out(cat)

⇢ ! 0⇢ ! 1

Many outbound short URLs (aggregators, e.g., Twitter)Many inbound short URLs (landing pages, e.g., news, blogs)

Page 26: The Long Story of Short URLs
Page 27: The Long Story of Short URLs
Page 28: The Long Story of Short URLs
Page 29: The Long Story of Short URLs

Content-specific vs. general-purpose services

Table 3. Ranking of categories by the ratio of incoming and outgoing connections via short URLs.

r Category

0.00 naturism 0.18 artnudes 0.36 weapons 0.75 shopping0.01 personalfinance 0.21 antispyware 0.36 cleaning 0.78 games0.01 do-it-yourself 0.23 drinks 0.37 dating 0.80 news0.03 pets 0.25 medical 0.39 vacation 0.82 government0.04 gardening 0.25 weather 0.40 religion 0.88 chat0.07 clothing 0.30 onlinegames 0.42 culinary 0.90 blog0.07 mail 0.32 jobsearch 0.45 filehosting 0.91 socialnetworking0.09 banking 0.33 sportnews 0.52 kidstimewasting 1.00 contraception0.12 abortion 0.33 gambling 0.55 ecommerce 1.00 childcare0.12 instantmessaging 0.36 drugs 0.67 adult 1.00 astrology0.13 jewelry 0.36 searchengines 0.68 audio-video 1.00 cellphones0.18 hacking 0.36 weapons 0.69 sports 1.00 onlineauctions

1.00 onlinepayment

landing pages. This is the case, for example, of 4sq.com (about 100%), whose shortURLs always bring from online social-networking sites to pages categorized as “other”.The most popular shortening services (e.g., bit.ly, goo.gl, ow.ly) fall into the second tier(i.e., 32–48%), together with those services that cover a wide variety of categories, andtypically interconnect pages of different categories.

Non-obvious Uses of Short URLs We also analyzed how short URLs interconnecttogether pages of different categories, to understand whether some categories have amajority of container or landing pages. To this end, we calculated the average frequencyof category change from the perspective of the container page and landing page. With thisdata we created a weighted digraph with 48 nodes, each corresponding to a category. Theweights are the frequencies of change, calculated between each pair of categories—andaveraged over all the short URLs and pages within each category; weights are between10.19 and 39.41% and distributed as shown in Fig. 8. We then calculate the averageweight of incoming, In(cat), and outgoing, Out(cat), edges for each category cat, andfinally derive the ratio r(cat) = In(cat)

In(cat)+Out(cat) . When r ! 0, the category has a majorityof outgoing short URLs (i.e., many container pages of such category), whereas r ! 1

tru

nc.

it

om

.ly

slid

esh

a.re

mo

by

.to

mig

re.m

e

wp

.me

tum

blr

.co

m

lnk

.ms

cot.

ag

flic

.kr

icio

.us

tin

y.l

y

amzn

.to

po

st.l

y

yo

utu

.be

p.t

l

tcrn

.ch

tl.g

d

ny

ti.m

s

ht.

ly

altu

rl.c

om

tin

yso

ng

.co

m

dld

.bz

su.p

r

ust

re.a

m

t.co

j.m

p

dlv

r.it

fb.m

e

twu

rl.n

l

go

o.g

l

ow

.ly

ff.i

m

pin

g.f

m

is.g

d

tin

y.c

c

bit

.ly

tin

yu

rl.c

om

mas

h.t

o

ur1

.ca

sqze

.it

cort

.as

shar

.es

4sq

.co

m

0 2

0 4

0 6

0 8

0 1

00

Med

ian

% c

ateg

ory

dri

ft

Most popular shorteners are also general-purpose and cover a wide variety of categories

#categories covered (min. 0, max. 48)

Figure 7. Frequency of change of category (median with 25- and 75-percent quantiles) and numberof categories covered (size of black dot) of the top 50 services. The most popular, general-purposeshortening services highlighted are characterized by an ample set of categories (close to 48, whichis the maximum) and short URLs that, in 32–48% of the cases, are published on pages havingcategories different from the landing page category.

Page 30: The Long Story of Short URLs

Table 3. Ranking of categories by the ratio of incoming and outgoing connections via short URLs.

r Category

0.00 naturism 0.18 artnudes 0.36 weapons 0.75 shopping0.01 personalfinance 0.21 antispyware 0.36 cleaning 0.78 games0.01 do-it-yourself 0.23 drinks 0.37 dating 0.80 news0.03 pets 0.25 medical 0.39 vacation 0.82 government0.04 gardening 0.25 weather 0.40 religion 0.88 chat0.07 clothing 0.30 onlinegames 0.42 culinary 0.90 blog0.07 mail 0.32 jobsearch 0.45 filehosting 0.91 socialnetworking0.09 banking 0.33 sportnews 0.52 kidstimewasting 1.00 contraception0.12 abortion 0.33 gambling 0.55 ecommerce 1.00 childcare0.12 instantmessaging 0.36 drugs 0.67 adult 1.00 astrology0.13 jewelry 0.36 searchengines 0.68 audio-video 1.00 cellphones0.18 hacking 0.36 weapons 0.69 sports 1.00 onlineauctions

1.00 onlinepayment

landing pages. This is the case, for example, of 4sq.com (about 100%), whose shortURLs always bring from online social-networking sites to pages categorized as “other”.The most popular shortening services (e.g., bit.ly, goo.gl, ow.ly) fall into the second tier(i.e., 32–48%), together with those services that cover a wide variety of categories, andtypically interconnect pages of different categories.

Non-obvious Uses of Short URLs We also analyzed how short URLs interconnecttogether pages of different categories, to understand whether some categories have amajority of container or landing pages. To this end, we calculated the average frequencyof category change from the perspective of the container page and landing page. With thisdata we created a weighted digraph with 48 nodes, each corresponding to a category. Theweights are the frequencies of change, calculated between each pair of categories—andaveraged over all the short URLs and pages within each category; weights are between10.19 and 39.41% and distributed as shown in Fig. 8. We then calculate the averageweight of incoming, In(cat), and outgoing, Out(cat), edges for each category cat, andfinally derive the ratio r(cat) = In(cat)

In(cat)+Out(cat) . When r ! 0, the category has a majorityof outgoing short URLs (i.e., many container pages of such category), whereas r ! 1

tru

nc.

it

om

.ly

slid

esh

a.re

mo

by

.to

mig

re.m

e

wp

.me

tum

blr

.co

m

lnk

.ms

cot.

ag

flic

.kr

icio

.us

tin

y.l

y

amzn

.to

po

st.l

y

yo

utu

.be

p.t

l

tcrn

.ch

tl.g

d

ny

ti.m

s

ht.

ly

altu

rl.c

om

tin

yso

ng

.co

m

dld

.bz

su.p

r

ust

re.a

m

t.co

j.m

p

dlv

r.it

fb.m

e

twu

rl.n

l

go

o.g

l

ow

.ly

ff.i

m

pin

g.f

m

is.g

d

tin

y.c

c

bit

.ly

tin

yu

rl.c

om

mas

h.t

o

ur1

.ca

sqze

.it

cort

.as

shar

.es

4sq

.co

m

0 2

0 4

0 6

0 8

0 1

00

Med

ian

% c

ateg

ory

dri

ft

Most popular shorteners are also general-purpose and cover a wide variety of categories

#categories covered (min. 0, max. 48)

Figure 7. Frequency of change of category (median with 25- and 75-percent quantiles) and numberof categories covered (size of black dot) of the top 50 services. The most popular, general-purposeshortening services highlighted are characterized by an ample set of categories (close to 48, whichis the maximum) and short URLs that, in 32–48% of the cases, are published on pages havingcategories different from the landing page category.

Page 31: The Long Story of Short URLs

Table 3. Ranking of categories by the ratio of incoming and outgoing connections via short URLs.

r Category

0.00 naturism 0.18 artnudes 0.36 weapons 0.75 shopping0.01 personalfinance 0.21 antispyware 0.36 cleaning 0.78 games0.01 do-it-yourself 0.23 drinks 0.37 dating 0.80 news0.03 pets 0.25 medical 0.39 vacation 0.82 government0.04 gardening 0.25 weather 0.40 religion 0.88 chat0.07 clothing 0.30 onlinegames 0.42 culinary 0.90 blog0.07 mail 0.32 jobsearch 0.45 filehosting 0.91 socialnetworking0.09 banking 0.33 sportnews 0.52 kidstimewasting 1.00 contraception0.12 abortion 0.33 gambling 0.55 ecommerce 1.00 childcare0.12 instantmessaging 0.36 drugs 0.67 adult 1.00 astrology0.13 jewelry 0.36 searchengines 0.68 audio-video 1.00 cellphones0.18 hacking 0.36 weapons 0.69 sports 1.00 onlineauctions

1.00 onlinepayment

landing pages. This is the case, for example, of 4sq.com (about 100%), whose shortURLs always bring from online social-networking sites to pages categorized as “other”.The most popular shortening services (e.g., bit.ly, goo.gl, ow.ly) fall into the second tier(i.e., 32–48%), together with those services that cover a wide variety of categories, andtypically interconnect pages of different categories.

Non-obvious Uses of Short URLs We also analyzed how short URLs interconnecttogether pages of different categories, to understand whether some categories have amajority of container or landing pages. To this end, we calculated the average frequencyof category change from the perspective of the container page and landing page. With thisdata we created a weighted digraph with 48 nodes, each corresponding to a category. Theweights are the frequencies of change, calculated between each pair of categories—andaveraged over all the short URLs and pages within each category; weights are between10.19 and 39.41% and distributed as shown in Fig. 8. We then calculate the averageweight of incoming, In(cat), and outgoing, Out(cat), edges for each category cat, andfinally derive the ratio r(cat) = In(cat)

In(cat)+Out(cat) . When r ! 0, the category has a majorityof outgoing short URLs (i.e., many container pages of such category), whereas r ! 1

tru

nc.

it

om

.ly

slid

esh

a.re

mo

by

.to

mig

re.m

e

wp

.me

tum

blr

.co

m

lnk

.ms

cot.

ag

flic

.kr

icio

.us

tin

y.l

y

amzn

.to

po

st.l

y

yo

utu

.be

p.t

l

tcrn

.ch

tl.g

d

ny

ti.m

s

ht.

ly

altu

rl.c

om

tin

yso

ng

.co

m

dld

.bz

su.p

r

ust

re.a

m

t.co

j.m

p

dlv

r.it

fb.m

e

twu

rl.n

l

go

o.g

l

ow

.ly

ff.i

m

pin

g.f

m

is.g

d

tin

y.c

c

bit

.ly

tin

yu

rl.c

om

mas

h.t

o

ur1

.ca

sqze

.it

cort

.as

shar

.es

4sq

.co

m

0 2

0 4

0 6

0 8

0 1

00

Med

ian

% c

ateg

ory

dri

ft

Most popular shorteners are also general-purpose and cover a wide variety of categories

#categories covered (min. 0, max. 48)

Figure 7. Frequency of change of category (median with 25- and 75-percent quantiles) and numberof categories covered (size of black dot) of the top 50 services. The most popular, general-purposeshortening services highlighted are characterized by an ample set of categories (close to 48, whichis the maximum) and short URLs that, in 32–48% of the cases, are published on pages havingcategories different from the landing page category.

Page 32: The Long Story of Short URLs

Security aspects related to short URLs

Page 33: The Long Story of Short URLs

Malicious short URLs encountered by users

Category Short URLs Long URLs Ratio

Phishing 88 79 1.11Malware 1,161 1,083 1.07Spam 731 694 1.05

Blacklist Phishing Malware Spam

Spamhaus - - 694Phishtank 61 - -Wepawet - 266 -Safe Browsing 18 817 -

Page 34: The Long Story of Short URLs

What type of sites contain malicious short URLs?

Table 4. Number of short and long URLs, respectively, classified as Phishing, Malware, and Spam.The dash ‘-’ indicates that the blacklist in question provides no data about the specific threat.

Category Short URLs Long URLs Ratio

Phishing 88 79 1.11Malware 1,161 1,083 1.07Spam 731 694 1.05

Blacklist Phishing Malware Spam

Spamhaus - - 694Phishtank 61 - -Wepawet - 266 -Safe Browsing 18 817 -

indicates that the category has a majority of incoming short URLs (i.e., many landingpages of such categories). The digraph is on Fig. 8.

As summarized in Tab. 3, there are clearly categories that exhibit a container-likeusage, that is, they typically contain more outgoing short URLs than incoming shortURLs. Besides a few extreme cases, which are mostly due to the scarcity of short URLs,container-like categories include, for instance, “do-it-yourself,” “mail” (web based emailscontain outgoing short URLs more often than being referred to by short URLs), and“hacking.”

Summary: Categories that we would anecdotally consider as aggregators of shortURLs are actually more often used as landing pages. The most notable example is“socialnetworking” (r = 0.91), which we would expect to have many outgoing links aspeople share lots of resources through them. Instead, it turns out that, from a globalviewpoint, this is no longer true. We calculated r by means of the number of incomingand outgoing edges, instead of using the average weights, and the ranking is the same.

3.6 Abuses of Short URLs

We wanted to understand if usage patterns of malicious short URLs differ from usagepatterns of legitimate short URLs. We first concentrate on the categories of containerpages where short URLs often appear. Then, we analyze the use of multiple short URLsto point to the same landing page. Finally, we focus on their lifespan.

Malicious Short URLs We wanted to understand to what extent the average userencounters malicious short URLs. To do this, we leveraged four datasets: the Spamhaus

socialnetworkingblog

otherchat

newsaudio-video

mailfilehosting

searchenginesgames

adultkidstimewasting

vacationcleaning

0 5 10 15 20 25 30 35

% of malicious short URLs

Figure 9. Categories of container page ranked by the percentage of malicious short URLs theyheld. Differently from Fig. 6, the average number of short URLs/page is not significant here as itis between 1 and 2 as detailed in § 3.6.

Cate

gory

of c

onta

iner p

age

Page 35: The Long Story of Short URLs

Malicious page

http://ab.cd/sfb4Ac

http://ab.cd/asd31A

http://ab.cd/5aD3B9

http://ab.cd/419E9s

http://ab.cd/sfb4Ac

. . .

http://ab.cd/5aD3B9

http://ab.cd/419E9s

Page 36: The Long Story of Short URLs

Aliasing of malicious pages using short URLs

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

Number of unique short URL per distinct landing URL (log-scale)

BenignSpam URLs

Malware URLs

(a) Distinct short URLs per distinct maliciousor benign landing URL.

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

(log) Number of unique container page per distinct short URL

Benign URLsSpam URLs

Malware URLs

(b) Distinct containers per distinct malicious orbenign short URL.

Figure 10. Number of distinct short URLs per unique landing page and distinct container page perunique short URL. Notably, benign short URLs exhibit different characteristics if compared withmalicious ones: More precisely, (b) spam short URLs are generally spread on a larger number ofcontainer pages than benign ones. Surprisingly, (a) more unique short URLs are created for thesame malicious landing page, whereas benign long URLs are typically less aliased. Phishing shortURLs are a minority with respect to the others, thus are not accounted.

We derived the maximum lifespan of each collected URL based on historical accesslogs to their container pages. We calculated the maximum lifespan (or simply lifespan)as the delta time between first and latest (i.e., more recent) occurrence of each shortURL in our database. More specifically, our definition of lifespan accounts for thefact that short URLs may disappear from some container pages and reappear after awhile on the same or other container pages. Fig. 11 shows the empirical cumulativedistribution frequency of the lifespan of malicious versus benign short URLs. About95% of the benign short URLs have a lifetime around 20 days, whereas 95% of themalicious short URLs lasted about 4 months. For example, we observed a spam campaignspanning between April 1st and June 30th 2010 that involved 1,806 malicious short URLsredirecting to junk landing pages; this campaign lasted about three months before beingremoved by tinyurl.com administrators. The latest MessageLabs Intelligence Annual

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

CD

F

Lifespan (hours between first and latest occurrence of a short URLs in our dataset)

3-months spam campaign disseminatedvia 1,806 tinyurl.com short URLs(April 1st-June 30, 2010)

BenignMalicious

Malicious (excluding 3-months Apr-Jun spam campaign)

Figure 11. Delta time between first and latest occurrence of malicious versus benign short URLs.The “peak” indicates a high, about 50 %, amount of spam short URLs that lasted about threemonths. Malicious URLs are usually found on multiple different container pages even for extendedperiods of time, whereas benign short URLs follow the “one-day-of-fame effect” pattern.

x = Distinct short URL per distinct landing page

Drive-by and spam landing pages are more aliased than benign ones.

Page 37: The Long Story of Short URLs

http://ab.cd/asd31A

http://ab.cd/5aD3B9

http://ab.cd/sfb4Ac

http://ab.cd/419E9s

Container page 1

http://ab.cd/asd31A

http://ab.cd/5aD3B9

http://ab.cd/419E9s

http://ab.cd/sfb4Ac

Container page 2

. . .

http://ab.cd/sfb4Ac

http://ab.cd/asd31A

http://ab.cd/5aD3B9

http://ab.cd/419E9s

Container page N

Page 38: The Long Story of Short URLs

Dissemination of malicious short URLs

x = Distinct container page per distinct short URL

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

Number of unique short URL per distinct landing URL (log-scale)

BenignSpam URLs

Malware URLs

(a) Distinct short URLs per distinct maliciousor benign landing URL.

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

(log) Number of unique container page per distinct short URL

Benign URLsSpam URLs

Malware URLs

(b) Distinct containers per distinct malicious orbenign short URL.

Figure 10. Number of distinct short URLs per unique landing page and distinct container page perunique short URL. Notably, benign short URLs exhibit different characteristics if compared withmalicious ones: More precisely, (b) spam short URLs are generally spread on a larger number ofcontainer pages than benign ones. Surprisingly, (a) more unique short URLs are created for thesame malicious landing page, whereas benign long URLs are typically less aliased. Phishing shortURLs are a minority with respect to the others, thus are not accounted.

We derived the maximum lifespan of each collected URL based on historical accesslogs to their container pages. We calculated the maximum lifespan (or simply lifespan)as the delta time between first and latest (i.e., more recent) occurrence of each shortURL in our database. More specifically, our definition of lifespan accounts for thefact that short URLs may disappear from some container pages and reappear after awhile on the same or other container pages. Fig. 11 shows the empirical cumulativedistribution frequency of the lifespan of malicious versus benign short URLs. About95% of the benign short URLs have a lifetime around 20 days, whereas 95% of themalicious short URLs lasted about 4 months. For example, we observed a spam campaignspanning between April 1st and June 30th 2010 that involved 1,806 malicious short URLsredirecting to junk landing pages; this campaign lasted about three months before beingremoved by tinyurl.com administrators. The latest MessageLabs Intelligence Annual

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

CD

F

Lifespan (hours between first and latest occurrence of a short URLs in our dataset)

3-months spam campaign disseminatedvia 1,806 tinyurl.com short URLs(April 1st-June 30, 2010)

BenignMalicious

Malicious (excluding 3-months Apr-Jun spam campaign)

Figure 11. Delta time between first and latest occurrence of malicious versus benign short URLs.The “peak” indicates a high, about 50 %, amount of spam short URLs that lasted about threemonths. Malicious URLs are usually found on multiple different container pages even for extendedperiods of time, whereas benign short URLs follow the “one-day-of-fame effect” pattern.

Spam short URLs are disseminated on a larger number of container pages.

Page 39: The Long Story of Short URLs

Lifespan of malicious short URLs

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

Number of unique short URL per distinct landing URL (log-scale)

BenignSpam URLs

Malware URLs

(a) Distinct short URLs per distinct maliciousor benign landing URL.

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 10 100 1000

CD

F

(log) Number of unique container page per distinct short URL

Benign URLsSpam URLs

Malware URLs

(b) Distinct containers per distinct malicious orbenign short URL.

Figure 10. Number of distinct short URLs per unique landing page and distinct container page perunique short URL. Notably, benign short URLs exhibit different characteristics if compared withmalicious ones: More precisely, (b) spam short URLs are generally spread on a larger number ofcontainer pages than benign ones. Surprisingly, (a) more unique short URLs are created for thesame malicious landing page, whereas benign long URLs are typically less aliased. Phishing shortURLs are a minority with respect to the others, thus are not accounted.

We derived the maximum lifespan of each collected URL based on historical accesslogs to their container pages. We calculated the maximum lifespan (or simply lifespan)as the delta time between first and latest (i.e., more recent) occurrence of each shortURL in our database. More specifically, our definition of lifespan accounts for thefact that short URLs may disappear from some container pages and reappear after awhile on the same or other container pages. Fig. 11 shows the empirical cumulativedistribution frequency of the lifespan of malicious versus benign short URLs. About95% of the benign short URLs have a lifetime around 20 days, whereas 95% of themalicious short URLs lasted about 4 months. For example, we observed a spam campaignspanning between April 1st and June 30th 2010 that involved 1,806 malicious short URLsredirecting to junk landing pages; this campaign lasted about three months before beingremoved by tinyurl.com administrators. The latest MessageLabs Intelligence Annual

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

CD

F

Lifespan (hours between first and latest occurrence of a short URLs in our dataset)

3-months spam campaign disseminatedvia 1,806 tinyurl.com short URLs(April 1st-June 30, 2010)

BenignMalicious

Malicious (excluding 3-months Apr-Jun spam campaign)

Figure 11. Delta time between first and latest occurrence of malicious versus benign short URLs.The “peak” indicates a high, about 50 %, amount of spam short URLs that lasted about threemonths. Malicious URLs are usually found on multiple different container pages even for extendedperiods of time, whereas benign short URLs follow the “one-day-of-fame effect” pattern.

Malicious short URLs typically survive longer than benign ones.

Exception: a spam campaign (Storm botnet?) with1,806 short URLs deleted by tinyurl.com's administrators.

Page 40: The Long Story of Short URLs

Are shortening services taking countermeasures?

1. Prepare a list of benign and malicious long URLs

2. Shorten them via the top 6 shortening services (e.g., bit.ly, is.gd, tinyurl.com)

2.1.Do they accept malicious URLs (spam, phishing, drive-by download)?

3. Try to access the malicious shortened URLs

3.1.Do they warn the users when they resolve the short URLs?

4. Modify the benign long URLs (under our control) and make them malicious

4.1.Do they periodically check their databases of existing short URLs?

Page 41: The Long Story of Short URLs

URL shorteningservice

http://example.com/very/long/?url=to&the=landing-pagelong URL

"make me shorter"

Blacklist

"is this long URL malicious?"Yes!

"sorry, can't shorten this URL"

Page 42: The Long Story of Short URLs

Malicious long URLs accepted by top services

Service Malware Phishing Spam

# % # % # %

bit.ly 997 99.7 1,000 100.0 1,000 100.0durl.me 898 89.8 937 93.7 216 21.6

goo.gl 999 99.9 994 99.4 1,000 100.0is.gd 640 64.0 358 35.8 143 14.3

migre.me 201 20.1 402 40.2 235 23.5tinyurl.com 997 99.7 996 99.6 998 99.8

Overall 4,732 78.9 4,687 78.1 3,592 59.9

Page 43: The Long Story of Short URLs

URL shorteningservice

http://i.am/so-niceshort URL

"please resolve this"

Aliases DB

"is the long URL malicious?"Yes!

"sorry, can't resolve this URL"

Page 44: The Long Story of Short URLs

Alerting users when accessing bad short URLs

Service Malware Phishing Spam

bit.ly 100.0 97.5 99.9durl.me 100.0 100.0 100.0

goo.gl 66.4 96.9 78.7is.gd 43.3 42.9 78.7

migre.me 46.8 40.6 95.7tinyurl.com 43.5 43.2 77.1

Overall 66.6 70.2 88.4

Shortened malicious URLs detected when accessed: 9,987 shortened malicious URLs

Page 45: The Long Story of Short URLs

URL shorteningservice

http://our.server/dynamic-page.phpdynamic long URL

"make me shorter"

Blacklist

"is this long URL malicious?"No!

http://ab.cd/good-today

Page 46: The Long Story of Short URLs

http://our.server/dynamic-page.php?redirect=http://evil.comafter 24 hours

Page 47: The Long Story of Short URLs

URL shorteningservice

http://ab.cd/good-todayshort URL

"please resolve this"

Aliases DB

"is the long URL malicious?"No!

http://our.server/dynamic-page.php?redirect=http://evil.com

Page 48: The Long Story of Short URLs

Deferred maliciousness

Threat Shortened Blocked Not Blocked

Malware 162 0% 100%Phishing 180 0% 100%

Spam 150 0% 100%

Overall 492 0% 100%

Number of deferred malicious short URLs submitted and percentage of blocked versus

Page 49: The Long Story of Short URLs

Limitations & future work or what we still need to do

• We collect short URLs only when container pages are visited.

• We track clicks on short URLs, but we collected 42,147 clicks (too early to draw conclusions).

• We have not tracked whether existing, benign short URLs turn into malicious short URLs.

Page 50: The Long Story of Short URLs

Conclusions: What is the impact on users?

• What do users use short URLs for?

• Share ephemeral resources to user-generated content (e.g., social nets)

• Do users stumble upon short URLs that often?

• Not very often: ~1K over 16M

• Do users perceive the maliciousness of a short URL?

• Not much: almost no one clicked on our "flag as malicious" link. Also confirmed by [Onarlioglu et al., NDSS 2012]

• Do URL shortening services take enough countermeasures to protect the users?

• Some of them use blacklists but do not proactively check existing aliases

Page 51: The Long Story of Short URLs

We're still collecting short URLs

• 16,075,693 over 24,953,881 analyzed thoroughly

• No big changes in the new portion of the dataset

0

50000

100000

150000

200000

250000

300000

Apr’10

May’10

Jun’10

Jul’10

Aug’10

Sep’10Sep’10

Oct’10

Nov’10

Dec’10

Jan’11

Feb’11

Mar’11

Apr’11

May’11

Jun’11

Jul’11

Aug’11

Sep’11Sep’11

Oct’11

Nov’11

Dec’11

Jan’12

Feb’12

Mar’12

Apr’12

Page 52: The Long Story of Short URLs

The Long Storyof Short URLs Federico Maggi

Politecnico di Milano

Questions? [email protected]@phretor

Alessandro FrossiGianluca StringhiniBrett Stone-Gross

Stefano ZaneroChristopher Kruegel

Giovanni Vigna

Politecnico di MilanoUC Santa BarbaraUC Santa Barbara

Politecnico di MilanoUC Santa BarbaraUC Santa Barbara

Co-authors