phishscore: hacking phishers’ minds cnsm 2014 – fault tolerance and security track november 18,...

19
PhishScore: Hacking Phishers’ Minds CNSM 2014 – Fault Tolerance and Security Track November 18, 2014 Samuel Marchal, Jérôme François, Radu State and Thomas Engel {samuel.marchal,radu.state,thomas.engel}@uni.lu [email protected]

Upload: thomas-cooper

Post on 26-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

PhishScore: Hacking Phishers’ Minds

CNSM 2014 – Fault Tolerance and Security TrackNovember 18, 2014

Samuel Marchal, Jérôme François, Radu State and Thomas Engel

{samuel.marchal,radu.state,thomas.engel}@[email protected]

PhishScore at a glance

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 1 / 16

• Use of technical subterfuges and social engineering to steal any kind of valuable consumers’ data:

• Identity information• Web-sites credentials: login, password, etc.• Credit card information• Etc.

• Cause billions of dollars of loss every year

What is Phishing ?

2 / 16PhishScore: Hacking Phishers‘ Minds – Samuel Marchal

Phishing techniques and statistics

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 3 / 16

• Web based delivery

• Trojan hosts

• Content Injection (website)

• Phishing emails

• Instant messaging

• Fake websites

• etc.

Phishing website example

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 4 / 16

Phishing URLs characteristics

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 5 / 16

• Long URLs (many level domains, long path, etc.)• Composed of many labels• Embed targeted brand at different URL level e.g. Yahoo, Wells Fargo• Embed specific key words

www.paypal.creasconsultores.com/www.paypal.com/Resolutioncenter.php

shevkun.org/css/paypal.com/cgi-bin/cmd%3D_login-submit/css/websc.php

us-mg6.mail.yahoo.com.dwarkamaigroup.com/Yahoo.html

emailoans.hostingventure.com.au/bankofamerica.com

nitkowski.pl/components/wellsfargo/questions.php

URL characteristics:

Prior Work

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 6 / 16

URL lexical analysis

•Garrera et al. [WORM `07]

Logistic regression with word based features

•Ma et al. [SIGKDD `09]

Batch classification method with lexical and host based features

•Blum et al. [AISec `10]

Refined technique with binary feature for each word/level

•Le et al. [Infocom `11]

Batch and online learning with lexical features and URL features

Phishing URLs characteristics

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 7 / 16

www.paypal.creasconsultores.com/www.paypal.com/Resolutioncenter.php

shevkun.org/css/paypal.com/cgi-bin/cmd%3D_login-submit/css/websc.php

us-mg6.mail.yahoo.com.dwarkamaigroup.com/Yahoo.html

emailoans.hostingventure.com.au/bankofamerica.com

nitkowski.pl/components/wellsfargo/questions.php

The registered domain has no relationship with the rest of the URL

• Most parts of URLs can be freely defined• Except the registered domain: main level domain + public suffix

4ld.3ld.http:// mld.ps /path1/path2?key1=value1&key2=value2

Proposition for Phishing URL Detection

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 8 / 16

Hypothesis: • Components of legitimate URLs are all related

• Registered domains (mld.ps) of phishing URLs are not related to

the remaining of the URL

Analyse relatedness between mld.ps and the remaining part of a URL : Intra-URL relatedness

Intra-URL relatedness

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 9 / 16

URL label extraction:

login.paypal.com/securepayment

•RDurl = {paypal; paypal.com}

•REMurl = {login; secure; payment}

http://4ld.3ld.mld.ps/path1/path2?key1=value1&key2=value2

Basic splitting

“mld” & “mld.ps”

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 10 / 16

• Compare the two sets RDurl and REMurl

• Existing word relatedness techniques : Wordnet [Miller90], NGD [Cilibrasi07], Disco [Kolb08], etc.

Problem: all dictionary based and ”Internet” vocabulary is not necessarily contained in dictionary

• Idea : use Search Engine Query Data •Web searches reflect the cognitive behaviour of users looking for services on Internet (what phishers try to identify and to mimic)•Request well-known services: Google Trends & Yahoo Clues•See which words are requested together in search engines to infer word relatedness

How to evaluate intra-URL relatedness ?

Intra-URL relatedness evaluation

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 11 / 16

Intra-URL relatedness evaluation

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 12 / 16

12 features representing intra-URL relatedness:

Features set

JRR JRA JAA

JAR JARrd JARrem

cardrem

ratioArem

ratioRrem

mldres

mld.psres

ranking

Word set relatedness(Jaccard index)

Words embedded in URL

Popularity of words in URL

Popularity of registered domain

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 13 / 16

Feature analysis

• Datasets:• 48,009 phishing URLs

(source: PhishTank)• 48,009 legitimate

URLs (source DMOZ)• Features extraction

for all dataset

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 14 / 16

URL classification

• Machine learning approach:• Determine the best classifier to identify phishing URLs• 7 classifiers tested: Random Forest, C4.5, JRip, SVM, etc.• 10-fold cross-validation on the presented feature set (96,016 URLs)

• Random Forest:

94.91% accuracy

1.44% FPrate

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 15 / 16

URL rating• Random Forest based rating system:

• Use soft prediction score [0;1] as URL score:• 1: phishing URL• 0: legitimate URL

• 0: 22,863 legitimate // 40 phishing• 1: 26 legitimate // 34,790 phishing

99.89% correctness on

60.11% of the dataset

• [0;0.1] and [0.9;1]

99.22% correctness on

83.97% of the dataset

Conclusion

URL Semantic Analysis for Phishing Detection – Samuel Marchal 16 / 16

Lexical analysis to detect phishing URLs:

• Intra-URL relatedness

• Word relatedness inferred with search engine query data

• Phishing URL detection: 95% accuracy (FP rate = 1.44%)

• URL rating system: >99% correctness for > 80% URLs

Future Work:

• Use distributed on-line processing (Big Data) to reduce delay

• Implementation as phishing email filtering and browser add-on

PhishScore

PhishScore: Hacking Phishers’ Minds

CNSM 2014 – Fault Tolerance and Security TrackNovember 18, 2014

Samuel Marchal, Jérôme François, Radu State and Thomas Engel

{samuel.marchal,radu.state,thomas.engel}@[email protected]

Phishing summary

PhishScore: Hacking Phishers‘ Minds – Samuel Marchal 5 / 17

• Phishing:• seeks to steal different kind of data• targets several industry sector• uses various techniques

Is there a global characteric for phishing ?

No , but most of phishing attacks rely on fake websites using redirecting links

Phishing detection technique with wide scope:Phishing URL identification