on designing and evaluating phishing webpage detection ... · phishing activity trends report 1st...

On Designing and Evaluating Phishing Webpage Detection Techniques for the Real WorldSamuel Marchal, N. AsokanAalto University, Finland

[email protected]

Phishing webpage

2

Phishing webpage (phish) Legitimate webpage

State of research on phishing detection

• Threat known since late 1990s• First protection technique[1] early 2000s

• > 4,000 articles on “phishing”- Half as popular as "malware”

• Many solutions report high accuracy- Cantina[2] (2007): 97%- Whittaker et al.[3] (2010): 99.9%- Off-the-Hook[4] (2017): 99.9%

3

[1] Herzberg and Gbara, ”Trustbar: Protecting (even naive) web users from spoofing and phishing attacks” in Cryptology ePrint Archive, 2004.[2] Zhang et al., ”CANTINA: A content-based approach to detecting phishing web sites” in WWW, 2007.[3] Whittaker et al., ”Large-scale automatic classification of phishing pages” in NDSS Symposium, 2010.[4] Marchal et al., ”Off-the-hook: An efficient and usable client-side phishing prevention application” in IEEE Transactions on Computers 66, 10, 2017.

State of phishing threat

Monetary damage:• 2013-2016: $1.6 billion loss for businesses (US only)• Most expensive attack (2015): $100 million cost (US defense department)

4

Global Phishing Survey 2016: Trends and Domain Name Use

An APWG Industry Advisory http://www.apwg.org ● [email protected] ● 25 April 2017

PMB 246, 405 Waltham Street, Lexington MA USA 02421

6

deceptive messages in the translated domains names. Please see page 25 for more detail.

Basic Statistics

2012 2013 2014 2015 2016

Domain names used for phishing 153,952 135,848 183,222 160,155 195,475

Attacks 216,938 188,323 247,713 227,471 255,065 TLDs used 204 223 288 355 454 new gTLDs used 0 0 72 120 228 IP-based phish (unique IPs) 4,899 4,366 6,472 2,245 5,378

Maliciously registered domains 13,545 35,004 49,932 34,102 95,424

0

50,000

100,000

150,000

200,000

250,000

300,000

2012 2013 2014 2015 2016

Phishing Attacks and Domains Used, 2012-2016

Attacks

Domainsused forPhishingMaliciouslyRegisteredDomains

Phishing Activity Trends Report

1st Quarter 2018 w w w . a p w g . o r g • i n f o @ a p w g . o r g

4

Phishing Activity Trends Report, 1st Quarter 2018

�

The total number of phish detected in 1Q 2018 was 263,538. This was up 46 percent from the 180,577 observed in 4Q

2017. It was also significantly more than the 190,942 seen in 3Q 2017.

0

20,000

40,000

60,000

80,000

100,000

120,000

Oct 2017 Nov 2017 Dec 2017 Jan 2018 Feb 2018 Mar 2018

Unique Phishing Sites Detected,4Q2017-1Q2018

The number of unique phishing reports submitted to APWG during 1Q 2018 was 262,704, compared to 233,613 in 4Q

2017 and 296,208 in 3Q 2017.

Phishing Site and Phishing E-mail Trends – 1st Quarter 2018

Phishing attacks 2012-2016 Phishing websites 2017-2018

Source: Anti Phishing Worgin Group (APWG).

Detection of phishing webpages

Gap between • High accuracy reported in literature• Low effectiveness when applied to the real-world

What goes wrong during design & evaluation?• Design choices only driven by high detection accuracy level• Evaluation not representative of the real-world

5

Effective phishing detection

Requirements for effectiveness• Detection performance• Temporal resilience• Deployability• Usability

Recommendations• Design of detection method• Evaluation- Ground truth selection- Assessment methodology

6

ML-based phishing webpage detection

Machine learning based phishing detection

8

Webpage Feature extraction Model Prediction

Phish

Legit

Phishing detector training

9

Ground truth collection Feature extraction Model learning

Phishing detectionmodel

Phish

Legit

Design of detection method

System design

11

Centralized solution currently favored by industry….….but increasing privacy concerns may change the game.

Centralized Client-side

Pros • High computational power• Easy model updates • Confidentiality of detection model

• User privacy• Fast decision• Website data availability

Cons • Delay in decision• Impacts user privacy (browsing history)

• Degrades client device performance• Lack of model confidentiality

Evaluation

Evaluation setup

13

Trainingset

Leg Phish

Test set

Model learning Phishing detectionmodel

Prediction

Phish

Legit

Ground truth collection

Accuracymetrics

Ground truth selection

Improve relevance of accuracy results • Validity• Generalizability• Reproducibility

Ground truth • Validity of labels• Representativeness• Availability

14

Phishing

Benign

Webpage selection

Generic guidelines• Multi-lingual + different alphabet

• Publicly available sources (≠ static dataset)

Legitimate webpage• Diverse popularity

• Real URLs: as browsed

- www.amazon.com ≠ https://www.amazon.com/gp/cart/view.html?ref=nav_cart

Phishing webpage• Targeting different brands

• Fresh and up-to-date

- PhishTank (https://www.phishtank.com/)

- OpenPhish (https://openphish.com) 15

https://www.phishtank.com/

https://openphish.com/

Phishing webpage validity

Analysis of 23,118 phishing pages (source Phishtank)• 59% valid (13,646)

• 41% invalid (9,472) - Content unavailable

- Domain parking

- Legitimate webpage

Phishing data requires sanitization• Scrape and save webpages of fresh phishes

• Sanitization- Screenshot analysis

- Google search with keywords

- Later visit of URL

16

Dataset usage

Follow realistic use cases• Train model with oldest data & test with newest data - No cross-validation to get accuracy metrics

• Larger testing set than training set → scalability• Use real-world distribution: 1 phish / 100 legitimate pages → relevant accuracy metrics

17

Accuracy metrics

• Phishing detection capability True positive rate !"# = %&%&'()

• Erroneous phishing warnings False positive rate *"# = (&%)'(&

• Correctness of phishing warnings Precision "+,-./.01 = %&%&'(&

18

Positive (P) = identified as phish Negative (N) = identified as benign

True positive (TP) = detected phish False positive (FP) = benign detected as phish

False negative (FN) = missed phish True negative (TN) = benign identified benign

Temporal resilience

Ensure steadiness of effectiveness over time

Longitudinal study: readiness for deployment• Data collection over extended period of time• Recompute accuracy metrics- Steady accuracy without retraining → ready for deployment / low maintenance cost- Steady accuracy with retraining → ready for deployment / maintenance cost depends on retraining period- Decrease in accuracy with retraining → not ready for deployment

Resilience to adversaries• Security assessment using adversarial machine learning attacks• Evaluate manipulability of features

19

Conclusion

Recommendations• Design of detection method• Evaluation- Ground truth selection- Assessment methodology

Goals for research in phishing detection• Relevant accuracy results + easy comparison• More impactful research → technology transfer

20

On Designing and Evaluating Phishing Webpage Detection Techniques for the Real WorldSamuel Marchal, N. AsokanAalto University, Finland

[email protected]

on designing and evaluating phishing webpage detection ... · phishing activity trends report 1st...

Documents