on designing and evaluating phishing webpage detection ... · phishing activity trends report 1st...
TRANSCRIPT
On Designing and Evaluating Phishing Webpage Detection Techniques for the Real WorldSamuel Marchal, N. AsokanAalto University, Finland
Phishing webpage
2
Phishing webpage (phish) Legitimate webpage
State of research on phishing detection
• Threat known since late 1990s• First protection technique[1] early 2000s
• > 4,000 articles on “phishing”- Half as popular as "malware”
• Many solutions report high accuracy- Cantina[2] (2007): 97%- Whittaker et al.[3] (2010): 99.9%- Off-the-Hook[4] (2017): 99.9%
3
[1] Herzberg and Gbara, ”Trustbar: Protecting (even naive) web users from spoofing and phishing attacks” in Cryptology ePrint Archive, 2004.[2] Zhang et al., ”CANTINA: A content-based approach to detecting phishing web sites” in WWW, 2007.[3] Whittaker et al., ”Large-scale automatic classification of phishing pages” in NDSS Symposium, 2010.[4] Marchal et al., ”Off-the-hook: An efficient and usable client-side phishing prevention application” in IEEE Transactions on Computers 66, 10, 2017.
State of phishing threat
Monetary damage:• 2013-2016: $1.6 billion loss for businesses (US only)• Most expensive attack (2015): $100 million cost (US defense department)
4
Global Phishing Survey 2016: Trends and Domain Name Use
An APWG Industry Advisory http://www.apwg.org ● [email protected] ● 25 April 2017
PMB 246, 405 Waltham Street, Lexington MA USA 02421
6
deceptive messages in the translated domains names. Please see page 25 for more detail.
Basic Statistics
2012 2013 2014 2015 2016
Domain names used for phishing 153,952 135,848 183,222 160,155 195,475
Attacks 216,938 188,323 247,713 227,471 255,065 TLDs used 204 223 288 355 454 new gTLDs used 0 0 72 120 228 IP-based phish (unique IPs) 4,899 4,366 6,472 2,245 5,378
Maliciously registered domains 13,545 35,004 49,932 34,102 95,424
0
50,000
100,000
150,000
200,000
250,000
300,000
2012 2013 2014 2015 2016
Phishing Attacks and Domains Used, 2012-2016
Attacks
Domainsused forPhishingMaliciouslyRegisteredDomains
Phishing Activity Trends Report
1st Quarter 2018 w w w . a p w g . o r g • i n f o @ a p w g . o r g
4
Phishing Activity Trends Report, 1st Quarter 2018
�
The total number of phish detected in 1Q 2018 was 263,538. This was up 46 percent from the 180,577 observed in 4Q
2017. It was also significantly more than the 190,942 seen in 3Q 2017.
0
20,000
40,000
60,000
80,000
100,000
120,000
Oct 2017 Nov 2017 Dec 2017 Jan 2018 Feb 2018 Mar 2018
Unique Phishing Sites Detected,4Q2017-1Q2018
The number of unique phishing reports submitted to APWG during 1Q 2018 was 262,704, compared to 233,613 in 4Q
2017 and 296,208 in 3Q 2017.
Phishing Site and Phishing E-mail Trends – 1st Quarter 2018
Phishing attacks 2012-2016 Phishing websites 2017-2018
Source: Anti Phishing Worgin Group (APWG).
Detection of phishing webpages
Gap between • High accuracy reported in literature• Low effectiveness when applied to the real-world
What goes wrong during design & evaluation?• Design choices only driven by high detection accuracy level• Evaluation not representative of the real-world
5
Effective phishing detection
Requirements for effectiveness• Detection performance• Temporal resilience• Deployability• Usability
Recommendations• Design of detection method• Evaluation- Ground truth selection- Assessment methodology
6
ML-based phishing webpage detection
Machine learning based phishing detection
8
Webpage Feature extraction Model Prediction
Phish
Legit
Phishing detector training
9
Ground truth collection Feature extraction Model learning
Phishing detectionmodel
Phish
Legit
Design of detection method
System design
11
Centralized solution currently favored by industry….….but increasing privacy concerns may change the game.
Centralized Client-side
Pros • High computational power• Easy model updates • Confidentiality of detection model
• User privacy• Fast decision• Website data availability
Cons • Delay in decision• Impacts user privacy (browsing history)
• Degrades client device performance• Lack of model confidentiality
Evaluation
Evaluation setup
13
Trainingset
Leg Phish
Test set
Model learning Phishing detectionmodel
Prediction
Phish
Legit
Ground truth collection
Accuracymetrics
Ground truth selection
Improve relevance of accuracy results • Validity• Generalizability• Reproducibility
Ground truth • Validity of labels• Representativeness• Availability
14
Phishing
Benign
Webpage selection
Generic guidelines• Multi-lingual + different alphabet
• Publicly available sources (≠ static dataset)
Legitimate webpage• Diverse popularity
• Real URLs: as browsed
- www.amazon.com ≠ https://www.amazon.com/gp/cart/view.html?ref=nav_cart
Phishing webpage• Targeting different brands
• Fresh and up-to-date
- PhishTank (https://www.phishtank.com/)
- OpenPhish (https://openphish.com) 15
Phishing webpage validity
Analysis of 23,118 phishing pages (source Phishtank)• 59% valid (13,646)
• 41% invalid (9,472) - Content unavailable
- Domain parking
- Legitimate webpage
Phishing data requires sanitization• Scrape and save webpages of fresh phishes
• Sanitization- Screenshot analysis
- Google search with keywords
- Later visit of URL
16
Dataset usage
Follow realistic use cases• Train model with oldest data & test with newest data - No cross-validation to get accuracy metrics
• Larger testing set than training set → scalability• Use real-world distribution: 1 phish / 100 legitimate pages → relevant accuracy metrics
17
Accuracy metrics
• Phishing detection capability True positive rate !"# = %&%&'()
• Erroneous phishing warnings False positive rate *"# = (&%)'(&
• Correctness of phishing warnings Precision "+,-./.01 = %&%&'(&
18
Positive (P) = identified as phish Negative (N) = identified as benign
True positive (TP) = detected phish False positive (FP) = benign detected as phish
False negative (FN) = missed phish True negative (TN) = benign identified benign
Temporal resilience
Ensure steadiness of effectiveness over time
Longitudinal study: readiness for deployment• Data collection over extended period of time• Recompute accuracy metrics- Steady accuracy without retraining → ready for deployment / low maintenance cost- Steady accuracy with retraining → ready for deployment / maintenance cost depends on retraining period- Decrease in accuracy with retraining → not ready for deployment
Resilience to adversaries• Security assessment using adversarial machine learning attacks• Evaluate manipulability of features
19
Conclusion
Recommendations• Design of detection method• Evaluation- Ground truth selection- Assessment methodology
Goals for research in phishing detection• Relevant accuracy results + easy comparison• More impactful research → technology transfer
20
On Designing and Evaluating Phishing Webpage Detection Techniques for the Real WorldSamuel Marchal, N. AsokanAalto University, Finland