phishing webpage detection jau-yuan chen coms e6125 whim march 24, 2009

19
Phishing Webpage Detection Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

Upload: beverly-cook

Post on 02-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

Phishing Webpage DetectionPhishing Webpage Detection

Jau-Yuan Chen

COMS E6125 WHIMMarch 24, 2009

Page 2: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

• Source: "Phishing Activity Trends Report," APWG, December 2008

• APWG: Anti-Phishing Working Group • (Definition)– Phishing is a criminal mechanism employing both social engineer-

ing and technical subterfuge to steal consumers’ personal identity data and financial account credentials.

– Social engineering schemes use ‐ spoofed e‐mails purporting to be from legitimate businesses and agencies to lead consumers to counterfeit websites designed to trick recipients into divulging financial data such as usernames and passwords.

– Technical subterfuge schemes‐ plant crimeware onto PCs to steal credentials directly, often using systems to intercept consumers online account user names and passwords and ‐ to corrupt local navigational infrastructures to misdirect consumers to counterfeit websites (or authentic websites through phisher controlled ‐proxies used to monitor and intercept consumers’ keystrokes).

April 20, 2023 2

What is “Phishing”?

Page 3: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

• The number of crimeware-spreading sites infecting PCs with password-stealing crimeware reached an all time high of 31,173 in December, 2008.

• Unique phishing reports submitted to APWG recorded a yearly high of 34,758 in December, 2008.

• in 2007 (a survey by Gartner, Inc.)–more than $3.2 billion was lost to phishing attacks in the US– 3.6 million adults lost money in phishing attacks

April 20, 2023 3

Severity of the “Phishing” Problem

Page 4: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

WHY PHISHING PAGE DETECTION?WHY PHISHING PAGE DETECTION?

April 20, 2023 4

Page 5: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

April 20, 2023 5

eBay?

It’s difficult to distinguish It’s difficult to distinguish these pages!these pages!

Page 6: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

April 20, 2023 6

Most Targeted Industry

Page 7: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

• text-based page analysis– URL analysis– HTML parsing– keyword extraction

• however, phishers can easily avoid detection by using non-html components, such as– images, – Flash, – ActiveX, etc.

April 20, 2023 7

Current Anti-phishing Solutions

Page 8: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

Image-based Anti-phishing SchemeImage-based Anti-phishing Scheme

focus on "what you see", not "how the page is composed"!

J.-Y. Chen, and K.-T. Chen, “A Robust Local Feature-based Scheme for Phishing Page Detection and Discrimination,” Web 2.0 Trust 2008.

K.-T. Chen, J.-Y. Chen, C.-R. Huang, and C.-S. Chen, “Fighting Phishing with Discriminative Keypoint Features of Webpages,” IEEE Internet Computing, to appear.

April 20, 2023 8

Page 9: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

April 20, 2023 9

Page MatchingImage-based

Page Matching Page Scoring Page Classification

Page 10: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

April 20, 2023 10effective grids

a successful match

Page ScoringImage-based

Page Matching

Page Scoring

Page Classification

Page 11: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

• naïve Bayesian classifier with 10-fold cross-validation• training data– a pre-stored phishing page set & a legitimate page set– phishing page set (positive data set)• comparisons between phishing pages and their target pages

– legitimate page set (negative data set)• comparisons between legitimate pages of different sites

April 20, 2023 11

Page ClassificationImage-based

Page Matching Page Scoring Page Classification

Page 12: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

PERFORMANCE EVALUATION

April 20, 2023 12

Page 13: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

• phishing pages: 2,058 pages on 74 sites– source: http://www.phishtank.com, http://www.antiphishing.org – records of top 5 phishing target sites are more than half of our records

• potential target pages: 300 vulnerable pages – source: http://www.ciphertrust.com/resources/statistics/

• pre-stored data set– positive: 2,058 comparisons– negative: 44,000 comparisons

April 20, 2023 13

Data description

Domain Number of Records

eBay 701

PayPal 632

Marshall & Ilsley 138

Charter One 116

Bank of America 51

Page 14: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

• Fu et al., IEEE Trans. on Dependable & Secure Computing, 2006• the 1st image-based phishing detecting approach• to evaluate the distance between two signatures• Signature (S)– the frequency and the centroid of each color used

• Weight (p, q)– a linear combination of the Euclidian distance and the centroids of colors

• Visual similarity degree (VSD)– VSD = 1 – (EMD)α

• pros: simple and fast• cons: only suitable for basic phishing cases– it tends to fail if phishing pages and the official ones are partially similar– however, phishing pages are usually partially different from their

targets!

April 20, 2023 14

Earth Mover’s Distance (EMD) based Scheme

Page 15: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

• CCH settings– levels to describe salient points (L) = 4– Euclidean distance between two salient points (Dist) = 7 pixels– input image size: original webpage resolution (mostly 800 × 600)– k-means parameter (k) = 4– naïve Bayesian classifier

• EMD settings– we follow the suggestion in Fu et al.'s previous work– input image size: 100 × 100 (Lanczos3 resampling algorithm)– color degrading factor (CDF): 32– amplifier for the EMD value (α): 0.5– the # of colors used for the signature (|Ss|): 20– the weight for the color distance (p): 0.5– the weight for the color centroid distance (q): 0.5– naïve Bayesian classifier is used instead of per-page threshold

April 20, 2023 15

Parameter Settings

Page 16: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

• Top 5 Phishing Target Sites– AUC• CCH: 0.998• EMD: 0.956

April 20, 2023 16

Page 17: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

• Impact of Image Size on Computation Time

April 20, 2023 17

!!!!

Page 18: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

• We proposed an image-based phishing detection technique with local features.

• Our experimental results show that we have– an over 96% successful phishing recognition rate, and – less than 0.30 second per phishing identification on

average.

• Our experiments show that local features are more suitable than global information for phishing page detection.

April 20, 2023 18

Conclusions

Page 19: Phishing Webpage Detection Jau-Yuan Chen COMS E6125 WHIM March 24, 2009

THANK YOU!