phishdef : url names say it all
DESCRIPTION
PhishDef : URL Names Say It All. Michalis Faloutsos U niversity of California, Riverside USA. Anh Le, Athina Markopoulou U niversity of California, Irvine USA. What is Phishing?. Social engineering and technical means to steal consumers’ personal identity, data, etc. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/1.jpg)
PhishDef: URL Names Say It All
Anh Le, Athina Markopoulou
University of California, IrvineUSA
Michalis FaloutsosUniversity of California, Riverside
USA
![Page 2: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/2.jpg)
What is Phishing?
Anh Le - UC Irvine - PhishDef 2
• Social engineering and technical means to steal consumers’ personal identity, data, etc.
• Cause billions of dollars of loss annually
![Page 3: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/3.jpg)
Anh Le - UC Irvine - PhishDef 3
Financial, 33.1%
Payment Services,
37.9%
Classifieds; 6.6%
Auction; 5.5%
Gaming; 4.6%
Retail/Service;
3.6%
Social Network-ing; 2.8%
Government; 1.3%
ISP; 1.2% Other; 3.4%
Most Targeted Industry Sectors 2nd Quarter ‘10
Antiphishing.org
![Page 4: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/4.jpg)
Example of a Phishing Site
Anh Le - UC Irvine - PhishDef 4
![Page 5: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/5.jpg)
Current Protection
Anh Le - UC Irvine - PhishDef 5
• Google Safe Browsing
• Microsoft Smart Screen
• Third-Party
![Page 6: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/6.jpg)
Current Protection Model
Anh Le - UC Irvine - PhishDef 6
Motivation: Blacklist-based protection is reactive -- -- cannot protect against zero-day phishing
Google Safe Browsing
![Page 7: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/7.jpg)
Outline o Phishing Background
o Motivation
o Our proposalo New Protection Modelo Learning Algorithmso Dataseto Feature Selectiono Evaluation Results
o Concluding Remarks
Anh Le - UC Irvine - PhishDef 7
![Page 8: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/8.jpg)
Our Proposed Protection Model
Anh Le - UC Irvine - PhishDef 8
• Main challenges: Accuracy and Classification Latency• Which classification algorithm works best?• Which set of features works best?
![Page 9: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/9.jpg)
Prior Work o Whittaker et al. [NDSS ’10]
o Google Safe Browsing
o Ma et al. [SIGKDD ’09]o Batch-based Classification
o Ma et al. [ICML ‘09]o Batch-based vs. Online Learning
Anh Le - UC Irvine - PhishDef 9
Server-Side Classification
![Page 10: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/10.jpg)
Main Contributions o New Protection Model:
o Client-side classification
o Propose using Adaptive Regularization of Weights (AROW)o High accuracyo Resilient to noise
o Set of Lexical Featureso Fast to extract at client sideo Obfuscation resistant
Anh Le - UC Irvine - PhishDef 10
![Page 11: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/11.jpg)
• Batch-based Support Vector Machine
• Online Perceptron
• Confident Weighted (CW) [Dredze et al., ICML 2008]
• Adaptive Regularization of Weights (AROW)[Crammer et al., NIPS 2009]
Machine Learning Algorithms
Anh Le - UC Irvine - PhishDef 11
![Page 12: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/12.jpg)
Online Classification
Anh Le - UC Irvine - PhishDef 12
• Maintaining a weight vector and use it for classification
• Online Perceptron
Trained Beforehand Extract In Real Time
Client Side:
Server Side:
![Page 13: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/13.jpg)
Online Classification
Anh Le - UC Irvine - PhishDef 13
• Confident Weighted (CW)
• Adaptive Regularization of Weights (AROW)
minimum change
enough to correct last mistake
minimum change
penalty for mistake increasing confidence
![Page 14: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/14.jpg)
o Phishing URLso PhishTank (4,082)o MalwarePatrol (2,001)
o Benign URLso Open directory (4,012)o Yahoo directory (4,143)
o Time period: June 2010
Dataset
Anh Le - UC Irvine - PhishDef 14
![Page 15: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/15.jpg)
Feature Selection
Anh Le - UC Irvine - PhishDef 15
o Lexical Features
o External Featureso Country, AS number, registration date,
registrant, registrar, etc.
![Page 16: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/16.jpg)
Outlineo Phishing Background
o Motivation
o Our proposalo New Protection Modelo Learning Algorithmso Dataseto Feature Selectiono Evaluation Results
o Concluding Remarks
Anh Le - UC Irvine - PhishDef 16
![Page 17: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/17.jpg)
Evaluation Results: Lexical vs. Full Features
Lexical features alone are better-suited than full features for client-side phishing classification
Anh Le - UC Irvine - PhishDef 17
(+) ~ 1%
(-) Dependency on Remote Server
(-) Avg. Latency: 1.64 s
![Page 18: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/18.jpg)
Evaluation Results:CW vs. AROW
AROW is more resilient to noise than CW
Anh Le - UC Irvine - PhishDef 18
![Page 19: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/19.jpg)
Conclusion: PhishDef
19Anh Le - UC Irvine - PhishDef
o Client-side phishing classification systemo Proactive, on-the-fly
classification of zero-day phishing URLs
o Low delay client side (ms),high accuracy (97%)
o Resilient to noisy data
o Future Work: o Develop an add-on for Firefox
![Page 20: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/20.jpg)
oQuestions
Anh Le - UC Irvine - PhishDef 20
![Page 21: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/21.jpg)
Anh Le - UC Irvine - PhishDef 21
![Page 22: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/22.jpg)
Example of a Phishing Site
22Anh Le - UC Irvine - PhishDef
http://www.hmrc.gov.uk/intro-income-tax.htm
http://pilety.ru/c548c205d7660ed0628b467d7d5aa54c9c3a7124/image/taxrefund.htm
![Page 23: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/23.jpg)
Evaluation Results:Batch-Based vs. Online Learning
Online Learning outperforms Batched-Based Learningfor Phishing classificationAnh Le - UC Irvine - PhishDef 23
![Page 24: PhishDef : URL Names Say It All](https://reader030.vdocuments.us/reader030/viewer/2022020801/56815fc9550346895dcec172/html5/thumbnails/24.jpg)
Chrome 11 > Firefox 4
24Anh Le - UC Irvine - PhishDef