malware bek slides 20131023 final

MALWARE 2013October 22-24, 2013

It's you on photo? Automatic Detection of Twitter

Accounts Infected With the Blackhole Exploit Kit

Josh White and Jeanna MatthewsClarkson University

Objective

This work identifies some indicators of possible BEK infectious messages on Twitter.

These indicators are used in the production of a filter which can be applied to our collection system to identify user accounts on Twitter which have reached specific thresholds and can be considered compromised or purposefully infectious.

Overview

BEK Details Data Collection Analysis Framework Metrics Results

Infectious Message Variations API Usage Infectious Indicators

Web-based application that manages the installation and C2 of malware.

Utilizes a compromised server for malware and web-page hosting.

Links luring victims to a compromised server are distributed mainly through spam, spear-phishing, and links in social network posts.

Blackhole Exploit Kit (BEK)

The exploit server hosts innocuous looking web-page

Page hosts a tool for scanning the visiting systeOnce a vulnerability is identified, it loads the

necessary exploit tools and compromises the visiting system

A wide variety of malware may be loaded at this point depending on the exact mission of the attacker.

Infection

Contains modular capabilities for new exploits to be added rapidly and in many languages.

Employs typical countermeasures:

Packing, binary obfuscation and antivirus avoidance.

Other BEK features

In 2012, BEK creators released version 2.0, since it has become the most well known/commonly deployed exploit kits

BEK enabled majority of malware infections in 2012

One study found: BEK accounted for 29% of all malicious URLS, in a dataset of 77,000 URLs marked harmful by the Google Safe Browsing API

Proliferation of BEK

Data Collection Overview

Over the course of 2012 we collected 165 TB of Twitter Data (Uncompressed) 175 Days Collected, 147 Full Days Estimated 45 Billion Tweets

Recently released estimates place total Twitter traffic at 175 million tweets per day in 2012 Our daily collection rates varied between 50%

and 80% of total Twitter traffic. We captured complete tweet data in JSON format

using Twitters REST API.

Key Examples of Attributes in JSON format

profile link color/background color/title/default image/image url (http and https)/text color/default description, background image url (http and https), In reply to screen name/status id and str/user id, follow request sent, friends count, screen name, show all inline media, utc offset, url, created at, favorite, retweet count, favorites count, id translator, trunkated, contributors enabled, contributors, time zone, verified, coordinates, Geo, text, entities, id, id str, following, application, retweeted, place, sidebar border color/fill color, followers count, geo enabled listed count, notifications, name, lang, location, protected, statuses count

Data Collection System

Distributed Data Collection Infrastructure Geographically dissimilar IP's to simulate

multiple systems Registered Application with Non-authenticated

API access (1 billion+ / week)

Data Storage

Collection in Streaming Gzip Python Dictionary Format (10:1 Average Compression Ratio); Storing 1.5 TB a Week

Converted to JSON on the fly when needed Initially Stored in HDFS (Had Issues Scaling); Now Use

High Level Patterns

• Basic observable patterns– Twitter has a lot of outages– Posting rates follow predictable patterns

Analysis Framework

Filter Analysis (on live stream) Experimental Analysis (after the fact)

Some Key Metrics

Entropy

Pearson’s Correlation

Initial Identification

Searched for two well publicized strings being seen in the wild: “It's you on Photo?” and “It's about you?”

Other message types found

Others found using REGEX, but not mentioned in any articles or blogs at the time: “You were nude at party) cool” photo)” “Wow! Your photo is cool.” “At party you was drunken) cool photo)” “Your photo is amazing” “It's photo of you?” “It's all about you” “It's about you?” “Wow! You look good)”

Because BEK allows message customization the permutations are virtually limitless

Results: Entropy

Normal Tweets = 4.6-7.5 197,237

manually verified messages from 100 sample accounts

Infectious = 4.3 & lower

Normal

Infectious

Results: Pearson’s Correlation Coefficient

Two Infectious Accounts Compared have an average PCC value of 0.927581013955 Positive value near 1 Indicates strong correlation “similarity” between accounts

One Infectious Account and One Non-Infectious Account Compared have an average PCC value of -

0.0847935420003 Negative value near 0 Indicates strong negative correlation “difference” between

accounts

Results: Use of API/Application

Applications must be registered for specific Twitter API function usage

There hundreds of registered applications: ie.: “Iphone”, “Android”,

“Official Twitter Client” “Mobile Web” is legacy

Requires no registration of application using it

Requires no 0Auth Less CPU/Memory utilization

Results: Graph Clustering

Visualize 729,609 suspicious accounts

Utilized Gephi and the OpenOrd clustering algorithm

Shows Obvious Clusters Based on: Tx infectious directs

message to victim, victim is infectious when it starts transmitting infectious messages

Non-connected accounts are assumed to have clicked on an infectious message without it being directed at them. We can not currently trace what messages they clicked on

Dense area's are the most successfully spread infection chains.

The cores are considered Infection Hubs

Results: Some Summary Stats

Total Number of Tweets Processed: 6,531,319,202 Total Number of Unique Accounts Processed: 265,163,290 Total Number of Suspicious Accounts Found: 729,609 Total Number of Suspicious Tweets Found: 8,286,480 Calculated Percentage of BEK Infectious Accounts: 0.275% Calculated Percentage of BEK Infectious Tweets: 12.7%

Related Work

A lot of research has been done into social network analysis using sites such as Twitter. [21,22,24]

Including research that uses social network trends to track real world contagion spread [25]

A few studies exist that examine BEK's malware dropping capability [2] The URL identification method they used “w.php?f=(.*?)&e=(.*?)”

does not pick up all of the URL patterns that we witnessed

Related Work (continued)

Various works on determining if a message is automated or not exist, most notably “Human, Bot or Cyborg” [26] Unfortunately they relied heavily on Google Safe Browsing API

which is only updated after someone has verified the link is dangerous. [27]

One work showed that up to 16% of all Twitter accounts show signs of automation. [28] However, they point out that only a small number of tweets use

the Mobile Web API

Future Work (with edits from Malware 2013 audience!)

Analysis of use of specific strings over time Studying spread of ideas in Twitter in addition to spread of

malware Case study of top infectors Carrier vs virology model of spread Compare to all vs just benign Testing twitter ( measure how well they do in disabling

infected accounts/help them get better); Work with Twitter to integrate

Include follower to following ratio on infected accounts More like antispam than antimalware

Geographic analysis of the infected accounts

Conclusion

We completed a large-scale analysis of the characteristics of BEK infectious Twitter Accounts Some accounts showed signs of being solely for malware distribution

We found substantial variation in infectious message structure We identified a large set of message types not previously published

We identified the characteristics most strongly associated with BEK infectious messages Tweets using the Mobile API, with a Text Entropy lower than 4.3, and

showing a strong PCC with known infectious messages, and those that additionally have URL's embedded in them

We presented the integration of our measurement techniques and how they integrate into our larger platform

Without manual investigation of all messages that we flagged as infectious we can not be certain of our results

Citations

1. J. Oliver, S. Cheng, L. Manly, J. Zhu, R. Paz, S. Sioting, J. Leopando. “Blackhole Exploit Kit: A Spam Campaign, Not a Series of Individual Spam Runs, An In-Depth Analysis,” Trend Micro Incorporated Research Paper, 20122. Chris Grier, Lucas Ballard, Juan Caballero, et al. 2012. Manufacturing compromise: the emergence of exploit-as-a-service. In Proceedings of the 2012 ACM conference on Computer and communications security (CCS ’12). ACM, New York, NY, USA, 821-832.3. Gabor Szappanos. ”Inside The Blackhole,” SophosLabs, 20124. Jason Jones. ”The State of Web Exploit Kits,” HP DVLabs, 20125. Howard, Fraser. 2013. Technical paper: Journey inside the Blackhole exploit kit. Naked Security from Sophos. November 30 20126. Chris Grier, Lucas Ballard, Juan Caballero, et al. 2012. Manufacturing compromise: the emergence of exploit-as-a-service. In Proceedings of the 2012 ACM conference on Computer and communications security (CCS ’12). ACM, New York, NY, USA, 821-8327. Fraser Howard. ”Exploring the Blackhole Exploit Kit,” Sophos Technical Paper, March 20128. Ziv Mador. ”Exploiting Kits: The Underground’s Weapon of Choice,” Infosecurity Europe 2012, SpiderLabs at Trustwave, 20129. Zhou Li, Kehuan Zhang, Yinglian Xie, Fang Yu, and XiaoFeng Wang. 2012. Knowing your enemy: understanding and detecting malicious web advertising. In Proceedings of the 2012 ACM conference on Computer and communications security (CCS ’12). ACM, New York, NY, USA, 674-686.10. Shea Bennett. ”Just How Big Is twitter In 2012 [INFOGRAPHIC],” All Twitter - The Unofficial Twitter Resource, February 2013

Citations

11. Mike Melanson, Twitter Kills the API Whitelist: What it Means for Developers and Innovation, February 11 2011, URL =http://www.readwriteweb.com/archives/12. Joab Jackson, Twitter Now Using Oauth authentication for Third Party Apps, Computer World UK, September 1, 2010, URL= http://www.computerworlduk.com/news/security/3237659/twitter-now-using-oauth- authentication-for-third-party-apps/ 13. Arne Roomann-Kurrik, Announcing gzip Compression for Streaming API’s, Twitter Developers Feed, Jan 20, 2012, URL =https://dev.twitter.com/blog/announcing-gzip-compression-streaming-apis14. Prashanth Mundkur, Ville Tuulos, and Jared Flatow. 2011. Disco: a computing platform for large-scale data analytics. In Proceedings of the 10th ACM SIGPLAN workshop on Erlang (Erlang 11). ACM, New York, NY, USA, 84-89.15. C. E. Shannon. A Mathematical Theory of Communication, Reprinted with corrections from The Bell System Technical Journal, Vol. 27, pp. 379-423, 623-656, July, October, 1948.16. Graham Cluley. ”Outbreak: Blackhole malware attack spreading on Twitter using ”It’s you on photo? diguise,” Sophos Naked Security Blog, July 27, 101217. Rob Waugh. ”It’s you! Blackhole vierus spreading rapidly via Twitter fools users with fake photo link,” MailOnline Science and Tech New, July 201218. Bastian M., Heymann S., Jacomy M. (2009). Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media.19. S. Martin, W. M. Brown, R. Klavans, and K. Boyack (to appear, 2011), OpenOrd: An Open-Source Toolbox for Large Graph Layout, SPIE Conference on Visualization and Data Analysis (VDA).20. Aditya Mogadala and Vasudeva Varma. 2012. Twitter user behavior understanding with mood transition prediction. In Proceedings of the 2012 workshop on Data-driven user behavioral modelling and mining from social media (DUBMMSM ’12). ACM, New York, NY, USA, 31-34.

Citations

21.Johan Bollen and Huina Mao. 2011. Twitter Mood as a Stock Market Predictor. Computer 44, 10 (October 2011), 91-94. DOI=10.1109/MC.2011.323 http://dx.doi.org/10.1109/MC.2011.32322. Johan Bollen, Bruno Gonalves, Guangchen Ruan, and Huina Mao. 2011. Happiness is assortative in online social networks. Artif. Life 17, 3 (August 2011), 237-251.23. Manuel Cebrian. 2012. Using friends as sensors to detect planetary-scale contagious outbreaks. In Proceedings of the 1st international workshop on Multimodal crowd sensing (CrowdSens ’12). ACM, New York, NY, USA, 15-16.24 Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia. 2010. Who is tweeting on Twitter: human, bot, or cyborg?. In Proceedings of the 26th Annual Computer Security Applications Conference (ACSAC ’10). ACM, New York, NY, USA, 21-30.25. Google. Google safe browsing API. http://code.google.com/apis/safebrowsing/, Accessed: Feb 5, 201026. Chao Michael Zhang and Vern Paxson. 2011. Detecting and analyzing automated activity on twitter. In Proceedings of the 12th international conference on Passive and active measurement (PAM’11), Neil Spring and George F. Riley (Eds.). Springer-Verlag, Berlin, Heidelberg, 102-111.

malware bek slides 20131023 final

limitless malware

c2 of malware

infectious accounts

data collection overview

data storage collection

bek features

bek creators

proliferation of bek

Social Media

persian language shahzoda alimova bek abdullaev gabriela...

bek flyer 2014

mooikloof uit die perd se bek - · pdf fileuit die perd se...

ff bek berkeley school 2014

intrusion detection and malware analysis - malware...

ppp model bek

marion, dominic - high voltage - 20131023

martin standards development implementation update 20131023

roland berger wealth management in new realities 20131023

bek-tel pc\4 callerid device

copy of bek resume - biz with bek project€¦ · copy of...

g8 math l1-14 order of operations 20131023.notebook the area...

express northern cape 20131023

fast and precise sanitizer analysis with bek · 2019. 2....

hala bek - mungrup stud › pdf › hala...

math activities3as bek ali handasa

malware & anti-malware

malware fails best bugs in malware felix leder [malware...

20131023 #bmg @10th ceo-day by vujàdé ltd

rensselaer chow pmu based voltage stability 20131023