tracking internet hosts using unreliable ids€¦ · tracking internet hosts using unreliable ids...

4
Tracking Internet Hosts Using Unreliable IDs Yinglian Xie, Fang Yu, and Martín Abadi Microsoft Research, Silicon Valley First 100.0.0.1 Then 100.0.0.2 100.0.0.1 Blacklist 100.0.0.1 X Open and anonymous weak accountability IP addresses are not unique, fixed identifiers (because of dynamic IP addresses, proxies, and NATs). It is hard to identify who is responsible for traffic. Accountability is weak on the Internet We should block attack traffic by host rather than by IP address Track hosts more reliably in spite of dynamic IP addresses Explore applications that require identifying hosts over time E.g., security and data-mining Our approach: Use application IDs and events to track hosts Goal: Inferring host-IP bindings Alice Alice Alice Bob Bob IP 1 IP 2 IP 3 IP 4 IP 5 IP 6 t 1 t 2 t 3 t 4 t 5 t 6 Alice’s host IP_1: [t1, t2] IP_2: [t3, t4] IP_5: [t5, t6] Bob’s host IP_4: [t3, t4] IP_3: [t5, t6] Challenges: No 1-1 mappings between hosts and IDs Dynamic IP addresses, proxies and NATs Malicious IDs Example application: host-based blacklisting

Upload: others

Post on 21-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tracking Internet Hosts Using Unreliable IDs€¦ · Tracking Internet Hosts Using Unreliable IDs Yinglian Xie, Fang Yu, and Martín Abadi Microsoft Research, Silicon Valley First

Tracking Internet Hosts Using Unreliable IDs Yinglian Xie, Fang Yu, and Martín Abadi

Microsoft Research, Silicon Valley

First 100.0.0.1Then 100.0.0.2

100.0.0.1Blacklist 100.0.0.1 X

Open and anonymous weak accountability IP addresses are not unique, fixed identifiers (because of dynamic IP addresses, proxies, and NATs).

It is hard to identify who is responsible for traffic.

Accountability is weak on the Internet

We should block attack traffic by host rather than by IP address

Track hosts more reliably in spite of dynamic IP addresses

Explore applications that require identifying hosts over time

E.g., security and data-mining

Our approach: Use application IDs and events to track hosts

Goal: Inferring host-IP bindings

AliceAlice

AliceBob

Bob

IP1IP2

IP3IP4IP5IP6

t1 t2 t3 t4 t5 t6Alice’s host

IP_1: [t1, t2]IP_2: [t3, t4]IP_5: [t5, t6]

Bob’s hostIP_4: [t3, t4]IP_3: [t5, t6]

Challenges: No 1-1 mappings between hosts and IDs Dynamic IP addresses, proxies and NATs Malicious IDs

Example application: host-based blacklisting

Page 2: Tracking Internet Hosts Using Unreliable IDs€¦ · Tracking Internet Hosts Using Unreliable IDs Yinglian Xie, Fang Yu, and Martín Abadi Microsoft Research, Silicon Valley First

Problem formulation

A

A

A

B

B

C

C

C

B

IP1

IP2

IP3

IP4

IP5

IP6

t1 t2 t3 t4 t5 t6

time

Input events

e1: <u1, IP1, t1>e2: <u2, IP1, t2>e3: <u3, IP4, t3>

… …en: <u4, IP6, t5>

Identity mapping

A u1, u2

B u3

C u4

… …

Host-tracking graph

timeIPi

t1 t3t2

u1

u2

u1 u1 u1 u1

GroupUser IDs

Construct tracking

graph

Initial ID-groups

Resolve inconsistency

Updated ID-groupsInput events

Tracking graph with

inconsistent bindings

Update ID groups

Pruned inconsistent

bindings

Host tracking

graph

Initial estimation: u1 : h1u2 : h2

Methodology overview

Our goal : maximize tracked events and tracked IDs

Grouping IPs

Consider the probability of two random IDs appearing together

Resolve inconsistencies

timeIPi

t1

t3

t2U1

t4U2

timeIPi

t1 t2

U1

U1

timeIPj

t3 t4

Conflict bindings

Concurrent bindings

Guest removal

Proxy identification

Group splitting

Host-tracking graph

Update estimation iteratively

Page 3: Tracking Internet Hosts Using Unreliable IDs€¦ · Tracking Internet Hosts Using Unreliable IDs Yinglian Xie, Fang Yu, and Martín Abadi Microsoft Research, Silicon Valley First

Calibrate cookie churns

Estimate host population more accurately

Build normal-user profiles

Host-aware blacklists – block by host rather than IP

Post-mortem forensics

Real-time blacklists (Tracklist)

Coverage and Accuracy

Applications

Evaluate accuracy using Windows update data 92% - 96%

Coverage: tracked events / total events 75% - 80%

Implementation with Hotmail data

Query planLINQ query Dryad

select

where

logs

Automatic query plan generation

Distributed query execution by Dryad

var logentries =from line in logswhere !line.StartsWith("#")select new LogEntry(line);

On Dryad and DryadLINQ

# of blocked users Falsepositives

IP blacklist / infinitely 44.70 million 52.8%

IP blacklist / one hour 27.94 million 34.1%

Tracklist / one hour 16.01 million 4.9%

Tracklist with profile/one hour 14.27 million 0.1%

Seed data: 5.6 million bot-accounts detected by BotGraph in one month

Page 4: Tracking Internet Hosts Using Unreliable IDs€¦ · Tracking Internet Hosts Using Unreliable IDs Yinglian Xie, Fang Yu, and Martín Abadi Microsoft Research, Silicon Valley First

Network security with IP intelligence Our work

IP property

inference engine

Static or dynamic

Proxy, NAT

Residential vs. enterprise

DSL, wireless, dialup

User population

Spam history

Known attack history

… …

Applications

Spammer detection

DDoSprevention

Click frauddetection

Targeted ads

Phishing sitedetection

Windows update

IP properties IP history

•190M dynamic IPs

•40-50% spam

UDmap:

identify

dynamic IPs

Hotmail login data

AutoRE:

derive spam

signatures

•340K botnet IPs

•16-18% spam not

detected today

Sampled emails

BotGraph:

detect bot-

accounts

•26M bot-accounts

•4.5M botnet IPs

Hotmail login data

HostTracker:

track host-IP bindings

Input

Collaboration with WLSP (Jason Atlas, Geoff Hulten, Ivan Osipkov), Hotmail (Hersh Dangayach, Eliot Gillum, Krish Vitaldevara), Bing (Fritz Behr, David Soukal, Roger Yu, Zijian Zheng), and Messenger (Steve Miale)

Routing tables

Windows update log

Web search log

Spam emailsand history

User login records

Messengerlog

SBotMiner:

detect

search bots

•3% of overall

search traffic

Search data

Spammer

detection

Search user

tracking

Your

application?…