internet intrusions: global characteristics and prevalence presented by: elliot parsons using slides...
Post on 19-Dec-2015
216 views
TRANSCRIPT
Internet Intrusions: Internet Intrusions: Global Characteristics Global Characteristics and Prevalenceand Prevalence
Presented By: Elliot ParsonsPresented By: Elliot Parsons
Using slides from Vinod Yegneswaran’s presentation at SIGMETRICS 2003
OverviewOverview
Data SourcesData Sources Intrusion CharacteristicsIntrusion Characteristics
– Port and source DistributionPort and source Distribution Projection to the global address Projection to the global address
spacespace Implications of Shared InformationImplications of Shared Information
– Does information sharing help?Does information sharing help?– How much information is needed?How much information is needed?
GoalsGoals
This papers aims to:This papers aims to: Show the volume of intrusions Show the volume of intrusions
attemptsattempts Show the distribution of intrusionsShow the distribution of intrusions
– In terms of both source and victimIn terms of both source and victim Show the impact of various scan Show the impact of various scan
typestypes Expand findings to the global scopeExpand findings to the global scope
Data SourcesData Sources
To extend the findings to the global To extend the findings to the global scope, the data must:scope, the data must:
Come from many ASesCome from many ASes Be spread both geographically Be spread both geographically
and over the IP address spaceand over the IP address space
DSHIELDDSHIELD
http://www.dshield.orghttp://www.dshield.org (part of SANS (part of SANS Institute)Institute)
Firewall / NIDS logs, ~ 1600 networksFirewall / NIDS logs, ~ 1600 networks– BlackIce Defender, CISCO PIX BlackIce Defender, CISCO PIX
Firewall, IP chainsFirewall, IP chains– Snort, Zonealarm Pro, PortsentrySnort, Zonealarm Pro, Portsentry
4 months (aug 2001, may-july 2002)4 months (aug 2001, may-july 2002)– 60 million scans, 375K dest IPs per 60 million scans, 375K dest IPs per
monthmonth– 5 Class B, 45 Class C, many others5 Class B, 45 Class C, many others
DSHIELD DataDSHIELD Data
Lowest common denominator approachLowest common denominator approach– simplicity, diversity, unbiasedsimplicity, diversity, unbiased
PitfallsPitfalls– packet headers, active connection packet headers, active connection
infoinfo– floodingflooding
intentional, misconfiguration intentional, misconfiguration (broadcast, half-life)(broadcast, half-life)
– Spoofed sourcesSpoofed sources
Timestamp Subm. Hash Count Source IP Port Target IP Port Protocol Flags104032322 provider_31 1 104.21.34.32 3211 10.10.1.3 21 6 S104032323 provider_32 3 128.22.32.32 3321 10.10.1.3 80 6 S
DSHIELDDSHIELD
• Red dots represent participating ASes
• Grey lines demonstrate connectivity between ASes
• Dots closer to the center indicate ASes closer to the internet backbone
WormsWorms
Code-red ICode-red I– July 12, 2001, 2 phase attack, random July 12, 2001, 2 phase attack, random
propagationpropagation Code-red IICode-red II
– Aug 4, 2001, “local-random propagation” Aug 4, 2001, “local-random propagation” NimdaNimda
– Sep 18, 2001, “local-random Sep 18, 2001, “local-random propagation”propagation”
SQL-snakeSQL-snake– May 2002, port 1433, random May 2002, port 1433, random
propagationpropagation– email passwords and sysinfo email passwords and sysinfo
[email protected]@postone.com
Scan TypesScan Types
Vertical ScanVertical Scan– Multiple ports on 1 victim by 1 sourceMultiple ports on 1 victim by 1 source
Horizontal ScanHorizontal Scan– 1 port on multiple victims by 1 source1 port on multiple victims by 1 source
Coordinated ScansCoordinated Scans– Multiple sources aimed at a /24 spaceMultiple sources aimed at a /24 space
Stealth ScansStealth Scans– Horizontal or verticalHorizontal or vertical– Characterized by a very low frequencyCharacterized by a very low frequency
Intrusion Intrusion CharacteristicsCharacteristics Port DistributionPort Distribution
– Monitor the destination port for Monitor the destination port for intrusion attemptsintrusion attempts
Source DistributionSource Distribution– Look for trends in the source Look for trends in the source
address associated with intrusionsaddress associated with intrusions– Group intrusions into port 80, port Group intrusions into port 80, port
1433, and non-worm scans1433, and non-worm scans
Port DistributionPort Distribution
0
500000
1000000
1500000
2000000
2500000
3000000
1-May 15-May 29-May 12-Jun 26-Jun 10-Jul 24-Jul
80
1433
ICMP (0)
137
21
53
22
p2p
111
27374
Service Distribution of Service Distribution of ScansScans
Source DistributionSource Distribution
port 80 port 1433 non-worm (June 2002) (June 2002) (June 2002)
Persistence of Worm Persistence of Worm ActivityActivity
Persistence of Port 80 sources
0
200000
400000
600000
800000
1000000
0.125 3 8 13 18 23 28 33 38 43 48 53 58
Number of Days
Nu
mb
er
of
So
urc
es
/32 /24
• 3 months data: May-July 2002 (CDF)• Half life ~ 18 days (/24), 6 hours (/32)
Date CharacteristicsDate Characteristics
05000
1000015000
2000025000
3000035000
4000045000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Day of Month
May June July
Code Red 1 was still very much alive!!
Top SourcesTop Sources
Mainly applies to non-worm scansMainly applies to non-worm scans Results will show that only a few Results will show that only a few
sources are responsible for a sources are responsible for a significant amount of the scanssignificant amount of the scans– Zipf DistributionZipf Distribution
Argument for a blacklistArgument for a blacklist
Top SourcesTop Sources
• Zipf distribution (power law)• CDF (source IP rank vs num scans : log-log scale)
Top SourcesTop Sources
0
200000
400000
600000
800000
1000000
1200000
1400000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2 21 2 2 2 2 2 2 2 2 3 31
Day of the Month (May 2002)
2002-05 2002-05.Top 100:
• May 2002 scan volume: overall vs top 100 sources• Top 100 sources account for 50% of all scans in any month
Source CoordinationSource Coordination
0
5000
10000
15000
20000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2 21 2 2 2 2 2 2 2 2 3 31
Day of Month (Aug 2001)
Numb
er o
f Sc
ans
165.193.248.34 172.27.12.1 172.27.12.2 166.48.53.250 171.70.168.141
207.189.64.32 207.189.65.62 167.216.180.165
• Aug 2001: 8 of the top 20 sources display identical ON/OFF behavior• Such clusters common among top 20 sources of all 4 months!• All sources scan more than 5 distinct /16s.
Source CoordinationSource Coordination
0
20000
40000
60000
80000
100000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Day of the Month (May 2002)
Numb
er o
f Sc
ans
202.28.120.17 66.66.201.171 202.128.131.183 141.109.222.64
• May 2002: ON/OFF pattern (4 out of top 20 sources)• Staggering behavior (identical attack or attack tool)
Identification of Scan Identification of Scan TypesTypes Still look at only non-worm scansStill look at only non-worm scans Horizontal scans make up the majority Horizontal scans make up the majority
of the scansof the scans More vertical scan episodesMore vertical scan episodes Surprisingly high number of Surprisingly high number of
coordinated scanscoordinated scans Stealth scans occur much less Stealth scans occur much less
frequently, but are usually vertical frequently, but are usually vertical scansscans
Scan TypesScan TypesNumber of ScansNumber of Scans
10000
100000
1000000
10000000
1/6/2002 6/6/2002 11/6/2002 16-6-2002 21-6-2002 26-6-2002
Coord Scans Vert Scans Horz Scans
Scan TypesScan TypesNumber of EpisodesNumber of Episodes
100
1000
10000
1/6/2002 6/6/2002 11/6/2002 16-6-2002 21-6-2002 26-6-2002
Coord Scan-Episodes Vert Scan-Episodes Horz Scan-Episodes
Global ProjectionsGlobal Projections
Question: How has the scanning Question: How has the scanning trend changed over the past year?trend changed over the past year?– Must extend the data to the entire Must extend the data to the entire
internetinternet Simply average the data and Simply average the data and
multiply by 2multiply by 23232
– Possible because data comes from a Possible because data comes from a broad range of sourcesbroad range of sources
Projection of Port 80 Projection of Port 80 ScansScans
1000000
10000000
1E+08
1E+09
1E+10
1E+11
6/8/2001 7/28/2001 9/16/2001 11/5/2001 12/25/2001
2/13/2002 4/4/2002 5/24/2002 7/13/2002 9/1/2002
projection(ip) projection(/24) projection(/16)
Linear (projection(ip)) Linear (projection(/24)) Linear (projection(/16))
• Port 80 scans show a decreasing trend– biased by release of CR I/II
• May-july 2002 relatively steady with small upward slope
Projection of Non-Projection of Non-worm Scansworm Scans
1000000
10000000
1E+08
1E+09
1E+10
1E+11
6/8/2001 7/28/2001 9/16/2001 11/5/2001 12/25/2001
2/13/2002 4/4/2002 5/24/2002 7/13/2002 9/1/2002
projection(ip) projection(/24) projection(/16)
Linear (projection(/16)) Linear (projection(/24)) Linear (projection(ip))
• Projection: (avg scan per IP) * num IPs– similar projections for /24 and /16 aggregates
• 25B scans / day
Implications of Shared Implications of Shared InformationInformation Many have looked to pool Many have looked to pool
resourcesresources Do not identify speed of attacksDo not identify speed of attacks Can gain a view of trends in Can gain a view of trends in
attacks, thoughattacks, though
Information Theoretic Information Theoretic ApproachApproach Relative EntropyRelative Entropy – measure of the – measure of the
distributional similarity between distributional similarity between two variablestwo variables
Marginal Utility Marginal Utility – amount of – amount of information gained by adding information gained by adding more samplesmore samples
Information Theoretic Information Theoretic ApproachApproach Goal – how much does adding Goal – how much does adding
intrusion logs improve the intrusion logs improve the resolution of identifying “worst resolution of identifying “worst offenders”offenders”
Can be measured using marginal Can be measured using marginal utilityutility– Number of experiments is the Number of experiments is the
number of logs identifiednumber of logs identified
Evaluation of Marginal Evaluation of Marginal Utility ApproachUtility Approach Use 100 /16’s and 100 /24’s from Use 100 /16’s and 100 /24’s from
the total data setsthe total data sets– Chosen at randomChosen at random
Received promising results about Received promising results about the amount gained from adding the amount gained from adding more data setsmore data sets
Marginal Utility for Marginal Utility for Worst OffendersWorst Offenders
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 11 21 31 41 51 61 71 81 91
Marg
inal
Uti
lity
/16 Networks
/24 Networks
• Random day, 100 random /16s and /24s
• Diminished returns after 40 /16s and 50 /24s
Marginal Utility for Marginal Utility for Detecting Target PortsDetecting Target Ports
0
0.01
0.02
0.03
0.04
0.05
0.06
1 11 21 31 41 51 61 71 81 91
Marg
inal
Uti
lity
/16 Networks
/24 Networks
• Random day, 100 random /16s and /24s
• Diminished returns after 40 nodes.
ConclusionConclusion
A lot of scanning directed away from A lot of scanning directed away from port 80port 80– 25B scans per day, 25% non port 8025B scans per day, 25% non port 80
A set of worst offenders does exist who A set of worst offenders does exist who are responsible for a lot of the are responsible for a lot of the scanningscanning
Combining data from multiple sites Combining data from multiple sites gives more informationgives more information– Data from larger sites is more usefulData from larger sites is more useful