a visual technique for internet anomaly detection

8
Joint EUROGRAPHICS - IEEE TCVG Symposium on Visualization (2002), pp. 1–8 D. Ebert, P. Brunet, I. Navazo (Editors) A Visual Technique for Internet Anomaly Detection Soon Tee Teoh Kwan-Liu Ma Xiaoliang Zhao S. Felix Wu Department of Computer Science, University of California, Davis Department of Computer Science, North Carolina State University Abstract The Internet can be made more secure and efficient with effective anomaly detection. In this paper, we describe a visual method for anomaly detection using archived Border Gateway Protocol (BGP) data. A special encoding of IP addresses built into an interactive visual interface design allows a user to quickly detect Origin AS changes by browsing through 2D visual representation of selected aspects of the BGP data. We demonstrate that each visually spotted anomaly agrees with actual anomaly on record. It is clear that this visual approach can play a major role in an anomaly detection system. 1. Introduction The Internet has become indispensable to the functioning of individuals and organizations, including government, busi- nesses, schools, and even emergency services. However, the very nature of the Internet, which relies on interconnected- ness and autonomy, makes it prone to unintentional machine or human errors as well as malicious attacks. It is there- fore of utmost importance to learn about and understand these harmful events. Monitoring the Internet to recognize anomaly allows us to gain valuable understanding about the Internet so that we can take appropriate action in a timely manner. In computer network security, anomaly detection is the process of searching for behavior deviating from normal network use. Most existing anomaly detection methods are based on statistical analysis, where user normal profiles are expressed as sets of statistical measures 8 11 12 . That is, a set of “normal” data is first analyzed to derive representa- tive characteristics of normal use, which are then compared against the characteristics of unknown data to disclose ab- normal behaviors. This comparative analysis forms the basis of anomaly detection. In this paper, we describe a visual-based approach to the anomaly detection problem. Our approach does not need a “normal” data set and mainly relies on the superior visual processing capability of the human brain to detect patterns and draw inference. Starting with no prior knowledge of what shape or form the anomalies take, we use visualiza- tion as the key tool for discovering the intrinsic properties of normal and abnormal data. We have developed a visual representation along with a set of interaction techniques for the user to visually browse through archived Border Gateway Protocol (BGP) 13 data to quickly detect anomaly in Origin AS changes 17 . These changes can indicate either configuration errors or inten- tional attacks of the Internet. Section 2 introduces BGP, Origin AS changes, and their implications to Internet security. Section 3 describes in de- tail the visual-based anomaly detection method. Finally, we report our findings and the lessons learned from the visual analysis of archived BGP data over 480 days. 2. BGP Data and Origin AS Changes The Internet is a network of networks. Each network within the Internet is identified by its IP address prefix. For exam- ple, the University of California’s (UC) Davis campus net- work is identified as 128.120.0.0/16, which means every host in the UC Davis campus network shares the same first 16 bits. One or more networks within a single administrative domain is referred to as an Autonomous System, or AS for short. Each AS is assigned a unique AS number. For exam- ple, the AS number for the UC Davis campus network is 6192. Informally, we could say AS 6192 owns the IP prefix 128.120.0.0/16. Each AS connects with one or more other ASes. Between submitted to Joint EUROGRAPHICS - IEEE TCVG Symposium on Visualization (2002)

Upload: others

Post on 17-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Visual Technique for Internet Anomaly Detection

JointEUROGRAPHICS- IEEETCVG Symposiumon Visualization (2002), pp.1–8D.

�Ebert,P. Brunet, I. Navazo (Editors)

A Visual Technique for Internet Anomaly Detection

SoonTeeTeoh�

Kwan-Liu Ma�

XiaoliangZhao�

S.Felix Wu�

�Department of ComputerScience,University of California,Davis�Departmentof Computer Science,North CarolinaState University

Abstract

TheInternetcanbemademore secure andefficientwith effectiveanomalydetection.In this paper, wedescribeavisualmethodfor anomalydetectionusingarchivedBorder GatewayProtocol(BGP)data.A specialencodingofIP addressesbuilt into an interactivevisualinterfacedesignallowsa userto quickly detectOrigin ASchangesbybrowsingthrough2D visualrepresentationof selectedaspectsof theBGPdata.Wedemonstratethateach visuallyspottedanomalyagreeswith actualanomaly on record. It is clear that thisvisualapproach canplay a major rolein an anomalydetectionsystem.

1. Intr oduction

TheInternethasbecome indispensableto thefunctioningofindividuals andorganizations,including government,busi-nesses,schools,andevenemergency services.However, thevery natureof the Internet,which relieson interconnected-nessandautonomy, makes it proneto unintentionalmachineor humanerrors as well as malicious attacks.It is there-fore of utmost importanceto learn about and understandtheseharmful events.Monitoring the Internetto recognizeanomalyallows usto gainvaluableunderstanding abouttheInternetso that we can take appropriate action in a timelymanner.

In computer network security, anomalydetectionis theprocessof searchingfor behavior deviating from normalnetwork use.Most existing anomaly detectionmethodsarebasedon statisticalanalysis,whereusernormalprofilesareexpressedas setsof statisticalmeasures8� 11� 12. That is, asetof “normal” datais first analyzedto derive representa-tive characteristicsof normaluse,which arethencomparedagainstthe characteristicsof unknown datato discloseab-normalbehaviors.Thiscomparative analysisformsthebasisof anomalydetection.

In this paper, we describea visual-basedapproachto theanomalydetectionproblem.Our approachdoesnot needa“normal” datasetand mainly relieson the superiorvisualprocessingcapabilityof the humanbrain to detectpatternsand draw inference.Starting with no prior knowledge ofwhat shapeor form the anomaliestake, we usevisualiza-

tion asthekey tool for discoveringtheintrinsicpropertiesofnormalandabnormaldata.

We have developed a visual representationalong with asetof interactiontechniques for theuserto visually browsethrougharchived Border Gateway Protocol (BGP) 13 datato quickly detectanomalyin Origin AS changes17. Thesechangescan indicate either configurationerrors or inten-tionalattacksof theInternet.

Section2 introducesBGP, Origin AS changes,andtheirimplicationsto Internetsecurity. Section3 describesin de-tail thevisual-basedanomaly detectionmethod.Finally, wereportour findingsandthe lessonslearnedfrom the visualanalysisof archivedBGPdataover 480days.

2. BGP Data and Origin AS Changes

TheInternetis a network of networks.Eachnetwork withinthe Internetis identifiedby its IP addressprefix. For exam-ple, the University of California’s (UC) Davis campusnet-work is identifiedas128.120.0.0/16,whichmeanseveryhostin the UC Davis campus network sharesthe samefirst 16bits. One or more networks within a single administrativedomainis referredto asan AutonomousSystem,or AS forshort.EachAS is assigneda uniqueAS number. For exam-ple, the AS numberfor the UC Davis campus network is6192.Informally, we couldsayAS 6192owns theIP prefix128.120.0.0/16.

EachAS connectswith oneor moreotherASes.Between

submittedto Joint EUROGRAPHICS- IEEE TCVGSymposiumonVisualization(2002)

Page 2: A Visual Technique for Internet Anomaly Detection

2 TeohMa ZhaoandWu / A Visual Techniquefor InternetAnomalyDetection

two ASes,inter-AS routing protocolsareusedto exchangenetwork reachabilityinformationso that eventually routersknow how to forwarddatapacketsto thecorrectdestination.BorderGateway Protocol(BGP) 13 is the currentstandardinter-AS routing protocol. BGP routersexchange the net-work reachabilityinformationin the formatof BGProutes.A BGProutelistsa particularIP prefix (destination)andthepathof ASesusedto reachthatprefix.ThelastAS in anASpathis referredastheOrigin AS of thatprefix.For example,theBGProute“128.120.0.0/16:(6079,11423,6192)” meansthat the IP prefix 128.120.0.0/16could be reachedby firstgoingto AS 6079, thento AS 11423, andfinally to AS 6192.AS 6192is theOrigin AS of theIP prefix 128.120.0.0/16.

Apparently, theOrigin AS shouldbe theowner of the IPprefix.Thus,theOrigin AS for a particularprefix shouldre-mainsameall thetime unlesstheownershipchanges.How-ever, dueto somefaultslikeroutermisconfigurationor inten-tionalattacks,wemayobserveabnormalOrigin AS changesthroughtheBGProutingtable,which containsall therecentBGProutes.For example, AS 6192 originatesthe IP prefix128.120.0.0/16all thetime,except,ononeday, weobservedthatadifferentAS startedto originatethesameIP prefixtoo.We could askif it is dueto valid network operationor dueto an attack.In the latter case,the routing systemcould beadverselyaffectedanddatapacketscouldbedeliveredto thewrongplace.

Weobtainedthearchiveddaily BGProutingdataover480daysfrom theOregonRouteViewsserver 1. Thenwecollectall thechangesto theOrigin AS of anIP prefix.We believethat examiningtheseOrigin AS changesexposes routerer-rorsandattacks.

3. A Visual-BasedApproach

Traditionalstatisticalanomaly detectionmethods searchforpatternsby usingprimarily automaticmechanisms.In con-trast,a visual anomalydetectionmethodis basedon inter-active dataexploration.Goldsteinet al. 6 describedataex-plorationasaniterative andinteractive processinitiatedanddirectedby people.Previous efforts in visual techniquestoaid datamining 7 include 4, 9 anda methodbasedon clus-tering 14. Girardin 5 usesself-organizingmapsto to helpanalyzenetwork activity. Atkison et al. 3 proposedetectingnetwork intrusion by running datathrough an informationretrieval systemandvisualizingtheresult.

There are three goals of our visualizationsystem.Themostimportantoneis for theuserto beableto quickly iden-tify anomaly in thedata.However, it is notenoughmerelytodiscover that an anomalyhasoccurred. Therefore,two ad-ditional goalsare to enablethe userto quickly understandthe natureof the anomalyand to identify its source.Thisis so that the usercan know whereto focus further inves-tigation andtake corrective action.With appropriate visualmetaphors,thesetwo additionalgoalscan be more easily

achieved than with automatic,non-visual techniques.Thiskey advantage of dataexplorationover datamining is men-tionedin 6.

Ahlberg andShneiderman2 promotesvisual-basedmeth-odsasa viable approachto information-seeking dueto theability of humansto recognizefeaturesin visualdisplaysandrecallrelatedimagesto identify anomalies.Girardin5 statesthat humanperceptioncan notice even featureswhich arenotexpected.This is especiallyimportantwhentheuserhasno idea in advanceabout the characteristicsof normalandabnormalbehavior.

Lee 10 statesthat a shortcoming of statisticalanomalydetectionmethodsis that normal behavior changes overtime,andthedetectionsystemhasdifficulty adapting to thechange.In the visual method,the humanuseris moreableto recognizegradual,normalchangesin behavior, anddis-tinguishthatfrom genuineanomalies.

In traditional statisticalmethods, it is a challengeto setthresholdvalues such that false positives are minimizedwhile not missingtrue positives.With the visual approach,we relegatetheresponsibilityof makingfuzzy judgment ofwhatis normal/abnormalto theuser5. Furthermore,theusercanjudgewhetheradetectedanomalyis importantor is justan isolatedcase,whereasan automaticmethodwould justraiseflagsbasedon a rigid setof criteria.

3.1. An interactive visualization process

Anomaly detectionby visual dataexplorationconsistsof 3steps.

1. dataarecollectedandfiltered.2. dataaremappedto appropriatevisualproperties.3. theuserinteractswith thedata,possiblygoingbackto 2.

The visual anomalydetectionmethodis an iterative pro-cess.The anomalydetectionmethodhas to be performedwith differentparametersin orderto achieve success.Inter-activevisualizationprovidesanefficientmeansof trying outdifferentcombinations of variablesto watch,aswell asdif-ferentmappingsfrom datato visualproperties.With interac-tive visualization,thehumanusercanvery easilyguidetheiterativeprocessin themostpromisingdirection.

It is crucial to provide the userwith the tools to interac-tively changeparameters,focuson certaindetails,andani-matethedataover time.Interactivity allowsconsecutive im-ageframesto give the usera coherentmentalpicture.Ourdesignof the userinterfaceadheresto two main principlesgivenin 15:

1. rapid,incrementalandreversibleactions,and2. immediateandcontinuousdisplayof results.

Theseguidelinesfacilitateintelligent andproductive hu-man interactionfor anomalydetection.In order to achieveinteractivedisplayratesdespitethelargesizeof thedata,we

submittedto Joint EUROGRAPHICS- IEEE TCVGSymposiumonVisualization(2002)

Page 3: A Visual Technique for Internet Anomaly Detection

TeohMa ZhaoandWu / A Visual Techniquefor Internet AnomalyDetection 3

needto useefficientdatastructures.Wealsohave to providethemeansfor viewing at differentlevelsof detail.

4. Visualizing Origin AS Changes

In this section,we describein detail the designof our vi-sual anomaly detectionsystemfor analyzing Origin ASchanges.An Origin AS Changeis anentryin theform (Pre-fix,AS,Date,Type). Prefix is the IP prefix whoseOrigin AShas changed. AS is a list of the associatedAS(es)of thechange. Date is thedateonwhich thechangeoccurred. Typeis thetypeof thechange.

4.1. Typesof Origin AS changes

Origin AS changesareclassifiedinto 4 maintypesandthenfurtherclassifiedinto 8 typesin total.The4 maintypesare:

1. B-type:An AS announcesa morespecificprefix out of alargerblock it alreadyowns

2. H-type:An AS announcesa morespecificprefix out of alargerblock belongingto anotherAS

3. C-type:An AS announcesa prefix previously ownedbyanother AS

4. O-type:An AS announcesa prefix previously not owned(andthereforeownedby ICANN by default)

A Multiple Origin AS(MOAS) conflictoccurswhenit ap-pearsasthoughan IP prefix originatesfrom morethanoneAS. MOAS conflicts could be a symptomof a fault or anattack17. The C-typeandO-typechanges are further clas-sifiedby whetherthey involve SingleOrigin AS (SOAS) orMOAS:

1. CSM: C-typechangefrom SOAS to MOAS2. CSS:C-typechangefrom SOAS to SOAS3. CMS: C-typechangefrom MOAS to MOAS4. CMM: C-typechangefrom MOAS to MOAS5. OS:O-typechangeinvolving SOAS6. OM: O-typechangeinvolving MOAS

The8 typesarethusthesesix andtheB-typeandH-typechanges.

4.2. Mapping IP prefixes

EachIPprefixmapsto onepixel onasquare.Themappingisdonein a traditionalquad-treemanner. Figure1 shows thismapping. In a quad-tree,a squareis repeatedlysubdividedinto 4 equalsquares.In mappinga 32-bit prefix to a square,we usestartwith the first two most significantbits of theaddressto placetheIP addressin oneof the4 squaresin thesecondlevel of thequad-tree.Wethenusethenext two mostsignificantbits to placetheIP prefix in theappropriatethirdlevel squarewithin this square.We do this repeatedlyuntilwe canplacetheprefix in a squarethesizeof a singlepixel.Theprefix is mapped to thatpixel.

Dueto thelimitationsof a computerscreenspace,we use

0000

0001

0010

001100

001101

001110

001111

01

10

11

Figure 1: Quadtreecodingof IP prefixes,showtop few lev-elsof thetree, andthemostsignificantbitsof theIP prefixesrepresentedby each sub-tree(sub-square).

a 512 X 512 pixel squareto representthe entire 32-bit IPprefix space.With only 512 X 512 pixels,many IP prefixesmapto the samepixel. Despitethat,a 512 X 512 squareissufficient in spreadingout theIP addressesin our data.Withanadditionallevel of zoominginto a portionof thedata,wecanview individual IP prefixes.Figure2 shows additionalwindows zoominginto themainwindow showing theentireIP prefix space.In themainwindow, a pixel is coloredyel-low if anOrigin AS Changeoccurredonthecurrentday, andcoloredbrown if achangeoccurredonapreviousday. In thedetail windows, a coloredsquareis shown for eachOriginAS change.Thepositionis determinedby the IP prefix, thesize by the mask,and the hue by the type of the change.Eachof the 8 differentpossibletypesof Origin AS changeis mappedto oneunique hue.Thebrightnessof eachsquaredependson the day the changeoccurred,with the currentday’s databeingthebrightest.This exampleshows thedataover a 416-daywindow from January1, 2000till February19, 2001.To show only oneday’s data,theusercansetthewindow to oneday.

This is a sensiblemappingfrom IP prefix to screenspacebecauseIP prefixes sharing similar more significant bitswould bein closeproximity on thescreen.In thedetailwin-dows,eachIP prefix is shown asasquareor arectangle.Thesizeof therectangleindicatesthesizeof theblock of IP ad-dresses;prefixeswith a smallermaskget mappedto largerrectangles.

submittedto Joint EUROGRAPHICS- IEEE TCVGSymposiumonVisualization(2002)

Page 4: A Visual Technique for Internet Anomaly Detection

4 TeohMa ZhaoandWu / A Visual Techniquefor InternetAnomalyDetection

Figure 2: Visualizationof data for 416 daysup till February 19, 2001.Themainwindowshowsthequadtreemappingof theentire spaceof 32-bit IP address.A pixel is coloredyellowif an Origin ASChange occurredon thecurrent day(February19,2001),and colored brown to greenif a change occurred on a previousday (January 1, 2000throughFebruary 18, 2001).Inthewindowsshowingdetail,a square is usedto depicteach change, with huedeterminedby thetypeof thechange, brightnessdeterminedby howlong ago thechange occurred (presentday datashownthebrightest),and sizedeterminedby themaskoftheprefix.Thebackground of themainwindowis shadedaccording to theIP prefixthepixel represents.Thebrighter thepixel,thelarger theIP prefixrepresented.

4.3. Relationshipbetweenprefix and AS

Next, therelationshipbetweenaprefixandits associatedASnumberneedsto berepresented.To achieve this,we draw 4lines surrounding the IP square.An AS number is mappedto a pixel on oneof the 4 lines.We draw a line from an IPaddressto anAS number if thereis anOrigin AS changein-volving that IP addressandthat AS number. This mappingtakesadvantageof theuser’sacuteability to recognizeposi-tion,orientationandlength.Figure3 shows thevisualizationof theIP-ASrelationshipof Origin AS Changesof a typicalday. Onceagain,the color of eachline is basedon the typeof changeit represents.

SincetherearemoreAS numbersthanpixels,morethan

oneAS numbermapsto a pixel. Again, we provide zoom-ing featuresfor the userto differentiatebetweenAS num-berswhich mapto the samepixel in the main display. Thelines representingchangesfor the AS in focus is shownwith brighterandmoresaturatedcolorsthanotherchanges.This effectively highlights theAS, fadingtheotherchangesinto the background.This is shown in Figure4, wherethepink (OS-type)linesemanatingfrom oneAS arehighlightedamongthousandsof lines.

4.4. Animation and other features

For the time dimension,we show oneday’s dataat a time,andallow the userto animatethe visualization,eachframeshowing consecutive day’s data.With this “movie” display,

submittedto Joint EUROGRAPHICS- IEEE TCVGSymposiumonVisualization(2002)

Page 5: A Visual Technique for Internet Anomaly Detection

TeohMa ZhaoandWu / A Visual Techniquefor Internet AnomalyDetection 5

Figure 3: Data on a typical day (September24, 2000). Foreach change, a line is drawn betweenthe IP prefix and theAS involved.Each line is colored according to the typeofthe change. On this day, there are manyH-type(blue) andB-type(green)changesoriginating froma singleASto a fewblocksof IP addresses.

theusercandetecttemporalpatterns.To assistour memoryof patternsfrom previousdays,weallow auser-definedwin-dow of acertainnumberof daysprior to thecurrentlyshowndate.Datafrom theseprevious daysaredisplayed,but withdarker, lesssaturatedcolors,so that the currentday’s datastandsout.

For the convenienceof the user, we alsoprovide textualdisplayof the IP addressor AS numberrepresented by thepixel clickedby theuser. Otherfeaturesfor conveniencein-cludea slider bar to tell thedateof thecurrentdatashown,andalsoto allow the userto choosethe dateto show. Withthetime line is a simpleplot of thetotal numberof changesof eachtypeon eachday. Theplot is in thelower left of Fig-ure2. Thecurrentdateis alsodisplayedin text. Theusercanalsochangethedateshown by typing thedesireddate.

By choosing parameterslike what IP prefixesto zoominon,whichAS numbersto focuson,whichtypeof changestoview etc.,theusercanview vastlydifferentinformation.De-pendingon the combinationof chosenparameters,the usercanseetheoverall patternof thedata,or theusercanfocusattentionon veryspecificpartsof thedata.Differentchoiceswould revealdifferentanomaliesandinformation.

Figure2 shows theOrigin AS Changesaccumulatedover416 days(from January 1, 2000to February19, 2001). Weobserve that the Changesoccurredin localized areas.Anareaon the squarecorresponds to a block of IP addresses

sharingthesameprefix. It is alsoobservedthatdifferentar-eashave differentcharacteristics.For example, Changesonthelowerright (128.0.0.0/8) tendoccur in largerblocks(16-bit masks)

4.5. Anomaliesdetected

To validate the visualizationapproach for anomalydetec-tion, wehada coupleBGPexpertsuseour tool to detectpo-tential problems(faultsandattacks)sinceJanuary1, 2000.Both of them agreethat our visualizationtool provides amuchimproved interfacethanthe tool they wereusingpre-viously, andis helpful in debugging thenetwork.

We classify the detectedanomaliesinto three differentcatagories:measureintensity (the numberof MOAS con-flicts weobserved,regardlessof MOAS types),AS anomaly(unusualbehavior per AS), andanimationcorrelation(spe-cial correlationrelationsacrossthetimedomain).

4.5.1. Measure Intensity

Normally, the amount of MOAS conflicts in the Internetislimited. Whensomeseriousfaults/problemshappened, thenumberof dots(in the3D figure)or coloredlines(in theASview) would increasesignificantly. While it is possiblethatsomeISPshadsomedramaticnetwork topology(or configu-ration)changesin onesingleday, it is very valuableto mon-itor thehealthof thenetwork throughthis measure.

For instance, on September 18, 2001, while theNimda/CodeRed-IIworms are spreadaroundthe Internet,we can clearly observe a surge on the intensity measurefor MOAS conflicts. Furthermore,since the attack waswidespreadaroundthe whole Internet,we canobserve thatmany ASessimultaneouslyhave contributedtheproblems.

Onanotherinstance,onJune14,2000,many MOAS con-flicts appearin the picture.After careful analysis,40% ofthe CMS conflictsarecausedby AS 1591and35% for theprefixes204.208.x.x.

4.5.2. AS Anomaly

Onevery useful featureof our visualizationtool is the ca-pability to identify a smallnumberof problematicASesbe-causemostof thepracticalBGPproblemstodayonly involveoneor two ASes.

In Figure4, the entiresquareis coveredwith blue lines(H-type changes). In addition, some pink lines (OS-typechanges)emanatingfrom a single AS arevery noticeable.This is in contrastwith the morecommonobservation of aH-type changesinvolving close IP addressesand a singleAS, for example in Figure 3. From the picture,we easilydiscover thisanomaly sinceit is highly unusual thatsomanyH-typechangesoccurredon onedayinvolving somany dif-ferentASes.It turnsout that AS 7777misconfiguredtheir

submittedto Joint EUROGRAPHICS- IEEE TCVGSymposiumonVisualization(2002)

Page 6: A Visual Technique for Internet Anomaly Detection

6 TeohMa ZhaoandWu / A Visual Techniquefor InternetAnomalyDetection

routers,announcingmany prefixes,includingmany with 32-bit masks,which is not supposedto happen.In Figure4, H-typelinesareonly drawn from theIP prefixto theirpreviousOrigin ASesandnot to their new Origin AS, which is AS7777.However, OS-typelinesaredrawn to theirnew OriginAS,which is AS 7777,sincetheirpreviousOrigin AS is null(seeSection4.1). Thisexampleshows thatalthough thepic-turemayhavemany linescrossingandobscuringeachother,anomalycanstill bedetected.To overcometheclutterto getspecificinformationregardinganindividual or agroupof IPprefixesandASes,theusercanselectthoseprefixesor ASesto focus on, as mentionedin Sections4.2 and 4.3. Otherwaysto avoid visualclutterarediscussedin Section4.6.

In Figure5, yet another exampleappearson January 20,2001, wherewe observed, throughthe tool, that AS-8708falsely(mostlikely misconfiguration) announced29/32pre-fixes.While,onthesameday, AS-6463injected41CSScon-flicts againstAS-15290.The later might be normal thoughbecauseAS-6463belongsto AT&T CanadaTelecomSer-vices,while AS-15290belongsto AT&T CanadaIES.But, itis interestingthatour tool showsapotentialtopologychangewithin thesameserviceprovider.

4.5.3. Animation Corr elation

The most interestingaspectof our tool is to discover “cor-relation” relationsvia the animationof the BGP datasets.Figure6 shows a largenumber of changes dueto AS 15412erroneously announcingprefixesbelonging to many differ-ent ASeson April 18, 2001. The next day, changesweremadeto correcttheerror, shown by Figure7. AlthoughFig-ures6 and 7 look disorderly, an identical patternis easilyobserved becausethechangesinvolved theexact samepre-fixesandASes,onceagaindemonstratingthe effectivenessof humanpatternrecognition.In fact, this stormof on andoff CSM and CMS problemshave occurredsinceApril 6,2001.Theanimationhelpsthesystemadministratorsto dis-covernotonly aproblemhasoccurredbut alsohow onetypeof MOAS conflict affectsanothertype.

Other anomaliesobserved include private AS numberleakageon September18, 2000,andmany dayswith hightype-Oactivity. Wehavenot foundexplanationsfor many oftheseobservations.With moreinvestigation,andfurtherex-plorationwith thevisualizationtool, we will beableto findout why thesechangesoccurred.

4.6. Alter native representation

Anotherway to overcomevisualclutter is presentedin Fig-ure 8. It shows an alternative representationof the dataforAugust 14, 2000, (original representationin Figure4). Inthis representation,eachOrigin AS change is mappedto apoint on a horizontal plane in the samequad-tree mannerwedescribed.Theverticalpositionof thehorizontalplaneisbasedonits associatedAS number. Eachchangeis shown as

Figure4: DataonAugust14,2000. Ananomalyis observeddespitevisual clutter. Many B-typechanges involving dif-ferent ASesand IP prefixesoccurred. SomeOS-type(pink)changesare highlighted.TheseOS-typechanges all involveAS 7777 and far-apart IP prefixes. This also indicatesafault.

a cubein theposition.Thecubeis coloredaccordingto thetype of change, asbefore.Onceagain,anomalyis revealedbecausemany differentASesareinvolved.In this mapping,thereareno linescrossings.However, from our experience,theoriginal 2D representationis still betterat showing cer-tainfeatures,for examplethesameAS originatingmany far-apartIP prefixes.Theusercannavigatethroughthis3D rep-resentationby operationssuchas rotation, translationandzoom/pan.

Projecting the cubes onto two perpendicular verticalplanesgives us yet anotheralternative visual representa-tion of the data.The projectedimagesof eachday’s datacan reveal patternsof anomaly. In Figure8, the cubesareprojectedto grayscalevaluesonto two planesin the back-ground.Figure9 shows theresultof projectingthecubesinFigure8 onto squarescoloredby change type. The Figureshows only the projectedimagesand not the cubesthem-selves.Theanomalouspatternof regularly-spacedpink (OS-type)squaresis especiallyobvious.

5. Conclusionsand futur ework

Wehavedemonstratedtheprinciplesandeffectivenessof us-ing visualizationasa tool for anomalydetection,andfor re-vealingthesourceandnatureof thedetectedanomalies.Webelieve that visual-basedapproach will be widely adopted,improving thesecurityandefficiency of theInternet.

submittedto Joint EUROGRAPHICS- IEEE TCVGSymposiumonVisualization(2002)

Page 7: A Visual Technique for Internet Anomaly Detection

TeohMa ZhaoandWu / A Visual Techniquefor Internet AnomalyDetection 7

Figure 5: OS-typechanges on January 20, 2001. Many in-volveAS8708andIP prefixeswith 32-bit masks.

Figure6: CSM-typechangeson April 18,2001.

Onelimitation of thecurrentapproach is thatit is noteasyfor the userto quickly find out which ASescausefrequentchangesovernon-adjacentdays.It is alsonoteasyto quicklynoticewhich AS-IPpairsoccurfrequently, or occurin a pe-riodic manner. More datapreprocessingincorporatingsta-tistical methodscould help identify thesephenomenaandhighlight theseASes,IP prefixesor AS-IPpairsto draw theuser’s attentionduringinteractive visualization.

Figure 7: CMS-typechanges on April 19, 2001. Patternidenticalto CSM-typechangeson thepreviousday(seeFig-ure6).

Acknowledgments

Thiswork hasbeensponsoredin partby NSFPECASE,NSFLSSDSV, andDOE SciDAC. We thankthemfor their sup-port.

References

1. University of Oregon Route Views Project.http://www.antc.uoregon.edu/route-views/ 2

2. C. Ahlberg and B. Shneiderman.Visual InformationSeeking:Tight Coupling of Dynamic Query Filterswith StarfieldDisplays. ProceedingsCHI’94: HumanFactors in ComputingSystems, pp 313–317, Boston,Massachusetts,1994. 2

3. T. Atkison,K. Pensy, C.Nicholas,D. Ebert,R.Atkison,andC. Morris. CaseStudy:VisualizationandInforma-tion Retrieval Techniquesfor Network IntrusionDetec-tion. Joint Eurographics-IEEETCVG SymposiumonVisualization(VisSym01), Ascona,Switzerland,28-30May 2001. 2

4. R.J.Brachman,F. Halper, P.G. Selfridge,T. Kirk, L.G.Terveen,A. Lazar, B. Altman, D.L. McGuinness,A.Borgida,andL.A. Resnick.IntegratedSupportfor DataArchaeology. InternationalJournal of Intelligent andCooperativeInformationSystems, 1993. 2

5. L. Girardin. An Eye on Network Intruder-Administrator Shootouts. Proceedingsof the Work-shopon Intrusion Detectionand NetworkMonitoring(ID’99), USENIX Assoc,Berkeley, CA, USA,1999. 2

submittedto Joint EUROGRAPHICS- IEEE TCVGSymposiumonVisualization(2002)

Page 8: A Visual Technique for Internet Anomaly Detection

8 TeohMa ZhaoandWu / A Visual Techniquefor InternetAnomalyDetection

Figure 8: 3D representation for August14, 2000 (originalrepresentationfor thesameday’sdatain Figure4). Each ASchangeis representedbya cubewithcoordinatesdeterminedby IP prefix and AS number. Each cube is colored by itschangetype. Twoplanesin thebackground showa grayscaleprojectedimage of each cube. This picture showsan ex-traordinary numberH-type(blue) changes involving differ-ent ASes.In addition, there are also someOS-type(pink)changes arranged in a regular pattern on one horizontalplane. This correspondsto AS7777announcingthosepre-fixes.Thisis clearly a fault.

6. J.Goldstein,S.F. Roth,andJ.Mattis. A Framework forKnowledge-Based,Interactive DataExploration.Jour-nal of Visual LanguagesandComputing, pp. 339–363,December 1994. 2

7. M. Holsheimerand A. Siebes. Data Mining: TheSearchfor Knowledgein Databases.ReportCS-R9406,ISSN0169-118X, Amsterdam,TheNetherlands,1991.2

8. T. Lane.HiddenMarkov Modelsfor Human/ComputerInterfaceModeling. Proceedingsof theIJCAI-99Work-shopon Learningabout Users, pp 35–44,1999. 1

9. H.Y. Lee,H.L. Ong,andL.H. Quek. Exploiting Visu-alizationin Knowledge Discovery. Proceedingsof theFirst InternationalConferenceon KnowledgeDiscov-ery and Data Mining, pp 198–203,Montreal,Quebec,1995. 2

10. W. Lee. A DataMining Framework for ConstructingFeaturesandModels for IntrusionDetectionSystems.PhD Thesis,ColumbiaUniversity, June1999. 2

Figure 9: Anotherway of looking at the 3D representationof the data on August14, 2000,showingthe projection ofthe cubes(of Figure 8) onto two perpendicular planesandcoloring theprojectedimage according to change type. Thecubesare not shown.On this particular day, the OS-type(pink) changes and H-type(blue) changesobviouslyshowsa pattern,in agreementwith theothervisualrepresentation.

11. T. Lunt, A. Tamaru,F. Gilham,R. Jagannathan, P. Neu-mann,H. Javitz, A. Valdes,andT. Garvey. A real-timeintrusiondetectionexpertsystem(IDES)- final techni-cal report. Technical report, ComputerScienceLabo-ratory, SRIInternational, Menlo Park, California, Feb1992. 1

12. T. Lunt. Detectingintrudersin computersystems.Pro-ceedingsof the1993ConferenceonAuditingandCom-puterTechnology, 1993. 1

13. Y. Rekhterand T. Li. A Border Gateway Protocol4(BGP-4).RFC1771, 1995. 1, 2

14. W. Ribarsky, J. Katz, T.Y. Jiang,andA. Holland. Dis-covery VisualizationUsing Fast Clustering. ReportGIT-GVU-99-14,IEEEComputerGraphicsandAppli-cations, 19(5), 32–39, 1999. 2

15. B. Shneiderman. DesigningtheUserInterface:Strate-gies for Effective Human-Computer Interaction: Sec-ondEdition. Addison-Wesley Publ.Co.,Reading,Mas-sachusetts,1992. 2

16. R. Spence. Information Visualization. ACM Press,2000.

17. X. Zhao,D. Pei,L. Wang,D. Massey, A. Mankin, S.F.Wu, andL. Zhang.An Analysisof BGPMultiple Ori-gin AS (MOAS) Conflicts, SIGCOMMInternetMea-surementWorkshop 2001. 1, 3

submittedto Joint EUROGRAPHICS- IEEE TCVGSymposiumonVisualization(2002)