internet measurement masterclass 2006 10:00 session 1: kick off, problem space, thinking ahead, you...
Post on 20-Dec-2015
214 views
TRANSCRIPT
Internet Measurement Masterclass 2006
10:00 Session 1:Kick off, problem space, thinking ahead, you and the law
Andrew Moore - Queen Mary, University of London
11:00 Morning tea11:15 Session 2:
Monitoring with Windows and how not to be deluged with dataDinan Gunawardena - Microsoft Research Cambridge
12:15Hardware selection for monitoring
Fabian Schneider - TU Berlin
12:45 Lunch + concurrently with Endace hardware demonstration13:45 Session 3:
Netflow, and routing data as a source of measurementSteve Uhlig - Delft University of Technology
14:45 Afternoon tea15:00 Session 4:
Statistics for the measurement communitySteven Gilmour - Queen Mary, University of London
15:45 Wrap-up16:00 beer / NGN ProgNet06 workshop starts
What we won’t cover
• Active measurement (AMP, ping, traceroute, rrt, planetlab)
• Exhaustive survey of current measurement research
• I’m happy to provide opinion on these things in a break, but
I am not an active-measurement expert, I don’t even play-one on television.
WHY Measure?
• Measuring something helps you understand it
Few would argue the Internet is important enough to understand
- Good data outlives bad theory- Jeff Dozier
- Measure what is measurable, make measurable what is not.
- after Galelio
Why?a non-exhaustive list
• Measurements are inputs to– validate a model– drive a simulation– test a new approach
• Measurements help understanding (fault-finding)
• Measurements are often part of the accounting process
Why so hard?
Wrong.
-Law
-Level 2 is not always
-accessible
-monitor-able
-Operations staff hate you
1Other monitoring boards are available
Pick your (Endace1) Dag board, plug it in and go. Right?
-Data on the wire is not the only first class measurement object
-Hardware doesn’t work
-Wrong Measurements
-Wrong Interpretation
-Wrong Problem
Where should I start?
• Ask WHY are you measuring?
“Measure twice & cut once”
great for carpenters but
“Think (at least) twice and measure once”
is better for us.
Pick the right tool for the right job
• Measurement of packets on a wire in your lab– Great for observing once specific use of
one set of applications in one place in the Internet
– Terrible for telling you how many mobile devices are used for IPtv in China, or the connectivity among world ISPs, or ….
Uh-Oh
• Who are you going to measure? 1 user? 1000 users?
• When? (what time of the day?)• Where? (your personal machine, a
campus? a country?)• How?
– How-long? a day? week? month?– What method are you going to use?
Law(I am Not a Lawyer and this is UK Law)
• If in doubt, seek out advice• Everything is illegal• Don’t ask a question you don’t want to know
the answer to.
• We care about– RIPA (Interception)– DPA (personal-data storage)
Many Thanks to Richard Clayton and Andrew Cormack
Data Protection Act 1998
• Overriding aim is protect the interests of (and avoid risks to) the Data Subject
• Data processing must comply with the eight principles (as interpreted by the regulator)
• All data controllers must “notify” (£35) the Information Commissioner (unless exempt)– Exceptions for “private use”, “basic business purpose”: see the website
Data Protection act (1998)
• Principle 7 is specially relevant– Appropriate technical and organization measures
shall be taken against unauthorized or unlawful processing of personal data and against accidental loss or destruction of, or damage to personal data
• The Information Commissioner advises that a risk-based approach should be taken in determining what measures are appropriate– Management and organizational measures are as
important as technical ones– Pay attention to data over its entire lifetime
RIP Act 2000
• Part I, Chapter I interception
• Part I, Chapter II communications data
• Part II surveillance & informers
• Part III encryption– not as relevant for this
• Part IV oversight– sets up tribunal and interception commissioner
RIP Act 2000 - Interception
• Tapping a telephone (or copying an email) is “interception”. It must be authorized by a warrant signed by the secretary of state.– SoS means the home secretary (or similar). Power
delegation is temporary. Product is not admissible in court
• Some sensible exceptions exist– Delivered data– Stored data that can be accessed by the production of
an order– Techies running a network– “Lawful business practice”
Lawful Business Practice
• Regulations prescribe how not to commit an offence under the RIP act. They do not specify how to avoid problems with DPA (or other legislation)
• Must make all reasonable efforts to tell all users of system that interception may occur
Law One-slider• If in doubt - ask someone!• Why do you want to do this?
– bare minimum, no “data for data’s sake”– the onus is on you at all times to justify what you
are doing
• Unless you want to keep the DPA happy; don’t keep any personal identifiers
• Use your University ethics committee
I am NOT a Lawyer!
(Good) Measurement Principles
• Check your methodology• Keep all Meta-data• Calibrate your experiments• Automate all processing
– it’s a documentation trail– cache those intermediate results; they tell
you where you went wrong
• Visualize your data at every stage– this helps ensure you didn’t goof
Check your Methodology
• Talk to people around you, find a mentor and even an antagonist
• Better they find something wrong than the external examiner or the reviewers of the paper
• Consider the scope of a reasonable measurement and the claims you can make
Meta-Data
• the filter you used on tcpdump is meta-data.
• your methodology is meta-data• the day/time of the week is meta-data• the hardware you used is meta-data• (possibly) how much alcohol in your
blood-stream is meta-dataKeep it all
Calibrate your experiments• Test your assumptions
• (been assuming the network is busiest at midday - okay this is the moment you find that 3:30 is the busy time)
• “bench-test” your setup; this is just good science – test your processing scripts many (many)
times
• Most departments do not have good test equipment, this is no excuse
Automate your processing
• Make is your friend
• intermediate processing (and the scripts/code that did it) are more meta-data
• critical when you want to reproduce your results (and have others reproduce your results)
Visualize your data
• visualize your data early and often
• scatter plots are always useful
• identify/understand those outliers now– problem? or expected result?
My first network monitor
• configurations– monitor and method
• gotcha
• backhaul network
• storage, archive, index
Configuration
• Hardware selection– How are you going to remote-admin this machine?
• OS / Software selection– Much work in unix domain; that doesn’t make it
good-work; Dinan – tcpdump/pcap is standard and lots of tools
• Not fast, loss-error prone, timestamps are junk,
– divorce the data representation from the method• tcpdump is a useful offline tool but dagtools, CoMo and
others (nprobe, etc) are simply better online
– consider the right tool for the task
Hardware (getting the traffic)
• Passive taps– invasive installation– no impact in operation– “stealing photons”
• Port Mirrors (e.g. Cisco SPAN)– be vewy vewy careful.
• jitter, loss, reordering
– fantastic for multiple/redundant links• multiple copies of packets
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Hardware 2
• Remember about physical layers?• Observing traffic at end systems is pretty
easy (but imposes an overhead)• intermediate networks may not be trivial to
monitor:– Packet over Ethernet, Packet over Sonet are not
the only possibilities
• Aside from weird layer-2s, maybe encrypted,
Getting the data to somewhere useful
• Out of Band backhaul
– Co-schedule Measurements– FedEx the disks
(realistically - postgrad-u-haul)
– Co-locate storage/processing• storage & processing = heat/power
– Dedicated backhaule.g. using (a piece of) the dedicated research net
Tools• tcpdump (libpcap) - but know the limitationsa) no records of lossb) microsecond accuracy only - and RARELY thatc) simultaneous arrival times are possibled) no record of precision or accuracy or filter or conditions
or monitor-circumstance or equipment failure or …
• gnuplot (or any plotting packet)scatter plot are always useful (combined with eye-
squared)
SharingProviding Access to the data
• Law may prevent access• Either need to control who gets dataOR• Ship code to monitor
(Mogul et al, MineNet 2005/6)
• One PlatformCoMo http://como.sourceforge.net
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
These guys do run the Internet(or why I should be nice to my ops guys)
• Looking for a real problem?• Wondering about actual impact?• Talk to your front line• Sysadmins and Operators are front-line• They are rarely stupid• Don’t have the time to “think outside the box”• they will be honest with you (brutally honest in
most cases)• www.nanog.org • www.ripe.org
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Next….
• Lets examine hardware and Operating Systems issues, specifically:– Windows: the other operating-system– Data-management: how to prevent success-
disaster
– So you want to monitor 10Gbps?
UK specific resources
• Janet’s NDA and AUP:http://www.ja.net/development/traffic-data/
• Data Protection Act:http://www.hmso.gov.uk/acts/acts1998/19980029.htm
• RIPAhttp://www.legislation.hmso.gov.uk/acts/acts2000/20000023.htm
Specific references• Mark Crovella & Bala Krishnamurthy, Internet Measurement, Wiley
2006
• Walter Willinger, Pragmatic Approach to Dealing with High Variability, IMC 2004
• Vern Paxson, Sound Internet Measurement, IMC 2004
Very early “what I did with my measurements” paper; these papers grandparent much Internet measurement work
• kc claffy, etal, A parameterizable methodology for Internet traffic flow profiling, IEEE JSAC, 1995
• V. Paxson, End-to-End Routing Behavior in the Internet. IEEE/ACM Transactions on Networking, Vol.5, No.5, pp. 601-615, October 1997