e-insights, llc © 2000 all rights reserved. understanding web traffic michael whelan part 1 of 2
TRANSCRIPT
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Understanding Web Traffic
Michael Whelan
Part 1 of 2
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Why do you Analyze Traffic
• Management wants to track performance.
• Need to know inventory & usage information to support sales efforts.
• Audit requirements.
• Reconciliation with contracts/vendors.
• May be used for performance bonus targets.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Goals
• To understand the capabilities and limitations of web traffic analysis
• Identify the major pitfalls & workarounds
• Be able to identify erroneous data quickly
• Be able to track down inconsistencies
• Be able to extract marketing/customer support benefits from traffic analysis
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
DNS
• Domain Name– xxx.yyy => ‘yyy’ is top level domain.– ‘xxx.yyy’ is a domain name– abc.xxx.yyy is a machine name, as are
a.b.c.d.e.xxx.yyy and aspen.xxx.yyy
• Domain Name Service (DNS) maps from a machine name to an Internet Address – www.e-insights.com => 209.10.106.30
‘telephone book’
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Inverse DNS
• Map from IP address to machine name.– Was not part of the original DNS spec.– Does not have to be supported (may be required
for security in certain situations).– Frequently (>40%) simply does not exist.
‘unlisted numbers’
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Start with the BrowserHTTP://www.e-insights.com/index.html
Render the page
DNS Server
Whois www.e-insights.com
209.10.106.30
Connect to 209.10.106.30
GET /index.html HTTP/1.0…..
E-insights Server
<html><BODY BGCOLOR=red>Hi There.<IMG SRC=/images/E-logoA.gif></body></html>
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
HTML - a little more
• Colors/font sizes & styles
• Actual text (and links).
• Javascript code.
• Frame set definitions.
• References to Images, style sheets, and possibly frames.
• References to java or shockwave, etc.
• References to javascript files.
Each ‘referenced’ element involves a separatetransaction with the server.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
HTML Example
<html>
<head><title>Yahoo! Shopping</title></head>
<body bgcolor="#ffffff">
<center><table cellpadding=2 cellspacing=0 width=675>
<tr><td valign=middle width="1%"><a href="http://shopping.yahoo.com">
<img border=0 height=35 width=314 src="http://us.i1.yimg.com/us.yimg.com/i/sh/sh41.gif" alt="Yahoo! Shopping"></a></td>
<td align=right nowrap valign=bottom><font face=arial size="-1">
<a href="http://shopping.yahoo.com">Shopping Home</a> -
<a href="http://www.yahoo.com">Yahoo!</a> -
<a href="http://help.yahoo.com/help/shop/">Help</a></font>
<hr size=1 noshade></td></tr>
</table></center>
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
References => Transactions
• Each ‘referenced’ element is seperately requested and transferred to the browser. A record of each transfer is recorded in the server logs.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
What’s in a Log ?
• The actual contents, format and ordering can be customized on the servers.
• Browser identification (e.g. IE5.1 or NS4).• Date and time of request• Requesting IP address• Request & item• Status• Number of bytes sent
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Sample Log - Apache - simple#Fields: date time c-ip cs-method cs-uri-stem cs-protocol sc-status sc-bytes
cs(User-Agent)cs(Referer)
2000-02-21 00:01:26 192.168.2.100 GET / HTTP/1.0 200 5199 - - 2000-02-21 11:07:32 192.168.2.108 GET /buzzco/logs/byminplot.php3 HTTP/1.0 200 14093
Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+98;+DigExt)http://www.e-insights.com/buzzco/logs/logsummary.php3
2000-05-31 13:22:15 216.206.70.134 GET /meyers/ HTTP/1.1 401 483Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt)
http://monitor2/2000-05-31 14:58:34 206.189.239.171 GET /images/ei-logoC.gif HTTP/1.0 200 496
Mozilla/3.01+(compatible;)
-
Note - I have added line breaks and tabs to make this more readable, each log entry is actually recorded as a single line.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Extended Log Options
• Cookies
• Query Strings
• Referrer Location
• Time to complete request
• Bytes received
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Sample Log2 - NS/BVformat=%Ses->client.ip% - %Req->vars.auth-user% [%SYSDATE%] "%Req->reqpb.clf-request%”
%Req->srvhdrs.clf-status% %Req->srvhdrs.content-length%"%Req->headers.referer%" "%Req->headers.user-agent%"
205.217.100.73 - - [27/Aug/2000:00:00:37 -0400] "GET /cgi-bin/pm/international/community.jsp?channel=International&community=Dragracing HTTP/1.1”200 -"-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 98; TDSNET71)"
205.188.197.51 - - [27/Aug/2000:00:00:35 -0400] "GET /cgi-bin/pm/showroom/showroomview.jsp?channel=Truckin&community=Ford&oid=12958 HTTP/1.0”200 -"http://www.grstgv.com/cgi-bin/pm/search/showroom_searchresult.jsp?ResultStart=20&ResultCount=10”
"Mozilla/4.0 (compatible; MSIE 5.0; AOL 5.0; Windows 95; DigExt)"207.115.63.13 - - [27/Aug/2000:00:00:34 -0400] "GET /articles/013680af/013680afp07s05.jpg HTTP/1.0”
200 15240"http://www.grstgv.com/cgi-bin/pm/common/morePhotos.jsp?channel=Electronics&community=DIY&oid=24952&contentType=Feature”
"Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"
Note - I have added line breaks and tabs to make this more readable, each log entry is actually recorded as a single line.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Sample Log3 - MS/IIS#Version: 1.0#Fields: date time c-ip cs-authname s-ip s-sitename cs-method cs-uri-stem cs-uri-query c-version
sc-status sc-bytes cs-bytes cs(User-Agent) cs(Cookie) sc(Referer)2000-09-02 05:00:00 205.188.197.34 - 192.168.2.2 host.whobei.com GET /jetson/Detailed_Quote.html
Symbol=MAYS&nocache=278099 HTTP/1.0 200 29396 368"Mozilla/4.0 (compatible; MSIE 5.0; AOL 5.0; Windows 98; DigExt)”"TUSER=1649012.60578.rt0; AccipiterId=00000000*Def; PortfDispPrefs=0" ""
2000-09-02 05:00:00 24.177.29.24 - 192.168.2.2 host.whobei.com GET /jetson/Real_Time_Quote.htmlType=Real&Symbol=MTIC&nocache=943545020770 HTTP/1.1 302 288 473"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)”"TUSER=256182.292920.rt0; AccipiterId=00000000*Def; PortfDispPrefs=0" ""
2000-09-02 05:00:00 205.188.198.154 - 192.168.2.2 host.whobei.com GET /jetson/Detailed_Quote.htmlSymbol=BBY&nocache=282499 HTTP/1.0 200 29368 461"Mozilla/4.0 (compatible; MSIE 5.0; AOL 4.0; Windows 95; DigExt)”
"GUID=000E71DE876609B052ABE9630A001608; AccipiterId=00000000*Def" ""
Note - I have added line breaks and tabs to make this more readable, each log entry is actually recorded as a single line. IP’s and names modified.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Server defaults
• Upon receiving a request which does not specify a specific resource, the server looks through an ordered list of ‘defaults’ until a match is found & returns that resource.
• However, the log entry records what was asked for, not what was returned.
http://ww.acme.com/ returns ‘default.htm’ but Log shows ‘/’
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
HTTP Status Codes
• 200 - OK
• 300’s Moved– 301 permanently
– 302 temporarily
– 304 not modified
• 400’s Error– 400 bad request
– 401 unauthorized
– 403 forbidden
– 404 not found
• 500’s Server Errors– 500 internal error
– 503 too buzyNote not all codes areshown. Bold are mostimportant.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Some Definitions
• Page View - one person looking at one page of information they have asked for.
• Visitor a distinct individual who came to the site at least once during a specified period.
• Visit, or visitor session - activity of a specific visitor such that there were no ‘pauses’ of greater than (for example) 30 minutes.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Basic Questions
1. How many page views were there ?
2. How many visitors were there ?
3. How many site ‘visits’ were there ?
4. How long did people stay on the site ?
5. Where did people come to the site from ?
6. When people left the site where did they go ?
7. Who (where) are these people anyway ?
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Page Views
• Count all lines in the logfile which– 1) were not errors– 2) were not images or javascript or style ..– 3) that were not ‘monitors’ or other automated
processes.
– Challenge is knowing what not to count - what to ‘filter out’ . Frequently hard, sometimes impossible.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Visitors
• Count the number of distinct visitors .
• Challenge - How do you know which traffic is from one visitor and which is from another ?
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Visits
• Start at the beginning and in time order check log entries. Keep track of when you last saw traffic from a particular visitor. If this is the first time - then it’s a new visit, or if the time since the last traffic from that visitor was greater than 30* minutes ago, its also a new visit.
• Challenge - as in visitors plus the choice of time interval, and is ‘harder’ .
30 mins in the normal used - but it can vary.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
How Long did Visitors Stay ?
• Do the ‘visit’ analysis & record the times for each visit - take average.
• Challenges - visits+ period beginning/end, and the impact of ‘automated monitors’ - may appear as a small number of VERY LONG visits.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Where Did These Visitors Come From
• Use the referrer field
• Challenges tells you the page (no query), is sent by the browser => it may look different & not always logged.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Where did the People Go ?
• You cannot tell at all if they simply typed in another URL or picked a site from their history list or favorites list.
• You also cannot tell if they follow a normal link on your site to another site.
• You can track where they go if the site is coded to use ‘re-direct’ scripts.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
Who are these people anyway ?
• You don’t know the people at all.
• You may know the ‘computers’ .
• Proxies, Firewalls, ISP’s, all ‘hide’ computers behind a single IP.
• Unless - you use cookies, possibly combined with registration.
E-insights, LLC © 2000 All rights reserved.www.e-insights.com
• Continued in Part 2