internet / intranet fall 2000
Post on 15-Jan-2016
29 Views
Preview:
DESCRIPTION
TRANSCRIPT
Internet / Intranet
Fall 2000
Class 4Web Server Technology
HTTP ProtocolLog Files
Brandeis University Internet/Intranet Spring 2000 2
Class 4 Agenda
Discuss HomeworkMilestone 2 Due Week 6Mini-Homework Due Next Week
Overview of Web Servers and Server TechnologyPresentationsHTTP
The Protocol For Communication Between Web Browser and ServerLog Files
Lab WorkHTTP Log Files (Mini-Homework)
Brandeis University Internet/Intranet Spring 2000 3
Web Servers
A Basic Web Server is Just a File ServerClient Requests a File via HTTP ProtocolServer Delivers the File via HTTP ProtocolServer Maps URL to a SubdirectoryWeb Server Needs Appropriate Permissions to Access Files/DirectoriesSupports Non-HTTP Protocols
FTP, Gopher, etc.
A Web Server is Not HTML SpecificTypically Identifies a Filetype by Extension
Or Directory Where File Exists
Brandeis University Internet/Intranet Spring 2000 4
Additional Common Web Server Features
Additional Security Beyond That Provided by O/SScripting
Ability to Dynamically Create a Web PageRun a Program Instead of Returning a File (CGI)
Return the Program Output as the Requested File
AdministrationLog FilesPerformance Monitoring
Brandeis University Internet/Intranet Spring 2000 5
Advanced Web Server FeaturesVirtual Hosting
Allow Multiple URL’s to Map to Same ComputerPerformance Optimization
CachingReliabilityScalability
Proxy Servers (For Security and Performance)Fetch Documents That are on Other Computers
Cache Them LocallyAllows for Easy Scalability
Multiple Proxy Servers Can Cache Documents From One Source Computer
Embedded ScriptingServer Side IncludesCustom Scripting Languages
Server API
Brandeis University Internet/Intranet Spring 2000 6
Web Servers – Added FunctionalityDatabase Connectivity
SQL, MySQLDirectory Listings
Icons, etc.Built-In Search EnginesBuilt-In ImageMap HandlingMultimedia SupportSession Emulation
Streaming MultimediaAdvanced Security
Encrypted HTTPS-HTTP (Secure HTTP) – CommerceNetSSL (Secure Sockets Layer) - Netscape
Web Server “Add-Ons”CGI Substitutes / CGI Optimizations
Cold Fusion
Brandeis University Internet/Intranet Spring 2000 7
Web Server History
All Web Servers Have a Common Roothttpd (NCSA)
UNIX OrientationMany Features are Essentially UNIX Features
ApacheWebsite (O’Reilly)Netscape Enterprise ServerMicrosoft Internet Information ServerA Slew of Others
Brandeis University Internet/Intranet Spring 2000 8
Apache
UNIX Origins – Now Ported to NTEvolved From httpdFreewareTypical UNIX Application
Public Source CodeMany Defaults, Conventions
BUT: All is Configurable
No GUI InterfaceConfigured via Scripts, Shell Commands, Config Files
Various “Flavors”Many Optional Features
APIApacheSSL
Brandeis University Internet/Intranet Spring 2000 9
IIS / Netscape
Microsoft IISNot Strictly Derived From httpd/ApacheWindows NTHowever: Functionally Very Similar to Apache
Emulates Many UNIX ConventionsE.g. Forward Slashes
Configuration via GUIPersonal Web ServerPeer Web Server
NetscapeMulti-Platform
UNIX is Preferred PlatformLess “Open” Than ApacheMore Secure?
Brandeis University Internet/Intranet Spring 2000 10
UNIX File StructureForward Slashes (/) to Separate Filenames, DirectoriesCase Sensitive File Names
Windows is NotNo Limit on Filename Size / Extensions
Extensions are by ConventionRoot is “/”User Home Directory is: “~/”Symbolic Links / Aliases
Directories Can Be Spread Over Multiple DrivesCan Create Non-Hierarchical Structure
File PermissionsRead, Write, ExecuteSeparate Permissions for Owner, Group, All
Directories are Special Cases of FilesExecute Permissions = Able to Browse Directory
Brandeis University Internet/Intranet Spring 2000 11
Web Server ConfigurationDirectory Structure
Virtual Document TreeAccess to User Directories
UNIX: ~userSymbolic Links
Be Careful: May Link You Out of Directory StructureCase Sensitivity
Ownership AccessServer is a Process Started by a User.
Has the Permissions of the User Who Started It.
Default DocumentsAllow Directory Browsing
ScriptingWho is Allowed to Run Scripts?How are Scripts Identified?
Brandeis University Internet/Intranet Spring 2000 12
Web Server File Access Control / Security
DirectoryO/S Level SecurityIP, Domain Level Security
Spoofing
Directory Access.htaccessMicrosoft Front-Page Extensions
EncryptionS-HTTP
Web Protocols OnlySSL
TCP/IP LevelV1.0 – V2.X : Security Holes Found, FixedV3.0 Is CurrentUses Port 443
Microsoft PCTResponse to Holes in SSL 2.0Now Use SSL
Brandeis University Internet/Intranet Spring 2000 13
Server Administration
Need Sysadmin and O/S ExpertiseLots of “Holes” Gotchas Whenever Scripts are Allowed
FTPWho is Allowed to Change Documents?Who is Allowed to Change Server Configuration?How do They Get Access?
Direct AccessRemote Access (e.g. FTP)
Log FilesAccessibilityDirectory StructureManagement
Brandeis University Internet/Intranet Spring 2000 14
HTTPThe Protocol For Requesting and Delivering Web Pages
Not Restricted to Returning HTML FilesClient Server Model
Request / ReponseTCP/IP Protocol Using Port 80
Supports Other Ports, Can Be Run Over Other Protocols“Replaced” FTP as the Primary Method For Internet File TransferStatelessUses MIME Format to Encapsulate DataMessage Structure Similar to SMTP Mail Messages
Message Header (metadata) Message Body (data)
Separated From Header by a Blank Line
Browser Only Displays Body, Not HeaderNo Restrictions on Message Size / Format (as with SMTP)
Brandeis University Internet/Intranet Spring 2000 15
HTTP Versions
HTTP 1.0 - Commonly Used VersionHTTP 1.1
Formalizes Many Extensions to Version 1.0Supports Persistent ConnectionsSupports Compression/DecompressionSupports Virtual Hosting
Single Server With Multiple IP AddressesSupports Multiple LanguagesSupports Byte Range Transfers
Useful For Re-Sending Interrupted Data Transfers
Similar to Process Used By XMODEM, etc.
Brandeis University Internet/Intranet Spring 2000 16
HTTP OVERVIEW
Client(Browser)
WebServer
FileSystem
HTTP Request
HTTP Response
HTMLHTML
Server Application
HTML
CGI
Brandeis University Internet/Intranet Spring 2000 17
HTTP Commands
Simple StructureMain Methods
GET <URI> HTTP/1.0Request the File Specified By the URLURI is URL Without Protocol/Port
HEADRequest the HTTP Header Information Only
Don’t Return the File ItselfPOST
Sends Data to The ServerTypically Data From a Form
Defined, But Not Widely ImplementedPUTDELETELINKUNLINK
Brandeis University Internet/Intranet Spring 2000 18
Common HTTP Header FieldsAdditional “Parameters” to the HTTP CommandsUsed in HTTP Requests:
Accept Lists the MIME Types That Client Can Accept
E.g. Accept text/plain, text/html or Accept *Accept-Charset
Lists Accepted Character Sets That Client Can AcceptASCII, ISO-8859-1 Are Assumed
Accept-EncodingAccept-LanguageAuthorization
Basic – UserName:Password (Base64 Encoding)CookieFrom
E-mail Address of Requesting UserNot Typically Used For Privacy ReasonsPrimarily Used By Automated Clients (e.g. Bots)
Brandeis University Internet/Intranet Spring 2000 19
Common HTTP Header Fields (2)Host
Virtual Host – One Server Handles Multiple SitesIf-Modified-Since
Only Return Data if it Has Been Modified Since This DatePragma
General Purpose For “Additional” Headers Not in StandardReferrer
The URL That Referred One to This URLUser-Agent
Name/Version of the HTTP Client
Used in HTTP Responses:Allow
Lists the Available Commands Supported by ServerContent-Encoding
Allows for Passing Data in Compressed FormatsContent-Language
Describes the Natural Language of the Intended Audience
Brandeis University Internet/Intranet Spring 2000 20
Common HTTP Header Fields (3)Content-Length
Size of the Message BodyContent-Type
The MIME Type For the DataDate Expires
HTTP Clients Should Not Cache Data After This DateLast-ModifiedLocation
Used For RedirectionMIME-VersionPragma
E.g. no-cacheRetry-After
When Server is Unavailable. Info On When to Try BackServer
Name/Version of the HTTP Server
Brandeis University Internet/Intranet Spring 2000 21
Common HTTP Header Fields (4)
TitleDescriptive Title of the File
WWW-AuthenticateWhen Authorization Denied, Tells Client Which Methods of Authentication are Supported
HTTP Status CodesReturned By the Server In First Line of ResponseInformational (100-199)Successful (200-299)
Redirection (300-399)Location in HTTP Header Specifies Redirection
Client Error (400-499)Server Error (500-599)
Brandeis University Internet/Intranet Spring 2000 22
Common Status Values200 – OK201 – Created (Post Request Was Fulfilled)204 - No Content (OK. Nothing For Client to Display300 - Multiple Choices
Requested Resource Available From Multiple Locations.List of Locations Returned in the Response.
301 - Moved Permanently302 - Moved Temporarily304 - Not Modified
Document Hasn’t Been Modified Since If-Modified Since Date
400 - Bad Request401 – Unauthorized403 - Forbidden404 – Not Found500 – Internal Server Error501 – Not Implemented (Server Does Not Support ThisRequest)502 – Bad Gateway (Invalid Response From Server)503 – Service Unavailable
Brandeis University Internet/Intranet Spring 2000 23
Cookies
Cookies Are Name Value Pairs Stored by the ClientPassed in the HTTP HeaderCookies Have Associated Expiration
Session (Default)Date / Time
Associated With a URL Path, Not a Page!Allows Passing Parameters Between Web Pages
Thus Cookies are Used to Provide State Information to a Stateless Protocol
Brandeis University Internet/Intranet Spring 2000 24
Web Server HTTP Functionality
Content NegotiationChoose From Several Different Formats Based on Request
Language NegotiationChoose From Versions of Same Document Based on Request
Support for HTTP-Put, HTTP-DeleteKeep-AliveAs-Is
Server Doesn’t Add HTTP HeadersAllows You to Create Specific Behavior
Redirect to Another SiteNever Saved in Browser’s Cache
Brandeis University Internet/Intranet Spring 2000 25
Class Exercise: HTTP
http://www.mkat.com/brandeis/httplist.cfm
Viewhttp.exe
Brandeis University Internet/Intranet Spring 2000 26
Server Log Files
Records Server Activity
Brandeis University Internet/Intranet Spring 2000 27
Some DefinitionsHits
Each HTTP Request is a HitAccessing a Web Page May Result in Multiple Hits
E.g. Each Graphic is a Hit
Page ViewsAccessing a Single Web Page is a Page View
E.g. Typing in a URL or Clicking on a Link
VisitsA Single Client’s Visit to Your Entire Site (Session)
May Include Multiple Page ViewsWhat Constitutes a Second Visit From the Same Client?
Why is This Important?Terms are Sometimes Used Interchangeably and Improperly
Compare Apples to ApplesImportant for Commercial Web Sites
Advertising is Based on Site AccessTypically Sold on Page View Basis
Brandeis University Internet/Intranet Spring 2000 28
Server Log Files
Many Variations to Web Server Log File Formats
Four Log FilesAccess (Transfer) Log
Each Hit is RecordedUser, Date/Time, HTTP Request, etc.
Error LogDate/Time, Error
Referrer LogReferring Page, Destination Page
Agent (User) LogClient’s Browser
Clearly a Need for StandardizationLinking the Four Log Files Together
Brandeis University Internet/Intranet Spring 2000 29
Common Log FormatHost
IP Address (or Hostname) of ClientSome Servers Perform Lookup of IP Address
RFC931HTTP Request: From
Seldom Used.
AuthuserHTTP Request: Authorization
UserName if Username Authorization is Required
Time StampHTTP Response: Date
E.g. [ 10/Jun/1998:14:23:34 -0700]
RequestThe Actual HTTP RequestE.g. GET /index.htm HTTP/1.1
Brandeis University Internet/Intranet Spring 2000 30
Common Log Format (2)
StatusThe HTTP Response Status Code
Transfer VolumeHTTP Response: Content-Length
Brandeis University Internet/Intranet Spring 2000 31
Extended Log File Format
Seven Common Log Format Fields PlusReferrer
HTTP Request: Referrer
User AgentHTTP Request: User-Agent
Identifies Browser
Other Common FieldsCookies
Can Help Identify Users
Brandeis University Internet/Intranet Spring 2000 32
IssuesClient vs. User
Typically Don’t Have User Level InformationOnly Record IP Address of Computer Used For Access
If Fixed IP Address For a Single User’s MachineThis Can Identify the User
Dynamically Assigned IP AddressesIdentifies the Overall Domain (e.g. AOL.com)
Proxy ServersAll Client’s Have IP Address of Proxy Server
Multiple “Sessions” at Same Time
Impossible to Have Truly Accurate InformationLog File Analysis Software Has Algorithms to Identify Page Views, Visits
Client Level Caching Affects Logs“ISP” Level Caching Affects Logs
E.g. AOL Maintains a CacheNo Requirement for Clients, ISPs to Follow Expiration Info
Brandeis University Internet/Intranet Spring 2000 33
Log File Maintenance on Server
Log Files Grow RapidlyLog Files Compress Very NicelyServer Configurable
Generate Daily/Weekly/Monthly Logs
Maintenance Scripts to Cleanup Log FilesCompressArchiveCycle
E.g. Maintain Current Months Files
Brandeis University Internet/Intranet Spring 2000 34
Log File AnalysisBig Business
Bread and Butter of Sites Driven By Advertising RevenueEvaluation Factors
Log File Formats SupportedAbility to Link Multiple Logs
How Log Files are Accessed (e.g. via FTP)Display Methodology
E.g. Available Via Web Pages
Lookup CapabilitiesE.g. Map User-Agent to BrowserE.g. Resolve IP Addresses to Domains, Regions
Level of AnalysisE.g. Calculating Visits, Return VisitorsConfigurability
Drill-Down CapabilitiesEnterprise Capabilities
Ability to Manage Multiple Sites
Brandeis University Internet/Intranet Spring 2000 35
Log File Analysis Options
Important to Understand the Core Log FilesLog File Analysis Programs Make Some Assumptions
FreewareCommercialService Bureaus
Brandeis University Internet/Intranet Spring 2000 36
In Class Exercise / Mini Homework
Download http://www.mkat.com/brandeis/sample.log
View in Text EditorLoad Into Excel
Delimited / Spaces
Review the Log File in Detail (Do Not Use Analysis Tools)
Describe What You Can Learn From the Log File
Add it To Your Homepage along With In Class ExercisesDue Next Week
Brandeis University Internet/Intranet Spring 2000 37
Resources
HTTPStein pp. 47-57
Server Comparison http://webcompare.internet.com/chart.htm
Apache Serverwww.apache.org
Website Serverhttp://website.ora.com
Microsoft IIS http://www.microsoft.com/NTWorkstation/downloads/Recommended/ServicePacks/NT4OptPk/Default.asp
top related