http, naming and lookup zachary g. ives university of pennsylvania cis 455 / 555 – internet and...

28
HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems June 23, 2022

Upload: kristopher-carpenter

Post on 11-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

HTTP, Naming and Lookup

Zachary G. IvesUniversity of Pennsylvania

CIS 455 / 555 – Internet and Web Systems

April 21, 2023

Page 2: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

2

Readings and Reminders

Readings on DNS (Wikipedia) and LDAP (Marshall’s overview) – see course schedule for links

Homework 1 Milestone 1 due Feb. 3rd

Page 3: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

3

HTTP: HyperText Transfer Protocol

A very simple, stateless protocol for sessionless exchanges Browser creates a new connection each time it

wants to make a new request (for a page, image, etc.)

What are the benefits of this model? Drawbacks?

Exceptions: HTTP 1.1 added support for persistent

connections and pipelining Clients + servers might keep state information Cookies provide a way of recording state

Page 4: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

4

HTTP Overview

Requests: A small number of request types (GET, POST,

PUT, DELETE) Request may contain additional information,

e.g. client info, parameters for forms, etc.

Responses: Response codes: 200 (OK), 404 (not found),

etc. Metadata: content’s MIME type, length, etc. The “payload” or data

Page 5: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

5

A Simple HTTP Request

GET /~cis455/index.html HTTP/1.1If-Modified-Since: Sun, 7 Jan 2007 11:12:23 GMTReferer: http://www.cis.upenn.edu/index.html

Requests data at a path using HTTP 1.1 protocol

Example response:HTTP/1.1 200 OKDate: Sun, 7 Jan 2007 11:12:26 GMTLast-Modified: Wed, 14 Jan 2004 8:30:00 GMTContent-Type: text/htmlContent-Length: 3931

Page 6: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

6

Request Types

GETRetrieve the resource at a URL

POSTSubmit form content

PUTPublish the specified data at a URL

DELETE(Self-explanatory)

Page 7: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

7

Forms: Returning Data to the Server

HTML forms allow assignments of values to variables

Two means of submitting forms to apps: GET-style – within the URL:

GET /home/my.cgi?param=val&param2=val2

POST-style – as the data:POST /home/second.cgi

Content-Length: 34

searchKey Pennwhere www.google.com

Page 8: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

8

Authentication and Authorization

Authentication At minimum, user ID and password – authenticates

requestor Client may wish to authenticate the server, too!

SSL (we’ll discuss this more later) Part of SSL: certificate from trusted server, validating

machine Also: public key for encrypting client’s transmissions

Authorization Determine what user can access For files, applications: typically, access control list If data from database, may also have view-based

security

Page 9: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

9

Programming Support in Web Servers

CGI – Common Gateway Interface – the oldest: A CGI is a separate program, often in Perl, invoked by the

server Certain info is passed from server to CGI via Unix-style

environment variables QUERY_STRING; REMOTE_HOST, CONTENT_TYPE, … HTTP post data is read from stdin

Interface to persistent process: In essence, how communication with a database is done –

Oracle or MySQL is running “on the side” Communicate via pipes, APIs like ODBC/JDBC, etc.

Server module running in the same process Might be custom code (e.g., Apache extension) or an

interpreter/runtime system…

Page 10: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

10

Server Modules

Interpreters: JavaScript/JScript, PHP, ASP, … Often a full-fledged programming language Code is generally embedded within HTML, not stand-alone

Custom runtimes/virtual machines: Most modern Perl runtimes; Java servlets; ASP.NET A virtual machine runs within the web server process Functions are invoked within that JVM to handle each

request Code is generally written as usual, but may need to use

HTML to create UI rather than standard GUI APIs Most of these provide (at least limited) protection

mechanisms

Page 11: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

11

Servlets

An interesting model for programming applications in Java A servlet is a subclass of HttpServlet

It overrides methods doGet() or doPost() It’s given a number of objects: HttpServletRequest (includes

info about parameters, browser, etc.), HttpServletResponse (a means for sending info back to the browser, including data, forwarding requests, etc.)

There’s a notion of a session that can be used to share state across doGet()/doPost() invocations – it’s generally connected with a cookie

Those of you who took CSE 330/CIS 550 should be generally familiar with servlets Those who didn’t should be able to catch up by looking at, e.g.,

http://www.apl.jhu.edu/~hall/java/Servlet-Tutorial/ http://www.novocode.com/doc/servlet-essentials/

Your homework assignment will be to build a simple servlet engine a la Tomcat

Page 12: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

12

(Cross-)Session State: Cookies

Major problem with sessionless nature of HTTP: how do we keep info between connections? Cookie: an opaque string associated with a web

site, stored at the browser Create in HTTP response with “Set-Cookie: xxx” Passed in HTTP header as “Cookie: xxx”

Interpretation is up to the application Usually, object-value pairs; passed in HTTP header:

Cookie: user=“Joe” pwd=“blob” …

Often have an expiration Very common: “session cookies”

Page 13: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

13

How Do We Find Things on the Internet?

Generally, using one of three means: Addresses or locations: specify where something is,

assuming that we understand how to navigate Just like a physical address, we may still need a map! In the Internet, addresses are typically IP addresses – the

routers know the map Names: are mapped into addresses via lookup services

Best-known example on the Internet: DNS name Cell phone numbers, email addresses, etc. are becoming names

Content-based addressing/naming The actual data value is used to look up its location The basis of certain kinds of indices, publish-subscribe systems,

and peer-to-peer architectures

Page 14: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

14

Pushing the Search to the Network:Flooding Requests – Gnutella

Node A wants a data item; it asks B and C If B and C don’t have it, they ask their

neighbors, etc. What are the implications of this model?

AC B

D

EF

G

I

H

Page 15: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

15

The Most Efficient Way of Going fromNames or Content Locations

Directory-based lookup protocols are very common

Examples: Napster 1.0 – peer-to-peer storage with central

directory DNS – distributed hierarchical directory LDAP – hierarchical directory information tree

Inverted index – used to look up keywords in information retrieval

Page 16: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

16

Napster 1.0, ca 2002

Hybrid of peer-to-peer storage with central directory showing what’s currently available What are the trade-offs implicit in this model? Why did it

fail?

Napster.com

Peer1

Peer2

Peer3

los-del-rios-macarena.mp3

bspears-oops.mp3

los-del-rios-macarena.mp3

los-del-rios-macarenabspears-oops

Directory

Page 17: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

Other Services with Similar Directory + Peer Architectures

Windows Live Sync Google Desktop Search with multiple

machines

BitTorrent trackers are quite similar (we’ll discuss BitTorrent more later)

17

Page 18: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

18

Naming People and Devices: LDAP

Lightweight Directory Access Protocol Hierarchical naming system that can be

partitioned and replicated

Seehttp://www.seas.upenn.edu/cets/answers/ldap.htmlto set up your email client to access Penn’s

LDAP server

Page 19: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

19

LDAP’s Schema

LDAP information has a schema with different levels of containers: A unique name in LDAP is called a Distinguished Name,

“dn” and consists of a sequence of attributes representing a hierarchy, from most-specific to least-specific (as in DNS names):

o = organization; dc = domain component ou = organizational unit uid = user ID cn = common name

c = country; st = state; l = locality

Can also have objectClass – the type of entity

Page 20: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

20

LDAP Hierarchy

Brad Marshall LDAP Tutorial, quark.humbug.au/publications/ldap_tut.html

Page 21: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

21

Querying LDAP

LDAP queries are mostly attribute-value predicates: uid=zives; o=upenn; c = usa

(|(cn=Susan Davidson)(cn=Boon Thau Loo)(cn=Val Tannen))

objectclass=posixAccount

(!cn=Val Tannen)

How might we process these queries?

Page 22: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

22

The Backbone of Internet Naming:Domain Name Service

A simple, hierarchical name system with a distributed database – each domain controls its own names

edu

columbia upenn berkeley

com

www cis sas

www wwwwww

amazon

www

……

……

…… …

Top LevelDomains

Page 23: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

23

Top-Level Domains (TLDs)

Mostly controlled by Network Solutions, Inc. today .com: commercial .edu: educational institution .gov: US government .mil: US military .net: networks and ISPs (now also a number of other

things) .org: other organizations 244, 2-letter country suffixes, e.g., .us, .uk, .cz, .tv, … some variants on this for other institutions, e.g., .eu and a bunch of new suffixes that are not very

common, e.g., .biz, .mobi, .name, .pro, …

Page 24: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

24

Finding the Root

13 “root servers” store entries for all top level domains (TLDs)

DNS servers have a hard-coded mapping to root servers so they can “get started”

Page 25: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

25

Excerpt from DNS Root Server Entries

This file is made available by InterNIC registration services under anonymous FTP as ; file /domain/named.root ; ; formerly NS.INTERNIC.NET ; . 3600000 IN NS A.ROOT-

SERVERS.NET. A.ROOT-SERVERS.NET. 3600000 A 98.41.0.4 ; ; formerly NS1.ISI.EDU ; . 3600000 NS B.ROOT-

SERVERS.NET.B.ROOT-SERVERS.NET. 3600000 A 128.9.0.107 ; ; formerly C.PSI.NET ; . 3600000 NS C.ROOT-

SERVERS.NET.C.ROOT-SERVERS.NET. 3600000 A 192.33.4.12

(13 servers in total, A through M)

Page 26: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

26

Supposing We Were to Build DNS

How would we start? How is a lookup performed?

(Hint: what do you need to specify when you add a client to a network that doesn’t do DHCP?)

Page 27: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

27

Issues in DNS

We know that everyone wants to be “my-domain”.com How does this mesh with the assumptions

inherent in our hierarchical naming system?

What happens if things move frequently? What happens if we want to provide

different behavior to different requestors (e.g., Akamai)?

Page 28: HTTP, Naming and Lookup Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems September 15, 2015

28

Directories Summarized

An efficient way of finding data, assuming: Data doesn’t change too often, hence it can be

replicated and distributed Hierarchy is relatively “wide and flat” Caching is present, helping with repeated queries

Directories generally rely on names at their core

Sometimes we want to search based on other means, e.g., predicates or filters over content…