introduction to computing and programming in python: a multimedia approach

33
Chapter 11: Advanced Text Techniques: Web and Information 1

Upload: catalina-amora

Post on 02-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Introduction to Computing and Programming in Python: A Multimedia Approach. Chapter 11: Advanced Text Techniques: Web and Information. Chapter Objectives. Networks: Two or more computers communicating. Networks are formed when distinct computers communicate via some mechanism. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Computing and Programming in Python:  A Multimedia Approach

Chapter 11: Advanced Text Techniques: Web and

Information

1

Page 2: Introduction to Computing and Programming in Python:  A Multimedia Approach

2

Page 3: Introduction to Computing and Programming in Python:  A Multimedia Approach

Networks: Two or more computers communicatingNetworks are formed when distinct computers

communicate via some mechanism.Rarely does the communication take the place of

0/1 voltages over a wire. Too hard to make work over distances

More common is the use of frequencies (maybe in the sound range, but maybe not).

For example, a modem (modulator-demodulator) takes your computer’s 0’s and 1’s and translates them into sound frequencies that can pass over the sound wire and be decoded on the other side.

3

Page 4: Introduction to Computing and Programming in Python:  A Multimedia Approach

Networks, networks everywhereIf you’re driving a newer car, you probably

have a network in there.There are lots of computers in your car

(controlling air flow, gas flow; making the air bag work) and they communicate.

You can have a network in your own home, or even on an airplane.Can use radio signals for communication

(wireless)Or can string a cable between two computers.

4

Page 5: Introduction to Computing and Programming in Python:  A Multimedia Approach

Networks have layersNetworks have several layers to them.

At the bottom level is the physical substrate. What are the signals being passed on?

Levels higher determine how data is encoded. Do we use sound frequencies to represent 0’s and 1’s, or radio

waves? Do we send a bit at a time? A byte at a time? Or in packets larger

than that?Levels even higher determine the protocol of

communication. How do I address a particular computer I want to talk to? Or many

computers? How do I tell a computer that I want to talk to it? That I’m starting

to send it data? What it’s supposed to do with it? When are we done?

5

Page 6: Introduction to Computing and Programming in Python:  A Multimedia Approach

Ethernet: A common mid-level protocolEthernet is a common mid-level protocol.It specifies some aspects of how data is

encoded and computers are specified.For example, each computer on an Ethernet

network has a deep-down inside-the-computer address that identifies it uniquely.

But Ethernet can work over a variety of physical substrates.For example, you can run Ethernet over

wireless (radio) or over coaxial cable (where you hear terms like “10baseT”

6

Page 7: Introduction to Computing and Programming in Python:  A Multimedia Approach

Internet: A collection of networksThe Internet is a network of networks.If you put a device in your home so that your

computers can talk to one another, you have a network.A wireless base station, or an Ethernet router,

perhaps.You can probably reach printers on your

network, or copy files between computers.If you now connect your network (through an

Internet Service Provider (ISP)) to the global Internet, your network becomes yet another part of the whole Internet.

7

Page 8: Introduction to Computing and Programming in Python:  A Multimedia Approach

Internet is based on agreements on encodingsThe Internet is built on a set of agreements

about: How computers will be addressed

A set of four numbers (each one byte now, soon to grow) separated by periods, e.g., 210.51.40.155.

A way of associating domain names with these numbers, like www.cnn.com (which really is a name that resolves to a set of four numbers), using domain name servers.

How computers will communicate That data will be put into packets with various pieces in

them. That computers will format their data and talk to one

another using TCP/IP (Transmission Control Protocol/Internet Protocol)

How packets are routed around the network to find their destination.

8

Page 9: Introduction to Computing and Programming in Python:  A Multimedia Approach

The Internet is not newThe Internet agreements date back more

than 40 years.It was originally set up for military

applications.One of the features of the Internet is that

packets find their destination even if part of the Internet is destroyed or damaged.

The Internet originally had only a handful of computers (nodes) on it, but it has grown dramatically in recent years.

9

Page 10: Introduction to Computing and Programming in Python:  A Multimedia Approach

Protocols on the Internet But all that just lets us pass data back and

forth. What does the data say? What does the data do?

One of the first applications placed on the Internet was electronic mail. The mail protocols (SMTP, POP, IMAP) have evolved over time

to their standard forms today.

The File Transfer Protocol (FTP) allows computers to copy files between each other. It defines what one side says to the other when copying a file

over (e.g., “STO filename”) and how the file will be encoded.

10

Page 11: Introduction to Computing and Programming in Python:  A Multimedia Approach

Then there’s the WebThe Web dates only back to the 1980’s, but

before there were graphical browsers (like Netscape Navigator, Internet Explorer, and the first, NCSA Mosaic).

The Web is (again) a set of agreements, started by Tim Berners-LeeOn how to refer to everything on the Internet: The

URL (Uniform Resource Locator)On how to create documents that refer to things all

over the Internet: HTTP (HyperText Transfer Protocol)

On how those documents will be formatted: Using HTML (HyperText Markup Language)

11

Page 12: Introduction to Computing and Programming in Python:  A Multimedia Approach

HyperText: Non-linear textHypertext is a term invented by Ted Nelson

in the 1960’s.It refers to text that is non-linear, which the

computer makes possible.You’re familiar with this on the Web:

Read a little on a page, Click, Continue reading on some other page anywhere on

the Internet.

12

Page 13: Introduction to Computing and Programming in Python:  A Multimedia Approach

The point of the Web is HypertextTim Berners-Lee wanted a way to create

readable documents that could reference material anywhere on the Internet in a hypertext format.

There are technical flaws in what he did:For example, the phenomena of “dead links”

couldn’t happen in other hypertext systems before the Web.

But it worked and has become a worldwide standard.

13

Page 14: Introduction to Computing and Programming in Python:  A Multimedia Approach

HyperText Transfer Protocol (HTTP)HTTP defines a very simple protocol for how to

exchange information between computers.It defines the pieces of the communication.

What resource do you want?Where is it?Okay, here’s the type of thing it is (JPEG, HTML,

whatever), and here it is.And the words that the computers say to one

another:Not-complex words … like “GET”, “PUT” and

“OK”14

Page 15: Introduction to Computing and Programming in Python:  A Multimedia Approach

Uniform Resource Locators (URL)URLs allow us to reference any material

anywhere on the Internet.Strictly speaking, any computer providing a protocol is

accessible via a URL.Just putting your computer on the Internet does not

mean that all of your files are accessible to everyone on the Internet.

URLs have four parts:The protocol to use to reach this resource,The domain name of the computer where the resource

is,The path on the computer to the resource,And the name of the resource.

15

Page 16: Introduction to Computing and Programming in Python:  A Multimedia Approach

http://www.cc.gatech.edu/index.html

ftp://cleon.cc.gatech.edu/pub/guzdial/papers/sigcse2003.pdf

ProtocolProtocol

Domain nameDomain name

PathPath

FilenameFilename

16

Page 17: Introduction to Computing and Programming in Python:  A Multimedia Approach

What if there is no path?Web servers (programs that understand the

HTTP protocol) typically have a special directory that they serve from.Files in that special directory are directly

referable without specifying a path.Sub-directories within the server directory

can be accessed in terms of a path.But always starting from the server directory,

so not everything on your computer is always accessible.

17

Page 18: Introduction to Computing and Programming in Python:  A Multimedia Approach

A browser is a clientYour Web browser is called a client accessing

a Web server.Programs like Internet Explorer or Firefox or

Safari understand a lot about Internet protocols.They know how to interpret HTML and display it

graphically.If the HTML references other resources, like JPEG

pictures, the client fetches them and displays them where appropriate.

Your client knows the details of the HTTP and other protocols so that it can request the resources you request.

18

Page 19: Introduction to Computing and Programming in Python:  A Multimedia Approach

You don’t need a browser to use the InternetYour mail program also understands some

Internet protocols.JES even knows a little about one of the mail

protocols, SMTP (Simple Mail Transfer Protocol), so that it can email homework to your instructor (if it’s set up).

Python (and other languages) have modules that allow you to use these protocols.In Python, we can read any URL as if it was a

file.

19

Page 20: Introduction to Computing and Programming in Python:  A Multimedia Approach

Opening a URL and reading it>>> import urllib>>> connection =

urllib.urlopen("http://www.ajc.com/weather")>>> weather = connection.read()>>> connection.close()

20

Page 21: Introduction to Computing and Programming in Python:  A Multimedia Approach

Getting the temperature livedef findTemperatureLive(): # Get the weather page import urllib #Could go above, too

connection=urllib.urlopen("http://www.ajc.com/weather")

weather = connection.read() connection.close() #weatherFile = getMediaPath("ajc-

weather.html") #file = open(weatherFile,"rt") #weather = file.read() #file.close()

# Find the Temperature curloc = weather.find("Currently") if curloc <> -1: # Now, find the "<b>&deg;"

following the temp temploc =

weather.find("<b>&deg;",curloc) tempstart =

weather.rfind(">",0,temploc) print "Current

temperature:",weather[tempstart+1:temploc]

if curloc == -1: print "They must have changed the

page format -- can't find the temp"

22

Page 22: Introduction to Computing and Programming in Python:  A Multimedia Approach

Running it>>> findTemperatureLive()Current temperature: 57

23

Page 23: Introduction to Computing and Programming in Python:  A Multimedia Approach

The Interactive WebThe first use of HTTP was just to send around

static pages and images (and sounds and…)Later extensions allowed for users providing

input to the server (such as for doing searches).Originally, this was just “CGI” (Common

Gateway Interface) scripts.Later, servlets and applets and PHP and…

26

Page 24: Introduction to Computing and Programming in Python:  A Multimedia Approach

Interactive Web requires programs to generate HTMLTypically, a Web server will have some

directory specified “special.”Files referenced there aren’t just returned to the client.Instead, the files are executed and the result is returned

to the input.There’s even a mechanism where the client can provide

input to the executed files, e.g., a search string.Those special files would generate HTML.

The generated HTML might be based on up-the-minute information like stock quotes and temperature sensors and database queries.

Thus, to have an interactive Web, we need to write programs that write HTML.

27

Page 25: Introduction to Computing and Programming in Python:  A Multimedia Approach

Using text to map between any mediaWe can map anything to text.We can map text back to anything.This allows us to do all kinds of

transformations:Sounds into Excel, and back againSounds into pictures.Pictures and sounds into lists (formatted text),

and back again.

28

Page 26: Introduction to Computing and Programming in Python:  A Multimedia Approach

Why care about media transformations?Transformed digital media can be more easily

transmittedFor example, transfer of binary files over email is

often accomplished by converting to text.We can encode additional information to check

for and even correct errors in transmission.It may allow us to use the media in new

contexts, like storing it in databases.Some transformations of media are made

easier when the media are in new formats.

29

Page 27: Introduction to Computing and Programming in Python:  A Multimedia Approach

Any visualization of any kind is merely an encodingA line chart? A pie chart? A scatterplot?

These are just lines and pixels set to correspond to some mapping of the data

Sometimes data is lostRecall the mapping of grayscale

Sometimes data is not lost, even if it looks like a dramatic change.Recall creating a negative of an image, then

taking the negative of a negative to get back to the original.

45

Page 28: Introduction to Computing and Programming in Python:  A Multimedia Approach

Lists can do anything!

def soundToList(sound): list = [] for s in getSamples(sound): list = list + [getSample(s)] return list

Going from sound to lists is easy:

46

Page 29: Introduction to Computing and Programming in Python:  A Multimedia Approach

This really does work>>> list = soundToList(sound)>>> print list[0]6757>>> print list[1]6852>>> print list[0:100][6757, 6852, 6678, 6371, 6084, 5879, 6066, 6600, 7104, 7588,

7643, 7710, 7737, 7214, 7435, 7827, 7749, 6888, 5052, 2793, 406, -346, 80, 1356, 2347, 1609, 266, -1933, -3518, -4233, -5023, -5744, -7394, -9255, -10421, -10605, -9692, -8786, -8198, -8133, -8679, -9092, -9278, -9291, -9502, -9680, -9348, -8394, -6552, -4137, -1878, -101, 866, 1540, 2459, 3340, 4343, 4821, 4676, 4211, 3731, 4359, 5653, 7176, 8411, 8569, 8131, 7167, 6150, 5204, 3951, 2482, 818, -394, -901, -784, -541, -764, -1342, -2491, -3569, -4255, -4971, -5892, -7306, -8691, -9534, -9429, -8289, -6811, -5386, -4454, -4079, -3841, -3603, -3353, -3296, -3323, -3099, -2360]

47

Page 30: Introduction to Computing and Programming in Python:  A Multimedia Approach

Can we go from pictures into lists?Of course! We just have to decide on a

representation.We’ll put a list as an element for each pixel.The numbers in the pixel-list will represent

The X and Y positions The Red, Green, and Blue component values.

48

Page 31: Introduction to Computing and Programming in Python:  A Multimedia Approach

Pictures to Listsdef pictureToList(picture): list = [] for p in getPixels(picture): list = list +

[[getX(p),getY(p),getRed(p),getGreen(p),getBlue(p)]] return list

Why the double brackets? Because we’re putting a sub-list in the list, not just adding a component as we were with sound.

49

Page 32: Introduction to Computing and Programming in Python:  A Multimedia Approach

Running pictureToList>>> picture = makePicture(pickAFile())>>> piclist = pictureToList(picture)>>> print piclist[0:5][[1, 1, 168, 131, 105], [1, 2, 168, 131, 105], [1,

3, 169, 132, 106], [1, 4, 169, 132, 106], [1, 5, 170, 133, 107]]

50

Page 33: Introduction to Computing and Programming in Python:  A Multimedia Approach

Can we go back again? Sure!def listToPicture(list): picture = makePicture(getMediaPath("640x480.jpg")) for p in list: if p[0] <= getWidth(picture) and p[1] <= getHeight(picture): setColor(getPixel(picture,p[0],p[1]),makeColor(p[2],p[3],p[4])) return picture

We need to make sure that the X and Y fits within our canvas, but other than that, it’s pretty simple code.

51