lis508 lecture 2 thomas krichel 2003-10-07. today's lecture recap on what we did last week....

20
LIS508 lecture 2 Thomas Krichel 2003-10-07

Upload: jesse-rogers

Post on 27-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

LIS508 lecture 2

Thomas Krichel

2003-10-07

Page 2: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

today's lecture

• Recap on what we did last week.

• Encoding mark-up

• Databases

Page 3: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Recap• Computers deal with on/off signals called

bits.• Collections of these bits are binary

numbers.• Texts are (basically) strings of characters.

To represent text, we need to represent characters.

• To make a characters understandable to a computer we associate a number with each character. The result is a character set.

Page 4: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Beyond characters

• There is more to text than a string of characters.

• There is layout– titles– abstracts– mathematical formula spacing

Page 5: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Layout

• Layout can be conveyed by additional text that has special meaning. Examples – LaTeX– HTML– PostScript

• Another way is to do non-textual layout by adding some other digital signals. Examples– DVI– MS Word– MS Powerpoint

These can not be shown in these slides!

Page 6: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Example: LaTeX

\bigskip\textbf{Class structure}

Classes will be held in the computer lab in the Palmer School between 18:15 and 20:45. An optional practice session will last until 21:15.

\begin{tabular}{@{}llll@{}}0&2003--09--23&introduction to the course &\\1&2002--09--30&bits bytes and characters &\\2&2003--10--07&databases and markup

languages&\\

Page 7: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Example: HTML

<p><strong>Class structure</strong><p>Classes will be held in the computer lab in the Palmer School between 18:15 and 20:45. An optional practice session will last until 21:15.<p>Class details:

<p><center><table width=100% border=1><tr><td align=left> 0 </td><td align=left>

2003&#8211;09&#8211;23 </td><td align=left><a href="lis508w03a-00.ppt">introduction to the course</a> </td></tr><tr><td align=left> 1 </td><td align=left> 2002&#8211;09&#8211;30 </td><td align=left><a href="lis508w03a-01.ppt">bits bytes and characters</a> </td>

Page 8: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Example: PostScript

Fc(Class)g(structur)o(e)-104 3956 y Fd(Classes)26b(will)g(be)e(held)g(in)h(the)f(computer)f(lab)i(in)f(the)h(P)o(almer)f(School)g(between)f(18:15)h(and)g(20:45.)36 b(An)25 b(optional)e(practice)h(session)-104 4055 y(will)d(last)g(until)f(21:15.)-104 4155 y(Class)i(details:)-104 4307 y(0)141 b(2003\22609\22623)94b(introduction)18 b(to)i(the)h(course)-104 4407 y(1)141 b(2002\22609\22630)94 b(bits)21 b(bytes)f(and)g(characters)-104 4507 y(2)141 b(2003\22610\22607)94 b(databases)20 b(and)g(markup)e(languages)-

Page 9: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

DVI (rendition, "class structure")1659: fntnum27 current font is ptmb8t1660: setchar67 h:=-820459+473168=-347291, hh:=-221661: setchar108 h:=-347291+182183=-165108, hh:=-101662: setchar97 h:=-165108+327680=162572, hh:=111663: setchar115 h:=162572+254928=417500, hh:=271664: setchar115 h:=417500+254928=672428, hh:=431665: right3 163840 h:=672428+163840=836268, hh:=531669: setchar115 h:=836268+254928=1091196, hh:=691670: setchar116 h:=1091196+218232=1309428, hh:=831671: setchar114 h:=1309428+290976=1600404, hh:=1011672: setchar117 h:=1600404+364376=1964780, hh:=1241673: setchar99 h:=1964780+290976=2255756, hh:=1421674: setchar116 h:=2255756+218232=2473988, hh:=1561675: setchar117 h:=2473988+364376=2838364, hh:=1791676: setchar114 h:=2838364+290976=3129340, hh:=1971677: right2 -11792 h:=3129340-11792=3117548, hh:=1961680: setchar101 h:=3117548+290976=3408524, hh:=214

Page 10: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Databases

• Databases are collection of data with some organization to them.

• The classic example is the relational database.

• But not all database need to be relational databases.

Page 11: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Relational databases

• A relational database is a set of tables. There may be relations between the tables.

• Each table has a number of record. Each record has a number of fields.

• When the database is being set up, we fix – the size of each field – relationships between tables

Page 12: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Example: Movie database

ID | title | director | date

M1 | Gone with the wind | F. Ford Coppola | 1963

M2 | Room with a view | Coppola, F Ford | 1985

M3 | High Noon | Woody Allan | 1974

M4 | Star Wars | Steve Spielberg | 1993

M5 | Alien | Allen, Woody | 1987

M6 | Blowing in the Wind | Spielberg, Steven | 1962

• Single table• No relations between tables, of course

Page 13: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Problem with this database

• All data wrong, but this is just for illustration.

• Name covered inconsistently. There is no way to find films by Woody Allan without having to go through all spelling variations.

• Mistakes are difficult to correct. We have to wade through all records, a masochist’s pleasure.

Page 14: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Better movie database

ID | title | director | year

M1 | Gone with the wind | D1 | 1963

M2 | Room with a view | D1 | 1985

M3 | High Noon | D2 | 1974

M4 | Star Wars | D3 | 1993

M5 | Alien | D2 | 1987

M6 | Blowing in the Wind | D3 | 1962

ID | director name | birth year

D1 | Ford Coppola, Francis | 1942

D2 | Allan, Woody | 1957

D3 | Spielberg, Steven | 1942

Page 15: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Relational database

• We have a one to many relationship between directors and film– Each film has one director– Each director has produced many films

• Here it becomes possible for the computer– To know which films have been directed by

Woody Allen– To find which films have been directed by a

director born in 1942

Page 16: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Many-to-many relationships

• Each film has one director, but many actors star in it. Relationship between actors and films is a many to many relationship.

• Here are a few actorsID | sex | actor name | birth year

A1 | f | Brigitte Bardot | 1972

A2 | m | George Clooney | 1927

A3 | f | Marilyn Monroe | 1934

Page 17: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

Actor/Movie table

actor id | movie id

A1 | M4

A2 | M3

A3 | M2

A1 | M5

A1 | M3

A2 | M6

A3 | M4

… as many lines as required

Page 18: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

SQL

• Once we have the relational database, we can ask sophisticated questions:– Which director has had the most female

actors working for him?– In which years films have been shot that

starred actors born between 1926 and 1935?

• Such questions can be encoded in a language know as “structured query language” or SQL. All relational database vendors implement a dialect of SQL.

Page 19: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

databases in libraries

• Relational databases dominate the world of structured data

• But not so popular in libraries– Slow on very large databases (such as

catalogs)– Library data has nasty ad-hoc relationships, e.g.

• Translation of the first edition of a book• CD supplement that comes with the print version

Difficult to deal with in a system where all relations and field have to be set up at the start, can not be changed easily later.

Page 20: LIS508 lecture 2 Thomas Krichel 2003-10-07. today's lecture Recap on what we did last week. Encoding mark-up Databases

http://openlib.org/home/krichel

Thank you for your attention!