content - part 2 week 4. tonight more detailed look at metadata description of content no access to...

81
Content - part 2 Week 4

Post on 19-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Content - part 2

Week 4

Page 2: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Tonight

• More detailed look at metadata description of content

• No access to a network today, so not all the updating I would like to do… Sorry.

Page 3: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Google Books Project

• Michael A. Keller, Closing Keynote– Ida M. Green University Librarian at Stanford, – Director of Academic Information Resources, – Publisher of HighWire Press, and – Publisher of the Stanford University Press:

• "One good turn deserves another; how the Google Book Search project is benefiting everyone".

Page 4: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Google Books demo

• Full text - Life of Miguel de Cervantes

• Limited Preview - The Life of Miguel de Cervantes Saavedra

• Snippet View - "Discreción" in the Works of Cervantes: A Semantic Study

Page 5: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

What has been accomplished• As of September 2006• Nearly 30,000 Stanford books digitized

– ~1M books from all partner libraries

• Over 4,000 books identified as needing preservation treatment (& so not digitized)

• A great debate about copyright has started– Orphan works– What can an archive do to provide access– Defense of fair use underway

• Today’s news: www.pcworld.com/article/172315/google_books_wont_hit_digital_shelves_anytime_soon.html

This slide is taken from the presentation by Michael A. Keller at ECDL 2006

Page 6: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Original Principles

• If legally possible, digitize every book (9M volumes) in the Stanford libraries– Now digitizing with imprint dates up to 1963

• Partner libraries (*added recently)– University of Michigan (similar to Stanford)– Harvard (public domain (?), maybe > 1M)– NYPL (public domain, unusual collections)– Oxford - Bodleian (earlier than 1885, ~ 1M titles)– University of California (similar to Stanford >6M)– (more to follow)

This slide is taken from the presentation by Michael A. Keller at ECDL 2006

Page 7: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Purposes

• Digital preservation– Virtual Bookshelves in Stanford Digital Repository under

construction as part of the Stanford Digital Repository– For Stanford use only

• Other searching and research functions– Subtle searching (as in Socrates & HighWire)– Taxonomic (LCSH & HighWire) & Associative Searching

(Takano)– Citation linking (HighWire & “InforTools” (Ebrary)– Better navigation (through visualization ?) (Grokker)

• Digitized books from all sources as test bed for new research; combine with articles, datasets, etc. for data mining & other transformative uses.

This slide is taken from the presentation by Michael A. Keller at ECDL 2006

Page 8: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Some Conclusions• Google Book Search

– Is an indexing, not a publishing project– Offers substantial increases in access to contents of books in library

collections by keyword searching– Offers publishers global marketing of their publications– Offers several useful services to readers

• Offers participating libraries– Digital copies of books on their shelves for preservation– New possibilities for services to local readers– New possibilities for research for local faculty & students

• Note – recent settlement between Google and publishers. -- anyone hear about that?

This slide is taken from the presentation by Michael A. Keller at ECDL 2006

Page 9: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Google Books of 2007

• In May, the Cantonal and University Library of Lausanne, and Ghent University Library join the Book Search program, adding a substantial amount of books in French, German, Flemish, Latin and other languages, and bringing the total number of European libraries partners to six.

• In July, we add a "View plain text" link to all out-of-copyright books. T.V. Raman explains how this opens the book to adaptive technologies such as screen readers and Braille display, allowing visually impaired users to read these books just as easily as users with sight.

• By December, the Book Search interface is available in over 35 languages, from Japanese to Czech to Finnish. Over 10,000 publishers and authors from 100+ countries are participating in the Book Search Partner Program. The Library Project expands to 28 partners, including seven international library partners: Oxford University (UK), University of Complutense of Madrid (Spain), the National Library of Catalonia (Spain), University Library of Lausanne (Switzerland), Ghent University (Belgium) and Keio University (Japan).

Page 10: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Open Content Alliance

• The Open Content Alliance (OCA) is a collaborative effort of a group of cultural, technology, nonprofit, and governmental organizations from around the world that helps build a permanent archive of multilingual digitized text and multimedia material. An archive of contributed material is available on the Internet Archive website and through Yahoo! and other search engines and sites.

• The OCA encourages access to and reuse of collections in the archive, while respecting the content owners and contributors. Contributors to the OCA have agreed to the principles set forth in the Call for Participation.

• The Open Content Alliance is administered by the Internet Archive, a 501c3 non-profit library.

http://www.opencontentalliance.org/about/

Page 11: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

European Digital Library Project

EDLproject was a Targeted Project funded by the European Commission under the eContentplus Programme and coordinated by the German National Library.

The project, started in September 2006 and completed in February 2008, worked towards the integration of the bibliographic catalogues and digital collections of the National Libraries of Belgium, Greece, Iceland, Ireland, Liechtenstein, Luxembourg, Norway, Spain and Sweden, into The European Library.

EDLproject also addressed the enhancement of multilingual capabilities of The European Library portal, took first steps towards collaboration between The European Library and other non-library cultural initiatives, and expanded the marketing and communication activities of The European Library service. To learn more click here.

Comments? Discussion?

Page 12: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

A DL example

• Library of Congress American Memory project– http://memory.loc.gov/ammem/index.html– “American Memory provides free and open access

through the Internet to written and spoken words, sound recordings, still and moving images, prints, maps, and sheet music that document the American experience. It is a digital record of American history and creativity. These materials, from the collections of the Library of Congress and other institutions, chronicle historical events, people, places, and ideas that continue to shape America, serving the public as a resource for education and lifelong learning.”

Page 13: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core for a map

• Map found in the LOC American Memory collection– Map at

http://memory.loc.gov/ammem/gmdhtml/gmdhome.html

• Dublin Core metadata illustration found at http://webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

– Part of a DL course at U. of Alabama

Page 14: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Go to web site to explore what is there -- including copyright information, title, history, etc.

Page 15: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Title

• Name given, usually by the creator or publisher

< META name = “DC.Title”

content = “Novi Belgii Novæque Angliæ:nec non partis Virginiæ tabula multis in locis emendata ”

lang = “la”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 16: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Subject

• What the work is about, possibly keywords, terms from classification scheme if available.

<META name = “DC.Subject” content = “Middle Atlantic States - Maps

- Early works to 1800 - Facsimilies” scheme = “LCSH” >

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

LCSH = Library of Congress Subject Headers

Page 17: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Description

• Free text description, abstract, etc.

<META

name = DC.Description”

content = “An (sic) historical map showing the coast of New Jersey as perceived in the senventeenth century”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 18: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Source

• Is this object derived from another? Is this map a part of a larger map? Is this text a variation or revision of another piece of text?

<META name = “DC.Source”content = “G3715 1685 .V5 1969”scheme = “LCCN”

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

LCCN = Library of Congress Call Number

Page 19: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Language

• Language of the content of the resource

• For the map, there is no language content

<META

name = “DC.Language”

content = “nl”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 20: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Relation

• To what other object(s) or collection is this object related? Does it also exist in another collection? Is it derived from another document or image? How is it related?

<META name = “DC.Relation”content = “isPartOf

http://lcweb2.loc.gov/cgi-bin/query/r?ammem/gmd:@filreq(@field(NUMBER+@band(g3715+ct000001))+@field(COLLID+dsxpmap))

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 21: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Creator

• Person or organization responsible for the Intellectual Content of this object

<META

name = “DC.Creator”

content = “Nicolaum Visscher”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 22: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Publisher

• Entity responsible for making the resource available in its present form

• Not shown in the example, but should be something like this:

<META name = “DC.Publisher”content = “Library of Congress American Memory Project”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 23: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Contributor

• Any entity making a contribution to this object.

• Example: someone who added some information to the original document or image

• No entry for this map.

Page 25: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Date

• Date on which this object was made available in its present form, possibly the date it was entered into this digital collection.

<META

name = “DC.DATE”

content = “1996-04-17”

scheme = “ISO 8601”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Specify the date format so that others can interpret it correctly

Page 26: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Type or Category

• What sort of thing is this? Some examples: home page, novel, poem, working paper, technical report, essay dictionary, …

• Type should be selected from a controlled list. For example, see the DCMI Type Vocabulary:

• http://dublincore.org/documents/2006/08/28/dcmi-type-vocabulary/

Why is this recommended as a controlled vocabulary field?

Page 27: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

DCMI Type Vocabulary

• Collection• Dataset• Event• Image• InteractiveResource• MovingImage

• PhysicalObject• Service• Software• Sound• StillImage• Text

See the official page for explanations of the categories. Note that Image is a broad category and Moving Image and StillImage are more restricted subcategories.

Page 28: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Type

• Category of this resource

<META

name = “DC.Type”

content = “image.photograph”

>

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 29: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Format

• The way the content is encoded. This tells what resource is needed to access this content.

<METAname=“DC.Format”content = “image/gif”scheme = “IMT”

>

Internet MIME Types: http://www.ltsw.se/knbase/internet/mime.htp

See also Internet Media Type: http://www.graphcomp.com/info/specs/mime.html

Page 30: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Unique ID

• The key for this object in the collection.• I cannot find one for the map we are looking

at, but the ID for the map of which it is a part is g3715 ct000001

• The Metadata specification for that would be<META name= “DC.Id”

content = “g3715 ct000001”>

Source: http://memory.loc.gov/cgi-bin/query/r?ammem/gmd:@filreq(@field(NUMBER+ @band(g3715+ct000001))+@field(COLLID+dsxpmap))

Page 31: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dublin Core: Coverage

• The time, space or other measurement of the scope or completeness of the object.

• No coverage entry specified, but might be this:

<META name = “DC.Coverage”content = “North America, Eastern lands and coast, as viewed in late seventeenth century”

> Example not a controlled vocabulary. Why would a controlled vocabulary be better?

Page 32: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

International Concensus

• Recognition of International Scope ofResource Discovery on Web

• 17 Countries Currently Involved in DCWorking Groups

• 50+ Implementation Projects in 10Countries

Source: webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm

Page 33: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Guide to Good Practice

• The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials

• http://www.nyu.edu/its/humanities/ninchguide/index.html

Page 34: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Access Control and Rights Management

Page 35: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Legal and Technical Issues

• Legal: When is a resource available to digitize and make available. What requirements exist for controlling access.

• Technical: How do we control access to a resource that is stored online?– Policies– Encoding– Distribution limitations

Page 36: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Date of work Protected from Term

Created 1-1-78 or after

When work is fixed in tangible medium of expression

Life + 70 years1(or if work of corporate authorship, the shorter of 95 years from publication, or 120 years from creation

Published before 1923

In public domain None

Published 1923 - 63

When published with notice 28 years + could be renewed for 47 years, now extended by 20 years for a total renewal of 67 years. If not so renewed, now in public domain

Published from 1964 - 77

When published with notice 28 years for first term; now automatic extension of 67 years for second term

Created before 1-1-78 but not published

1-1-78, the effective date of the 1976 Act which eliminated common law copyright

Life + 70 years or 12-31-2002, whichever is greater

Created before

1-1-78 but published between then and 12-31-2002

1-1-78, the effective date of the 1976 Act which eliminated common law copyright

Life + 70 years or 12-31-2047 whichever is greater

Chart created by Lolly Gasaway. Updates at http://www.unc.edu/~unclng/public-d.htm

Page 37: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Works for hire

• Usual case -- works created by faculty are not the property of the university. – Faculty surrender copyright to publishers of

journals and books– Some publishers allow faculty to retain

copyright, giving the publisher specific limited rights to reproduce and distribute the work.

Page 38: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Fair use

• No clear, easy answers.

• Checksheet provided in the article is a good guide to the issues.

• Link to the checksheet: http://www.copyright.iupui.edu/checklist.htm

Page 39: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Moral rights

• Fair to the creator– Keep the identity of the creator of the work– Do not cut the work– Generally, be considerate of the person (or

institution) that created the work.

Page 40: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Getting Permission

• With the best will in the world, getting the appropriate permissions is not always easy.– Identify who holds the rights– Get in touch with the rights holder– Get a suitable agreement to cover the needs of your use.

• Useful links: http://www.loc.gov/copyright/http://www.utsystem.edu/OGC/IntellectualProperty/PERMISSN.HTM

– Connections to various ways to discover and contact the rights holder of a work.

Page 41: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Source: NINCH Guide to Good Practice. Chapter 4:

Rights Management

Checking copyright status

Page 42: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Source: NINCH Guide to Good

Practice. Chapter 4: Rights

ManagementCopyright: Lauryn

G. Grant

Considering people

depicted in the work

Page 43: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Technical issues

• Link the resource to the copyright statements• Maintain that link when the resource is copied

or used• Approaches:

– Steganography– Encryption– Digital Wrappers– Digital Watermarks

Page 44: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Issues in Encryption

• General cases for protection of controlled content: Concern for passive listening, active interference.– Listening: intruder gains information, may not be detected.

Effects indirect. – Active interference

• Intruder may prevent delivery of the message to the intended recipient.

• Intruder may substitute a fake message for the intended one• Effects are direct and immediate• Less likely in the case of digital library content

Page 45: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Message interception

Original message

EncodingMethod Ciphertext

DecodingMethod

Received message

Eavesdropping Masquerading

Intruder

(Plain text)(Plain text)

Page 46: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Types of Encryption Methods• Substitution

– Simple adjustment, Caesar’s cipher• Each letter is replaced by one that is a fixed distance from it in

the alphabet. A becomes D, B becomes E, etc. At the end, wrap around, so X becomes A, Y becomes B, Z becomes C.

• May have been confusing the fist time it was done, but it would not have taken long to figure it out.

• Note the simple example at geocaching.com – No intention to hide or confuse. Just keep a person from seeing too much

information about the hide, unless the person wants to see the help.

– Simple substitution of other characters for letters -- numbers, dancing men, etc.

– More complex substitution. No pattern to the replacement scheme.

• See common cryptogram puzzles. These are usually made easier by showing the spaces between the words. (For very modern version, see http://www.cryptograms.org/)

Page 47: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Dancing Men????

• Arthur Conan Doyle: The Adventure of the Dancing Men. A Sherlock Holmes Adventure.

Read the story online and see the images and analysis of the

decoding at http://camdenhouse.ignisart.com/canon/danc.htm

“Speaking roughly, T, A, O, I, N, S, H, R, D, and L are the numerical order in which letters occur; but T, A, O, and I are very nearly abreast of each other, and it would be an endless task to try each combination until a meaning was

arrived at.”

Page 48: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Types of encryption - 2Hiding the text.• The wax tablet example

– message written on the base of the tablet and wax put over top of it with another message on the wax

• Steganography: (ste-g&n-o´gr&-fē) (n.) The art and science of hiding information by embedding messages within other, seemingly harmless messages. Steganography works by replacing bits of useless or unused data in regular computer files (such as graphics, sound, text, HTML, or even floppy disks ) with bits of different, invisible information. This hidden information can be plain text, cipher text, or even images.

• Special software is needed for steganography, and there are freeware versions available at any good download site.

• Can be used to insert identification into a file to track its source.

Definition from www.webopedia.com

Page 49: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Types of encryption - 3

• Key-based shuffling– Using a mnemonic to make the key easy to

remember.

• A machine to do the shuffling

A

D

B

C

D

C

B

A What shuffling is used?How would “CAB” look?

Page 50: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Monoalphabetic codes

• Any kind of substitution in which just one letter (or other symbol) represents one letter from the original alphabet is called monoalphabetic encoding.– Such codes are easy to break. That is what you do when

you solve cryptograms. – Frequency distribution of letters in normal text for a given

language are well known.• “The twelve most frequently-used letters in the English

language are ETAOIN SHRDL, in that order.” -- http://www.cryptograms.org/

Page 51: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Letter distributions in EnglishA 7.81% N 7.28% TH 3.18 OU 0.72 THE 6.42

B 1.28 O 8.21 IN 1.54 IT 0.71 OF 4.02

C 2.93 P 2.15 ER 1.3 ES 0.69 AND 3.15

D 4.11 Q 0.14 RE 1.30 ST 0.68 TO 2.36

E 13.05 R 6.64 AN 1.08 OR 0.68 A 2.09

F 2.88 S 6.46 HE 1.08 NT 0.67 IN 1.77

G 1.39 T 9.02 AR 102 HI 0.68 THAT 1.25

H 5.85 U 2.77 EN 1.02 EA 0.64 IS 1.03

I 6.77 V 1.00 TI 1.02 VE 0.64 I 0.94

J 0.23 W 1.49 TE 0.98 CO 0.59 IT 0.93

K 0.42 X 0.30 AT 0.88 DE 0.55 FOR 0.77

L 3.60 Y 1.51 ON 0.84 RA 0.55 AS 0.76

M 2.62 Z 0.09 HA 0.84 RO 0.55 WITH 0.76SOURCE: Tannenbaum Computer Networks 1981 Prentice Hall

Page 52: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Disguising frequencies

• First trick: use more than 26 symbols and use several different symbols to represent the same letter. The goal is to even out the distribution.

• Ex. Use the letters plus the digits. – 36 symbols– Assign five symbols to the letter E, two to

the letter I, three to the letter N, two each to R and S.

Page 53: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

More complex

• Vigenere’s table• Arrange all the letters of the alphabet 26 times, in

parallel columns, such that each column begins with a different letter, first A, then B, etc.

• Encode each letter by using a different column for each successive letter of the message.

• How to know which column to use? Use a keyword.

Examples and breaking:

http://www.trincoll.edu/depts/cpsc/cryptography/vigenere.html

Page 54: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Decoding • The Vigenere cipher looks really hard, but is not

secure. Since the keyword repeats, it is really just a bunch of monoalphabetic codes. If you can figure out the length of the keyword, you can do standard analysis.

• Making it harder - instead of a regular arrangement of the letter columns, scramble them in some arbitrary way.– Makes decoding much more difficult, but also makes it

difficult to have the arrangement known to the people who are supposed to be able to read the message.

Page 55: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Enigma• Suppose we take a conversion for the first letter of

the message and a different mapping for the next letter and a different mapping for the next letter …

• That is what we did with Vigenere • Add additional encodings. Rotate from a fixed starting

point through 26 positions of the first set of columns, then iterate a second set of columns. Now have 676 different mappings.

• To decode, must figure out the wiring inside each phase, and the order in which they are arranged in the machine.

Page 56: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Enigma

• German engineer, Artur Scherbius (1878-1929) invented a machine of this type around 1918 and bought the patent rights to one invented in Holland also. He added a reflecting cylinder, which allowed the same machine to encode and decode. He called the machine enigma, from the Greek for riddle.

• The enigma used by the Germans in WWII had three rotors, and later four.

Page 57: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Enigma - 2

Page 58: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Encryption/Decryption Keys• Problem is that you have to get the key to the

receiver, secretly and accurately.• If you can get the key there, why not use the same method

to send the whole message? (Efficiency of scale)• If the key is compromised without the communicators

knowing it, the transmissions are open.

• Exact working of the enigma machine: – http://www.codesandciphers.org.uk/enigma/rotorspec.htm

• How Polish mathematicians broke the enigma– http://www.codesandciphers.org.uk/virtualbp/poles/poles.htm

Page 59: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Summary of encryption goals

• High level of data protection• Simple to understand• Complex enough to deter intruders• Protection based on the key, not the

algorithm• Economical to implement• Adaptable for various applications• Available at reasonable cost

Page 60: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Data Encryption Standard

• Complex sequence of transformations– hardware implementations speed performance– modifications have made it very secure

• Known algorithm– security based on difficulty in discovering the key

• http://www.itl.nist.gov/fipspubs/fip46-2.htm

Page 61: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

The Data Encryption Standard Illustrated

64 bit blocks, 64 bit key

Federal InformationProcessing Standards 46-2 http://www.itl.nist.gov/fipspubs/fip46-2.htm

Page 62: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

INTERNET-LINKED COMPUTERS CHALLENGE DATA ENCRYPTION STANDARD

LOVELAND, COLORADO (June 18, 1997). Tens of thousands of computers, all across the U.S. and Canada, linked together via the Internet in an unprecedented cooperative supercomputing effort to decrypt a message encoded with the government-endorsed Data Encryption Standard (DES).

Responding to a challenge, including a prize of $10,000, offered by RSA Data Security, Inc, the DESCHALL effort successfully decoded RSADSI's secret message.

According to Rocke Verser, a contract programmer and consultant who developed the specialized software in his spare time, "Tens of thousands of computers worked cooperatively on the challenge in what is believed to be one of the largest supercomputing efforts ever undertaken outside of government."

Using a technique called "brute-force", computers participating in the challenge simply began trying every possible decryption key. There are over 72 quadrillion keys (72,057,594,037,927,936). At the time the winning key was reported to RSADSI, the DESCHALL effort had searched almost 25% of the total. At its peak over the recent weekend, the DESCHALL effort was testing 7 billion keys per second.

Page 63: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Public Key encryption

• Eliminates the need to deliver a key• Two keys: one for encoding, one for

decoding• Known algorithm

– security based on security of the decoding key

• Essential element: – knowing the encoding key will not reveal

the decoding key

Page 64: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Effective Public Key Encryption

• Encoding method E and decoding method D are inverse functions on message M:– D(E(M)) = M

• Computational cost of E, D reasonable• D cannot be determined from E, the algorithm, or any

amount of plaintext attack with any computationally feasible technique

• E cannot be broken without D (only D will accomplish the decoding)

• Any method that meets these criteria is a valid Public Key Encryption technique

Page 65: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

It all comes down to this:

• key used for decoding is dependent upon the key used for encoding, but the relationship cannot be determined in any feasible computation or observation of transmitted data

Page 66: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Rivest, Shamir, Adelman (RSA)

• Choose 2 large prime numbers, p and q, each more than 100 digits

• Compute n=p*q and z=(p-1)*(q-1)• Choose d, relatively prime to z• Find e, such that e*d=1 mod (z)

– or e*d mod z = 1, if you prefer.• This produces e and d, the two keys that define the E

and D methods.

Page 67: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Public Key encoding

• Convert M into a bit string• Break the bit string into blocks, P, of size k

– k is the largest integer such that 2k<n– P corresponds to a binary value: 0<P<n

• Encoding method – E = Compute C=Pe(mod n)

• Decoding method– D = Compute P=Cd(mod n)

• e and n are published (public key)• d is closely guarded and never needs to be

disclosed

Page 68: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

An example:

• P=7; q=11; n=77; z=60• d=13; e= 37; k=6• Test message = CAT• Using A=1, etc and 5-bit representation:

– 00011 00001 10100• Since k=6, regroup the bits (arrange right to left so that any

padding needed will put 0's on the left and not change the value): – 000000 110000 110100 (three leading zeros added to fill the block)

• decimal equivalent: 0 48 52• Each of those raised to the power 37 (e) mod n: 0 27 24• Each of those values raised to the power 13 (d) mod n (convert

back to the original): 0 48 52

Page 69: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

A practical note

• There is a lot more to security than encryption.

• Encryption coding is done by a few experts• Understanding how the common encryption

algorithms work is useful in choosing the right approach for your situation.

• Our interest here is in providing assurance that access to protected resources will be limited to those with legitimate rights.

Page 70: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

On a practical note: PGP

• You can create your own real public and private keys using PGP (Pretty Good Privacy)

• See the following Web site for full information.• (MIT site - obsolete)• http://www.pgpi.org/products/pgp/versions/freeware/• http://www.freedownloadscenter.com/Utilities/

Required_Files/PGP.html

Page 71: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Issues• Intruder vulnerability

– If an intruder intercepts a request from A for B’s public key, the intruder can masquerade as B and receive messages from B intended for A. The intruder can send those same or different messages to B, pretending to be A.

– Prevention requires authentication of the public key to be used.

• Computational expense– One approach is to use Public Key Encryption to send the

Key for use in DES, then use the faster DES to transmit messages

Page 72: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Digital Signatures

• Some messages do not need to be encrypted, but they do need to be authenticated: reliably associated with the real sender– Protect an individual against unauthorized

access to resources or misrepresentation of the individual’s intentions

– Protect the receiver against repudiation of a commitment by the originator

Page 73: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Digital Signature basic technique

Sender A

Receiver B

Intention to send

E(Random Number)where E is A’s public key

Message and D(E(Random Number))

= Random Number, decoded as only A

could do

Page 74: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Public key encryption with implied signature

• Add the requirement that E(D(M)) = M

• Sender A has encoding key EA, decoding key DA

• Intended receiver has encoding (public) key EB.

• A produces EB(DA(M))

• Receiver calculates EA(DB(EB(DA(M))))– Result is M, but also establishes that only A could

have encoded M

Page 75: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Digital Signature Standard (DSS)

• Verifies that the message came from the specified source and also that the message has not been modified

• More complexity than simple encoding of a random number, but less than encrypting the entire message

• Message is not encoded. An authentication code is appended to it.

Page 76: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Digital Signature - SHA

FIPS Pub 186 - Digital Signature Standard http://www.itl.nist.gov/fipspubs/fip186.htm

Page 77: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Encryption summary

• Problems– intruders can obtain sensitive information– intruder can interfere with correct

information exchange

• Solution– disguise messages so an intruder will not

be able to obtain the contents or replace legitimate messages with others

Page 78: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Important methods

• DES– fast, reasonably good encryption– key distribution problem

• Public Key Encryption– more secure

• based on the difficulty of factoring very large numbers

– no key distribution problem– computationally intense

Page 79: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Digital signatures

• Authenticate messages so the sender cannot repudiate the message later

• Protect messages from changes during transmission or at the receiver’s site

• Useful when the contents do not need encryption, but the contents must be accurate and correctly associated with the sender

Page 80: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Legal and ethical issues

• People who work in these fields face problems with allowable exports, and are not always allowed to talk about their work.

• Is it desirable to have government able to crack all codes?

• What is the tradeoff between privacy of law abiding citizens vs. the ability of terrorists and drug traffickers to communicate in secret?

Page 81: Content - part 2 Week 4. Tonight More detailed look at metadata description of content No access to a network today, so not all the updating I would like

Tonight

• Further detail of Dublin Core

• Look at another DL

• Google Books example

• Access management– Encryption– Digital Signatures