1 peter fox xinformatics – itec 6961/csci 6960/erth-6963-01 week 9/10, april 13, 2010 information...

40
1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963- 01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project definitions

Upload: george-baker

Post on 28-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

1

Peter Fox

Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01

Week 9/10, April 13, 2010

Information life-cycle and visualization and check-in for

project definitions

Page 2: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Contents• Review of last class, reading

• Information life-cycle

• Information visualization

• Checking in for project definitions

• Discussion of reading

• Next class

2

Page 3: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

And yet only one part of the life cycle of data

Page 4: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Definitions• Life-cycle elements

– Acquisition: Process of recording or generating a concrete artefact from the concept (see transduction)

– Curation: The activity of managing the use of data from its point of creation to ensure it is available for discovery and re-use in the future (http://www.dcc.ac.uk/FAQs/data-curator)

– Preservation: Process of retaining usability of data in some source form for intended and unintended use

– Stewardship: Process of maintaining integrity across acquisition, curation and preservation

4

Page 5: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Definitions ctd.• Management: Process of arranging for

discovery, access and use of data, information and all related elements. Also oversees or effects control of processes for acquisition, curation, preservation and stewardship. Involves fiscal and intellectual responsibility.

5

Page 6: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

The nature of the challenge• To architect information systems today

– You may play many roles– You may not get all the metadata or information

you need even if you get the data– You will need skills that you were not taught

• To work with end-users today– You may have lots of technical experience– You will need new skills in addressing the

changing use of data and information– One ‘size’ does not fit all

6

Page 7: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Many views of the Information life-cycle

7

Page 8: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Acquisition• Learn / read what you can

about the developer of the means of acquisition– Documents may not be easy

to find– Remember bias!!!

• Document things as you go

• Have a checklist (the Management list) and review it often 8

Page 9: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Curation (partial)• Consider the organization and presentation of

the data

• Document what has been (and has not been) done

• Consider and address the provenance to date, you are now THE next person

• Be as technology-neutral as possible

• Look to add metainformation

9

Page 10: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Preservation• Usually refers to the full life cycle

• Archiving is a component

• Stewardship is the act of preservation

• Intent is that ‘you can open it any time in the future’ and that ‘it will be there’

• This involves steps that may not be conventionally thought of

• Think 10, 20, 50, 200 years…. looking historically gives some guide to future considerations 10

Page 11: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Remember• The life cycle applies within and before and

after your use case…

• So, let’s look in a little more detail

11

Page 12: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

How the information is created

• Systemic

• Environmental

• Trial-and-error (or ad-hoc)

12

Page 13: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

How the information is delivered?• One-to-many presentation• White paper• Web site FAQ• Web site informational• Web site directed (link sent with e-mail, and so

on) to a specific Web site• Application-based delivery via managed expert

system• One-to-one presentation:

– Word of mouth– Ad-hoc communication

13

Page 14: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

How the information is managed• Complexity of the information

• Complexity of the creation process

• Complexity of the management system

• Financial impact of IP/IC creation

14

Page 15: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Type of information created• Tacit (created and stored informally):

– Human memory

– Local hard drive of the computer

– Expert system (moving tacit information into a formalized structure)

• Explicit (created and sorted formally):– Network share

– Network Web site/intranet

– Informal knowledge-management system

– Document-management system

– Formal KM system

• Value of the source

• Age of the information• Proximity of the information to the consumer• Source of the information, and previous interactions with that

specific source

15

Page 16: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Value of the source• Age of the information

• Proximity of the information to the consumer

• Source of the information, and previous interactions with that specific source

16

Page 17: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Mostly Technical Issues

• Data Preservationo Bit-level integrityo Data readability

• Documentation• Metadata• Semantics• Persistent Identifiers• Virtual Data Products• Lineage Persistence• Required ancillary data• Applicable standards

Page 18: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Mostly Non-Technical Issues• Policy (constrained by money…)

• Front end of the lifecycleo Long-term planning, data formats, documentation...

• Governance and policy• Legal requirements• Archive to archive transitions

• Money (intertwined with policy)• Cost-benefit trades• Long-term needs of programs • User input

o Identifying likely users• Levels of service• Funding source and mechanism

Page 19: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Life cycle is a complex issue• Must be managed

• Documented

• As part of the use case, but also outside it

19

Page 20: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Information Visualization• Questions to keep in mind

– What is the improvement in the understanding as compared to the situation without visualization?

– Which visualization techniques are suitable for one's data/ information?

20

Page 21: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Why visualization?• Reducing amount of data, quantization

• Patterns

• Features

• Events

• Trends

• Irregularities

• Exit points for analysis

• Leading to presentation of data

• Recall – cognitive science and the mental representation??!!??

21

Page 22: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Types of visualization• Color coding (including false color)

• Classification of techniques is based on– Dimensionality– Information being sought, i.e. purpose

• Line plots

• Contours

• Surface rendering techniques

• Volume rendering techniques

• Animation techniques

• Non-realistic, including ‘cartoon/ artist’ style22

Page 23: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Image (aka Raster) file formats• CGM, the Computer Graphics Metafile, has

been an ISO standard since 1987. It has the capability to encompass both graphical and image data.

• PostScript or more specifically Encapsulated PostScript Format (EPSF), is a page description language with sophisticated text facilities . For graphics, as compared to CGM, it tends to be expensive in terms of storage.

23

Page 24: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Image file formats• TIFF, the Tagged Image File Format,

encompasses a range of different formats, originally designed for interchange between electronic publishing packages.

• GIF, the Graphical Interchange Format , is quite widespread and can encode a number of separate images of different sizes and colors.

• PNG, the Portable Network Graphic format

24

Page 25: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Image file formats• RGB, the Red Green Blue format of Silicon

Graphics, is used by most visualization software packages as the internal image format. The format consist of a header containing the dimensions of the image, followed by the actual image data.

• The image data is stored as a 2D array of tuples. Each tuple is a vector with 3 components: R, G, and B. The RGB components determine the color of every pixel (picture element) in the image. 25

Page 26: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Image file formats• PPM, the Portable

Pixmap Format (24 bits per pixel), PGM, the Portable Greyscale Format (8 bits per pixel), and PBM, the Portable Bitmap Format (1 bit per pixel) formats are pixel based and are distributed with the the X-Window system (version 11.4).

26

Page 27: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Image file formats• XBM is the X-Window one Bit image file format,

which has been standardized by the MIT X-consortium.

• A major constraint on the use of images is the large data volume which has to be dealt with.

• Large sets of image data can have severe implications for storage, memory, and transmission costs.

• Therefore, compression techniques are very important.

• There are two categories based on whether or not it is possible to reconstruct the initial picture after compression.

27

Page 28: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Compression (any format)• Lossless compression methods are methods for

which the original, uncompressed data can be recovered exactly. Examples of this category are the Run Length Encoding, and the Lempel-Ziv Welch algorithm.

• Lossy methods - in contrast to lossless compression, the original data cannot be recovered exactly after a lossy compression of the data. An example of this category is the Color Cell Compression method.

• Lossy compression techniques can reach reduction rates of 0.9, whereas lossless compression techniques normally have a maximum reduction rate of 0.5.

28

Page 29: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Vector formats• Postscript

• PDF

• SVG

• ‘Shape files’

• CGM (also)

• …

29

Page 30: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Animation formats• Mpeg

• Avi

• Qt

• Wmv

• Animated GIF

30

Page 31: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Remember - metadata• Many of these formats already contain

metadata or fields for metadata, use them!

31

Page 32: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Tools• Conversion

– Imtools– GraphicConverter– Gnu convert– Many more

• Combination/Visualization– IDV– Gnuplot– http://disc.sci.gsfc.nasa.gov/giovanni

32

Page 34: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Visualization

34

Page 35: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Managing visualization products

• The importance of a ‘self-describing’ product

• Visualization products are not just consumed by people

• How many images, graphics files do you have on your computer for which the origin, purpose, use is still known?

• How are these logically organized?

35

Page 36: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Discovery of visualizations• When represented as images:

– Image-based type free text search?– Referred to in publications (articles, books, web

pages)

• Vector graphics:– Postscript or PDF– SVG– Others?

• What makes this easy or hard or impossible?

36

Page 37: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Discussion• About life-cycle in general?

• Visualization?

37

Page 38: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Reading for this week• Is retrospective

38

Page 39: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

Check in for Project Assignment

• Analysis of existing information system content and architecture, critique, redesign and prototype redeployment

39

Page 40: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9/10, April 13, 2010 Information life-cycle and visualization and check-in for project

What is next• Week 11 – Information and Workflow

Management

• Week 12 – Information Discovery, Information Integration

40