principles of information systems

133
Principles of Information Systems Session 06 Systematisation and Construction

Upload: fauna

Post on 23-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Principles of Information Systems. Session 06 Systematisation and Construction. Systematisation and Construction. Chapter 5. Overview. Learning objectives Introduction Repositories for data Describing things and collections Data modelling Data structures - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Principles of Information Systems

Principles of Information SystemsSession 06Systematisation and Construction

Page 2: Principles of Information Systems

2

Systematisation and Construction

Chapter 5

Page 3: Principles of Information Systems

3

OverviewLearning objectives1. Introduction2. Repositories for data3. Describing things and collections4. Data modelling5. Data structures6. Data organisation for the real world7. Systematics: another way of organising the world8. Summary

Page 4: Principles of Information Systems

4

Learning objectives

• Explain why recorded information needs to store information about both things and types of things

• Describe some different types of information repository

• Describe how information repositories can be built from simple propositions

• Describe how data modelling is used to design an information repository

Page 5: Principles of Information Systems

5

Learning objectives

• Explain how measurement and scaling affect information in systems

• Explain some of the issues to do with modelling data about space and time

• Describe the main features of several different data structures

• Explain the principles of classification

• Describe four types of classification structures

Page 6: Principles of Information Systems

6

Introduction

• Informatics is primarily concerned with collecting details about the world to use for a variety of practical purposes.

• So far we have looked at:- How we identify and name things, - Which then become represented and recorded, - And how language, perception and memory help build maps of the world that order the details that will remain of interest over time.

1. Introduction2. Repositories for data3. Describing things and collections4. Data modelling5. Data structures6. Data organisation for the real world 7. Systematics8. Summary

Page 7: Principles of Information Systems

7

Introduction

• In this chapter we show how the recorded information itself can be organised for specific practical purposes, by lasting arrangements and structures that humans can understand and find useful.

• To do this we must formalise what we know of both things in the world, and types of things in the world.

Page 8: Principles of Information Systems

8

Things… and types of things

Spaniels, beagles, pointers, terriers, setters …

My dog Patch

Page 9: Principles of Information Systems

9

Things and types of things

• To organise things, we have ideas of data repositories made up of various data structures

• To order types of things, we have principles of classification and systematisation

• We need both these in order to be able to organise and record the world effectively.

Page 10: Principles of Information Systems

10 10

Things and types of things

• Information about things enables individual occurrences to be stored and used effectively

• Information about types of things allows these occurrences to be categorised within larger schemes of understanding.

Page 11: Principles of Information Systems

11

Recorded information needs to be organised for specific practical purposes, by lasting

arrangements and structures that humans can understand and find useful.

To do this we need to consider both things, and types of things.

Recap

Page 12: Principles of Information Systems

12 12

Repositories for data

• We can identify three major types of data repository:

-Databases -Spreadsheets-Knowledge repositories

1. Introduction2. Repositories for data3. Describing things and collections4. Data modelling5. Data structures6. Data organisation for the real world 7. Systematics8. Summary

Page 13: Principles of Information Systems

13

Databases – the classic data repository

• Databases are a modern incarnation of a system of recording details about the world that goes back as long as recorded history.

• Databases have also been a major idea throughout the history of computing.

• The word database refers both to a particular set of facts, but also to the physical presence on a computer of those facts, as well as to its logical location in an organisation’s hierarchy.

• The term database system refers to the database itself and the software application used to manipulate it.

Page 14: Principles of Information Systems

14

Entities and tables

• Data is essentially a set of observations: when we create a database, we begin by working out what entities (that is, things) we are describing.

• For example, we might have made some observations about Australian football teams that we wish to record…

• We can store this information in the form of a table.

Page 15: Principles of Information Systems

Table

We represent a set of observations about an entity in a table.

Record

Each row is an individual record, corresponding to one club’s details

Field

The value in the column for a particular record is called a field or attribute.

step

Page 16: Principles of Information Systems

16

Schema

• Since each footy club has the same sort of attributes, the types in the record for each one will be congruent:

-Column 2 is always a name (of a city);

-Column 5 is always a URL (of a website)

• Databases aim to have a consistent structure for similar objects with congruent types.

-The specification describing this is called a schema.

• Deciding on the correct schema for a database involves a process called data modelling

Page 17: Principles of Information Systems

17

Multiple tables

• Complex situations may need more than one entity to describe it (and hence more than one table to collect it)

• For example, we might also store information about matches :

Page 18: Principles of Information Systems

18

Database queries

• Once the data is in a suitable structure, queries can be asked of it

• The result of the query is another set of records, with the required information.

Page 19: Principles of Information Systems

19

Database queries

“Who won on their home ground, and by how much?”

Page 20: Principles of Information Systems

Who won

On their home ground?

-- didn’t win on home ground, so not included in result

Page 21: Principles of Information Systems

Who won

On their home ground?

-- didn’t win on home ground, so not included in result

Page 22: Principles of Information Systems

Who won

On their home ground?

And by how much?

Page 23: Principles of Information Systems

Who won

On their home ground?

And by how much?

Page 24: Principles of Information Systems

Who won

On their home ground?

And by how much?

Page 25: Principles of Information Systems

25

Spreadsheets • Spreadsheets are arrays of values that are laid out in a

grid on a computer screen -They are conceptualised as rows and columns of values

• Spreadsheets can store and manipulate the data dynamically in various ways

- If a change is made in one part of the spreadsheet, the entire array can be recalculated automatically

• Spreadsheets are very flexible and have numerous applications in informatics

Page 26: Principles of Information Systems

26

• Labels, numbers and formulae can all be entered into the spreadsheet grid

• Notice that the spreadsheet structure can be irregular, unlike a database table

Total column is calculated automatically, using formulae

Page 27: Principles of Information Systems

27

Spreadsheets

• The term spreadsheet refers to the details being stored, the physical file they are stored in, and also the software application

• Sometimes the distinction is made between workbooks and worksheets (single sheets within a workbook)

• Most spreadsheet applications have features such as charts, different types of built-in formulae and analysis tools, and templates for common tasks

Page 28: Principles of Information Systems

Spreadsheet template for expense claims (above) and completed for a particular trip (below)

Page 29: Principles of Information Systems

29

Knowledge repositories

• When information ‘about’ the data is stored, rather than the data itself

- For example, manuals or procedures documents, emails, notes, multimedia, and other ‘unstructured’ information

• In this situation the data must be catalogued and indexed in some way, and the catalogue itself stored in an easily searchable form

Page 30: Principles of Information Systems

30

Structured, unstructured and semi-structured data stores

• Herb Simon introduced the idea of three different levels of structure into informatics

- (we meet this again in Chapter 6)

• We can use it to describe stores of information as: - structured- semi-structured (somewhere in between) - unstructured

Page 31: Principles of Information Systems

Unstructured data set – where there is no regular structure among the records in the data set, such as in a collection of documents or emails.

Structured data set – where there is a regular structure shared by all records in the data set which can be expressed as a schema. A conventional database is an example

Semi-structured data set – where there is an irregular structure among records in the data set. For example, bibliographic data.

Page 32: Principles of Information Systems

32

Databases, spreadsheets and knowledge repositories are three

of the most common types of information repository.

Recap

Page 33: Principles of Information Systems

33

Describing things and collections

1. Introduction2. Repositories for data3. Describing things and

collections4. Data modelling5. Data structures6. Data organisation for the real world 7. Systematics8. Summary

• When we build information repositories, we are describing collections of things that exist in the world

• A collection is a meaningful grouping, and can be represented by a list:

-Shopping lists-Lists of football players-Top ten music lists-FBI most wanted lists

• Lists of terms, along with coding systems and formal descriptions, are the basis of any systematic organisation of data

Page 34: Principles of Information Systems

34

Words and terms• To have an informatic system, we need consistent and

verifiable descriptions of the world –

-we need to be able to use words that have a consistent shared meaning

• Any word that is agreed to have a consistent shared meaning is called a term

• We can also use numbers, dates, colour and so on to describe the world,

-but these must also share the criteria for terms

Page 35: Principles of Information Systems

35

A term must have:

• A clearly-defined context of usage-This defines the term’s community of users, which in turn provides the framework for understanding that term

• A clearly-defined range of permitted usage-This means that once a term is used as applied to a concept, it must be adhered to by the community of users

• A clearly-defined and unambiguous definition of meaning within that context.

Page 36: Principles of Information Systems

36

Terms• A term has to stand for something that exists in the

world, and also for an agreed meaning. • Thus, terms are parts of lists and also part of coding

schemes and descriptions

Duck

Birds

Flighted

Aquatic Land

Flightless

List of animals I saw yesterday

dog cat duck rabbit

Bird classification

Page 37: Principles of Information Systems

37

Combing words or terms into phrases and sentences • A phrase is a group of words that has a single semantic

meaning:-My left foot -Green tea -On the floor

• Phrases combine into semantically complete clusters of words or sentences.

-Sentences convey ideas and observations in various ways as statements, questions and orders

Page 38: Principles of Information Systems

38

There is a sock on the floor. Who left the sock

on the floor?

Pick that sock up off the floor!

Statement

Question

Order

Sentences convey ideas and observations in three different ways

step

Page 39: Principles of Information Systems

39

There is a sock on the floor. Who left the sock

on the floor?

Pick that sock up off the floor!

PROPOSITION

QUERY

COMMAND

In informatics these are formalised as

Page 40: Principles of Information Systems

40

Sentences in informatic systems

• Propositions are sentences that make descriptions about the world, which can be proven true or false.

- These are suitable for storing in informatic systems.

• Queries are requests given to informatic systems for propositions that match certain criteria.

• Commands are instructions given to informatic systems.

Page 41: Principles of Information Systems

The way we build up sentences ofmeaning in ordinary language (left) corresponds directly to equivalents in informatics (right).

Page 42: Principles of Information Systems

42 42

Information systems as lists of propositions• Lists of propositions make up information repositories:

The cat sat on the mat

The dog sat on the mat

• Finding information from these repositories involves matching a query to the stored propositions:

- What sat on the mat?

The ___ sat on the mat

Answer: cat, dog

Page 43: Principles of Information Systems

43

Observations of things in the world can be formalised as statements, and these statements or propositions are what is collected into various types of

information repository.

Recap

Page 44: Principles of Information Systems

44

Data modelling

• Much of informatics involves modelling.

• Models are simplifications of a perceived reality where some things in the world get described, measured, represented, and put together.

• A data model describes the structure that will contain the actual data in the repository.

1. Introduction2. Repositories for data3. Describing things and collections4. Data modelling5. Data structures6. Data organisation for the real world 7. Systematics8. Summary

Page 45: Principles of Information Systems

45

Data modelling

• A data model describes the structure that will contain the actual data in the repository

-Notice that although we are making a container before we have the data, we are already aware of the kinds of statements we want to store, and can plan the structure accordingly

• We go from a description that is semi-structured or unstructured, pick out the entities and other things of interest and prepare a structured schema that allows querying.

Page 46: Principles of Information Systems

46

Steps in data modelling

• Investigate the kinds of statements that will be recorded

• Identify the terms the statements use

• Identify what kinds of questions are going to be asked

• Identify the extent and frequency of changes to the statements

Page 47: Principles of Information Systems

47

Investigate the kinds of statements that are going to be recorded

• Identify or elicit statements and propositions about the area of interest.

‘A car is expensive’

‘A car is boxy’

are propositions about a car

Page 48: Principles of Information Systems

48

Identify the terms the statements use

• The things of interest have properties or attributes

• In general a subject (S) will have several properties (P1, P2, …)

Car is red, boxy, expensiveBook is red, boxy, expensiveFrisbee is yellow, round, cheapJelly is red, shapeless, cheap

• Here we see that all our statements have properties in common:

Thing has colour, shape, price

Page 49: Principles of Information Systems

49

Identify the kinds of questions that will be asked• Matching the pattern of the required query against the

stored data provides a check that the information can be retrieved:

__ is red, __, __Retrieves everything except frisbee

Page 50: Principles of Information Systems

50

Identify the extent and frequency of changes to the statements

• Some information in your database will change very rarely (states of Australia, capital cities)

-Others will change more frequently (mailing addresses)

• Your model needs to take into account whether changes to the data are expected, and what you want to do about it, e.g.

-Overwrite the old information-Keep a log of old information-Store infrequently changing information separately, so it is easier to keep consistent

• The decision depends on the particular system and what it is used for

Page 51: Principles of Information Systems

51

Relationships between entities

• Once the things of interest have been modelled, the way in which the relate to one another must be considered

• One common way of doing this is through conceptual data modelling techniques such as entity-relationship modelling

Page 52: Principles of Information Systems

52 52

Conceptual data modelling

• One common form of conceptual modelling for databases is entity-relationship diagramming

• Entity-relationship diagramming uses:-Entities – types of thing of interest, such as customers, suppliers, invoices

-Attributes - properties of specific interest for an entity.

-Relationships – how entities relate, for example, is the relationship optional or mandatory?

Page 53: Principles of Information Systems

53

Entity-relationship diagramAn employer may have many employees, but each employee has only one employer.

An employer has a single premises, or none. Many employers are located at each premises

Page 54: Principles of Information Systems

54

A data modelling example

• The Domesday book records a survey of England carried out in 1086 for William the Conqueror

• The information collected includes information about villages, landowners, tenants, stock kept, and so on, written in Latin

Page 55: Principles of Information Systems

The Domesday book

Page 56: Principles of Information Systems

56

A data modelling example• From the information contained in the Domesday

book, and what we know about how land ownership operated, we can make reasonable assumptions about:

-Entities (the things of interest)-How the entities are related

• From this we can create a data model that enables the same information to be stored as a database

-We go from a semi-structured or unstructured document, to a structured schema that permits querying

Page 57: Principles of Information Systems

A possible Entity-Relationship Diagram modelling the information in the Domesday Book

Page 58: Principles of Information Systems

58

• The schema shows the overall structure of the database and is the container for things of interest, such as landlords and tenants

• The tables would be directly produced from the schema, and the data itself would then be put into these tables

Page 59: Principles of Information Systems

59 59

Data modelling proceeds from a set of requirements to a

formal regularised structure that can be used to build an

information repository.

Recap

Page 60: Principles of Information Systems

60

Data structures

• Data structures are particular forms for representing data that allow computational processes to occur.

- Compare with the representational forms in Chapter 3: these are aimed at human users and are not precise enough for a computer to work with directly.

• There are several commonly used data structures commonly used in informatics

- Sets, lists, graphs, trees …

1. Introduction2. Repositories for data3. Describing things and collections4. Data modelling5. Data structures6. Data organisation for the real world 7. Systematics8. Summary

Page 61: Principles of Information Systems

61

Data structures for computation• With many of these representations, the code required

for a computer to use them can be generated automatically.

• So, if you use an appropriate representation to specify the things of interest and how they relate to one another, a computer can then take your model and build a system.

-The system can then process data, answer queries, draw new inferences, make predictions or enact any number of useful outcomes.

• All you have to do is draw the model correctly, and there are tools that help you to do this properly.

Page 62: Principles of Information Systems

62

Atoms and lists

Consider a list of five fruit:

This list has 5 atomic elements

Page 63: Principles of Information Systems

63

Simple lists and complex lists• Simple lists comprise atoms, while complex lists comprise atoms

and other lists recursively.

• A more complex list might have a list inside it:(apple banana cherry durian eggplant (citrus fruits))

where citrus fruits might itself be the simple list (orange lemon lime)

• This can be written as:(apple banana cherry durian eggplant orange lemon lime)

or:(apple banana cherry durian eggplant (citrus fruits))

Page 64: Principles of Information Systems

64

Atoms and lists

• Atoms and lists are ways to store individual or composite chunks of data

• An atom is the simplest abstract data type, and serves as a storage mechanism for a single value.

• It is possible to have a null value, and store this too-This is needed because sometimes we have to store missing values.

• The list itself has no structure apart from its atoms, but is the basis for building structures

Page 65: Principles of Information Systems

65

Adding behaviour to lists

• In practice, we have to consider how lists and more complex structures get built up in the first place.

• There are three ways in which a list can have a value added to it:

-head addition (value is added to beginning of the list)-tail addition (value is added to the end of a list)- insertion (value is added at a designated position in list)

• And similar ways in which a value can be removed from the list

• These behaviours distinguish different data structures

Page 66: Principles of Information Systems

66

Queue

• A queue is a list in which a value is: - added to the tail, and

- removed from the head.

• This behaviour is called First-In-First-Out or FIFO

Page 67: Principles of Information Systems

ba

cde

First In First Out

Queueview

Page 68: Principles of Information Systems

68

Queue

A queue showing First-In-First-Out (FIFO) behaviour

Page 69: Principles of Information Systems

69

Stack

• A stack is a list for which the value is

- added to the head, and - removed from the head

• This behaviour is called Last-In-First-Out – or LIFO

Page 70: Principles of Information Systems

70

de

cba

Last In First Out

Stackview

Page 71: Principles of Information Systems

71

Stack

A stack showing Last-In-First-Out (LIFO) behaviour

Page 72: Principles of Information Systems

72

Ordered lists

• In ordered lists, the elements self-sort as they are added

• For example, an ordered queue still has FIFO adding/removing order, but when a new value is added, it is automatically placed in order in the queue

A sorted queue

Page 73: Principles of Information Systems

73

Tables

• Lists can be built up into more complex but still conventional structures

• Adding labels to lists starts to associate them with meaningful ideas.

• Arithmetical or statistical operations applied to lists of numbers are basic in informatics and spreadsheets are commonly used for this and other functions

Page 74: Principles of Information Systems

74

Tables

• A spreadsheet’s rows and columns are lists on which meaningful operations can be defined to process that data.

• For example, a column of numbers listing quarterly expenses can be totalled, or averaged

Page 75: Principles of Information Systems

75

Tables

• Another set of data (Bethany’s expenses) have now been recorded, to form a rectangular table, or array.

Page 76: Principles of Information Systems

76

Matrices

• Separating the labels from the lists gives another data structure, the matrix (plural matrices).

• Matrices are useful as we can also perform operations on them as mathematical expressions:

- If our company policy is to subsidise expenses at 50 per cent, we multiply the highlighted numbers by 0.5.

• The information in a matrix is often represented pictorially in a graph

Page 77: Principles of Information Systems

77

Graphs

• Often the term ‘graph’ is used for a chart; however, note this is not the meaning used in data structures

• A graph has nodes and connections between them

• Graphs have formal properties that can be very useful in modelling and representing data.

Page 78: Principles of Information Systems

78

Graphsnode

edgeA

B

C

D• This graph has four nodes (A, B, C, D) and four edges• The graph is connected – it is possible to get from any

node to any of the other nodes along an edge.

• Three of the nodes form a cycle• This graph could represent a road network, where the

road from A to B is the same as from B to A

Page 79: Principles of Information Systems

79

Weighted graph

• Labels are placed on the edges showing their weights

• This graph could now represent a road network, where the weights indicate the distances between towns

A

B

C

D

Page 80: Principles of Information Systems

80

Directed graph A

B

C

D

• This graph shows directional information and is called a directed graph or digraph.

• When the order of the nodes does matter the links are called arcs. One node is the beginning and the other is the end of the arc.

• This shows an arc from A to B and three other directed edges.

• A digraph may also be labelled with weights or other information.

Page 81: Principles of Information Systems

81

Cyclic and acylic graphs

• Graphs may be cyclic or acyclic

• Because you can cycle round BCD and back to B the graph is cyclic

• If the path from C to D were removed, the graph would have no cycles.

• Directed acyclic graphs (DAGs) are very common in informatics.

Page 82: Principles of Information Systems

82

Directed graphs

• Because there is a start and an end point for each arc, directed graphs are very useful for modelling flows, sequences and causal chains

• They show whether some particular node is unreachable:

- A is unreachable from B, C or D.

- In a road network, this might represent a one-way system

Page 83: Principles of Information Systems

83

Sources and sinks

• As it has no arc going into it, A is called a source: it provides whatever is flowing through the graph, such as water, traffic, or data

• If the arrow directions were reversed, A would receive the output from the system, the final destination for the water, data or whatever flows through the network.

• In this case node A would be called a sink.

Page 84: Principles of Information Systems

84

Weighted digraph with non-symmetric weighting

• Links need not be symmetric. For example, between A and B the weight is 10, but between B and A it is only 5.

• This could represent:

-an unequal strength of attraction between two people

- faster to drive one way than another between locations, due to one-way systems or rush hour

Page 85: Principles of Information Systems

85

Graphs in informatics

• There are different possible paths through this road network. Starting at C the quickest way to D is via A and B, a total distance of 30, rather than 40 by the direct route.

• This type of thing is very useful to know in many applied areas. For example:

-How information flows most efficiently in a network-Which people associate with each other most, and who is the central person in a social network

-How ideas spread-How similarity or association among elements can be modelled.

Page 86: Principles of Information Systems

86

Trees

• Trees are a common type of representation in all areas of informatics, both informally (e.g. showing levels of an organisation), and more formally as models of information structure.

• Formally, trees are a special type of graph: their essential property is that they are connected and have no cycles.

• The node at the top of the tree is called the root, and there are levels of branches, ending in leaves at the furthest level.

Page 87: Principles of Information Systems

87

Data structures are representational forms that are suitable for direct

computation. Atoms, lists (including stacks and

queues), tables, graphs and trees are commonly used in informatics

Recap

Page 88: Principles of Information Systems

88

Data organisation for the real world

• So far we have discussed purely formal structures for data and have not made much reference to the world in which these apply and are used.

• However when organising data to build systems representing the real world, we must take into account issues of:

-Measurement

-Scaling

-Space

-Time

1. Introduction2. Repositories for data3. Describing things and collections4. Data modelling5. Data structures6. Data organisation for the real world 7. Systematics8. Summary

Page 89: Principles of Information Systems

Where is the centre of Australia?

Page 90: Principles of Information Systems

Where is the centre of Australia?

Furthest from the sea?

Median?Centre of gravity?

Lambert gravitational centre?

Depends – what do you mean by centre? And how are you going to measure it?

Page 91: Principles of Information Systems

91

Measurement• Measurement is critical in informatics, but is also not as

straightforward as it might first appear.

• When putting something into an information system the terms must refer to a concept understood by users of the system

• Terms must be interpreted in the same way-e.g. definition of ‘centre’

• And the data recorded needs to be based on a consistent interpretation

-e.g. how latitude and longitude are measured

Page 92: Principles of Information Systems

92

Scaling

• Scaling along with measurement is highly applicable to the relationship between data and the real world.

• When it is impractical to use the whole territory as its own map, a scale map is used, making specific choices of what gets measured and on what scale.

• What is selectively represented on a map then refers to analogous things in the physical world.

Page 93: Principles of Information Systems

For thousands of years the mighty starships tore across the empty

wastes of space and finally dived screaming onto the planet Earth –

where, due to a terrible miscalculation of scale, the entire battle fleet was

accidentally swallowed by a small dog.Douglas Adams, ‘The Hitchhiker’s Guide to the Galaxy’

Page 94: Principles of Information Systems

94

How long is the coastline of Britain?

The measured length depends on the scale of measurement.

The smaller the measuring unit used, the larger the end result.

This is known as the Richardson Effect.

Page 95: Principles of Information Systems

95

Scale and informatics

• Scale also matters in organisational processes:

-For example, it is not generally worth a publisher’s effort to prepare a printing press for a run of less than (say 500) books. If there is no market likely even for those 500 books, the project will be cancelled.

-On the other hand, if there is likely to be a huge market, economies of scale will apply – the press only has to be set up once, and each unit over 500 becomes marginally cheaper to produce.

Page 96: Principles of Information Systems

96

96

Scale and informatics• Many systems do not scale, or alter their properties, as they scale

up or down.- If I have two friends round for a meal, I may make them a pizza using tomato paste, olives, herbs and cheese.

- If I have six friends round, I may make three pizzas, by multiplying the quantities by three. I’d also need to start earlier, and use all three oven shelves.

- If I have 600 friends round, the oven and work surface would not be big enough, the toppings on some pizzas would have oxidised due to the delay in cooking, and most would be cold by the time they were served. And I would have to enlist help or start very early indeed.

• My previous pizza-production system would not work. A redesign caused by the scale of the task is required.

Page 97: Principles of Information Systems

97

Scale and informatics• The viability of any system depends on its scale:

-Systems have to be the right size for them to work in their planned environment.

• When you prototype a system, or make any scale model of a larger, operational system, some things are necessarily left out.

-But the things left out might become important in the full-scale version.

• A scale model or a small-scale pilot project may not tell you enough about the realities of the target project, or its behaviour in its environment.

-Must see how it operates in its context of practical use

Page 98: Principles of Information Systems

98

Data about points in space

• Often we need to record information about places: - A mailing address in a list of colleagues or friends

- Recording the observations of weed regrowth after a fire in a bushland

• In determining how to record position information, we must pay attention to the features that will apply when the information is used.

Page 99: Principles of Information Systems

Can you tell me where the nearest garage is, please?

You can’t miss it – just drive about a mile this side of the old bridge then after you pass the house where Bill used

to live take the second fork after the site for the new barn …

Page 100: Principles of Information Systems

100

Data about points in space

• When representing information about locations in a system three things apply:

- The conventions understood in the context

- The level of detail required

- Whether a specific physical location or a general idea is intended

Page 101: Principles of Information Systems

101

Conventions for recording• By name:

-Sydney

• By absolute reference -Latitude and longitude, e.g. 31 57.7705S 115 56.1506E

• By relative reference -Official, e.g. 47 Smith Drive Doubleview-Colloquial, e.g. ‘third street on your right after the train station’

• Any of these will work, as long as they are used consistently, and the relation of the terms has a conventional understanding by the users in the outside world.

Page 102: Principles of Information Systems

102

How much detail

• How much detail is needed in the location is another choice

• Again, context determines the appropriate granularity- Recording bushfire location for planning fuel reduction burning

- Or for studying plant regeneration in micro-habitat

Page 103: Principles of Information Systems

103

General or specific

• Some location descriptions are based on function or form rather than specific places:

-Soap is found in bathrooms

-Fish are found in rivers

-Every 500 metres

-30 litres per hectare

• In each case the information is describing regularities that are true not of a location, but of a set of locations in general and are repetitious in space.

Page 104: Principles of Information Systems

104

Data about points in time

• The way we use words to indicate time says a lot about our culture, and about the differences between cultures.

• A primary concern of societies is to locate significant events (religious, political, social) within a common framework of reference for time:

-This is the study of chronemics (see chapter 2)

• However all cultures are thought to share two forms of temporal experience – linear and cyclical.

Page 105: Principles of Information Systems

105

Linear time and cyclical time• Linear time is where gradual change in the physical

universe is observed as an irreversible decay:-The 28 consecutive days from one full Moon to another-Growth and decay of organic structures (people, trees, coral reefs

-Slow release of energy in coiled metal springs or chemical batteries

• Cyclical time is where observed events seem repetitious:

-Seasonal time is a cyclical response of the living environment to repeating changes in climate

-Circadian time is a cyclical, biological response to the day/night cycle.

Page 106: Principles of Information Systems

106

Time and informatics• Issues related to recording times and dates are

extremely important for society.- Inheritance rights for first born, leases, rentals, rites of passage, loan repayments…

• All these are based on a society’s calendars:-Gregorian, Julian, Hindu, Chinese, Arabic…

• Calendrical reforms have caused great disruption in the past

• The trend towards globalised systems and their synchronisation and standardisation, means these issues are likely to be significant for informatics in the future

Page 107: Principles of Information Systems

107

Organising data to build systems representing the ‘real world’ has significant

practical implications for informatics. Issues of measurement, scaling, space and

time must be taken into consideration.

Recap

Page 108: Principles of Information Systems

108

Systematics

• So far we have considered how we can make a stored version of our observations so as to retrieve them accurately later on.

-That approach applies to recording the presence of things in the world

• We saw in chapter 1 how categorising things in the world is fundamental to our understanding of the types of things that can exist, and how they are related.

• Now we will look at how we can build these into formal schemes for describing the world.

1. Introduction2. Repositories for data3. Describing things and collections4. Data modelling5. Data structures6. Data organisation for the real world 7. Systematics8. Summary

Page 109: Principles of Information Systems

The Encyclopaedia of Diderot and d’Alembert, developed in the second half of the eighteenth century

Page 110: Principles of Information Systems

110

Systematising a library

• This involves organising the books by preparing a list of the subjects they are about. There will be:

-some number of books on any number of subjects

-there can be more than one subject in each book

-more than one book (or none) on each subject.

• If we have already organised the books we will have a secondary list, an index, which signifies the contents of the books

-Checking the index is more efficient than checking the books themselves

Page 111: Principles of Information Systems

111

Systematising a library

• There are three main reasons to index: • Efficiency

- It is more efficient to consult the index than go through every book

• Exhaustiveness - Every book can be contained in the index

• Substitutability - A systematised collection again helps in finding a book on a broader or narrower topic, or the required book in another library

Page 112: Principles of Information Systems

112

Arrangement

• Arrangement means the systematisation of a set of objects, putting them into some order

• Dunnell shows how there are three parts to arranging a set of objects:

-preparing the organising principles (enumeration of the classes)

-describing the things to be organised (enumeration of the groups)

- identifying the things (matching the classes with the groups)

Page 113: Principles of Information Systems

Arrangement (Dunnell)

Page 114: Principles of Information Systems

My dog Patch

spaniels beagles foxhoundspointers terriers…

Patch is a foxhound

Page 115: Principles of Information Systems

115

Types of classification system

• A classification system can take many forms. The most commonly used are:

- Enumerative list

- Table

- Taxonomic tree

- Hybrid tree/table

Page 116: Principles of Information Systems

116

The enumerative list

• The simplest classification system is an enumerative list

- This simply but exhaustively enumerates the different things

• Enumerated list classifications are usually based on empirical evidence (what can be seen in the real world), and as long as there is a category for the unclassifiable they work well

Page 117: Principles of Information Systems

US Geological Survey’stable of land cover types

Page 118: Principles of Information Systems

118

Tabular classifications

• The next level of complexity uses two or more simple lists that combine to give a descriptive table of types

• This results in a double enumeration, and the final classification is located in the table cell

Page 119: Principles of Information Systems

A list enumerating the benefits of regulation is combined with one enumerating the costs This produces four cells, each a particular type of policy

COSTS OF REGULATION BENEFITS OF REGULATION Concentrated Dispersed Concentrated Interest group politics Client politics

Dispersed Entrepreneurial politics Majority politics

Page 120: Principles of Information Systems

120

Tree structures and taxonomies

• Hierarchical trees are common classification structures

• In a tree, things true of any node are true of all its descendants

- All mammals are warm blooded, therefore a lion (one of its subtypes) is warm blooded

• The tree structure also assists with identification, as every path you follow along the branches cuts down the choices available.

Page 121: Principles of Information Systems

The Linnaean tree of life

Page 122: Principles of Information Systems

122

Hybrid tree/table form

• Many classification systems use a hybrid form of tree and table

• For example, a table structure at the top level to provide broad groupings, and a tree structure within each group for easier identification

• Many library classification systems are of this type- Dewey Decimal, Universal Decimal Classification, Bliss systems

Page 123: Principles of Information Systems

The Harmonised System of trade codes from the World Trade Organization attempts to unify all previous trade coding systems

High level (top) and more detailed classifications

Page 124: Principles of Information Systems

124

Taxonomic keys

• A taxonomic key is a way of identifying an individual thing from a classification

• Like the game ‘20 questions’, you answer a series of questions until identification is reached

Page 125: Principles of Information Systems

Part of a taxonomic key

Page 126: Principles of Information Systems

126

When good systematisation goes bad

• Classification is a human and cultural activity, and while it produces communal understanding, it is also prone to error.

• Many libraries, for example, consider the widely used Dewey decimal system as distorted:

- It is too general for domain-specific libraries, such as law or music

-It does not match today’s requirements – eg for newer areas such as computing

-Some classifications are biased towards European-based (mainly US) thought and tradition.

Page 127: Principles of Information Systems

An extract from the Dewey decimal system used in libraries

Page 128: Principles of Information Systems

128

What can be done?• The Dewey custodians have suggested various strategies,

including adopting of the alternative Universal Decimal Classification’s religious schema, which was recently reformed for the same reasons.

• But would it necessarily stop with religion? The codes for history/geography and literature have a similar bias to the one found in religion, and computing and informatics have major problems with the current system.

• Billions of books worldwide are classified according to the existing schedules. Once the system is fragmented, this advantage of universality would be lost. An unmanaged or partial conversion to a new system would ruin the shared classification system.

Page 129: Principles of Information Systems

129

Systematisation and the Web

• Schemes for the semantic web require very careful thought and international management to make them globally useful.

• If definitive sets of organising principles (ahistorical and essentially permanent) for such classification schemes can be agreed, the work done to produce that will have innumerable applications.

Page 130: Principles of Information Systems

130

Simple enumerated lists, tabular classifications, trees and taxonomies, and hybrid tabular/tree structures are the main types of structures used to

describe classifications

Recap

Page 131: Principles of Information Systems

131

Summary• To organise and record the world effectively, we need to

formalise ideas about both things and types of things

• Three common types of information repository are databases, spreadsheets and knowledge repositories

• Information repositories can be seen as a set of propositions or statements

• Data modelling proceeds from a set of requirements to a regularised structure that can be used to build an information repository.

• Data structures are forms for representing data that allow computational processes to occur.

Page 132: Principles of Information Systems

132

Summary• Measurement and scaling have implications for how

systems are designed and built

• Modelling data about space and time has significant practical issues for informatics

• Systematisation is a way of classifying occurrences into formal schemes for describing the world

• Simple enumerated lists, tabular classifications, trees and taxonomies, and hybrid tabular/tree structures are the main types of structures used to describe classifications.

Page 133: Principles of Information Systems

133133