overview data, information, knowledge and...

1 Information systems

Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?

T S Eliot, poet

Overview In this first unit we will begin our investigation of IPT by looking at:

what information is and how, by itself, information does not provide knowledge or wisdom

the different ways information systems can be viewed some of the different forms of information systems basic terms and concepts of relational databases information system security and integrity.

Data, information, knowledge and wisdom The pessimistic words of the poet T S Eliot above are very apt in the world of the Internet. Anyone who has run a Google search that has returned hundreds of thousands of hits will agree with his sentiment. Eliot warns us we can have too much information to be able to derive any knowledge from it. Even if we do develop knowledge, there may be little chance of wisdom emerging.

Information is not the same as knowledge, and knowledge is not the same as wisdom. To understand the distinctions we must go back one further level to data, the basic building block of comprehension.

Data is the plural of datum. A datum is a simple recognisable fact, such as Red, or 25, or True. Red might be the colour of a car or a person’s nickname, 25 might be a temperature or a score, and True could be the result of an experiment or an answer in a test.

Each datum is simple (singular) and states a fact. 25 is not 15 and is not 35. It is something that we recognise stands for the value of two tens plus five units. We do not concern ourselves if the datum is correct or not. Even though we call it a fact, truth does not come into it. We accept data as given or, in an information system, as entered.

© Kevin Savage 2011 – Licensed to St Mary's Catholic College for use in 2012

Information systems 9

From the building blocks of data, information may be structured. From information, knowledge may be constructed and, if we are very lucky, wisdom may arise. To understand this we will look at an everyday example.

As I listen to the weather on the radio I might hear that the temperature outside is 25°C. Somewhere, someone looked at a thermometer and saw the number 25. This 25 is data (actually a datum).

The value 25 was reported to the weather bureau where they gave the datum form. They structured it by adding °C. This implies that the 25 exists on a scale of values that represents air temperature. The datum has now become information. It has been organised using the Celsius scale so that it has meaning. The meaning would be very different if the weather bureau was in America and had added °F. This is from a different structure – on the Fahrenheit scale 25°F is below freezing.

When I hear from the radio the information that the outside temperature is 25°C I can construct some meaning from it, I know it is not too cold and not too hot. By decoding that 25°C is a pleasant temperature, I have drawn inferences from the information and I now have knowledge of the situation. The information has been arranged within an overriding format so that it is useful. This overriding format is sometimes called a metacontext. In this case the metacontext is formed from my experience of weather conditions in the past.

Once I know that it is an agreeable day outside I can use the knowledge in ways that are compatible with my needs and with what is acceptable in society. I can make the decision not to wear a thick overcoat, or not to go outside in a swimming costume. If I have wisdom I use the knowledge effectively in a given situation. (Or not, as the case may be. Knowledge that has been constructed from information, but then not used wisely, has been described as being inert knowledge.)

To summarise, data are basic recognisable facts, information is data that has been organised so that it has meaning, knowledge is information arranged into a metacontext so that it is useable, and wisdom is knowledge being used effectively in a given situation.

From data to wisdom

Together these form a hierarchy. Wisdom comes from knowledge, which comes from information, which comes from data. At the bottom level a great deal of data is required to develop the needed information. As we move up there is only a limited amount of useable


Leading Technology 10

knowledge derived from this information. At the top there is usually very little wisdom displayed in the use of this knowledge.

The point of all this is that knowledge and wisdom do not come from an information system. The system if used effectively will combine data into information, but whether this information is developed into knowledge that is applied with wisdom depends upon the person operating the system.

Activity 1.1 – From data to wisdom 1. a What is meant by the term Information Age?

b In what ways has the availability of huge volumes of information on the internet and through other sources put us in the position of not being an age of knowledge?

2. a Explain in your own words what is meant by each of the terms data, information, knowledge, and wisdom.

b Give a simple, single example of each.

c The philosopher Immanuel Kant once said “Science is organised knowledge, wisdom is organised life”.

Explain what you think he meant by this.

3. The section above uses the analogy of today’s temperature to illustrate the difference between data, information, knowledge, and wisdom.

Create your own analogy. You may use your own situation or, if you wish, consider the scenario of a runner in a 100m race.

4. 25°C does not mean the same as 25°F. Give an example of a piece of data that can have two different meanings depending on the context it is in.

5. Give an example of a situation in which knowledge might be inert.

6. Whether information is developed into knowledge that is applied with wisdom depends upon the user of the system.

Give an example where an information system may not be used with wisdom.

Compare your answers with those of others in your class.

7. The Australian social commentator Hugh McKay has described today’s generation as being “answer rich and question poor”.

a What do you think he meant by this? In your answer you might like to refer to the ability to reason logically, and powers of discernment.

b In what ways has the advent of search engines such as Google led to the reduction in the need to consider and contemplate before seeking answers.

8. The artist Pablo Picasso once said “Computers are useless, they can only give you answers”.

In what ways is he still correct today?



Systems When we think of using a computer we usually imagine one person sitting at a computer working. This however is too simplistic a view of the situation.

The computer the user is sitting at consists of a system of hardware, software and interface. In turn the software forms a system, as does the hardware and interface. If the computer is connected to other computers or the internet then it is linked through a networking system. The user also belongs to a system; this might be a business system or an education system or a military system.

We will be investigating many forms of system, so what is a system?

Think of systems you know about. You have a digestive system, you may use a stereo system, you belong to the education system, and the Earth is part of the Solar System.

What makes each of these a system? Let us take one example and see what it consists of.

In a music system we may have a CD, a CD player, an amplifier, an equaliser, connectors and speakers. These are all related but none of these parts can produce music by itself. The CD is bright and shiny, but alone it is useless. However link the CD with each of the other components correctly and together the parts can fulfil a purpose – in this case playing music.

It is the same with other systems. From this it appears that a system is a group of inter-related parts that work together for a common purpose or goal. For example your digestive system consists of your teeth and tongue, your oesophagus, stomach, small and large intestines, and a variety of glands and other organs. These work together with the purpose of converting food into simple substances that can be transferred to the blood – where they become part of the circulatory system.

There are many types of system, but the ones we are interested in here are information systems.

An information system consists of interrelated parts that have the purpose of taking data as input, and then processing it so as to produce information as output.

An information system processes data into information



Within our bodies we have an information system called the central nervous system. This system takes data from our senses, uses the brain to process it, and develops information as an output, perhaps I’m hungry or I’m cold.

In computer terms an information system works the same way. A computerised information system will include facilities to input data, manipulate it as specified, and then display the results. In addition there must be facilities to keep the data and information secure.

An example of a computerised information system is the one used by the QSA. This system takes your QCS results and SAIs as data, processes these by combining and adjusting them, and then uses the results to produce the output information of your OP.

System views A computerised information system can be described, investigated or dealt with from a variety of points of view. The system will appear different depending on who is looking at it. The technician who installs the software and maintains the hardware will be interested in performance and reliability, the designer wants efficiency and accuracy, while the end user will be more interested in how easy and efficient the system is to use.

These different perspectives of an information system are called views. There are four broad groupings – the external, internal, conceptual, and logical views. We will look at each in turn starting with the external view.

External view Whether a system consists of one stand-alone personal computer or hundreds of networked terminals, each user interacts with it on a one-to-one basis. In doing this they build up their own mental image of how it appears to them, and of what is going on in the system they are using. This is described as the external view of the information system – how it is visualised by the person using it.

The external view is shaped by the parts of the system that a user is allowed to see and interact with. This will vary from user to user. Although the data in the system will not change, an everyday user will not have the same view as a manager, and neither will have the access rights of the system administrator.

id# name department id# name department salary rating

4789 Smith, J Accounts 4789 Smith, J Accounts $56 450 good

4127 Morris, M PR 4127 Morris, M PR $45 895 poor

2578 Downs, L Sales 2578 Downs, L Sales $49 230 good

Worker’s view Manager’s view

The external view is limited for three reasons. The first is confidentiality – not every user needs to see the details of other users. The second is security – only authorised persons should be able to alter data. If all users had write-access, data might be accidentally (or even deliberately) deleted or changed. Finally there is the aspect of convenience. A user does not need to see and handle all parts of a system. By limiting a view to only the sections that the user needs, operation can be less confusing.



Physical view In contrast to the external view there is a internal or physical view of an information system.

Data is stored using bits and bytes and is scattered across the storage medium. How this data is actually written to disc, recalled, sorted, summarised, printed, updated, secured and so on is at the physical level of the system.

A data table and ... ... the data as stored as bits on disc

This physical view is the reality, and consists of the hardware, software, operating systems and the database management system (DBMS), that perform the information storage and processing. This is at the heart of the information system and determines how it actually works.

Conceptual and logical views In between the user’s external view and the physical hardware, software and data on disc, are the logical and conceptual views. These are the abstractions used to define the components and the relationships that make up the information system, and then relate these in a meaningful way to designers, developers and users.

The conceptual view of an information system consists of:

a conceptual schema – the design for the objects that make up the data, the relationships between these, and the constraints or restrictions that apply to them

a conceptual information processor – the part of the system that controls and enforces the design so that changes to the data can only be made if consistent with the design

the conceptual database – the set of facts that make up the information contained in the database.

A section of a conceptual schema for an training course IS

We will look at conceptual schema in more detail in unit 8.

The logical view is how this conceptualisation is communicated or presented to users. The Windows tree structure for folders is a logical view of the files on a computer (see page 15). Representing information as tables of data in rows and columns with headings is also a logical representation.

011000010110110001101100011110010110111101110101011100100110001001100001011100110110010101100001011100100110010101100010011001010110110001101111011011100110011101110100011011110111010101110011

id# name age

12874 Smith, J 15

12895 Sullivan, A 16

15902 Taylor, P 15



Views In summary the four views of a computerised information system are:

physical (or internal) – the actual way the data is stored and manipulated; this view is how the system will appear to the technician who creates, installs or maintains the system

conceptual – the types of data and the relationships between these types; this is the view of the systems analyst who takes a real world situation and transforms it into an abstraction that can be represented as an information system

logical – how the data or information is represented or communicated to the user; this is the view of the system designer who brings order to the data by presenting it in an understandable way

external – the parts the user is permitted to interact with; this is the view of the information system as it appears to the end user.

Each view of an information system is closely linked to the others, and in places they overlap. For example the logical view determines the external view, and in turn the external view is very much dependent on the internal (software used, speed of operation, etc.).

Despite this connection each view should be as independent of the others as possible. Each should be able to be replaced without affecting the other parts of the system. For example the conceptual view should be flexible enough so that it can fit any suitable hardware and software configuration, and the user’s view should be able to be varied to allow for these changes. Again, if the user is isolated from the physical method of storing and retrieving information from the database, changes can be made to the underlying architecture (hardware, disk storage methods, etc.) without affecting how the user accesses it.

In IPT we look at information systems from each of these views. We use programming languages to gain a logical understanding of how computers do what they do. We use SQL to get some idea of the internal, or physical, workings of databases. We investigate ORM to develop conceptual schema, and then use these to produce a set of relational tables as a logical view of the information. We will also investigate the HCI of information systems to better understand the externals of how users see and interact with data.

Activity 1.2 – Model systems 1. a What is a system?

b List three different systems you know about.

For each of these identify the main parts of the system.

What is the purpose or goal of each of these systems?

You might present your answer as a table such as:

System Parts Purpose

c Pick one of these systems and identify what is taken as input, how is it processed, and what is output.



2. What is the purpose of a computer system?

3. a What is meant by the external or user’s view of a system?

b What are the three reasons behind establishing a view of a system for the user?

c Pick one of these reasons and explain in more detail why it can be an advantage that a user not be aware of more of the system.

4. a What is involved in the physical view of an information system?

b What is meant by the distinction between a logical and a physical representation of something? Give an example to support your answer.

5. a What is an analyst’s view of an information system described as?

b What three parts does this consist of?

6. Find out what the role of a DBMS is. Write your answer as a set of dot points outlining the functions performed.

7. Why is it important that the views of an information system be independent?

Forms of information system Having seen the different ways of viewing an information system we will now look at different forms information systems can take.

Hierarchical One of the earliest forms of designing an information system was to use the hierarchical model. An example you may be familiar with is the folder view in Windows.

A logical view of a computer system presented as a hierarchy



A hierarchical information system like this is usually represented as a tree diagram with the data forming the leaves at the end of the branches. It can be displayed horizontally (as above) or vertically (see below).

In a tree diagram each leaf and each point of branching is called a node. All branches originate from a single root node. A lower node is called a child, while the one above is called a parent node. In a hierarchical tree each child node is restricted to having only one parent node.

In a hierarchical system information is stored as data that are the terminal nodes in a tree structure. To find a particular piece of data it is necessary to start from the root and then travel down the branches from parent to child node until the required data is found.

parent child parent child parent child parent child

A hierarchical information system

In the Windows example above, starting from the Local Disc (C:) there are folders for Dreamweaver, Intel, etc. Each of these has subfolders contained within it. Eventually by tracing down through folder structure you will reach the data in saved files.

Hierarchical information systems were widely used in the early days of computing, especially by IBM. As a result tree spanning algorithms were developed to search down branches to find specific data quickly and efficiently.

A hierarchical database

root

data data data data data



There were advantages to using the hierarchical model for organising information (e.g. very efficient searching) but they proved too inflexible. Sorting data or restructuring the information were major operations, and so hierarchical systems generally fell into disuse. Some are still used for specialist applications such as document retrieval or file storage.

Network One of the problems with a hierarchical information system is that a child node can have only one parent. This means that if data needs to appear in two places (e.g. a person works for two departments, or a part is used in two different machines) then the data had to appear as a node on two different branches of the tree. Duplicating data like this can lead to problems of redundancy and update anomalies (data changed in one place, but not in another).

To overcome these problems network information systems were developed. These removed the single parent restriction on data trees.

A network information system structure

Network information systems retained the speed of searching of the hierarchical model and overcame the problems associated with duplicated data, but were very complex to program and maintain. When it was found that relational information systems gave the same advantages without the associated complexity, the network model also fell into disuse.

Relational This brings us to the relational information system. This model is the most widely used nowadays as it is simple, consistent and, if well designed, does not cause any of the problems associated with redundant data.

The relational model consists of:

data structures called tables or relations rules on allowable values or combinations of values in tables, and data manipulation operators.

Relational databases are based on relational algebra introduced by IBM’s E F Codd in the 1970s. The sound mathematical basis of relational algebra gives this model its consistency and stability.

We will explore the relational model in more detail shortly.

SALESREP CUSTOMER

PRODUCT INVOICE PAYMENT

PRICE DATE



A relational table in Access

Other types of information system Although relational information systems have been the most widely used over the last forty years, other forms are now appearing.

One is the object oriented (OO) database model. Similarly to object oriented programming, these information systems work from a top-down model of defining classes and objects, each of which have inheritable properties. This simplifies design and programming. OO information systems contain not only data but programs to act on the data. These programs define the objects that can then be manipulated usefully. There is as yet no generally agreed model for OO information systems and so different products organise data differently.

Other models for information systems include object-relational, multi-media, spatial, temporal and textual. Information systems also include rule based (or expert) systems, prognosis systems for forecasting, distributed on-line systems, and the simulation and optimising systems used in modelling.

In this book we will look at general fact retrieval systems using a relational model through SQL in unit 7, and object-role modelling in unit 8. We will also see a rule based system in unit 10.

Activity 1.3 – Classification of information systems 1. Draw a hierarchical tree diagram and label it to show each of the following:

node leaf branch root child node parent node

2. Hierarchical information systems are searched using tree-spanning algorithms.

A depth-first spanning algorithm would search the tree at right in this order:

F C Q W P R S M etc.

a How many nodes would the depth-first search visit before it reached the data at each of the following nodes:

i G ii E iii A.



b An alternative spanning algorithm is breadth first. In this the algorithm visits all nodes on the same level before moving to a lower level.

List the nodes that would be visited in order, using a breadth-first search.

3. a What is the principle advantage of hierarchical and network information systems?

b Suggest a disadvantage of each.

c Name three other ways of implementing an information system.

4. An update anomaly is when data is changed in one part of an information system, but not in another. For example, a person moves house but their address had been stored in two places. If one address is brought up to date and the other is not, the system has an update anomaly.

a Give an example of how an update anomaly might happen in a library database.

b Explain how an update anomaly could arise in a hierarchical information system. In your answer draw part of an imaginary information system and use this diagram to show how the anomaly could occur.

c Before the development of network information systems the problems of anomalies in hierarchical information systems was overcome by using “virtual records”. When a repeat of information had to be held in the information system, instead of creating a second record the information system would hold a pointer to the original record.

Redraw your answer to part b above showing the inclusion of a virtual record and explain why the update anomaly would now not occur.

Relational basics Relational information systems are the most widely used nowadays. When correctly designed they offer a simple, readily programmable way of representing and accessing data.

A relational database is made up of a group of named tables. Each table is displayed as a grid of values divided into rows (across) and columns (down). The table below is named Event entered, and has five columns (id number, name, age group, house, event) and six rows:

Event entered

id number name age group house event

1456 Smith, J U/15 Red 100m

3241 Bloggs, F U/14 Red 100m

3215 Jones, L U/15 Blue 200m

5439 Adams, F U/15 Green ?

6231 Davis, H U/14 Red 200m

4376 Fange, J U/14 Green 200m

Each data entry in this table is stored in one block of the grid called a cell. A value in a cell is called an instance. For example Smith,J above is an instance of name and is stored in a single cell.



Each cell in a relational database must contain one and only one instance (value) or be empty. An empty cell is described as null – indicated above by a ?. (Note: a null cell is different to a cell containing 0.)

The cells in a vertical column are referred to as a field, e.g. the name field or the id number field.

Each set of information in a horizontal row forms a record or a tuple (rhymes with couple):

e.g. the tuple: 6231 Davis, H U/14 Red 400m has five instances.

Duplicate tuples are not permitted, i.e. the same set of information must not occur twice in the same table or database. If it did, one set might be updated and the other not, and the data would then be inconsistent and the database would lack integrity. We saw this above as an update anomaly.

In a relational database there is no inherent order in columns or rows. They can be rearranged to suit our purposes. We can put names into alphabetical order, or get ages with youngest first. If we wish we can rearrange the columns. When such resorting takes place however the data in each tuple stays together. The sequence of the records or the order of the fields within a record may change, but the instances in the record tuple still contain the same data.

While columns (fields) are named, rows (tuples) are not, and so we need some way of distinguishing one row from another. One field (or several fields) takes on the function of being a key to a tuple. The key is used to identify the tuple. In the Event entered table the field id number is the key field,

e.g. 3241 is used to identify the second record; 6231 to identify the fifth record, and so on.

A key field obviously cannot contain a null, and must be unique, i.e. not repeated in the table. (Why not?)

A relational database can consist of one table or several tables. In these the different tables may have fields in common. These tables are thereby related, just as two children who have the same grandmother are related. The ability to combine related tables makes this type of database a very powerful tool. The relationship enables information that is spread across a number of tables to be accessed by a single search, achieved by linking the tables through their related fields.

In designing a relational database we will be aiming to make it as problem free as possible. To do this we must get it into what is called Optimal Normal Form (ONF). We will see how to do this in unit 8.

Note: although the data in a relational database appears to the user to be in rows and columns, it is not actually stored this way by the computer. The way the data appears to behave as rows and columns of a table is a logical view of the data, rather than the way it is physically stored. There is a distinction between the logical data and the physical data.

E. F. Codd, the “father” of relational

databases



Activity 1.4 – Related tables 1. The tables below are incomplete but show a part of a relational database. Use them to

identify:

a a column b a row c a field

d a cell e an instance f a tuple

g a named table h a record i a null

j a key field k a common field Student Achievement

id name age id subject result

1456 Harris, J 15 1456 IPT 70%

3241 Simons, F 14 1456 Maths C 82%

5423 Howard, M 15 1456 English ?

4578 Franklin, A 15 3241 IPT 58%

2. How are these two tables related?

3. Could the tuple: 3241 Simons, F 14 be added to Student? Why, or why not?

4. What does the ? in the result column suggest?

5. In Achievement could id act as a key field by itself? Why, or why not?

6. Each of the tables below has an error. Identify what is wrong with each.

a Arrivals b Code c Stock

date name id name code part price

15-6-10 Mike

Alice

1234 Jones, P 145789 sprocket $14.50

1245 Anderson, A 148792 widget $23.90

16-6-10 Clair

Shirley

George

1259 Howard, J 145876 grunckle $16.75

1245 Carey, M 148792 widget $23.90

1256 Falls, N 145796 hamsmith $14.20

SEI 1 – System security and integrity Once data has been transformed into information it has value and must be kept safe. In this our first look at the social and ethical considerations of using technology we will investigate how information can be abused or misused.

It is the responsibility of the person in charge of the information system to ensure its security and integrity. Security is ensuring the information is only used for its intended purpose, integrity is ensuring the stored values stay correct as entered.

The value of information During the second world war the British had a huge advantage over the Germans. They had captured several of the German Enigma code machines. Using a very early computer, the



genius of Alan Turing, and hundreds of other mathematicians, the British were able to break the Enigma code.

Throughout the war the Allies were able to intercept and decode enough of the enemies’ war messages that they were always one step ahead of the Germans. Whenever a bombing raid was planned, when troops or supplies were to be moved, as U-boats set sail, and so on, the British and Americans had prior warning and were able to act against them. The cracking of the Enigma code and the information it supplied was a major factor in defeating Hitler.

In this case the information that the Allies had helped win a war. To have access to information your competitor does not gives an advantage, whether the situation is a conflict like a war, or something less dramatic. Knowing market trends can aid a business, being aware of weather patterns can help a farmer, knowing key “cheats” can give you a win in a computer game.

Since having relevant information can give an advantage, information has value. It can be bought, sold or traded, just like any other commodity. To illustrate this consider the following examples:

1. A profile, in computer terms, is a set of characteristics or qualities that identifies a person, either as an individual, or as belonging to a category or group. For example a person might be identified as young, fitness oriented, and having a moderate disposable income. Such a profile can be compiled by analysing the on-line and web browsing habits of the person. By tracking sites visited, on-line purchases, preferences, contacts, and so on, the particular characteristics of a certain individual can be categorised. From the profile targeted advertising in, for example, pop-up ads or other forms can be directed at the person.

2. Credit information is also valuable. When an individual applies for a loan from say a finance company, the company may phone a credit bureau. Credit bureaus hold and trade financial information about individuals. The bureau will check its records and, for a fee, give a credit rating on the person. On the basis of this rating the loan may be given or withheld.

3. A mailing list is a list of addresses of a specific group of people who can be targeted for a particular purpose such as advertising or surveying. Say, for example, a company was about to promote a new range of imported cars. One very expensive option would be a mail drop of a colour pamphlet into every household in Australia. If however they had a mailing list of every home with a member whose income was over $80 000 they could target those households, greatly reducing their operating costs. Such mailing lists do exist and sell for thousands of dollars.

In each of these cases the information held (profile, financial data, mailing list) is worth paying for. There are hundreds of other examples where an organisation will pay for information.

The cost of the information involves three aspects, the cost of creating or developing the information, the cost of maintaining it and keeping it up to date, and the cost of communicating or passing it on to those who will use it. Although such research is expensive, in the long run it will save money and increase profits. In turn this could lead to industrial espionage – spying, hacking or bribery to gain access to the information held by a competitor.

An Enigma machine



Whether information is used for commercial purposes or for any other reason it has a value to the holder of that information and so it must be kept secure.

Information security It is the responsibility of the information system manager to maintain the security of the data she or he is entrusted with. In the case of a home computer the information system manager is the owner, in an office or corporate system however the manager may be responsible for hundreds of computers.

Threats to the security of data include:

hackers – a hacker is someone who gains unauthorised access to a system; while most hackers are joyriders attempting to enter systems for the challenge involved they can still accidentally alter the data in a system; in extreme cases hackers can be malicious and may seek to vandalise or steal from the system

fraud – fraud is gaining money by deception; computer fraud is usually committed by someone with authorised access to a system who alters the information it contains to their own benefit

data diddling – altering data in a system; this is usually deliberate and can be done to hide unfavourable information or to damage another person or group; recent examples include the alteration of web pages

data theft – copying and removing information without authority, for example illegally compiling mailing lists, or selling of restricted information obtained from a system

data mining – the collection of information on individuals so that the data can be used for some purpose, collected without the individual’s knowledge by automated programs (bots) from web sites, completed forms or other on-line sources; information so collected can be used for identity theft or to create a profile of the person

industrial espionage – the stealing of trade secrets and commercial information from an economic competitor

viruses – malignant computer code that can damage the data in a system privacy invasion – obtaining information about other people without the right to do

so; in some cases the information might be used for blackmail copyright infraction – copying and using intellectual property without payment to, or

permission from the rightful owner of the information.

The information manager must put safeguards in place for each of these threats.

Basic safeguards include:

limiting access – restricting use of a system to set people, usually by password access; each user has certain read/write privileges depending on their level of authorisation

software controls – procedures in place so that changes to data are monitored and can be reversed if necessary

threat monitoring – records are kept of unsuccessful accesses to a system, overlong sessions by users, or the overuse of certain data files; these can be taken as indicators of unauthorised use of a system



physical security – restricting admission to a system, portable discs and flash drives accounted for, printouts only available to authorised people, etc.

backup – maintaining an effective backup of data so that it can be reinstalled in case of deletion, loss, or system crash

anti-virus – using software or hardware to guard against the actions of computer viruses or worms

firewall – software protection to provide a layer between an intranet and the Internet; this isolates a system so that it s not vulnerable to access from the outside

staff security clearances – only permitting trustworthy users to access sensitive sections of a system

copy restrictions – preventing users from copying information from a system without permission; this can be by vetting outgoing files or attachments, or by monitoring unusual duplication of data

encryption – sensitive data can be encoded so that only users with a key can view it separation of data and identifier – where possible keeping records in such a way that

an individual cannot be recognised.

The maintenance of data security is a major responsibility of the system manager and is a very complex task. It is not made any simpler by the characteristics of on-line data. Computerised information can be reproduced quickly and then easily uploaded to another system. When it is copied the original owner is not deprived of it, and may not even be aware they have been deprived of its exclusive use.

The integrity of data The integrity of a computer system refers to all data values being correct (or at least as correct as originally entered). It also concerns data being mutually consistent so that there are no anomalies in the system. (One example of an anomaly might be two different ages recorded for one person in different parts of the system.) A system with no such anomalies is said to have referential integrity.

The integrity of a computerised information system depends on data being:

complete accurate relevant up to date, and secure.

Again it is the system manager’s responsibility to ensure the integrity of data. Effective data capture systems, secure storage procedures and well designed output reports are all part of maintaining the reliability of the data.

To maintain this integrity the system manager will be reliant on the DBMS in place. The DBMS will handle:

the data dictionary – a list of the values the system can hold along with their properties



data transformation and presentation – operations that can occur on the data and how it can be displayed

data storage management – how the data is physically retained on storage media security – measures to keep the data safe backup and recovery – procedures to follow to ensure data is not lost multi-user access control – how more than one user can access the system effectively

and securely data integrity – maintaining data as it was entered access language and API – the language used to manipulate the data (e.g. SQL) communication interface – the user interface.

Activity 1.5 – Safe and secure 1. a Explain the difference between the security and the integrity of data.

b Whose responsibility is it to ensure that the security and the integrity of data is maintained?

2. a What do we mean when we say information has value?

b Give at least two examples (of your own) that show that information has value.

3. a What are the three areas of costs involved with information?

b What is industrial espionage?

c What forms might it take?

d Why might commercial operations be prepared to undertake industrial espionage?

4. Make up examples to show how data could lack integrity in each of the following ways:

a incomplete

b inaccurate

c irrelevant

d out of date.

5. What is meant by identity theft?

Accessing the data Governments and businesses have always collected data on individuals and there has always been a potential threat to privacy in the use of this data. With the advent of the electronic computer this threat has been greatly magnified.

Using computers the huge amount of data stored, the ease of searching this data, and the comparisons that can be made between data stored in different places and forms has made the data vulnerable to abuse and misuse.

Abuse is deliberate, misuse is accidental. We will look at examples of each in turn.

One form of data abuse mentioned above is to use computerised data to build up a profile on a person. The profile will list their personal details, buying habits, preferences and habits. A profile is assembled by tracking the electronic activities of a person. These activities can



include the use of a credit card, internet accesses, use of a smartcard, videos rented, places visited and so on. Once compiled this profile could then be sold to marketing people who would use it to target the individual.

An example of developing a profile might be where it is noted that a given individual visits the EPSN (sports) web site often, uses a credit card at sports stores, and has books on sports on loan from a library. This person is now identified as being a potential customer for a range of associated goods such as fitness equipment or health foods. This is valuable information to marketers either for direct mail or to target ads when on-line.

Another form of data abuse is blackmail. By finding out secret or confidential information about an individual it is possible to then use this information to extort money from them. The data could also be used to force them to do something they do not wish such as to promote the holder of the data. The blackmailed person must do what they have been told to prevent the release of the confidential information.

Data theft is also a form of data abuse. Compiling mailing lists from a legitimate databank, or querying a data file to sell information illegally are two examples of this.

In each of the cases above the abuse of the data has been deliberate. Data misuse usually occurs through ignorance or carelessness. Examples include:

data given to others when they have no need to know of it deleting data accidentally using irrelevant data delays in updating data so that it is out of date transmission errors where data is corrupted poor security using data for purposes other than for what it was supplied.

Controls should be in place so that data is not open to either abuse or misuse.

Abuse of data files In 1924 in America, J. Edgar Hoover was appointed head of the Federal Bureau of Investigation (FBI). Over the next 48 years, until he died, he dominated the bureau bringing it to a high level of efficiency and effectiveness.

During his years in control of the bureau Hoover used his office to compile files on many wrongdoers in America including gangsters, racketeers and foreign agents. Over time as he became entrenched in his position Hoover extended his files to cover politicians and other leading community figures.

Eventually Hoover had files on most of the important decision makers in America, many of the files including highly private information, some of it information the individuals did not want revealed. There is a suggestion that the sensitive data was used to pressure the individuals into acting in ways that Hoover wanted. One description of Hoover

A rare photo of Hoover, taken in his younger days



likened him to a giant spider sitting at the centre of a web of intrigue, twitching the strands to make the leaders of America dance to his tune.

In the end Hoover’s influence was such that even the president (Kennedy) was not able to remove him from his position.

There is always the potential that when private information is in the control of one person that it be used for that person’s own purposes. It is important that checks and balances be introduced into any institution in which this might occur.

A check is any device to prevent a situation occurring, e.g. using an independent person such as a magistrate to authorise activities. A balance is any way of ensuring that activities are not carried too far in any given direction.

Data merging and sharing The data systems of government agencies, and those of the finance, insurance and health industries are very large and complex, while the data in them is particularly detailed and sensitive. The uncertainty, lack of precision and the complexity that arises when the data from these different systems is merged is extremely difficult to deal with.

Data merging is when information held on different computer systems is brought together to form a new file, e.g. a mailing list.

Data sharing is when information is given to users on another system to either check entries or to discover connections. An example of data sharing is where the taxation department is permitted access to bank accounts to detect if a person’s stated interest income matches their tax return.

Problems that might arise with merging or sharing data include:

irrelevance – data may be accidentally or deliberately given to others when they have no need to know of it

inaccuracy – this may arise out of delays in updating data, data taken out of context, transmission errors, or poor standards of checking

disclosure – more people see the data and therefore there is less security, and the new users of the information may not be aware of its sensitivity

no consent – the information may be being used for a purpose other than that for which the individual supplied it

tampering – once files are on-line they are vulnerable to “hackers” or other unauthorised access.

The ability to merge data files or to simply share information has been greatly increased with the use of the Tax File Number (TFN) in Australia. This number is required by each working Australian. Originally the Tax File Number was for use only by the taxation department but its use is becoming more widespread, and the potential to abuse it as a key to query separate files is growing.

Originally (by law) only to be used for tax recording purposes, the TFN is now used when opening bank accounts, applying for social security, and in some cases is asked for when applying for a job.



In itself the TFN is not a threat to privacy, however, being a unique identifier, it permits easy reference to specific individuals. Information in many different places is difficult to collate (how many J. Smiths are there in Australia?). While it is possible to query many different systems and build up a profile of a target individual by matching name, address, age, and so on, this is time consuming and difficult.

With a unique identifier however this whole process is simplified. Each TFN applies to only one individual, and if in turn it was stored on many different systems the compilation of a profile would be easy.

Security guidelines If the information in a system can be used to invade personal privacy or to abuse power, controls have to be instituted.

The U.S. Privacy Commission suggested the following guide-lines:

the existence of all personal records must be made public the individual must have some way of finding out when information about him/her is

stored and how it is used personal information should only be used for its intended purpose individuals should be able to correct or amend their own records personal information must be obtained in such a way that it is: complete, accurate,

relevant, up to date, and secure all uses of information should be accounted for by a responsible manager.

The holding of data carries with it the responsibility of ensuring it is only used for its intended purpose. It is part of the job of a computer system manager to make sure this happens.

Activity 1.6 – Holding data 1. a In computer access terms what is a profile of a person?

b Give an example of the type of information that might be used to build up a profile on a person.

c What is a cookie and how are they used to track user access to a web site?

d How might information obtained from cookies be used to help develop a profile on a person?

2. Blackmail is where confidential information is used to coerce (force) a person to do something they do not wish to do. Give examples of the types of information that might be held on computer that should have restricted access so that they not be available for blackmail.

3. a What is the difference between the abuse and the misuse of data files?

b Give examples of each.

4. a How did the FBI’s Hoover use data files to protect his interests?

b What steps should be taken to avoid a similar situation arising in Australia?



5. Why should we be concerned about data merging and data sharing between different information systems?

6. a What is a smart card?

b Discuss the advantages and disadvantages involved in using a smartcard.

7. One of the safeguards to ensure the security of data is to have one person ultimately responsible for the data.

a Why is it more important to have one person responsible for data, than to have a group of people or a committee in charge?

b What checks and balances should be put in place to make sure the information manager does not in turn abuse or misuse the data?

8. a What is a TFN? Who must have one?

b What is meant by a unique identifier?

c How does a unique identifier make the merging or sharing of data files easier?

d What are the implications to the right to privacy that have arisen with the more widespread use of the TFN.

It is now time to see how we interface with these information systems


overview data, information, knowledge and...

Documents