modeling the data in a systemcse.csusb.edu/dick/cs372/a4.pdf · here is a simple example of a dfd...

21
[Skip Navigation ] [CSUSB ] / [CNS ] / [CSE ] / [R J Botting ] /[CS372 Course Materials ] /a4.html [Search Go ] T ue Nov 8 15:05:20 PST 2011 [a4.txt (Text)] [a4.pdf (PDF)] [About ] [News ] [Schedule ] [Syllabus ] [Readings ] [Glossary ] [Contact ] [Grades ] Contents Modeling the Data in a System : Story -- Data determines the feasibility of systems : Story -- The Available Data effects Reliability and Usage : Stories of data transfer between systems : Introduction to data models : Story -- Go with the flow : ERDs and DFDs : DFDs -- Data Flow Diagrams : Review Questions : Online Exercises on DFDs : Typical Exam Questions and Exercises on DFDs : Exercise -- Context Diagram of a Possible Project Abbreviations Links Modeling the Data in a System Story -- Data determines the feasibility of systems Recently, at this campus we rolled out a new system for handling registration etc -- including grades. It is called the "Common Management System" or CMS. A faculty member asked in the Fall of 2009 about posting "Incomplete" grades. Given that Faculty members now must use CMS to enter grades, why is the hardcopy multiple copy form required? Notice that this is a classic system improvement pattern of remove paperwork. And here is the reply It just so happens that CMS has developed an incomplete form that will be delivered to us soon. We plan to roll it out in the Winter Quarter as we need time to do set up and training. T he form does have a dialog box to enter what the student needs to complete and a deadline for that work. It also allows for a default grade other than the "F" or "NC" if sufficient work was completed at the time of the contract. Since some students need to sign or accept (they can do the acceptance through MyCoyote Student Grades) the contract before the grade rosters will be available, this feature will be added to the class roster, too. Our campus requested the Incomplete form as a result of the CSU Student Records Audit that noted we had not received forms for all of the incomplete grades, they were not completed properly when received, and did not have a student signature..... This is considered a contract between the faculty member and the student, so it does require both signatures. The answer was that the originally implemented system could neither input nor store the required data for the function of filing an Incomplete grade. Moral: find out about what data exists, what is needed, and how it can be computed or input. By the way, also note the iterative implementation strategy: roll out only some of the functionality at each iteration. Start with a good (but incomplete) system and add to it periodically. We will compare this to some alternatives (Big Bang for example) later. T his is a continuing story, see later in these notes on an unexpected problem with this nice new feature. Story -- The Available Data effects Reliability and Usage When we could only get rosters in hard-copy we would have to type the names and student IDs into a spreadsheet. T his always introduced errors in the data. But when I could copy the rosters from a terminal screen into the central "SIS+" and paste the data into a spreadsheet the errors almost disappeared. Again note the pattern -- remove paperwork. CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html 1 of 21 11/08/2011 03:07 PM

Upload: others

Post on 02-Feb-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

[Skip Navigation] [CSUSB] / [CNS] / [CSE] / [R J Botting] /[CS372 Course Materials] /a4.html [Search

Go ] T ue Nov 8 15:05:20 PST 2011

[a4.txt(T ext)] [a4.pdf(PDF)][About] [News] [Schedule] [Syllabus] [Readings] [Glossary] [Contact] [Grades]

Contents

Modeling the Data in a System: Story -- Data determines the feasibility of systems: Story -- The Available Data effects Reliability and Usage: Stories of data transfer between systems: Introduction to data models: Story -- Go with the flow: ERDs and DFDs: DFDs -- Data Flow Diagrams: Review Questions: Online Exercises on DFDs: Typical Exam Questions and Exercises on DFDs: Exercise -- Context Diagram of a Possible ProjectAbbreviationsLinks

Modeling the Data in a System

Story -- Data determines the feasibility of systems

Recently, at this campus we rolled out a new system for handling registration etc -- including grades. It is called the"Common Management System" or CMS. A faculty member asked in the Fall of 2009 about posting "Incomplete" grades.

Given that Faculty members now must use CMS to enter grades, why is the hardcopy multiple copy formrequired?

Notice that this is a classic system improvement pattern of remove paperwork.

And here is the reply

It just so happens that CMS has developed an incomplete form that will be delivered to us soon. We plan to roll itout in the Winter Quarter as we need time to do set up and training. T he form does have a dialog box to enter whatthe student needs to complete and a deadline for that work. It also allows for a default grade other than the "F" or"NC" if sufficient work was completed at the time of the contract.

Since some students need to sign or accept (they can do the acceptance through MyCoyote Student Grades) thecontract before the grade rosters will be available, this feature will be added to the class roster, too.

Our campus requested the Incomplete form as a result of the CSU Student Records Audit that noted we had notreceived forms for all of the incomplete grades, they were not completed properly when received, and did not havea student signature.....T his is considered a contract between the faculty member and the student, so it doesrequire both signatures.

T he answer was that the originally implemented system could neither input nor store the required data for thefunction of filing an Incomplete grade.

Moral: find out about what data exists, what is needed, and how it can be computed or input.

By the way, also note the iterative implementation strategy: roll out only some of the functionality at each iteration.Start with a good (but incomplete) system and add to it periodically. We will compare this to some alternatives (Big Bangfor example) later.

T his is a continuing story, see later in these notes on an unexpected problem with this nice new feature.

Story -- The Available Data effects Reliability and Usage

When we could only get rosters in hard-copy we would have to type the names and student IDs into a spreadsheet.T his always introduced errors in the data.

But when I could copy the rosters from a terminal screen into the central "SIS+" and paste the data into aspreadsheet the errors almost disappeared. Again note the pattern -- remove paperwork.

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

1 of 21 11/08/2011 03:07 PM

Page 2: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

T he latest system (CMS) gives a teacher the option to download his or her roster directly as a spreadsheet. T his avery useful feature. It is the fastest and most reliable system I have used. However it is less secure because thespread sheet has to be downloaded in an unencrypted form... and pieces of unencrypted data may be left on my harddisk -- even after I have deleted the downloaded file.

Similarly, Course Management Systems on this campus (Blackboard, Moodle) also extract CMS data to populate thegrading subsystem. Again, having easy access to the data makes the system an improvement over previous systems.On the other hand faculty who have uploaded materials (data) to Blackboard will not want to upload it into Moodle --so Moodles success will depend on processes that download and re-upload data.

Stories of data transfer between systems

Another recent change also illustrates the importance of data. T he campus has just moved student's Email handlingfrom an internal server and data base to the cloud in the form of Google gmail. In theory it was a simple extract the datafrom the old system and send it to the new one... and was scheduled to be done over night... In fact some of the data inthe old system was not as expected and so erroneous data was created on the new. It took another 24 hours to fix this.Something similar to this occurs when I move an address book from my old Palm Pilot to my iPod. T he Palm will outputa text file (export) and the iPod/iMac will import it. But the Palm Pilot allows blank lines in the Note field and theiMac/iPod does not. So I have to filter the file using some simple Unix spells...

Introduction to data models

System flow charts are a popular, traditional, and simple way to picture a system. T hey show stuff moving through it(flows), being 'stored' in it, being 'processed' by it, entering, and leaving it. T his is a classic PowerPoint slide! We call themovements 'flows'. Here are some classic flows: goods, money, data, and objects. All shown as an arrow... which can beconfusing.

Computer based systems are almost entirely about handling data. In systems, the data that exists and can be created,processed, collected, and output drives the selection of good designs. We need a way to trace and define the data inour systems. We need a way to picture and visualize the data in new systems.

T o analyze and design systems that handle data we need a specialized diagram for showing the flow of data. T hese arecalled Data Flow Diagrams. We also need specialized diagrams for showing the structure and meaning of data. T heseare called Entity-Relationship-Diagrams.

If you want to change a systems it is vital to understand the data in it. T he technical feasibility of a new system willoften depend on what data is already available. Samples of data (printouts, forms, manual files and records) are a goodstarting point. So are the descriptions of data in the documentation and source code of any software in the system. Butyou need to make a more abstract or essential model of two things: the (1) the dynamic flow of the data through thesystem, and (2) the static structure of the data in the system. T o master the complexity of a real domain you needdiagrams that just show the essentials: how the data moves, where it is stored, and how different data is related. T heseare best done by drawing DFDs (Data Flow Diagrams) and simple ERDs (Entity Relationship Diagrams). T he details areoften described in a Data Dictionary and we will cover these later.

Information Technology is all about delivering information to people. Information is data provided to the people whoneed it, in their preferred format, at the right time. Information needs to be computed reliably, cheaply, and securely.T racing the flow of data from source to sink is a vital technique to achieve this aim.

Story -- Go with the flow

When I worked with in the British Civil Service a colleague described the following meeting. He had been invited to visit abranch and was there for the day to consult with them about a new computer system they wanted to develop. T heyexplained that they wanted a program to print out a 20 page report. Each page had 20 columns and 50 rows.... He andthey worked on the content and format of the report and half-an-hour before lunch they had the whole thing definedand ready to be programmed. T he programming would be done by another team. T here was, apparently, nothing to dofor the next 30 minutes.

So the analyst asked -- "What do you do with the 20 page report?". And a manager replied -- "I look for the row with thelargest value in column 17." So my friend asked: "would you like the computer to do that for you?" T hey replied: "Can acomputer do that?" He said "Yes -- and printing one line instead of 1,000 will save money!". T hey liked it.

So my friend asked: what do you do with the row of data? T hey told him "We multiply the 2nd column by the 4th columnand subtract the 5th column". And he said: "T he computer can do that too, if you want". T hey liked it.

So my friend -- now hot on the track of the end of the data flow -- asked "what do you do with the result you calculated"-- they said "if it is greater than 100 we send a memo to the manager listed in column 1." At last, my friend had found anaction... "Could we just send the memo for you and let you know it was sent?".

T hey then went to lunch at a local pub...

Moral -- always ask where the output data goes to. And contrariwise -- ask where input data comes from.

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

2 of 21 11/08/2011 03:07 PM

Page 3: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

ERDs and DFDs

We analyze and design data flows using: external entities(input/source, output/sink), processes, and stores. T he data flowdiagram or DFD is the central diagram used in information technology. We also analyze the data. We need to know itsorganization and meaning. UML Entity-Relationship Diagrams are a simple tool that does this.

Once you have a DFD is it useful for pin pointing the changes the enterprise needs to make. You can use DFDs topresent the choices to management. T hey form an excellent start for specifying the hardware and software that will beneeded. Meanwhile the static model -- the ERD -- is the starting point for designing a data base and then designingobjects inside software.

In summary DFDs and ERDs are a useful intermediate step between problems/opportunities and solutions/plans.

DFDs -- Data Flow Diagrams

Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System.

A DFD is a circuit diagram of system. When done right -- following some very specific rules -- they becomes arigorous picture of a information processing system. Sometimes we inherit DFDs as documentation of an existinglegacy software system. T his can be very helpful.

T hey are good for

Making rough notes when interviewing people.1.Mapping out existing systems to find out things to change and things to leave alone.2.Planning new systems.3.Planning the instalation of a new system.4.Verifying our designs: how will they work?5.Presenting our plans to management and stakeholders.6.Specifying a process or function as a black box -- with hidden details inside.7.Documenting a system to help others understand it.8.Getting a list of data stores to start an Entity-Relationship or Conceptual Business Model.9.

Definitions of DFDs

DFDs ::="Data Flow Diagrams".1.DFD::="A diagram that shows how data moves through processes and stores, from sources to sinks, in a system". Adata flow diagram has

External Entities -- Where data comes from or goes toSources -- Where data comes fromSinks -- Where data goes toSome External Entities are both sources for some data and sinks for other data.

1.

Processes -- Where things happen to data2.Stores -- Where data is held ready for future use3.Data Flows -- connecting processes to and from entities, processes, and stores.4.

Here is an example of a rough pencil and paper DFD:

2.

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

3 of 21 11/08/2011 03:07 PM

Page 4: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

Each DFD summarizes a collection of simple statements. T he above diagram implies some of the following facts:

T he Author makes changes to the document.1.T he Author reads a preview of the document.2.T he document is printed on the printer.3.

Physical and Logical Data Flow Diagrams

A DFD can be used to model the physical structure of a system. T he physical model describes and names thehardware and software involved. Each process is one program, but may be a subsystem of programs. Each store is aseparate file (think of a folder in a filing cabinet) or a table in an data base. In other words, physical DFDs show thearchitecture of the system not its underlying logic. However this information is better shown using an UMLDeployment Diagram.

In a logical DFD there is no mention of how something is done. No technology is mentioned. Several programs may beinside a single process. Avoid drawing DFDs that show the inner workings of a program -- they are better ways topicture internal architecture of software. One program may even implement several processes. Stores are notdescribed in terms of their media (data base, mag tape, disk, RAM,...) but are named for the entities (outside thesystem) that they store information about (student, teacher, ...).

As a rule you should aim to move to logical DFDs as soon as possible. You can then solve the logical problems in thesystem without getting confused in the technology. T his process produces a top-level design for a new system and isthe start for specifying data and programs.

Notations for DFDs

T here are three different icons in a DFD: External entity, Process, and store.

T here are several different notations for DFD icons:

Yourdon and/or De Marco1.Gane & Sarson2.SSADM(Structured System Analysis and Design Methodology)3.Unified Modeling Language Component diagrams.4.

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

4 of 21 11/08/2011 03:07 PM

Page 5: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

T he SSADM DFD notation was developed by the British Civil Service (with LBMS Ltd.) from the Gane and Sarsonnotation. It is used in England and what used to be the British Commonwealth. As far as I can judge the Gane andSarson form is most often used notation in the USA. T he Gane and Sarson notation also allows a process box tohave three compartments. T hese are used for: (top) a unique process ID. (middle) description of the function of theprocess. (bottom) the location where the process is carried out of the actor responsible for the process.

I will use Gane and Sarson and encourage you to do so as well in this course. But different enterprises will usedifferent notations.

Below I have some notes [ UML notations for DFDs ] that show how the UML is used and explains why you should,for now, use one of the other notations rather than the UML.

Semantics of DFDs

Many people misunderstand DFDs -- they don't know what they mean. T hey have the wrong semantics. T hissection is about the meaning of the parts of a DFD. It is vital that you study the meaning of diagrams as well asjust learning the notation (syntax).

Semantics of External Entities in DFDs

External entities are outside the current system. T here are sources and sinks. Sources show how data that flowsinto the system from outside. Sinks show where data leaves the systems. Some entities are both sources andsinks. We tend to think of entities as being people. But they can be parts of other systems -- hardware and/orsoftware. T he key point is that we can not redesign external entities. Our system has to fit them. T hey are alsothe main source of disturbances that the system must handle. We can not control the input from an externalsource unless we have a process to handle anything that can happen and sieve out the data that is needed for oursystem.

Semantics of Processes in DFDs

Processes are the only active part of a DFD. It is the only place where results can be computed, data processed,and decisions made. Data does not flow without there being a process to move it. A process is best thought of as acontinuously running program. T hey handle whole streams of data. T hey may wait when the data is not availablebut they do not stop. T hey may repeat the same computation on each item of data as it arrives. T hey can makedecisions and route input data to different outputs. Processes can also wait to be asked for data and then provideit to one their outputs. T ry to not see them as steps in an algorithm -- use an activity diagram (later) foralgorithms.

Some processes are subsystems. T his helps keep the diagrams of complex diagrams simple. T hey are shown as awhole process in some DFDs. Each is also defined by a DFD. T his is called the refinement of the process. Suchprocesses can contain hidden data stores and sub-processes. T here is a potential tree of refinements.

Semantics of Data Stores in DFDs

Stores are places where data is placed, and where it waits to be used. Some people use the CRUD mnemonic todescribe the interaction between a process and a data store:CRUD::acronym=Create + Read + Update + Delete.

Ultimately the data flows between processes and data stores are (nowadays) programmed using the Structured

1.

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

5 of 21 11/08/2011 03:07 PM

Page 6: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

Query Language --(SQL).

SELECT StudentName FROM Student WHERE Student.id = "123-45-6789"

However it is a mistake to go in to this level of detail in a DFD. A single data flow attached to a data store can beimplemented by any number of SQL-type statements.

On the other hand you should aim to have each data store labeled with the name of a single type of real worldobject. T he data store holds records about all entities of some type or other. T he name of the data store shouldreflect the type of entity. Ultimately they become tables in a database or file.

T raditionally, creating data in a data store -- adding new records -- is shown by an arrow that flows from a processto a data store. Reading data is indicate by an arrow from the store to the process that needs it. Updates anddeletions are shown as two way arrows since data has to be read and then rewritten.

Notice that a data store is needed whenever data is reordered or reorganized. On the other hand if the store is aqueue or buffer, so that the first item of data to arrive is the first to be output then we don't show a data store:arrows are understood to be buffered by a queue.

Another simplification: you can put the same data store in several places. T raditionally you mark stores like thiswith an extra stripe at the left hand end. It also helps if you give each store a unique Id.

Semantics of Arrows in DFDs

T he meaning of a data flow (arrow in a DFD) is subtler than you might guess. It depends on the symbols at eachend: process, Entity, or Store.

Notice that only a process can move data. So each data flow must either come from or go to a process. We do notpermit data flows to connect entities or stores unless a process is involved.

Connections between processes and entities define the interfaces between the system and its environment. It israrely unambiguous what data is communicated. T hus these data flows must be described -- at least given a name.

Similarly, it is not clear when you connect one process to another process with an unlabeled arrow what is goingon. T he arrow needs to be named with the data being transmitted. T he name will need further definition (later) in aData Dictionary. Occasionally you will meet a doubled headed arrow -- here someone has to define the protocolthat describes the conversation between the two connected processes.

Notice that in real systems (unlike computer programs) data flows between processes are buffered. One processwrites the data and the data waits in a queue until the other process reads it. T he writer doesn't have to wait forthe data to be taken away. For example when you send me Email it is automatically stored before I read it.Similarly "Snail Mail" is put in my box. Memos, rosters, etc. are all buffered for me. So when Modeling a real systemyou don't have to say that data in a data flow is in a queue. T his buffering is implicit in the the Data Flow model.

A data flow out of a store can only go to a process. It indicates that the process reads the data in the store butdoes not change it. External entities and stores are not allowed to read data directly -- they must get the dataindirectly via a process. However, you don't have to label and document these data flows if the process can readthe whole store. You only have to document the data flow from a data store if the process accesses only a part ofthe store.

A data flow into a store must again come from a process. It indicates any combination of the three basicoperations: Create, Update, or Delete. Again if the arrow is unlabeled then it is assumed that the process can (orwill) change any item in the store.

A double-headed arrow between a data store and a process indicates that the process may: create, read, deleteand update the data in the store. Some omit the arrow heads in this case.

. . . . . . . . . ( end of section Semantics of DFDs) <<Contents | End>>

Drawing DFDs

Keep DFDs simple by keeping them abstract, logical, or essential -- don't document the media and format of theinformation -- just give it a meaningful name. Note: you can keep a list of the current or planned media/formats in a"data dictionary". Similarly a DFD should not show the current type of a part: people, procedures, hardware, andsoftware all tend to be implementations of processes. T he type of a component should be noted in a data dictionary(see [ a5.html ] ). Neither should a DFD show steps in a user scenario like "login" and "logout". T hese can be analyzedlater in the process using more suitable tools.

Do DFDs quickly -- pencil and paper, chalk-board. Only tidy them up when some else needs to see them. Use a toolonly to impress people. However, even when sketching roughly follow the rules and avoid the errors listed on thispage.

Some people put unique short identifiers on each part of a DFD. Avoid this if you can! But in those cases where theboxes are numbered, here are the rules: processes are numbered 1,1.1, 1.2, ... and data stores have an id that startswith "D" plus a number. External entities can be given single lower case letters to be their unique id. T hese ids are

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

6 of 21 11/08/2011 03:07 PM

Page 7: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

good for linking the same part in different diagrams. For example, the parts numbered 1.1, 1.2, 1.3, etc. are all partsof the process numbered 1. Similarly, 1.2.1, 1.2.3, etc. are subparts of process 1.2.

Never use more than one piece of paper for a DFD. T he trick is to have layers of detail. We do this by expanding,exploding, or refining a process into a lower level diagram. T his is done by taking a process and drawing a DFD thatwould replace it in the original DFD. T here are three levels of detail commonly needed: context, level-0, and level-1.Here is a picture of how refinement works:

T he table shows the three types of DFD and is followed by definitions and examples.Table

Level Content

Process Context Shows one process with its inputs and outputs only.

System Context One process + surrounding external entities

Level-0 Make the central process BIG and draw stores, processes, and flows inside

Level-1 T ake a process on the level-0 and repeat the expansion in another DFD

Level-n+1 T ake a level n process and refine it.

(Close T able)Note: 3 or 4 levels is usualy enough. Don't get too detailed. Other techniques [ r1.html ] are better.

Examples of DFD Levels -- Conext Level0 Level1

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

7 of 21 11/08/2011 03:07 PM

Page 8: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

A Note on level terminology

I will be following well known textbooks on the naming of the levels. T he Wikipedia seems to use a different form.

Definitions of DFDs

Context_DFD::DFD=Shows a system as a single process surrounded by external entities. T his should show a singleprocess -- your system surrounded by the external entities that send it data and get data from it. Each data flowshould be named. No internal details allowed -- they come later. No data stores, no sub-processes: just establishesthe Boundary between the system and its environment.

3.

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

8 of 21 11/08/2011 03:07 PM

Page 9: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

Level_0_DFD::=DFD=Shows the main functions in a system as processes.... At this level you show up to about a dozenmain functions that the system provides, plus the data stores and external entities that interact with the processes.A Level_0_DFD always expands a Context_DFD

4.

Level_1_DFD::DFD=Takes a single process in a Level_0_DFD and shows the details inside it.5.

Fish_eye_DFD::DFD=Shows a DFD inside a box representing a process in another DFD. We have a central focus wherewe show the details but round the edge we have higher level symbols. An excellent way to refine a Context DFD toLevel 0, a Level 0 process to Level 1, and so on. It is called a fish-eye diagram Because when a fish looks up out ofthe water it sees the whole 180 degree view compressed into a small circle. In the center of the view things look big.Further out the look small.

Refining a DFD

T he process of finding out what is inside a process has many names: leveling, refinement, filling in the details,partitioning, exploding, decomposing, ... It is an important strategy for analyzing a problem. Start with the big picture-- the context -- then break it into smaller and smaller parts. Ultimately, as you decompose or refine processes, youwill find yourself needing to express logical rules, algorithms, and types of data. Do not use a DFD to expresscomplex logic, algorithms, or data structures. Instead, record these details by using techniques introduced later inthis course:Table

Processes Activity diagrams, Use Cases, and Scenarios. Prototypes.

External Entities Persona

Data flows Data dictionary entries and coding techniques.

Stores Entity Relationship Diagrams, T ables, and Normalization

(Close T able)

Bottom-Up DFDs are chaotic

T he above is a top-down procedure. You can also draw rough DFDs of parts of the organization and link themtogether to get an "end-to-end" model. Here is an example from the first time this course was taught.

6.

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

9 of 21 11/08/2011 03:07 PM

Page 10: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

T hese tend to be a little chaotic and unstructured. You may be forced to do this when interviewing people andstarting design. But as soon as possible shift to top-down/refinement.

Principle -- DFDs are systems not programs

My Law: DFDs are good for recording how a system works. T hey are a way of choosing what parts of a system tochange and which to protect. T hey can be used to define the inputs and outputs to a program. You can use them toplan a collection of new and old software (system design). BUT don't use them to design the internals of a program.You will make errors. T here are more modern techniques for designing programs.

Rules of DFDs -- DFD Errors

Notice and learn the rules below. T he key thought is that data never moves unless a process moves it.

DFD_Errors ::=following,Process names must start with a verb and describe an action. T ry the "Hey Mom T est." A process name shouldmake sense when prefixed by "Hey, Mom, I'm going to .....". Some describe producing an output for each input(Calculate tax) but most do more -- Prepare monthly summary from weekly data. Stores and external entitiesshould be named with specific noun phrases. T hey must not indicate any activity. T hey are passive. All datastores must be named after the specific data they store. Information about people would be in a data storecalled "Person" for example.

1.

Data flows do not transfer control. An arrow is not a function call or a go to! T he processes run in parallel. T heycan stop and wait for incoming data. It is OK for a flow to send a message, trigger, or signal without other data.However control is not transferred. T he sender does not have to wait for a reply.

2.

Name all data flows between processes. Unlabeled arrows between processes are often control flows and sowrong. However: arrows leaving and entering a well-named store only have to be named when they provideaccess to only parts of the data in the store.

3.

No Flowcharts. Do not use normal flow chart symbols like decision diamonds, ST ART , ST OP etc. in a DFD. Allparts of a DFD exist at the same time and operate in parallel. A process can read and store data long before andafter producing an output. Processes consumes streams of data and produce streams of data.

4.

No magic data flows. Data does not move without a process to move it. So each arrow must have at least oneprocess. Never show arrows connecting an external entry to another entity, or to a data store. Never have anarrow that connects a data store to another store. Examples: Waiter and cook. Coordinator and secretary.T eacher and student. Student to student records. Customer to bank account.

5.

7.

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

10 of 21 11/08/2011 03:07 PM

Page 11: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

No spontaneous generation. All processes have input.6.

No black holes. All processes have outputs.7.

No miracles. T he input data must make it possible to compute all of the outputs.8.

Maintain balance. Each upper level matches its lower level expansions.9.

No forks or joins. When flows meet or split you must have a process to control the joining and/or splitting.10.

Not specific. I common mistake by beginners is overgeneralization. T his cartoon [ http://xkcd.com/974/ ]expresses the error perfectly. In a DFD a common error is to show a data store labelled "Data Base". T his merelysweeps the problem under the rug. It is an error. You may not use the words "Database" -- it is too general andconveys no information about the data in the store.

11.

Example of a DFD Error leading to a better design -- AMP Level 0

First Iteration Context

Resulting Level 0 with error

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

11 of 21 11/08/2011 03:07 PM

Page 12: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

Fixing the error improves the design

And the Context must change to fit the level 0

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

12 of 21 11/08/2011 03:07 PM

Page 13: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

DFD Advice

Number your nodes only if you have to. Example: the boss says so.1.

But don't clutter the DFD with the format or media: phone calls, forms, EMail, disks, tapes, print outs,HT ML, XML, ... (1) T he DFD shows what exists, not what form it takes. (2) Our job always involves changingthe format and/or media. (3) describe media and formats in a separate document called a data dictionary (4)note content in as attributes in a separate ERD (below).

2.

Keep DFDs simple by omitting backup, support, and maintenance processes as long as you can. Focus on theoperation of the system first.

3.

Data Flow Analysis of System Development

In this class we will look at applying systems techniques to the systems work itself. T his leads to a model ofsystem development as three parallel processes. One is concerned with understanding the current system plusthe latest plans and changes -- call this "Analysis". T he second process is concerned with taking ideas from theAnalyzes process and designing plans that need implementing. T he last process carries out the plan and changesthe system.

Notice we can schedule the above DFD in many ways. We can run the analysis process until it produces an idea,then pass it to the design process, which can modify the plan that triggers implementation activity. It all dependson the size of the change to the model and the plan whether we get a traditional or an agile life cycle.

DFD Smells and Patterns

Much of the expertise that helps us understand and plan systems is encapsulated in the following hints. T heyare classified as smells that are to be avoided and patterns that work well enough for repeated use.

Pattern -- Stores contain a model of reality

In nearly all systems the purpose of storing data is to capture a picture of some real object. So, name storesafter the entities that they model. For example a file containing student records should be shown as a store

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

13 of 21 11/08/2011 03:07 PM

Page 14: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

named "Student".

DFD Smell -- useless storage

Be suspicious of data stores that have inputs without outputs or have outputs without inputs. Storingsomething that is never needed is wasteful. Having data that can not be altered or created (no input) is aproblem waiting to happen. Example: When I moved office I found I had two filing cabinet drawers full of unreadpaperwork. I threw it out and plan to not keep it again.

Exception: there may be some law that requires you to keep some data for a number of years. Find out if this istrue.

DFD Smell -- wasted motion

T ake note of processes that merely move things around in a system, especially when it is data transmitted aspaperwork!

Pattern -- Remove paperwork

One of the traditional improvements is to replace paperwork data flows and storage by electronic forms.

DFD Smell -- old technology

As you abstract away from the current technology to an abstract set of data flows, processes, and stores; takenote of processes and storage that use old technology. But don't clutter the DFD! T hese are candidates forreplacement in the new system. Perhaps, when you present your problem to management you could color theold technology red? Don't forget that sometimes an old technology is more reliable than brand new technology.Some old ways of doing things need to be preserved. As an example look at [ Word-Processors-One-Writers-Retreat ] (Slashdot Features Story | Word Processors: One Writer's Retreat) which argues that simple editorsare more efficient for writing than high powered "word processors".

DFD Smell -- Overloaded Process

When data (and stuff) flows through an organization it can pile up in buffer zones. A person's desk can slowlydisappear under the incoming paperwork, for example. Look for processes that handle their input slower than itarrives. Even if it can just handle the average rate, queuing theory shows that the length of the queue growswithout limit.

T he ideal solution is to have multiple copies of the process running in parallel. Input is distributed to the leastloaded or first available server that can run the process. T he next solution is to find ways of speeding up theprocess: better technology, simpler logic, ... Simple examples of this strategy are upgrading the CPU or addingRAM. But a subtler variation is reorganizing the data storage to give faster access to the data. T his trickincludes defragmenting disk drives. A third solution is to provide multiple parallel clones of the process runningon multiple processors.

As an example high traffic web sites may have a dozen web servers and a special load balancing "switching"server front end.

Note -- multiple computers all running the same process are still a single process in the DFD!

DFD Smell -- Inefficient, Intractable, and/or Non-computable Processes

Look out for inefficient processes. T hese are often concerned with reorganizing data in some way or other.Many times a clever design can make them run a lot more efficiently. Sometimes you can remove the need forthe process entirely be rethinking the design. Be aware that a process can often be implemented with manydifferent algorithms and each algorithm will perform differently. You may have to specify an algorithm or givefeasible limits on the efficiency of the implementation of a process.

Computer Scientists has discovered a large family of problems that can not be solved by a computer. T hese cannot be programmed. An example is checking to see if a program will stop or not. We have also discoveredproblems that apparently demand very inefficient processes to solve them. A classic example is the "T ravelingSalesman problem". It is worth studying computer science theory to be able to spot these.

T here are also processes that are better done by a human than a machine. Ethical questions should not behandled by machines! Questions needing discretion should involve humans. Sometimes you need to designsystems that support communication and cooperation so that complex (political) problems can be resolved byhumans.

DFD Smell -- Under-worked Human

You will often find systems where an event occurs and triggers a message that is sent to a person, who inreturn does nothing with the message but pass it on to another part of the system. T his smell is worst whenthe human has a fixed simple procedure that they use to respond to the disturbance. I recently heard of an

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

14 of 21 11/08/2011 03:07 PM

Page 15: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

example of a computer system that turned a light on and expected a human had to hit a button to avoid adisaster. T he person did nothing and the disaster occurred. T his bad design. Another classic example is any website that expects you to type in data shown to you or even input by you on a previous page!

A version of this is sending data to a human to re-input later. T his introduces errors... there must be a betterway to handle the problem. Here is the smelly system and a possible improvement.

Pattern -- Automate Simple feedback

If the choice of action in the above system can be computed from the message then a better systemautomatically carries out the action and reports to the person. T he sensing + acting system should only ask forhelp on the difficult decisions. T he best systems allow the person to input and update the desirable actions.

Examples: EMail -- automatically deleting messages that we don't need to see. Inventory -- automaticallyreorder when stocks get below a certain level. Record people's browsing, let them replay and/or edit therecordings.

Pattern -- Keep people in the loop

It is common for people to reject designs and sabotage resulting systems if they take control away from thepeople who used to be in charge. In fact the better an automated system is, the worse people will feel aboutbeing replaced by it. T hey will fight it.

On the other hand there is something wrong about forcing people to work as computers. You need a ballance.

For example, it is highly rational to insist on being able to undo things that can be undone. For example whenthe CSUSB system automated the handling of Incomplete Contracts (2010) it became incredibly easy to create acontract -- no forms to fill in. No signatures to gather. Unfortunately it became impossible to remove anincomplete contract that was on file -- in the old days you ripped it up and put it in the trash can. At this time(2010) you can not do that. So small mistakes can not be corrected. T here is no "Undo" feature.

Summary -- let people do the thinking and machines do the boring stuff.

. . . . . . . . . ( end of section Smells and Patterns) <<Contents | End>>

UML notations for DFDs.

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

15 of 21 11/08/2011 03:07 PM

Page 16: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

T he UML is not designed to do DFDs. T he designers (the OMG -- Object Management Group) are more concernedwith the details and internals of software than with interactions between parts of a larger system. But in thespecification of UML2.0 there is a way to document flows between components:

At this time (Fall 2009) it is still better to use a traditional notation like the Gane and Sarson in these notes.

. . . . . . . . . ( end of section DFDs -- Data Flow Diagrams) <<Contents | End>>

UML Data Models

We need a simple way to describe, explore, and design data. T his turns out to be a powerful technique in analyzingand designing systems.

Data is always organized in clumps called records. A record has a collection of items of mostly different data typesin it. For example the CMS probably has a record that contains all the information about a student in it. Each typeof record tends to reflect a real world Entity. Each type of record is given a meaningful name and this is put in thetop compartment of a UML class. T hese entity names should be in your DFD as well.

Story -- Sharp Wizard Contacts Data

I've been using small portable computes as Personal Digital Assistants -- -- --(PDA)for a long time. And all have had problems of one kind or another. T he Sharp Wizard series, for example, had avery annoying way of handling phone numbers. You couldn't enter the name and number of a person withoutalso inputting the title, rank, department, and organization. Not a bad model for a business person, but veryirritating for your mother or spouse. Of course: you had to include the address of the organization as well...People didn't have an address of their own. You had to start top-down inputting the company, the department,and then individuals.

T he Palm Pilots and iPods I've been using for 6 or 7 years have simpler model with each contact having optionaldata about companies and titles. But don't get me talking about the different models of events and tasks oniPods and Palm Pilots.

Modeling Entities and Relationships in the UML

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

16 of 21 11/08/2011 03:07 PM

Page 17: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

Use UML class diagrams [ uml1.html ] (notes introducing UML for beginners) with no operations to describe data!

Here is an example based on a project set in a restaurant.

T he boxes are logical groups of data each referring to a real entity. T he lines connecting the boxes are significantrelationships`, for example: a T able has a single Waiter assigned to it, but a Waiter can be handling several T ables.Notice that this model does not show any attributes (the properties of the entities). It does not show the waiter'sname for example. T his kind of reduced model -- based on ideas about the real world is sometimes called aDomain Model. T hey are very useful for planning data bases (later) and for designing object oriented code(CSCI375).

Each item of data is given a name and a type:

name : type

Examples

address : string

initial : char

age : int

Notice I used C++ data types... because my audience (you) has taken CS202 and can be expected to understandthem. In general, you should use the words of your audience. With multiple audiences put different meanings in adata dictionary as aliases.

When you first draw these diagrams you can just list the attribute names and jot down more information in aprototype data dictionary. For example here is an UML diagram of the data I found in a class roster.

If an item is repeated use square brackets:

salary_each_month [12] : money

children : Person[*]

spouse : Person [0..1]

When you meet attributes that are actually other entities/records you should connect the boxes with anassociation.

If you know of a significant relationship between records/entities then show it as a line (an association) betweenthe boxes. In fact, in some analysis and design methods, you check every pair (and grouping) of entities looking forimportant relationships between them.

Mark these relations with multiplicities:

Optional: 0..1Many: *One: 1

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

17 of 21 11/08/2011 03:07 PM

Page 18: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

Here is an ERD showing the relationships between Questions, Answers, and Comments in the DFD of my T utoringSystem (above):

Keep Entity-Relation-Diagrams Simple

Note: the official database notation developed by Chen is too cumbersome for everyday data analysis and design.Use it only when you have to!

My old student edition of Rational Rose did UML ERDs well. Dia and Visio can also handle them. But the quickestway (after a field trip, say) is on a board or a piece of paper. Keep the edges of the boxes incomplete until done.Notice.... that you can just note the relationships without any need for attributes. Here is an example that I drewon my Palm Pilot one day.

Sometimes I even omit the boxes:

Smell -- Unreal Data

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

18 of 21 11/08/2011 03:07 PM

Page 19: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

In an ideal system the data perfectly reflects the reality -- it forms a "mirror-world". Often, in real systems, thedata is often approximate, omits details, and lags behind the real world. When the data also has the wrongstructure it provides a distorted mirror of the world. T he system will not work as well as it could. But the peoplein the system may not be aware of this: the file becomes the reality, the computer is the only truth they know.

Look for lags, errors, missing data, and misfitting structures when ever you are analyzing a system.

Normalizing a UML Data Base

T he following procedure improves the design of the data. It exposes logical structures that is implicit in yourdata.

Draw an ERD of the entities and relationships with attributes inside the boxes.1.Extract all attributes marked with [*] as relations.2.T urn all many-to-many and n-ary relations into entities.3.Look at 1-to-1 associations: is either (or both) '1's really a '0..1'? If so add the "0.." and treat '0..1' a many '*'.If not, then coalesce the two boxes into one.

4.

All associations end up being many-to-1. Redraw with the 1's above the 'many's5.

. . . . . . . . . ( end of section Normalizing a UML Data Base) <<Contents | End>>

. . . . . . . . . ( end of section UML Data Models) <<Contents | End>>

Review Questions

Describe and distinguish a DFD from an ERD.1.Distinguish physical from logical DFDs.2.Name the three types of icon in a DFD. What do they represent?3.If there is an arrow from one icon to another in a DFD, what does it mean?4.What are the Gane and Sarson icons?5.How can you show data flows in the UML2.0?6.What is shown on a context DFD of a system? What is not shown?7.What is shown in a Level-0 DFD? What is shown in a level-1 DFD?8.Give an example of simple context DFD and its matching Level-0 DFD.9.How do you document the contents of data stores?10.How do you document the detailed processing of data?11.List the rules that a valid DFD must follow. T hen check the list [DFD_Errors] above.12.Below is a bad first attempt at the level 0 DFD of my automatic tutoring system. It has many errors. Mark the errorswith a big "X" and the number in the list [DFD_Errors] above. For example the "Make Comment" process should bemarked X7.

13.

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

19 of 21 11/08/2011 03:07 PM

Page 20: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

If you discover a person who does no more than reinput some previous output -- how can you improve the system?14.Name 6 DFD smells.15.Here is a recent example scenario that I experienced

My doctor told the computer system that I needed a certain screening test.1.1 Month later...2.My doctor's assistant sent me snail mail asking why I hadn't had the test done.3.I phoned the testing center and they told me that I was not eligible.4.When I explained why I needed the test they said the doctor would have to re-input the request in a differentform that specified the reason for the test. (Note to save money this data must come from the doctor not thepatient).

5.

I phoned my doctor and left a message explaining the situation.6.T he doctor resubmitted the request.7.

What smells here? Draw a partial DFD of the situation. Redesign the system to work better.

16.

What is the ultimate reason for storing data in a system?17.Is ERD below normalized? If not, show how to normalize it.18.

. . . . . . . . . ( end of section Review Questions) <<Contents | End>>

Online Exercises on DFDs

Here [ images?hl=en&q=DFD&btnG=Search+Images&gbv=2 ] is a Google search that produces thousands ofDFDs! Some of them are very good and some not so good. Look at them, figure out which notation they use. What doyou like and/or dislike about some of them.

1.

List some strange ways that information/data is transmitted/stored in an enterprise that you know about.2.

T ake this diagram [ manufacturing.gif ] and redraw it as a DFD -- note: you can treat some money and material flowsas data flows.

3.

. . . . . . . . . ( end of section Online Exercises on DFDs) <<Contents | End>>

Typical Exam Questions and Exercises on DFDs

Draw a context DFD of:my web site.1.CSUSB's current registration and student records system.2.CSUSB CSCI web site.3.

1.

Draw a simple but correct DFD <TBA>2.Given a Context DFD draw a plausible and correct level 0 fish-eye DFD.3.Given the fish-eye DFD of a system draw its context DFD.4.Given a process in a DFD draw a correct and plausible expansion/fish-eye DFD.5.Correct a given DFD model.6.Answer questions about a given DFD.7.

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

20 of 21 11/08/2011 03:07 PM

Page 21: Modeling the Data in a Systemcse.csusb.edu/dick/cs372/a4.pdf · Here is a simple example of a DFD for a project called AMP -- The Absent Minded Professor System. A DFD is a circuit

. . . . . . . . . ( end of section Typical Exam Questions and Exercises on DFDs) <<Contents | End>>

Exercise -- Context Diagram of a Possible Project

Either to be done in class and/or assigned as out-of-class project work.

UML::="Unified Modeling Language", [ samples/uml1.html ]1.

. . . . . . . . . ( end of section Modeling the Data in a System) <<Contents | End>>

AbbreviationsTBA::="T o Be Announced".1.TBD::="T o Be Done".

LinksNotes -- Analysis [ a1.html ] [ a2.html ] [ a3.html ] [ a4.html ] [ a5.html ] -- Choices [ c1.html ] [ c2.html ] [ c3.html ] --Data [ d1.html ] [ d2.html ] [ d3.html ] [ d4.html ] -- Rules [ r1.html ] [ r2.html ] [ r3.html ]

Projects [ project1.html ] [ project2.html ] [ project3.html ] [ project4.html ] [ project5.html ] [ projects.html ]

Field T rips [ F1.html ] [ F2.html ] [ F3.html ]

[ about.html ] [ index.html ] [ schedule.html ] [ syllabus.html ] [ readings.html ] [ glossary.html ] [ contact.html ] [grading/ ]

2.

End

CS372: Modeling the Data in a System file:///u/faculty/dick/cs372/a4.html

21 of 21 11/08/2011 03:07 PM