journal 3 - tecnología, tips de programación y datos ... filesecrets of great architects don awalt...
TRANSCRIPT
Secrets of GreatArchitects Don Awalt and Rick McUmber,RDA Corporationpp 04 – 13
The Case for SoftwareFactories Jack Greenfield,Microsoft Corporationpp 14 – 19
Identity and AccessManagement Frederick Chong,Microsoft Corporationpp 20 – 31
Business Patterns forSoftware Engineering Use – Part 2 Philip Teale,Microsoft Corporation and Robert Jarvis, SA Ltdpp 32 – 43
A Strategic Approach toData Transfer MethodsE G Nadhan and Jay-Louise Weldon, EDSpp 44 – 54
Messaging Patterns in Service OrientedArchitecture – Part 2Soumen Chatterjeepp 55 – 60
Dear ArchitectWhile it is often difficult to reachagreement, one thing I think allarchitects can agree on is that ourjobs are getting harder, not easier.We’re facing an ever-increasing levelof complexity with an ever-shrinkinglevel of resources. Whether its newchallenges such as increasedregulatory compliance or old-fashionchallenges like feature creep andslashed budgets, the architect’s jobcontinues to become more difficult as well as more important every day.
Abstraction is the architect’s key toolfor dealing with complexity. We’veseen the evolution of architecturalabstractions techniques such asmodels and patterns; however we
have yet to realize much in the way of measurable value from them.Models and patterns are useful forcommunicating effectively with ourpeers, but so far they haven’t helpeddrastically reduce the amount ofresources it takes to build and operatea system. In order to continue to dealwith increasing complexity, we need toget much more pragmatic about usingabstraction to solve problems.
In this, the third issue of JOURNAL,we focus on raising the abstractionlevel. This focus ranges from generaltechniques for abstraction to theintroduction of an industrializedmanufacturing approach for softwaredevelopment. Along the way, we cover
the use of patterns to describebusiness functions as well as identityand data management.
Regardless of our disagreements, thelooming crisis of complexity threatensto severely impede the progress of software construction and thebusinesses that rely on that software.Going forward, JOURNAL willcontinue to deliver information fromall aspects of the software architectureindustry and to provide the guidancearchitects need.
Harry Piersonarchitect,D&PE Architecture Strategy,Microsoft Corporation
JOURNAL3 MICROSOFT ARCHITECTS JOURNAL JULY 2004 A NEW PUBLICATION FOR SOFTWARE ARCHITECTS
JOURNAL3
JOURNAL3 | Editorial 2
Dear ArchitectWelcome to the summer issue ofJOURNAL. These last few months havebeen really exciting for ‘architecture’ asa topic at Microsoft. The Microsoft®
architecture center has becomeestablished as a leading portal forarchitectural content and a springboardfor thousands of our customers andpartners to dive into excellent guidanceon architecting and developingsolutions on the Windows® platform.
This issue has an abundance ofarchitectural gems written by respectedarchitects from Microsoft and valuedpartners and I’m confident that it hasraised the level of content quality to a higher level.
We start with a paper by Don Awaltand Rick McUmber of RDA Corporation,also members of the MicrosoftArchitecture Advisory Board, whoreveal many secrets of great architects.They tackle a very hard problem faceddaily by Enterprise architects, namelythe challenge of high complexity insystems development which iscompounded by ever-changing needs ofthe business and pressure to adopt newtechnologies as they emerge. The keysecret of great architects they revealbegins with a mastery of solutionconceptualization and abstraction.The way in which the authors havedissected the problem and provided an exemplary walkthrough of thesolution process is evidence itself of such mastery.
Jack Greenfield from Microsoft’sEnterprise Frameworks and Toolsdivision discusses in his articleimportant new thinking on a criticalbusiness imperative troubling manyorganizations today – how to scale upsoftware development? As currentlypracticed, software development isslow, expensive and error prone, andresults in a multitude of well-knownproblems. Despite these shortcomings,the ‘products’ of software developmentobviously provide significant value toconsumers, as shown by a long termtrend of increasing demand. To addressthese shortcomings a case is made for a ‘Software Factories’ methodology to industrialize the development ofsoftware which is described in detail ina forthcoming book of the same nameby Jack Greenfield and Keith Short,from John Wiley and Sons.
Feedback from customers to Microsofton the challenges of implementing SOAsystems has been very consistent; issuesin managing identities, aggregatingdata, managing services, and integratingbusiness processes have been citedover and over again as major roadblocks to realizing more efficient andagile organizations. Frederick Chongfrom the Architecture Strategy team inMicrosoft writes a paper on one of thesechallenges, namely Identity and AccessManagement. He provides a succinctand comprehensive overview of whatI&AM means using a simple frameworkconsisting of three key areas: identitylife cycle management, accessmanagement, and directory services.
Microsoft’s Philip Teale and RobertJarvis of SA Ltd. follow with the secondpart of their paper discussing businesspatterns – which are essentiallyarchitectural templates for businesssolutions. In this paper they describehow to develop business patterns basedon business functions, data, andbusiness components and also showhow these elements can be used toengineer software systems. A realisticbut simplified example is used to showhow to use standard techniques todevelop descriptions of these elementsrequired for a business pattern.
Next, Easwaran Nadhan and Jay-Louise Weldon, both from EDS,examine various data transferstrategies for the enterprise whichenable timely access to the rightinformation and data sharing such that business processes can be effective across the enterprise. Theydescribe eight options and analyze thoseusing criteria such as data latencyrequirements, transformation needs,data volume considerations, andconstraints regarding the level ofintrusion and effort that can betolerated by an enterprise in order to realize the expected benefits.
EditorialBy Arvindra Sehmi
Keep updated with additional informationat http://msdn.microsoft.com/architecture/journal
JOURNAL3 | Editorial 3
The final paper is part two of SoumenChatterjee’s description of SOAmessaging patterns. Traditionallymessaging patterns have been appliedto enterprise application integrationsolutions. Soumen uses these patternsto explain how SOA can be implemented.His insights are derived from theoriginal work of Hohpe and Woolf ’sbook on Enterprise IntegrationPatterns. However, Soumen shows ushow the same messaging patternsdescribed in the book can be appliedequally effectively at the applicationarchitecture level, especially in SOA-based solutions, because they too arefundamentally message-oriented.
Please keep up to date on the web atthe Microsoft® architecture center andspecifically at the home for JOURNALhttp://msdn.microsoft.com/architecture/journal where you'll be able todownload the articles for your addedconvenience. And keep a look out forannouncements of a new JOURNALsubscription service coming soon.
As always, if you’re interested inwriting for JOURNAL please send mea brief outline of your topic and yourresume to [email protected]
Now put your feet up and get reading!
Arvindra SehmiArchitect, D&PE, Microsoft EMEA
JOURNAL3 | Secrets of Great Architects 4
Secrets of Great ArchitectsBy Don Awalt and Rick McUmber, RDA Corporation
“By organizing the solution into discrete levels, architects are able to focus on a single aspect of the solution while ignoring for the moment all remaining complexities.”
Applying Levels of Abstraction to IT SolutionsEnterprise architects are challenged by the vast amount of complexity they face. It’s one thing to develop an isolated, departmental applicationthat automates a business task.However, it is quite another to designand assemble a worldwide network ofIT labs filled with applications, serversand databases which support tens ofthousands of IT users, all supportingvarious business activities.Compounding the complexity, the IT network must always be available,responsive and protect the corporation’sprecious information assets. In additionto all of this, the IT network must beflexible enough to support the ever-changing needs of the business and toadopt new technologies as they emerge.
Some architects clearly stand out andthrive in this complexity. We’ve beenfortunate to work side by side with some truly great analysts and architectsover our careers. Reflecting on theseexperiences, we’ve analyzed what makes an exceptional architect.
Without exception, all great architectshave mastered the ability toconceptualize a solution at distinctlevels of abstraction. By organizing thesolution into discrete levels, architectsare able to focus on a single aspect ofthe solution while ignoring for themoment all remaining complexities.Once they stabilize that part of thesolution, they can then proceed to otheraspects, continuously evolving andrefining the layers into a cohesive model;one that can ultimately be implemented.
Most software developers know thatthey should decompose the solutioninto levels of abstraction. However,this is very difficult to apply in practice
on actual projects. Upon encounteringthe first hurdle, it’s easy to abandonthe levels in the rush to start coding.Great architects work through thechallenges and are disciplined tomaintain the levels throughout theentire project lifecycle. They realize if they don’t, they’ll eventually drownin the complexity.
This article presents techniques forapplying levels of abstraction to ITsolutions. We first demonstrate theapproach through a simple example,and then propose a structure of system artifacts based on formal levels of abstraction.
Levels of Abstraction: A PowerfulWeapon for all EngineersOther engineering disciplines, such ascivil engineers, have been coping withcomplexity by leveraging levels ofabstraction for centuries. Let’s studyhow other, more mature, engineeringdisciplines apply levels of abstraction,starting with the electrical engineerswho design computer systems whichcontinually grow increasingly complexwith each new generation.
Hardware EngineersSystem designers model computersystems using levels of abstraction.Each level is well defined and providesa distinct perspective of the system.Many systems are designed at threeprimary levels – System, Subsystemand Component as shown in Figure 1.
Layering enables the engineers tointegrate an enormous amount ofcomplexity into a single, workingcomputer system. It’s impossible tocomprehend a computer strictly at thelevel of its atomic parts. There are~25,000,000 transistors on a singleIntel Itanium® chip alone!
This approach of breaking complexitydown into layers of abstraction is of course not unique to IT-relateddisciplines. A similar approach is used in countless other disciplines from aeronautical engineering to microbiology.
Core Principles When ApplyingLevels of AbstractionAll engineers follow this core set ofprinciples when applying levels ofabstraction. These principles also hold true when applying levels ofabstraction to software.
The number and scope of the levelsare well defined – In order forengineers to collaborate on a complexsystem, all team members must sharethe same understanding of the levels.So as designers make design decisions,they must file those decisions at theappropriate level of detail.
Multiple views within each level –The complexity within a single level
System Design
MouseKeyboard
RAID Subsystem SCSIBus
CommBoard• CPU• DRAMs• Clock• MemoryChips• etc
CPU• Processor• Microcode• Cache• Clock• etc
MemoryChip• Transistors• Clock• etc.
Disk• Platters• Heads• ControllerCard• Bus• etc
Monitor
CommBoard Backplane
SoundCard
Subsystem Design
RAID SubsystemDesign• Disks 1-16• ControllerCard• Bus• etc
MotherBoard• CPU1• CPU2• DRAMs• Clock• MemoryChips• etc
Component Design
Motherboard
Figure 1. Levels of Abstraction for a Computer System
JOURNAL3 | Secrets of Great Architects 5
1 Many have successfully applied levels ofabstraction to software. Ed Yourdon andTom DeMarco proposed structuredanalysis and structured system designin 1979. Many branches of theUS Government standardized onDoD’s 2167A standard which requires
systems to be composed of a hierarchyof hardware and software configurationitems. The DBA community frequentlyapplies levels of detail to modelrelational databases. In particular, theBachman toolset and James Martin’sInformation Engineering Methodology
(IEM) model databases logically, thenphysically. A Google search of ‘softwarelevels of abstraction’ returns severalresults, however, most are from theacademic community and seem to focuson formal computer languages.
can become too much to grasp all atonce. In this case, engineers presentthe design within a single level throughmultiple views. Each view presents a particular aspect of the design,yet remains at the same level ofabstraction. For example, themotherboard engineer creates a viewfor each layer of the board, modelingthe design of the connection pathways for that layer.
Must maintain consistency amongthe levels – In order for the system tofunction as intended, each subsequentlayer must be a proper refinement ofits parent layer. If the computer systemdesigner switches from an IDE bus to a SCSI bus, then the interfacespecifications for all devices must alsoswitch to SCSI. If the levels are notsynchronized, the system won’t performas envisioned at the top level.
Apply Levels of Abstraction to IT SystemsNow that we’ve examined how otherdisciplines apply levels of abstraction,let’s apply the technique to ITsolutions1. The following sectionspresent techniques for applying levels of abstraction to model therequirements, design andimplementation of a typical ITapplication. The techniques arepresented through a simple,instructional example of an onlineorder system for a hypotheticalretailer. In our example, we not onlyinclude the architecture, but expand the scope to include the systemrequirements and the business contextas defined by the retail industry.
Level Name Level Scope
Domain – The company is the ‘black box’ central actor.– Model the business from the perspective of the business’
external actors.– Model only the business interactions. Omit the communication
mediums.
Business – Model the business process workflows that are realization Processes of the domain level’s business interactions.
– The system serves as the ‘black box’ central actor.– Model the business processes from the perspective of the
system’s external actors. Include the communication mediumsfor completing the business transactions.
Logical – Model the internal design of the system.– The major system components function as the main actors.– Model the system behavior from the perspective of inside the
system ‘black box’.
Physical – Model the physical structure of the implementation.
Level Name Level Scope
System The computer engineer designs the system by integratingvarious subsystems, such as a backplane, circuit boards, achassis, internal devices such as CD/DVD drives & diskdrives and external devices such as a display monitor,keyboard and mouse. The engineer thinks of the system interms of its subsystems, their interfaces and theirinterconnections. Each subsystem interface is documented asa formal specification to the subsystem designer.
Subsystem The subsystem engineer designs the subsystem byintegrating components. For example the motherboarddesigner architects the motherboard by integratingcomponents such as memory chips, DMA chips and a CPUchip. Likewise, the display monitor engineer designs themonitor by integrating components such as a video card andCRT. The subsystem engineer thinks of the subsystem interms of its components, their interfaces and theirinterconnections. Each component interface is documentedas a formal specification to the component engineer.
Component The component engineer designs the component byassembling and integrating subcomponents. For example,the memory chip designer architects the chip as a complexnetwork of integrated circuits.
The three levels of abstraction are defined as follows:
JOURNAL3 | Secrets of Great Architects 6
Simple Framework: Four Levelsof AbstractionOur simple example defines thefollowing four levels of abstraction for an IT solution:
Within each level, we present both thedynamic view and static view of thebehavior for that particular level.Whereas the dynamic view models themessaging among the objects, the staticview models the structure andrelationships among the objects.
Domain Level of AbstractionApplying the scoping rules above, theretailer serves as the black box centralactor in the domain level. The customerserves as the external actor. The domainlevel is modeled from the perspective of the customer. Only the purchaseinteractions are modeled. Thecommunication mediums used tocomplete the purchase are not includedat this level but are introduced at thebusiness process level.
Dynamic ViewThe dynamic view within the domainlevel models the interactions betweenthe customer and the retailer. Thefollowing figure summarizes thedomain context and contains a simple use case narration of thebusiness interactions.
Static ViewThe static view of the domain levelmodels the class structure and theirrelationships of the objects witnessedin the use case. In other words – ‘Whatobjects does the customer need tounderstand in order to accomplish thepurchase transaction at this level ofabstraction?’ Figure 3 presents the class
diagram for the static view of thedomain level.
The customer is an instance of Person.The relationship between the customerand the retailer is embodied as anAccount. All Purchases are associatedwith the customer’s Account. ThePurchase is associated with each of theItems being purchased. Each Item is ofa specific Product, where Productfollows the meta-class pattern.Instances of Product are in effect classesthemselves. Modeling Product as ameta-class makes our model moreflexible in that adding additionalProducts to the Catalog is purely a
data driven process and doesn’t impactthe class model. Rounding out theclasses, each Payment is associatedwith its Purchase.
As you can see, the model at this levelis representative for most any retailer –online or brick-and-mortar, large orsmall. This illustrates why the[Industry] Domain Model shouldindeed define the company as the blackbox central actor. Companies in thesame industry tend to support thesame set of business interactions withtheir external actors. Moreover, domainmodels exclude the specific businessprocesses of the company as they canvary widely among companies in thesame industry.
The domain level focuses strictly onbusiness interactions viewed from theperspective of the external actor. Wemust be careful to not includeimplementation mechanisms foraccomplishing the interactions. Thesedetails belong at the next level ofabstraction. So in this case, we onlymodel browse, select, purchase and pay.We do not model how these interactionsare accomplished – via telephone, USMail, email, web application, in person,check, credit card or cash.
Customer
Purchase items1. Customer browses the retailer’s catalog of items2. Customer selects one or more items3. Customer pays the retailer for the items
Retailer
PurchaseItem(s)
Figure 2. Domain level dynamic view for purchasingitems from a retailer
Figure 3. Domain level static view for purchasing items from a retailer
JOURNAL3 | Secrets of Great Architects 7
“In order for engineers to collaborate on a complex system,all team members must share the same understanding of the levels.”
Business Process of Abstraction The next level of abstraction modelsthe company’s business processes forrealizing the interactions that werecaptured at the domain level. Thesystem level ‘zooms inside’ thecompany black box and identifies allemployees and systems that collaboratein order to perform the businesstransaction. At this level, the system to be developed serves as the black box central actor.
Applying the scoping rules for thesystem level, the online order systemserves as the black box central actor.The customer and the employee serveas external actors. The system level is modeled from the perspective of the customer and the employee. Thecustomer performs the purchase online. Payment is made via credit card. The order is fulfilled by shippingthe items to the customer’s shippingaddress. Notification of shipment is sent via email.
Dynamic ViewThe dynamic view replays the domainlevel purchase transaction, this timeexposing the internal business
processes of the retailer. Figure 4summarizes the business processcontext and contains a simple use casenarration of the interactions among thesystem and its actors.
Static ViewThe static view at this level refines the
class model to capture the objectswitnessed in the business process leveluse cases. In other words – ‘Whatobjects do the customer and employeeneed to understand in order to createan order online and fulfill the order?’Figure 5 presents the class diagram for the business process static view.
Figure 4. Business process level dynamic view for purchasing items online from a retailer
Customer
Purchase Items Online1. Customer visits the retailer’s web site2. Customer browses the retailer’s online catalog3. Alternatively, the customer performs a keyword search against the catalog4. Customer selects item(s)5. Customer logs into her online account6. Customer creates an order for the items(s)7. Customer pays for the items with her credit card8. Employee checks the system for unfulfilled orders9. Employee fulfills the order by pulling items from the inventory, packaging and shipping them to the order’s shipping address
System
Employee
Register/LoginBrowse/Search Catalog
Order Items(s)
CreditAuthorizer
Authorize Credit Card
FulfillOrder
Retailer
Ship Item(s) Notify
Variations• Customer may login at any point prior to creating the order• At step 5, customer can register a new user account if she does not yet have one• At step 7, credit authorizer could reject credit card payment if authorization fails
10.Employee updates the system with the shipping information which decrements the item inventory11. System automatically sends an email notification to the customer indicating the shipping status
Figure 5. Business process level static view for purchasing items online from a retailer
JOURNAL3 | Secrets of Great Architects 8
We refine the domain class model tocapture the perspective at this level ofabstraction. The Person, Account andCompany abstractions remain thesame as well as Catalog and Product.However, the abstract Purchase eventfrom the domain model is replaced withan Order. Orders contain LineItemswhich are associated with the Productin the Catalog. Since this level modelsthe company’s internal business process,we need to capture the inventory onhand(an attribute of a stock keeping unit(SKU) which represents an inventory of items at a particular location). Wealso model the customer’s UserAccountwhich provides access to the onlinesystem. Payment is accomplished byusing a CreditCardAccount. Locationrepresents a geopoint within the USand serves as the billing address aswell as the Order’s shipping address.Shipment contains Items included inthe Shipment.
The system level of abstraction usuallyrequires a great deal of creativitybecause it is at this level that we inventways to streamline business processes.In doing so, it is common to realize asingle domain level transaction usingseveral different mediums at thebusiness process level. For example, apurchase can be accomplished online,over the phone, by mailing or faxing anorder form or in-person at the retailstore. Each of these would need to bemodeled at the business process level.Notice that even though the CreditAuthorizer is an external actor to theretailer, it is introduced at this levelbecause it is only needed to implementa business process that first appears at this level.
Lastly, notice that the system istechnology independent. Our onlinepurchase system could be implementedwith any web technology. Selecting
technologies inside the system blackbox is an architecture decision.
Logical Level of AbstractionThe logical level zooms inside thesystem black box, exposing the highlevel design of the system. Thearchitect selects the technology anddefines the high level system structure.In our simple example, the system iscomprised of an IIS/ASP.Net serverthat hosts the presentation, businessand data access layers and a SQLServer database server that hosts the persistent data.
Dynamic ViewThe dynamic view at the logical leveltraces the message flows through themajor components of the system. As an example, Figure 6 traces the flowwhen submitting the ConfirmOrderweb form.
Static ViewThe static view at this level alsoswitches our perspective to the systeminternals. Whereas the businessprocess level modeled the real worldabstractions that appear in thebusiness processes, this level modelsthe abstractions as they are to berepresented inside the system. In anactual system, the architect woulddesign the classes for each softwarelayer (presentation, business and dataaccess). To keep this article brief,Figure 7 presents just the static designfor the business layer to show how thesystem level abstractions are refined tothe design.
The architect refines the system levelclasses to design the business layerinterface.
All accounts and customers in thesystem are the retailer’s, so it’s not
“Domain models exclude the specific business processes of thecompany as they can vary widely among companies in thesame industry.”
Figure 6. Logical level dynamic view for purchasing items online from a retailer
System
ConfirmOrderPage.aspx
Presentation Layer
Business Layer
Data Access Layer
DB Server
Order
DBInterface
SaveOrderSP
Credit Authorizer
Web/App Server
ExecuteSP(”SaveOrderSP”,XML)
Execute(XML)
Save()
Authorize(amt)
Confirm Order 1. Customer reviews the order summary and confirms by submitting the form2. ASP.Net posts the ConfirmOrderPage who retrieves the current order and invokes Order.Save()3. Order.Save() validates order business rules. If any are violated, Save returns the list of errors4. Otherwise, Save() invokes the credit authorizer to authorize the payment amount5. If successful, Save() serializes all parts of the Order and invokes the Data Access Layer to execute the SaveOrderSP6. The Data Access Layer forwards the request to the stored procedure7. SaveOrderSP saves the order in the database
Post
JOURNAL3 | Secrets of Great Architects 9
practical to create a single instance ofCompany and associate it with allaccounts, so Company is omitted at thislevel. Rather than creating a separateinstance for each CreditCardAccount, wesimply store the credit card number andbilling address with the Payment.Further, it is not practical for the systemto create an instance for every Item sold,so Item is removed from the model andinstead the model tracks the quantity ofitems ordered in the LineItem and thequantity of items shipped in the newShippedItems class.
The architect also defines thegranularity of the services that thebusiness layer exposes. For thisexample, the business layer exportsCreate, Read, Update and Delete(CRUD) services for Account,UserAccount, Order, Shipment andCatalog. The ellipses indicate theCRUDing granularity.
Note that even though the classes atthis level are not a proper superset ofthe business process classes, thearchitect arrives at this design bydirectly refining the business process
classes, changing the perspective from outside the system to inside the system.
Physical Level of AbstractionThe physical level of abstractioncaptures the structure of the systemimplementation. The system isimplemented as a network of nodes,each node configured with hardwareand software. The three software layersin the logical view (presentation,business and data) are physicallyimplemented as code and deployed ontothese nodes. Persistent classes in the
Figure 7. Logical level static view for purchasing items online from a retailer
JOURNAL3 | Secrets of Great Architects 10
logical view are physically stored inrelational tables in a SQL Serverdatabase.
Dynamic ViewThe dynamic view traces the messageflows through the nodes of the physicalconfiguration. The ConfirmOrder HTTPpost flows from the customer’s browserthrough the internet through theretailer’s firewall to the web serverwhere Windows forwards it to IIS whopasses it to ASP.Net who then dispatchesConfirmOrder.aspx. Fortunately,modern development tools insulateus from the majority of the physicalnetwork. Architects, however, need tounderstand the physical layer in orderto avoid network bottlenecks andsecurity exposures.
Static ViewThe static view (Figure 7) refines thepersistent classes in the logical view to their physical representation. In our retail example, the business layer
classes are stored in the following SQLServer tables.
Classes map to relational tables andattributes are implemented as columns.One-to-one relationships and one-to-many relationships are implementedusing a foreign key. Optimisticconcurrency is implemented byassigning a datetime field to eachparent class being CRUDed.
When designing the logical level, thearchitect focuses mainly on implementingsystem functionality. Confident that thesystem functionality is covered, thearchitect can focus on optimizing theimplementation at the physical level.
Evolve the Levels throughIterationsHaving established this framework, thearchitect evolves the solution overseveral iterations. Each iterationincorporates additional functionality –invoices, back orders, order in person,
order by phone, and so on. In each case,the architect updates the appropriatelevel of abstraction, and then refinesthe updates to the physicalimplementation level.
Revisit the Levels of Abstraction Core PrinciplesLet’s test our example against the corelevel of abstraction principles.
The number and scope of the levelsare well definedWe have four distinct levels – Companyblack box, System black box, Logicaldesign inside the system and Physicalimplementation.
Multiple views within each levelIn this simple example, we presented a dynamic view and static view at each level.
Must maintain consistency amongthe levelsIf a change is made to the domainmodel, the impact of the changes mustflow down to the lower levels. Forexample, if the retailer decides to offer maintenance contracts for itsproducts, the analyst would addMaintenanceContract to the domainmodel and refine it to its physicalrepresentation. Synchronizing all levels is critical for maintaining largesystems. As enhancement requests aresubmitted, the analyst performs animpact assessment to the appropriatelevel of detail. Some enhancementsimpact the domain level (and thereforeall subsequent levels). Others merelyimpact the physical level.
Figure 8. Physical level static view for purchasing items online from a retailer
JOURNAL3 | Secrets of Great Architects 11
Scaling Levels to SupportEnterprise SolutionsNow that we’ve presented a simpleexample with four levels of abstraction,let’s scale the approach to supportsolutions for IT enterprises. Figure 9presents a Rational Unified Process(RUP) configuration that organizesproject artifacts in to well definedlevels of abstraction.
The levels in the table are describedbelow.
Domain – The domain level capturesthe business context for a project.
Project Vision – The project visioncommunicates the business impact the system will have on the business.It quantifies this impact in a return oninvestment analysis. The project visionrepresents the highest level ofabstraction of the project.
Business Process – The system levelmodels the business processes withinthe company. For extremely complexorganizations, this levelcan be subdivided into sublevels –division, interdepartmental andintradepartmental.
UI Specification – The UIspecification designs the user interfacethat realizes the business processes.It is comprised of a UI design documentand a functional UI prototype.
Detailed Requirements – Thedetailed requirements specify thelowest level abstraction of the systemrequirements. It includes details suchas data type formats and detailedbusiness rules. It also contains theproficiency requirements such asperformance, availability, security,internationalization, configuration,scalability and flexibility requirements.
“No longer do we overwhelm the business users with a single,monolithic functional specification.”
Level Name Level Scope
Domain The company is the ‘black box’ central actor.Model the business from the perspective of the business’ external actor.Model only the business interactions. Do not include the communication mediums.
Project Vision Project mission, project business objectives, project return on investment.
Business Process Model the business process workflows that are realization of the domain level’s business interactions.The system serves as the ‘black box’ central actor.Model the business processes from the perspective of the system’s external actors.Include the communication mediums for completing the business transactions.
UI Specification UI design and UI prototype of system functionality. Demonstrate how system users are able to realizethe above business process workflows.
Detailed Specify the lowest level details that represent the external interface of the system.Requirements For example, ‘US zipcodes must be masked as xxxxx or xxxxx-xxxx’.
Specify the proficiency requirements – performance, availability, security, internationalization,configuration, scalability and flexibility.
Architecture Inside the system ‘black box’.
Implementation Database schemas, source code, reference data, configuration files.
Logical Concurrency Security Deployment Component DataView View View View View View
Logical Concurrency Security Network Component LogicalDesign Design Design Configuration Interfaces Data Model
Concurrency Security Box Component PhysicalImplementation Implementation Configuration Implementations Data Model
Figure 9. RUP configuration organizing project artifacts into well defined levels of abstraction
JOURNAL3 | Secrets of Great Architects 12
Architecture – The system architectureis organized into six views – – Logical: Defines the software layers
and major abstractions that performthe system functionality.
– Concurrency: Captures the parallelaspects of the system, includingtransactions, server farms andresource contention.
– Security: Defines the approach forauthentication, authorization,protecting secrets and journaling.
– Deployment: Defines the networktopology and deploymentconfiguration of the system.
– Component: Defines the systemcomponents, their interfaces anddependencies.
– Data: Defines the design structure of the persistent data.
Benefits Organizing system artifacts intodiscrete levels of abstraction deliversseveral benefits:– It separates the system requirements
into three distinct levels ofabstraction – Business Processes,UI Specification and DetailedRequirements. No longer do weoverwhelm the business users with a single, monolithic functionalspecification. Instead, we communicatethe system requirements in threerefined levels of detail.
– Analysts and architects are able toharness the complexity into a single,integrated model of the system.
– Architects can focus on a singleaspect of the system and integratethose decisions into the overallsolution.
– The levels of abstraction form thestructure of the system artifacts. Forexample, the software architecturedocument dedicates a subsectionfor each view.
– The levels of abstraction providedirect traceability from requirementsto design to implementation.Traceability enables a team to perform an accurate impactassessment when evaluating change requests.
– After developing several systemsusing the same framework, patternsemerge at each level of abstraction.Organizations can catalog thesepatterns and other best practiceswithin each level of abstraction.This catalog of best practices servesas the foundation of a processimprovement program.
JOURNAL3 | Secrets of Great Architects 13
Summary All engineering disciplines applyformal levels of abstraction in order to cope with complexity. Software is no exception. In order to realize thebenefits of levels of abstraction,projects must – Formally identify the layers, each
with a well-defined scope– Split complexity within a level into
multiple views – Maintain consistency among the
levels
This article demonstrated how to applylevels of abstraction through a simpleexample, then scaled the approach to support enterprise IT solutions.It offered a RUP configurationframework that organizes systemartifacts into well defined levels of abstraction.
Self-AssessmentDoes your current project apply levels of abstraction?Are the levels well-defined? Are the levels well understood by the project team? If the complexity becomes too greatwithin a level, does the team split it into views? Does the team maintain consistencyamong the levels? Would your project benefit from levels of abstraction?
Great architects follow these principlesinstinctively. The rest of us mustconsciously apply levels of abstractionand exercise discipline to maintain thelevels throughout the project lifecycle.
ResourcesCockburn, Alistair. Writing EffectiveUse Cases. New Jersey: Addison-Wesley, 2001
Kroll, Per and Kruchten, Philippe. TheRational Unified Process Made Easy: APractitioner’s Guide to the RUP. BostonMA: Pearson Education and Addison-Wesley, 2003
DeMarco, Tom and Plauger, P JStructured Analysis and SystemSpecification. Prentice Hall PTR, 1979
Online copy of DoD standard 2167A canbe found athttp://www2.umassd.edu/SWPI/DOD/MIL-STD-2167A/DOD2167A.html
About the Authors
Don AwaltCEO and founder of RDACorporation [email protected] Awalt is the CEO and founder ofRDA Corporation, founded in 1988 as acustom software engineering firm withoffices in Washington DC, Baltimore,Atlanta, Philadelphia, and Chicago.The firm, a Microsoft Gold CertifiedPartner, is focused on the developmentof enterprise web and rich clientsystems using the .NET Framework.Don currently serves as the MicrosoftRegional Director for Washington DC,and was formerly the founding
Regional Director for Philadelphia. Donis a frequent speaker at industryevents, including Tech Ed, DeveloperDays, MSDN events, and various SQLServer and Windows events. He hasserved as a contributing editor for SQLServer Magazine and PC Tech JournalMagazine, and has written for otherpublications as well. Don’s particularareas of expertise include Web Services,SQL Server, the evolution of modernprogramming languages, and much ofthe architecture and process work seenin Microsoft’s Prescriptive ArchitectureGroup (PAG).
Rick McUmberDirector of Quality and BestPractices for [email protected] McUmber is Director of Qualityand Best Practices for RDA. For 11years, he worked for IBM and RationalSoftware Corporation respectively,developing systems for the Departmentof Transportation, Department ofDefense, NASA and Canada’sDepartment of National Defense. Since1994, he has worked with RDAdeveloping business solutions for its customers.
JOURNAL3 | Software Factories 14
The Case for Software FactoriesBy Jack Greenfield, Microsoft Corporation
“A Software Factory is a development environment configured tosupport the rapid development of a specific type of application,and promises to change the character of the software industryby introducing patterns of industrialization.”
This article briefly presents themotivation for Software Factories, amethodology developed at Microsoft.It describes forces that are driving atransition from craftsmanship tomanufacturing. In a nutshell, aSoftware Factory is a developmentenvironment configured to support therapid development of a specific type ofapplication. Software Factories arereally just a logical next step in thecontinuing evolution of softwaredevelopment methods and practices.However, they promise to change thecharacter of the software industry byintroducing patterns of industrialization.
Scaling Up Software DevelopmentSoftware development, as currentlypracticed, is slow, expensive and errorprone, often yielding products withlarge numbers of defects, causingserious problems of usability, reliability,performance, security and otherqualities of service.
According to the Standish Group[Sta94], businesses in the United Statesspend around $250 billion on softwaredevelopment on approximately 200projects each year. Only 16% of theseprojects finish on schedule and withinbudget. Another 31% are canceled,mainly due to quality problems, forlosses of about $81 billion. Another 53% exceed their budgets by an averageof 189%, for losses of about $59 billion.Projects reaching completion deliver an average of only 42% of the originallyplanned features.
These numbers confirm objectively whatwe already know by experience, whichis that software development is laborintensive, consuming more humancapital per dollar of value producedthan we expect from a modern industry.
Of course, despite these shortcomings,the products of software developmentobviously provide significant value toconsumers, as demonstrated by a longterm trend of increasing demand.This does not mean that consumers are perfectly satisfied, either with thesoftware we supply, or with the way we supply it. It merely means that theyvalue software, so much so that theyare willing to suffer large risks andlosses in order to reap the benefits itprovides. While this state of affairs isobviously not optimal, as demonstrated by the growing popularity of outsourcing,it does not seem to be forcing anysignificant changes in softwaredevelopment methods and practicesindustry wide.
Only modest gains in productivity havebeen made over the last decade, themost important perhaps being bytecoded languages, patterns and agilemethods. Apart from these advances,we still develop software the way wedid ten years ago. Our methods andpractices have not really changed much,and neither have the associated costsand risks.
This situation is about to change,however. Total global demand forsoftware is projected to increase by an order of magnitude over the nextdecade, driven by new forces in theglobal economy like the emergence of China and the growing roleof software in social infrastructure;by new application types like businessintegration and medical informatics;and by new platform technologies like web services, mobile devicesand smart appliances.
Without comparable increases incapacity, it seems inevitable that total
software development capacity isdestined to fall far short of totaldemand by the end of the decade. Ofcourse, if market forces have free play,this will not actually happen, since theenlightened self interest of softwaresuppliers will provide the capacityrequired to satisfy the demand.
Facing the Changes Ahead, AgainWhat will change, then, to provide theadditional capacity? It does not takemuch analysis to see that softwaredevelopment methods and practiceswill have to change dramatically.
Since the capacity of the industrydepends on the size of the competentdeveloper pool and the productivity of its members, increasing industrycapacity requires either moredevelopers using current methods andpractices, or a comparable number ofdevelopers using different methodsand practices.
While the culture of apprenticeshipcultivated over the last ten years seemsto have successfully increased thenumber of competent developersand average developer competency,apprenticeship is not likely to equip the industry to satisfy the expectedlevel of demand for at least two reasons:– We know from experience that there
will never be more than a fewextreme programmers. The bestdevelopers are up to a thousandtimes more productive than theworst, but the worst outnumber thebest by a similar margin [Boe81].
– As noted by Brooks [Bro95], addingpeople to a project eventually yieldsdiminishing marginal returns. Theamount of capacity gained byrecruiting and training developerswill fall off asymptotically.
JOURNAL3 | Software Factories 15
The solution must therefore involvechanging our methods and practices.We must find ways to make developersmuch more productive.
Innovation Curves and Paradigm ShiftsAs an industry, we have collectivelybeen here before. The history ofsoftware development is an assaultagainst complexity and change, withgains countered by losses, as progresscreates increasing demand. While greatprogress has been made in a mere halfcentury, it has not been steady. Instead,it has followed the well known patternof innovation curves, as illustrated inFigure 1 [Chr97].
Typically, a discontinuous innovationestablishes a foundation for a newgeneration of technologies. Progress onthe new foundation is initially rapid,but then gradually slows down, as thefoundation stabilizes and matures.Eventually, the foundation loses itsability to sustain innovation, and a plateau is reached. At that point,another discontinuous innovationestablishes another foundation foranother generation of new technologies,and the pattern repeats. Kuhn calls
these foundations paradigms, and thetransitions between them paradigmshifts [Kuh70]. Paradigm shifts occurat junctures where existing change is required to sustain forwardmomentum. We are now at such a juncture.
Raising the Level of AbstractionHistorically, paradigm shifts haveraised the level of abstraction fordevelopers, providing more powerfulconcepts for capturing and reusingknowledge in platforms and languages.On the platform side, for example, wehave progressed from batch processing,through terminal/host, client/server,personal computing, multi-tier systemsand enterprise application integration,to asynchronous, loosely coupledservices. On the language side, we haveprogressed from numerical encoding,through assembly, structured andobject oriented languages to byte codedlanguages and patterns, which can be
seen as language based abstractions.Smith and Stotts summarize thisprogression eloquently [SS02]:
The history of programming is anexercise in hierarchical abstraction.In each generation, language designersproduce constructs for lessons learnedin the previous generation, and thenarchitects use them to build morecomplex and powerful abstractions.
They also point out that newabstractions tend to appear first in platforms, and then migrateto languages. We are now at a point inthis progression where language basedabstractions have lagged behindplatform based abstractions for a longtime. Or, to put it differently, we arenow at a point where tools have laggedbehind platforms for a long time.Using the latest generation of platformtechnology, for example, we can nowautomate processes spanning multiple
Maturity
Maturity
Optimization
Inception
Optimization
Inception
Value
Time
Figure 1. Innovation Curves
Figure 2. ASIC Based Design Tools1
1 This illustration featuring Virtuoso® Chip Editor and Virtuoso® XL Layout Editorhas been reproduced with the permission of Cadence Design Systems, Inc © 2003Cadence Design Systems, Inc. All rights reserved. Cadence and Virtuoso are theregistered trademarks of Cadence Design Systems, Inc.
JOURNAL3 | Software Factories 16
businesses located anywhere on theplanet using services composed byorchestration, but we still hand-stitchevery one of these applications, as ifit is the first of its kind. We build largeabstract concepts like insurance claimsand security trades from smallconcrete concepts like loops, stringsand integers. We carefully andlaboriously arrange millions of tinyinterrelated pieces of source code andresources to form massively complexstructures. If the semiconductorindustry used a similar approach,they would build the massivelycomplex processors that power these applications by hand solderingtransistors. Instead, they assemblepredefined components calledApplication Specific IntegratedCircuits (ASICs) using tools like theones shown in Figure 2, and thengenerate the implementations.
Can’t we automate softwaredevelopment in a similar way? Ofcourse, we can, and in fact we alreadyhave. Database management systems,for example, automate data accessusing SQL, providing benefits like dataintegration and independence thatmake data driven applications easier tobuild and maintain. Similarly, widgetframeworks and WYSIWYG editorsmake it easier to build and maintaingraphical user interfaces, providingbenefits like device independence and visual assembly. Looking closely at how this was done, we can see arecurring pattern.
– After developing a number of systemsin a given problem domain, weidentify a set of reusable abstractionsfor that domain, and then wedocument a set of patterns for usingthose abstractions.
– We then develop a run time, such as a framework or server, to codify theabstractions and patterns. This letsus build systems in the domain byinstantiating, adapting, configuringand assembling components definedby the run time.
– We then define a language and buildtools that support the language, suchas editors, compilers and debuggers,to automate the assembly process.This helps us respond faster tochanging requirements, since part ofthe implementation is generated, andcan be easily changed.
This is the well known LanguageFramework pattern described by Roberts and Johnson [RJ96].A framework can reduce the cost of developing an application by anorder of magnitude, but using onecan be difficult. A framework defines an archetypical product, such as an application or subsystem, which can be completed or specialized invarying ways to satisfy variations in requirements. Mapping therequirements of each product variantonto the framework is a non-trivialproblem that generally requires theexpertise of an architect or seniordeveloper. Language based tools canautomate this step by capturingvariations in requirements usinglanguage expressions, and generatingframework completion code.
Industrializing SoftwareDevelopmentOther industries increased their capacityby moving from craftsmanship, wherewhole products are created from scratchby individuals or small teams, tomanufacturing, where a wide range ofproduct variants is rapidly assembledfrom reusable components created by
multiple suppliers, and wheremachines automate rote or menialtasks. They standardized processes,designs and packaging, using productlines to facilitate systematic reuse, andsupply chains to distribute cost andrisk. Some are now capable of masscustomization, where product variantsare produced rapidly and inexpensivelyon demand to satisfy the specificrequirements of individual customers.
Can Software Be Industrialized?Analogies between software andphysical goods have been hotly debated.Can these patterns of industrializationbe applied to the software industry?Aren’t we somehow special, or differentfrom other industries because of thenature of our product? Peter Wegnersums up the similarities andcontradictions this way [Weg78]:
Software products are in some respectslike tangible products of conventionalengineering disciplines such as bridges,buildings and computers. But there arealso certain important differences thatgive software development a uniqueflavor. Because software is logical notphysical, its costs are concentrated indevelopment rather than production,and since software does not wear out, its reliability depends on logicalqualities like correctness androbustness, rather than physical ones like hardness and malleability.
Some of the discussion has involved an‘apples to oranges’ comparison betweenthe production of physical goods, on onehand, and the development of software,on the other. The key to clearing up the confusion is to understand thedifferences between production anddevelopment, and between economies of scale and scope.
“If the semiconductor industry used a similar approach,they would build the massively complex processors thatpower these applications by hand soldering transistors.”
JOURNAL3 | Software Factories 17
In order to provide return oninvestment, reusable components mustbe reused enough to more than recoverthe cost of their development, eitherdirectly through cost reductions, orindirectly, through risk reductions,time to market reductions, or qualityimprovements. Reusable componentsare financial assets from aninvestment perspective. Since the costof making a component reusable isgenerally quite high, profitable levelsof reuse are unlikely to be reached bychance. A systematic approach to reuseis therefore required. This generallyinvolves identifying a domain in whichmultiple systems will be developed,identifying recurring problems in thatdomain, developing sets of integratedproduction assets that solve thoseproblems, and then applying them assystems are developed in that domain.
Economies of Scale and ScopeSystematic reuse can yield economiesof both scale and scope. These twoeffects are well known in otherindustries. While both reduce time and cost, and improve product quality,by producing multiple productscollectively, rather than individually,they differ in the way they producethese benefits.
Economies of scale arise when multipleidentical instances of a single designare produced collectively, rather thanindividually, as illustrated in Figure 3.They arise in the production of thingslike machine screws, when productionassets like machine tools are used toproduce multiple identical productinstances. A design is created,along with initial instances, calledprototypes, by a resource intensiveprocess, called development, performedby engineers. Many additional
“Other industries increased their capacity by moving from craftsmanship... to manufacturing...where machines automate rote or menial tasks. [And] the key to meeting global demand is to stop wasting the time of skilled developers on rote and menial tasks.”
Initial UseOccurs Here
ReuseOccurs Here
Production AssetsMany Copies(Optional)
FewPrototypes
MultipleDesigns
Initial UseOccurs Here
Production AssetsMany
CopiesFew
Prototypes
ReuseOccurs Here
Single design
Figure 3. Economies of Scale
Figure 4. Economies of Scope
JOURNAL3 | Software Factories 18
instances, called copies, are thenproduced by another process, calledproduction, performed by machinesand/or low cost labor, in order to satisfymarket demand.
Economies of scope arise when multiplesimilar but distinct designs andprototypes are produced collectively,rather than individually, as illustrated in Figure 4. In automobilemanufacturing, for example, multiplesimilar but distinct automobile designsare often developed by composingexisting designs for subcomponents,such as the chassis, body, interior anddrive train, and variants or models areoften created by varying features, suchas engine and trim level, in existingdesigns. In other words, the samepractices, processes, tools and materialsare used to design and prototypemultiple similar but distinct products.The same is true in commercialconstruction, where multiple bridges or skyscrapers rarely share a commondesign. However, an interesting twist incommercial construction is that usuallyonly one or two instances are producedfrom every successful design, soeconomies of scale are rarely, if ever,realized. In automobile manufacturing,where many identical instances areusually produced from successfuldesigns, economies of scope arecomplemented by economies of scale,as illustrated by the copies of eachprototype shown in Figure 4.
Of course, there are importantdifferences between software andeither automobile manufacturing or commercial construction, but it
resembles each of them at times.– In markets like the consumer
desktop, where copies of products likeoperating systems and productivityapplications are mass produced,software exhibits economies of scale,like automobile manufacturing.
– In markets like the enterprise, wherebusiness applications developed forcompetitive advantage are seldom,if ever, mass produced, softwareexhibits only economies of scope,like commercial construction.
We can now see where apples havebeen compared with oranges.Production in physical industries has been naively compared withdevelopment in software. It makes nosense to look for economies of scale indevelopment of any kind, whether ofsoftware or of physical goods. We can,however, expect the industrialization of software development to exploiteconomies of scope.
What Will Industrialization Look Like?Assuming that industrialization canoccur in the software industry, whatwill it look like? We cannot know withcertainty until it happens, of course.We can, however, make educatedguesses based on the way the softwareindustry has evolved, and on whatindustrialization has looked like inother industries. Clearly, softwaredevelopment will never be reduced to a purely mechanical process tended by drones. On the contrary, the key to meeting global demand is to stopwasting the time of skilled developerson rote and menial tasks. We must find
ways to make better use of preciousresources than spending them on themanual construction of end productsthat will require maintenance or evenreplacement in only a few short monthsor years, when the next major platformrelease appears, or when changingmarket conditions make businessrequirements change, which ever comes first.
One way to do this is to give developersways to encapsulate their knowledge asreusable assets that others can apply.Is this far fetched? Patterns alreadydemonstrate limited but effectiveknowledge reuse. The next step is to move from documentation to automation, using languages,frameworks and tools to automatepattern application.
Semiconductor development offers a preview into what softwaredevelopment will look like whenindustrialization has occurred. This isnot to say that software componentswill be as easy to assemble as ASICsany time soon; ASICs are the highlyevolved products of two decades ofinnovation and standardization inpackaging and interface technology.On the other hand, it might take lessthan 20 years. We have the advantageof dealing only with bits, while thesemiconductor industry had theadditional burden of engineering thephysical materials used for componentimplementation. At the same time,the ephemeral nature of bits createschallenges like the protection of digitalproperty rights, as seen in the film andmusic industries.
JOURNAL3 | Software Factories 19
ConclusionThis article has described the inabilityof the software industry to meetprojected demand using currentmethods and practices. A great manyissues are discussed only briefly here,no doubt leaving the reader wantingevidence or more detailed discussion.Much more detailed discussion isprovided in the book ‘SoftwareFactories: Assembling Applicationswith Patterns, Models, Frameworksand Tools’, by Jack Greenfield andKeith Short, from John Wiley and Sons.More information can also be found at
http://msdn.microsoft.com/architecture/overview/ softwarefactories, and at
http://www.softwarefactories.com/
including articles that describe thechronic problems preventing atransition from craftsmanship tomanufacturing, the critical innovationsthat will help the industry overcomethose problems, and the SoftwareFactories methodology, whichintegrates the critical innovations.
Copyright DeclarationCopyright © 2004 by Jack GreenfieldPortions copyright © 2003 by JackGreenfield and Keith Short, andreproduced by permission of WileyPublishing, Inc. All rights reserved.
References1. [Boe81] B Boehm. Software
Engineering Economics. PrenticeHall PTR, 1981
2. [Bro95] F Brooks. The Mythical Man-Month. Addison-Wesley, 1995
3. [Chr97] C Christensen. TheInnovator’s Dilemma, HarvardBusiness School Press, 1997
4. [Kuh70] T Kuhn. The Structure OfScientific Revolutions. The UniversityOf Chicago Press, 1970
5. [RJ96] D Roberts and R. Johnson.Evolving Frameworks: A PatternLanguage for Developing Object-Oriented Frameworks. Proceedings of Pattern Languages of Programs,Allerton Park, Illinois, September 1996
6. [SS02] J. Smith and D Stotts.Elemental Design Patterns – A LinkBetween Architecture and ObjectSemantics. Proceedings of OOPSLA2002
7. [Sta94] The Standish Group.The Chaos Report.http://www.standishgroup.com/sample_research/PDFpages/chaos1994.pdf
8. [Weg78] P Wegner. ResearchDirections In Software Technology.Proceedings Of The 3rd InternationalConference On SoftwareEngineering. 1978
Jack GreenfieldArchitect, Microsoft Corporation [email protected]
Jack Greenfield is an Architect forEnterprise Frameworks and Tools atMicrosoft. He was previously ChiefArchitect, Practitioner Desktop Group,at Rational Software Corporation, andFounder and CTO of InLine SoftwareCorporation. At NeXT, he developed the
Enterprise Objects Framework, nowcalled Apple Web Objects. A well knownspeaker and writer, he also contributedto UML, J2EE and related OMG andJSP specifications. He holds a B.S. inPhysics from George Mason University.
JOURNAL3 | Identity and Access Management 20
Identity and Access ManagementBy Frederick Chong, Microsoft Corporation
1 Context defines the boundary whichan identity is used. The boundarycould be business or applicationrelated. For example, Alice mayuse a work identity with the [email protected] to identify
herself at her Wall Street employer(Wall Street Ace) as well as to executea stock trade in NYSE. Her WallStreet employer and the NYSE wouldbe two different business contextswhere the same identifier is used.
AbstractTo date, many technical decisionmakers in large IT environments haveheard about the principles and benefitsof Service Oriented Architecture (SOA). Despite this fact, very few ITorganizations are yet able to translatethe theoretical underpinnings of SOAinto practical IT actions.
Over the last year, a few individualsolution architects on my team haveattempted to distill the practicalessence of SOA into the following areas: Identity and Access Management,Service Management, EntityAggregation and Process Integration.These four key technical areas presentsignificant technical challenges toovercome but yet provide the critical IT foundations to help businessesrealize the benefits of SOA.
Note that it is our frequent interactionswith architects in the enterprises thatenable us to collate, synthesize andcategorize the practical challenges of SOA into these areas. Our teamorganizes the Strategic ArchitectForums that are held multiple timesworldwide annually. In these events,we conduct small discussion groups tofind out the pain points and technicalguidance customers are looking for. Thefeedback from our customers has beenvery consistent: issues in managingidentities, aggregating data, managingservices, and integrating businessprocesses have been cited over and overagain as major road blocks to realizingmore efficient and agile organizations.In addition, our team also conductsproof-of-concept projects with
customers to drill deeper into the real world requirements andimplementation issues. It is throughthese combinations of broad and deepengagements with customers that weon the Architecture Strategy teamderived our conclusions on the foursignificant areas for IT to invest in.
The key focus of this paper is to provide an overview of the technicalchallenges in one of those areas, namelyidentity and access management; andsecondarily, to help the reader gain an understanding of the commonlyencountered issues in this broad subject.
IntroductionIdentity and access management(I&AM) is a relatively new term thatmeans different things to differentpeople. Frequently, IT professionalshave tended to pigeonhole its meaninginto certain identity and securityrelated problems that they arecurrently faced with. For example,I&AM has been perceived to be asynonym for single sign-on, passwordsynchronization, meta-directory, websingle sign-on, role-based entitlements,and similar ideas.
The primary goal of this paper is toprovide the reader with a succinct andcomprehensive overview of what I&AMmeans. In order to accomplish thispurpose, we have structured theinformation in this paper to helpanswer the following questions:– What is a digital identity?– What does identity and access
management mean?
– What are the key technologycomponents of I&AM?
– How do the components of I&AMrelate to one another?
– What are the key architecturechallenges in IA&M?
Anatomy of a Digital IdentityPersonal identifications in today’ssociety can take many different forms.Some examples of these forms aredriver licenses, travel passports,employee cardkeys, and clubmembership cards. These forms of identifications typically containinformation that is somewhat unique tothe holder, for example, names, addressand photos, as well as informationabout the authorities that issued thecards, for example, an insignia of thelocal department of motor vehicles.
While the notion of identities in the physical world is fairly wellunderstood, the same cannot be said about the definition of digitalidentities. To help lay the ground workfor the rest of the discussions in thispaper, this section describes one notionof a digital identity, as illustrated inFigure 1. Our definition of digitalidentity consists of the following parts:– Identifier: A piece of information that
uniquely identifies the subject of thisidentity within a given context1.Examples of identifiers are emailaddresses, X500 distinguished namesand Globally Unique Identifiers(GUIDs).
– Credentials: Private or public datathat could be used to proveauthenticity of an identity claim. Forexample, Alice enters in a password
2 From Enterprise Identity Management:It’s About the Business, Jamie Lewis,The Burton Group Directory andSecurity Strategies Report, v1 July 2nd 2003.
3 The term ‘life cycle’ when used in the context of digital identities issomewhat inappropriate since digitalidentities are typically not recycled.
However, since the phrase ‘identity lifecycle management’ is well entrenchedin the IT community, we will stick withthe ‘life cycle’ terminology.
JOURNAL3 | Identity and Access Management 21
to prove that she is who she says sheis. This mechanism works becauseonly the authentication system andAlice should know what the passwordfor Alice is. A private key and theassociated X509 public key certificateis another example of credentials.
– Core Attributes: Data that helpdescribe the identity. Core attributesmay be used across a number ofbusiness or application contexts.For example, addresses and phonenumbers are common attributes thatare used and referenced by differentbusiness applications.
– Context-specific Attributes: Data thathelp describe the identity, but whichis only referenced and used withinspecific context where the identity isused. For example, within a company,the employee’s preferred health planinformation is a context specificattribute that is interesting to thecompany’s health care provider butnot necessarily so to the financialservices provider.
What is Identity and AccessManagement?The Burton Group defines identitymanagement as follows: ‘Identity
management is the set of businessprocesses, and a supporting infrastructurefor the creation, maintenance, and useof digital identities’2
In this paper, we define identity andaccess management (I&AM) as follows:‘Identity and access management refersto the processes, technologies andpolicies for managing digital identitiesand controlling how identities can beused to access resources’
We can make a few importantobservations from the above definitions:– I&AM is about the end-to-end life
cycle3 management of digitalidentities. An enterprise classidentity management solution shouldnot be made up of isolated silos ofsecurity technologies, but rather,consists of well integrated fabric of technologies that address thespectrum of scenarios in each stage of the identity life cycle. We will talkmore about these scenarios in a latersection of this paper.
– I&AM is not just about technology,but rather, is comprised of threeindispensable elements: policies,processes and technologies. Policiesrefer to the constraints andstandards that needs to be followedin order to comply with regulationsand business best practices; processesdescribe the sequences of steps thatlead to the completion of businesstasks or functions; technologies are the automated tools that helpaccomplish business goals moreefficiently and accurately whilemeeting the constraints andguidelines specified in the policies.
– The relationships between elementsof I&AM can be represented as thetriangle illustrated in Figure 2.Of significant interest is the fact
that there is a feedback loop that linksall three elements together. The lengthsof the edges represent the proportionsof the elements relative to oneanother in a given I&AM system.Varying the proportion of one elementwill ultimately vary the proportion ofone or more other elements in orderto maintain the shape of a trianglewith a sweet spot (shown as anintersection in the triangle).
– The triangle analogy is perfect fordescribing the relationships andinteractions of policies, processes and technologies in a healthy I&AMsystem as well. Every organization is different and the right mix oftechnologies, policies and processes for one company may not necessarilybe the right balance for a differentcompany. Therefore, each organizationneeds to find its own balancerepresented by the uniqueness of its triangle.
– An organization’s I&AM system doesnot remain static over time. Newtechnologies will get introduced andadopted; new business models andconstraints will change the corporategovernance and processes to dothings. As we mentioned before,
Context-Specific Attributes
Credentials
Core Attributes
Identifier
Figure 1. Anatomy of a Digital Identity
Technology Process
Policy
Figure 2. Essential elements of an identity and access management system
JOURNAL3 | Identity and Access Management 22
when one of the elements change,it is time to find a new balance.It is consequently important tounderstand that I&AM is a journey, not a destination.
Identity and Access Management FrameworkAs implied in the previous sections,identity and access management is a very broad topic that covers bothtechnology and non-technology areas.We will focus the rest of this paperaround the technology aspects ofidentity and access management.
To further contain the technical scopeof this topic that is still sufficientlybroad, it is useful to abide by somestructure for our discussions. We willuse the framework shown in Figure 3,which illustrates several key logicalcomponents of I&AM to lead thediscussions on this subject.
This particular framework highlighted three key ‘buckets’ of technology components:– Identity life cycle management– Access management– Directory services
The components in these technologybuckets are used to meet a set ofrecurring requirements in identitymanagement solutions. We willdescribe the roles that thesecomponents play in the next few sections.
Directory Services As mentioned previously, a digitalidentity consists of a few logical typesof data – the identifier, credentials and attributes. This data needs to be securely stored and organized.Directory services provide theinfrastructure for meeting such needs.Entitlements and security policiesoften control the access and use ofbusiness applications and computinginfrastructure within an organization.Entitlements are the rights andprivileges associated with individualsor groups. Security policies refer to thestandards and constraints under whichIT computing resources operate.A password complexity policy is anexample of a security policy. Anotherexample is the trust configuration of a business application whichmay describe the trusted third partythat the application relies upon to helpauthenticate and identify applicationusers. Like digital identities,entitlements and security policies needto be stored, properly managed anddiscovered. In many cases, directoryservices provide a good foundation for satisfying these requirements.
Access ManagementAccess management refers to theprocess of controlling and grantingaccess to satisfy resource requests.This process is usually completedthrough a sequence of authentication,authorization, and auditing actions.Authentication is the process by which
Applications
DirectoryServices
Users, Attributes,Credentials, Roles, Groups
and Policies
AccessManagement
Authentication& SSOTrust & FederationEntitlements &AuthorizationSecurity AuditingIdentity Propagation,Impersonation and Delegation
IdentityLife-CycleManagement
User ManagementCredential ManagementEntitlement Management
Identity IntegrationProvisioning &Deprovisioning
Figure 3. Logical Components of I&AM
JOURNAL3 | Identity and Access Management 23
identity claims are proven. Authorizationis the determination of whether anidentity is allowed to perform an actionor access a resource. Auditing is theaccounting process for recordingsecurity events that have taken place.Together, authentication, authorization,and auditing are also commonly knownas the gold standards of security. (Thereasoning behind this stems fromthe periodic symbol for Gold, ‘Au’;the prefix for all three processes.)
There are several technical issues thatsolutions architects may encounterwhen designing and integratingauthentication, authorization, andauditing mechanisms into theapplication architecture:– Single Sign-On– Trust and Federation– User Entitlements– Auditing
We will describe these challenges andtheir solutions in more detail later onin this document.
Identity Life Cycle ManagementThe life cycle of a digital identity canbe framed in similar stages to the lifecycles of living things:– Creation– Utilization– Termination
Every stage in an identity’s life cyclehas scenarios that are candidates forautomated management. For example,during the creation of a digital identity,the identity data needs to be propagatedand initialized in identity systems.In other scenarios an identity’sentitlements might need to bemagnified when the user representedby the identity receive a job promotion.Finally when the digital identity is no
longer put to active use, its statusmight need to be changed or theidentity might need to be deleted from the data store.
All events during the life cycle of adigital identity need to be securely,efficiently, and accurately managed,which is exactly what identity life cycle management is about.
The requirements for identity life cycle management can be discussed at several levels, as represented inFigure 4. The types of data that need to be managed are shown at the identitydata level. Based on our previousdefinitions of digital identity, therelevant data includes credentials,such as passwords and certificates;and user attributes, such as names,address and phone numbers. Inaddition to credentials and attributes,there are also user entitlements data to manage. These entitlements aredescribed in more detail later on, butfor now, entitlements should be
considered as the rights and privilegesassociated with identities.
Moving up a level in the illustration,the requirements listed reflect thekinds of operations that can beperformed on identity data. Create,Read, Update, and Delete (CRUD) are data operation primitives coined by the database community. We reusethese primitives here as they provide a very convenient way for classifyingthe kinds of identity managementoperations. For example, we canclassify changes to account status,entitlements, and credentials under the Update data primitive.
The next level in the illustration showstwo identity life cycle administrationmodels: self-service and delegated. Intraditional IT organizations, computeradministration tasks are performed by a centralized group of systemsadministrators. Over time, organizationshave realized that there may be goodeconomic and business reasons to
Business Scenarios
Self Service
Hire/Fire Job ChangeRegistration& Enrollment
DelegatedAdministration
Administration Models
Read:• User Entitlements• User Attributes
Update:• User Entitlements• User Attributes• Update Credentials• Change Account Status
Create:• New Account
Delete:• Existing Account
Data Operations
Identity Data
Credentials Entitlements Attributes
Figure 4. Levels of Identity Life Cycle Management Requirements
JOURNAL3 | Identity and Access Management 24
enable other kinds of administrationmodels as well. For example, it is oftenmore cost effective and efficient forindividuals to be able to update someof their personal attributes, such as address and phone number,by themselves. The self-serviceadministration model enables suchindividual empowerment. The middleground between the self-service andcentralized administration models is delegated administration. In thedelegated model, the responsibilitiesof identity life cycle administration are shared between decentralizedgroups of administrators. The commoncriteria used to determine the scope ofdelegation are organization structureand administration roles. An exampleof delegated administration basedon organization structure is thehierarchy of enterprise, unit anddepartment level administrators in a large organization.
The above life cycle administrationmodels can be used to support a varietyof business scenarios, some of which are listed in Figure 4. For instance,new employees often require accounts to be created and provisioned for them.Conversely, when an employee is nolonger employed, the existing accountstatus might need to be changed. Jobchange scenarios can also have severalimpacts on the digital identities.For example, when Bob receives a promotion, his title might need to be changed and his entitlementsmight need to be extended.
Now that we have a betterunderstanding of identity life cyclemanagement requirements, we areready to drill into the challengesinvolved in meeting those requirements.The illustration in Figure 5 speaks to the
fact that a typical user in an enterprisetypically has to deal with multipledigital identities which might be storedand managed independently of oneanother. This current state of affairs isdue to the ongoing evolution that everybusiness organization goes through.Events such as mergers and acquisitionscan introduce incompatible systems intothe existing IT infrastructure; andevolving business requirements mighthave been met through third partyapplications that are not well integratedwith existing ones.
One may presume that it would havebeen much easier for enterprises todeprecate the current systems andstart over. However, this solution is
seldom a viable option. We have torecognize that existing identitysystems might be around for a longtime, which leads us to find othersolutions for resolving two key issuesarising from managing data acrossdisparate identity systems:
1. Duplication of informationIdentity information is oftenduplicated in multiple systems.For example, attributes such asaddresses and phone numbers areoften stored and managed in morethan one system in an environment.When identity data is duplicated, itcan easily get out of sync if updatesare performed in one system but notthe others.
Figure 5. Current state of affairs: multiple identity systems in the enterprise
SQLDatabase
ExchangeServerDirectory
DigitalIdentity
1
Credentials
Attributes
LDAPDirectory
ActiveDirectory
HR DatabaseSAP
DigitalIdentity
2
Credentials
Attributes
DigitalIdentity
3
Credentials
Attributes
JOURNAL3 | Identity and Access Management 25
2. Lack of integrationThe complete view of a given user’sattributes, credentials and privilegesare often distributed across multipleidentity systems. For example, for a given employee, the humanresource related information mightbe contained in an SAP HR system,the network access account in anActive Directory and the legacyapplication privileges stored in amainframe. Many identity life cyclemanagement scenarios requireidentity information to be pulled from and pushed into severaldifferent systems.
We refer to the above issues as theidentity aggregation challenge, whichwe will describe in more detail in alater section.
Challenges in Identity and Access Management
Single Sign-OnA typical enterprise user has to loginmultiple times in order to gain accessto the various business applicationsthat they use in their jobs. From theuser’s point of view, multiple logins and the need to remember multiplepasswords are some of the leadingcauses of bad application experiences.From the management point of view,forgotten password incidents mostdefinitely increase management costs,and when combined with bad userpassword management habits (such as writing passwords down on yellowsticky notes,) can often lead toincreased opportunities for securitybreaches. Because of the seeminglyintractable problems that multipleidentities present, the concept of singlesign-on (SSO); the ability to login onceand gain access to multiple systems,
has become the ‘Holy Grail’ of identitymanagement projects.
Single Sign-On SolutionsBroadly speaking, there are five classesof SSO solutions. No one type ofsolution is right for every applicationscenario. The best solution is verymuch dependent on factors such aswhere the applications requiring SSOare hosted, limitations placed by theinfrastructure (e.g. firewall restrictions),and the ability to modify theapplications. These are the fivecategories of SSO solutions:1. Web SSO2. Operating System Integrated
Sign-On3. Federated Sign-On4. Identity and Credential Mapping5. Password Synchronization
Web SSO solutions are designed to address web application sign-onrequirements. In these solutions,unauthenticated browser users areredirected to login websites to enter in user identifications and credentials.Upon successful authentication, HTTPcookies are issued and used by webapplications to validate authenticateduser sessions. Microsoft Passport is anexample of Web SSO solutions.
Operating system integrated sign-onrefers to authentication modules andinterfaces built into the operatingsystem. The Windows securitysubsystem provides such capabilitythrough system modules such as LocalSecurity Authority (LSA) and SecuritySpecific Providers (SSP) SSPI refers tothe programming interfaces into theseSSP. Desktop applications that use theSSPI APIs for user authentication canthen ‘piggyback’ on Windows desktoplogin to help achieve application SSO.
GSSAPI on various UNIXimplementations also provide the same application SSO functionality.
Federated sign-on requires theapplication authenticationinfrastructures to understand trustrelationships and interoperate throughstandard protocols. Kerberos and thefuture Active Directory FederationService are examples of federationtechnologies. Federated sign-on meansthat the authentication responsibilityis delegated to a trusted party.Application users need not be promptedto sign-on again as long as the user has been authenticated by a federated(i.e. trusted) authenticationinfrastructure component.
Identity and credential mappingsolutions typically use credentialcaches to keep track of the identitiesand credentials to use for accessing acorresponding lists of application sites.The cache may be updated manually orautomatically when the credential (forexample password) changes. Existingapplications may or may not need to be modified to use identity mappingsolutions. When the application cannotbe modified, a software agent may beinstalled to monitor application loginevents. When the agent detects suchevents, it finds the user credential inthe cache and automatically inputs the credential into the application login prompt.
The password synchronizationtechnique is used to synchronizepasswords at the application credential databases so that users and applications do not have to managemultiple passwords changes. Passwordsynchronization as a silo-ed technologydoes not really provide single sign-on,
JOURNAL3 | Identity and Access Management 26
but results in some conveniences thatapplications can take advantage of. Forexample, with password synchronization,a middle tier application can assumethat the password for an applicationuser is the same at the various systemsit need access to so that the applicationdoes not have to attempt looking up for different passwords to use whenaccessing resources at those systems.
Entitlement ManagementEntitlement management refers to theset of technologies used to grant andrevoke access rights and privileges toidentities. It is closely associated withauthorization, which is the actualprocess of enforcing the access rules,policies and restrictions that areassociated with business functions and data.
Today’s enterprise applicationsfrequently use a combination of role-based authorization and businessrules-based policies to determine whata given identity can or cannot do.
Within a distributed n-tieredapplication, access decisions can bemade at any layer in the application’sarchitecture. For example, thepresentation tier might only present UI choices that the user is authorizedto make. At the service layer of thearchitecture, the service might checkthat the user meets the authorizationcondition for invoking the service. Forexample, only users in the managerrole can invoke the ‘Loan Approval’service. Behind the scenes at thebusiness logic tier, there might need tobe fine grain business policy decisionssuch as ‘Is this request made duringbusiness hours’; at the data layer,the database stored procedure might filter returned data based
on the relationship between the service invoker’s identity and therequested data.
Given the usefulness, and oftenintersecting use, of both role and rule-based authorization schemes, it is notalways clear to application architectshow to model an entitlementmanagement framework thatintegrates both schemes cleanly.Many enterprises have separatecustom engines to do both.
Figure 6 Integrating role and rulebased authorization illustrates arepresentation of how both schemesmight be combined and integrated. Inthis representation, we can envision arole definition (which typically reflectsa job responsibility) with two sets of properties. One set of propertiescontain the identities of peopleor systems that are in the given role.For example, Alice, Bob and Charliemay be assigned to the Manager role.The second set of properties containsthe set of rights that a given role has. Rights can represent businessfunctions or actions on computingresources. For example, transfer funddefines a business function and readfile refers to an operation on acomputing resource. Furthermore,we can assign a set of conditional
statements (business rules) for eachright. For example, the transfer fundright may have a conditional statementto allow the action if the current time is within business hour. Note that theconditional statement might base its decision on dynamic input data that can only be determined atapplication runtime.
It is also important for mostorganizations to have a consolidatedview of all the rights that a givenidentity possesses. To meet thisrequirement, entitlement managementapplications typically leverage acentralized policy store to helpfacilitate centralized management and reporting of users’ rights.
Identity AggregationEnterprise IT systems evolveorganically over the course of anorganization’s history. This is often due to reasons such as mergers andacquisitions, or preferences andchanges of IT leaderships. Theconsequences of this are oftenmanifested through hodge-podges of disconnected IT systems withundesirable architectural artifacts.Identity-related systems are noexceptions to such IT evolutions.
Frequently, the enterprise will have notjust one identity systems, but several,each serving different business functionsbut storing duplicated and related data.Applications that need to integratewith those business functions are thenforced to reconcile the differences andsynchronize the duplications.
For example, a banking customerservice application might need toobtain customer information from an IBM DB2 database, an Oracle
User1
User2
...
Right1
Right2
...
Business Rule1
Business Rule2
...
Group1
Group2
...
Role
Figure 6. Integrating role and rulebased authorization
JOURNAL3 | Identity and Access Management 27
database and a homegrown CRM. Inthis case, the application’s concept of‘Customer’ is defined by three differentsystems. Attributes that describecustomer such as customer name,address, and social security numbermight be stored and duplicated inmultiple systems. On the otherhand, non-duplicated data such as thefinancial products that the customerhas purchased and the customer’s bankbalance might be kept in separatesystems. The application will need to aggregate this data from differentsystems to get the necessary view of the customer.
Moving the underlying identity datainto one huge giant identity system
might seem like an obvious answer tothis problem. However, there are manyreal world issues (for example, the riskof breaking legacy applications) thatprevent such solution from beingbroadly adopted any time soon.
Identity aggregation therefore refers to the set of technologies that helpapplications aggregate identityinformation from different identitysystems, while reducing the complexityof data reconciliation, synchronizationand integration.
There are several technical challengesthat identity aggregation technologiesshould help address:– Maintaining relationships for
data transformations– Optimizing data CRUD operations– Synchronizing data
The next few sub-sections provide anoverview of these design issues.
(a) Maintaining Relationships for DataTransformationsAn identity aggregation solution canprovide several benefits to applicationswith disparate views of identityinformation that manipulate data indifferent systems. The first benefitinvolves providing applications with a consolidated view or an aggregatedview of the data in the individualsystems. In order to transform andrepresent data in different views, theidentity aggregation solution needs to maintain meta-data describing theschema representing the consolidatedview and its relationship with the data schema in the various identity systems.
Let’s look at a specific example of an identity life cycle management
application that is managing dataacross a few existing systems. Theconsolidated view of the managementapplication is represented by a newschema which is made up of new and existing identity attributes.
Figure 7 Reconciling identity schemasillustrates an example where a newidentity schema is defined for theapplication. The new schema refersto attributes in two existing identityschemas and also defined new onesthat are not currently specified(Account Number, Cell phone and Preferences).
In order for applications to query andupdate data in existing stores, we willneed to maintain data relationshipsbetween the attributes defined in thenew schema and the correspondingattributes in the existing schemas. Forexample, the following relationshipswill need to be maintained:
Reference – A reference refers to a pieceof information that unambiguouslyidentifies an instance of data asrepresented by a particular dataschema. For example, the ‘Customer ID’ attribute allows a data instance asrepresented by the application schemein Figure 7 to find its correspondingdata instance as represented byexisting schema 1. Different schemasmay use different references to identifytheir own data instances.
Ownership – A data attribute may be defined in more than one existingschema. Using the same exampleshown in Figure 7 again, we can seethat the ‘Name’ and ‘Address’ attributesare defined in both existing schemas.In the event of a data conflict, theapplication needs to know which
Application IdentitySchema
Name Account Number Address Customer ID Home Phone Cell Phone Email Address Preferences
Existing IdentitySchema 2
Name Address Email Address Account Number Customer ID
Existing IdentitySchema 1
Name Address Customer ID Phone Number
Mappedattributes:<Name, Address,Customer ID, Phone Number>
New attributes:<Cell phone,Preferences>
Mappedattributes:
<Name, Address,Customer ID,Phone Number>
Figure 7. Reconciling identity schemas
JOURNAL3 | Identity and Access Management 28
version holds the authoritative copy tokeep. In addition, there may be scenarioswhere an attribute can get its valuefrom a prioritized list of owners. Inthose scenarios, when the value for an attribute is not present in the firstauthoritative source, the aggregationservice should query the next system in the prioritized list of owners.
Attribute Mapping – Attributes definedin multiple data store may have thesame semantic meaning, but have the same or different syntacticrepresentations. When querying orupdating a data attribute, the identityaggregation service needs to know the attributes that are semanticallyequivalent to one another. For example,although customer id is defined in allthree schemas, it is named differentlyas CustID in identity schema 2. Whenthe application performs an update foran identity’s customer id number, thesystem must also know to update theCustID attribute for the data instancerepresented by schema 2.
(b) Optimizing Data CRUD OperationsAs previously identified, CRUD is adatabase acronym that stands for Create,Read, Update and Delete. CRUDdefines the basic primitive operationsfor manipulating data. The reason why CRUD is raised as a technicalchallenge is because the performancefor completing an aggregation-relatedactivity that involves CRUD operationsacross multiple data backend systemscan vary significantly depending on thedata relationships.
In the best case scenario, CRUDoperations can be parallelized. This ismostly true in situations where thedata instances can be resolved usingthe same reference and the data
reference always resolves to a uniqueinstance. For example, if the socialsecurity number is the only key usedfor querying and aggregating dataacross data stores, that particularquery can be issued in parallel.
In the worst case scenario, the CRUDoperations are serialized across datastores. Serialized operation is commonin situations where the references usedfor resolving data instances havedependencies on other data instances.As a simple illustration, let’s supposewe need to aggregate data from the HR and Benefits databases and theinstance references are employee IDand social security ID respectively.If the only initial key we have for thequery is the employee ID, and thesocial security ID can only be obtainedfrom the HR database, then we willneed to serialize the query in thefollowing order:1. Query the HR database using the
employee ID as the key.2. Extract the social security number
from the above query results.3. Query the benefits database.4. Aggregate the query results.
Replication is a common techniqueused to address performancedegradation due to CRUD operationsacross data stores. To address theperformance issue, a portion of thebackend data attributes may bereplicated to a store maintained by the identity aggregation service.In addition, the local copy of thereplicated data can be further de-normalized to help improve the CRUD performance.
(c) Synchronizing DataData synchronization is needed insituations when one or both of the
following conditions are true:– Duplicate identity attributes exist
in multiple backend stores.– Data is replicated to an intermediate
identity aggregator store.
However, the use of datasynchronization may also introduce other design issues:– Data conflict resolution. In situations
where the data can be updated frommore than one source, it is often easyto introduce conflicting data. Somecommon practices to help mitigatethe situations are as follows:– Assign data ownership priority so
that in the event of conflict, we will use the assigned authority.
– In a conflict, the last writer wins.– Synchronization triggers. A couple
of common approaches are:• Scheduled updates.• Event notification based.
Trust and FederationAs mentioned in the single sign-onsection, federation offers a form ofsingle sign-on solution. However,federation is more than just singlesign-on. Federation implies delegationof responsibilities honored throughtrust relationships between federatedparties. Authentication is just one form of delegated responsibility.Authorization, profile management,pseudonym services, and billing areother forms of identity-relatedfunctions that may be delegated to trusted parties.
There are three technology elementsthat are crucial to the concept of federation:– A federation protocol that enables
parties to communicate.– A flexible trust infrastructure that
supports a variety of trust models.
JOURNAL3 | Identity and Access Management 29
– An extensible policy managementframework that supports differinggovernance requirements.
Federation protocols are the ‘languages’that are used by federating parties tocommunicate with each other. Sincefederation implies that a responsibility is delegated to and performed by adifferent party, the protocol must allowindividuals to obtain ‘capabilities’ –essentially tamper-proof claims that a given identity has successfullycompleted an action or is entitled to a collection of privileges. For example,in the case of federated sign-on, anauthenticated identity obtains acapability that proves that theindividual has successfullyauthenticated with an approvedauthentication service.
In a complex business world, it ispossible to have relatively complextrust schemes involving multiplebusiness parties. Federation technologymust be able to capture the essence ofthose real world trust relationshipsinto simple to understand but powerful
trust models that will help enablevarious business scenarios. Somecommon trust models, illustrated in Figure 8, are as follows:– Hub-and-spoke– Hierarchical– Peer-to-peer Web of Trust
The hub-and-spoke model is thesimplest to understand. In this model,there is a central broker that is directlytrusted by the federating parties. TheEuropean Union is an example of this federation model where the EUcountries directly trust the EU body to provide common economicguidelines and trade opportunities to federated parties.
In the hierarchical model, two partieshave an indirect trust relationship ifthey both have a trust path in theirrespective branches in the hierarchicaltree to a common root authority. TheAmerican political system demonstratesthis trust model. This political systemhas federal, state, county and local citypolitical bodies, each existing atvarious levels of the hierarchy.
The peer-to-peer model represents a collection of ad-hoc direct trustrelationships. Personal relationshipsbetween friends in the physical worldare good examples of this trust model.
Note that it is also possible to extendand form new networks of federationsthat consist of different trust models,or hybrid models.
A basic policy management frameworkmust allow policies to be created,deleted, modified and discovered. Inorder to promote federated systemsthat enable new business models andpartnerships to be quickly integrated,the policy management frameworkmust also be extensible to reflect thedynamicity of the environment itaims to support.
Some examples of application policiesthat are relevant in the federated world are:– Trusted issuer of identity-related
capabilities.– The types of capabilities required
to invoke an application’s operation.
Federation implies delegation of responsibilities honored through trust relationships between federated parties.”
Hub and Spoke
Figure 8. Common trust models
Hierarchical Peer-to-Peer Web of Trust
JOURNAL3 | Identity and Access Management 30
– The kinds of identity informationthat the application expects incapabilities.
– The kinds of privileges that anidentity must demonstrate in order to invoke a service.
AuditingAuditing in the context of I&AM, isabout keeping records of ‘who did what,when’ within the IT infrastructure.Federal regulations such as theSarbanes-Oxley Act are key drivers of the identity-related auditingrequirements.
(a) IT Audit ProcessThe IT audit process typically involvesthe following phases as illustrated inFigure 9:– Audit generation– Data collection and storage– Analysis and feedback
Audit trails can be generated bydifferent infrastructure and applicationcomponents for different purposes. Forexample, firewalls and VPN servers can
generate events to help detect externalintrusions; middleware components cangenerate instrumentation data to helpdetect performance anomalies; andbusiness applications can produceaudit data to aid debugging or complywith regulatory audit requirements.
After the audit data has been produced,it needs to be collected and stored.There are two main models to considerhere: distributed and centralized. In thedistributed model, audit data typicallyremains in the system where the datais generated. With the centralizedapproach, data is sent to a centralcollection and data storage facility.
Once the data is collected, they may beprocessed and analyzed automaticallyor manually. The audit analysis isdesigned to lead to conclusions on whatcorrective actions, if any, are needed toimprove the IT systems and processes.
(b) Audit Systems DesignConsiderationsGiven the typical auditing processdescribed in the previous sections,there are several considerations that are important to the design of auditing systems:– Locality of audit generation
and storage– Separation of auditor’s role– Flow of audited events
Once an audit trail is generated, theaudit data can be stored locally on thesame system or transferred to anotherstorage location. This consideration isimportant from a security perspective,as audit data can be easier to changeand modify if it is stored on the samesystem that generates it. This point canbe illustrated by a simple example:In the event that a system has been
compromised, the attacker might beable to modify the audit trail on thelocal system to cover up the hackingattempt. Therefore, for higher assuranceagainst such attacks, you might wantthe audit system to store the dataseparately on a remote machine.
Role separation is a common bestpractice to help minimize theoccurrence of illegal activities resultingfrom actions that might circumventaccountability checks. For example, acorporate acquisition officer should notbe able to approve purchasing requests.In the field of IT audit, it is commonpractice to separate the systemadministrator role from the auditor’srole. Doing so prevents the systemadministrator who usually has ‘godstatus’ on computers to cover up audittrails of unauthorized activities.
In a distributed design model, whereaudit data is generated and stored indifferent systems, only allowing data to flow out of the audit generationsystems can add another level ofsafeguard. This preventive measurereduces the chances of tampered auditdata replacing actual trails.
In addition, identity auditinginfrastructures are also expected to be:– Efficient (typically means a
message-based asynchronouscommunication channel).
– Available (typically meansdistributed, clustered and fault tolerant).
– Accurate (keep accurate records and traces).
– Non-repudiated (can be admitted into the courts of law as evidence,typically means the records need to be digitally signed).
Reporting andAnalysis
Collection and Storage
Audit Generation
Figure 9. IT audit process
JOURNAL3 | Identity and Access Management 31
ConclusionsOrganizations are made up of peopleand systems, represented within the IT systems as digital identities.Everything that occurs in businessesis the consequence of actions initiatedby and for those identities. Withoutidentities (even anonymous identities)there would be no activities andbusinesses would be lifeless constructs.
At the highest level, SOA can be seen as a way to organize the business ITinfrastructure that executes andmanages business activities. Therefore,enterprises seeking to realize SOA mustfigure out how to resolve the identityand access management challenges they face today. We have provided anoverview on a few keys technology areas:– Achieving user and application
single sign-on.– Aggregating, transforming,
synchronizing and provisioningidentity data from various identity systems.
– Managing access to businessfunctions and data using roles and rules.
– Federating with business partners.– Auditing identity-related activities.
We also hope that the technicaldiscussions on challenges have helpedthe readers gain some appreciation ofthe products and solutions inthis space.
Our final conclusion is:Identity and access management is acornerstone to realizing SOA. Show me an enterprise that claims to be‘SOAccessful’ and I’ll show you anenterprise that has good handle on its identity and access management.
References1. SOA Challenges: Entity Aggregation,
Ramkumar Kothandaraman, .NETArchitecture Center, May 2004(URL: http://msdn.microsoft.com/architecture/default.aspx?pull=/library/en-us/dnbda/html/dngrfsoachallenges-entityaggregation.asp)
2. Enterprise Identity Management:It’s About the Business, Jamie Lewis,The Burton Group Directory andSecurity Strategies Report, v1 July2nd 2003
3. Microsoft Identity and AccessManagement Series, MicrosoftCorporation, May 14th, 2004 (URL:http://www.microsoft.com/downloads/details.aspx?FamilyId=794571E9-0926-4C59-BFA9-B4BFE54D8DD8&displaylang=en)
4. Enterprise Identity Management:Essential SOA Prerequisite Zapflash,Jason Bloomberg, Zapflash, June19th, 2003 (URL:http://www.zapthink.com/report.html?id=ZapFlash-06192003)
Frederick ChongSolutions Architect,Microsoft [email protected]
Frederick Chong is a solutions architectin the Microsoft Architecture StrategyTeam where he delivers guidance onintegrating applications with identityand access management technologies.He discovered his interest in securitywhile prototyping an electronic-cashprotocol in the network security groupat the IBM T J Watson Research
Center. Since then, he has implementedsecurity features and licensingenforcement protocol in Microsoftproduct teams and collaborated withvarious enterprise customersto architect and implement a variety ofsecurity solutions including web singlesign-on, SSL-VPN and web servicessecurity protocol.
JOURNAL3 | Business Patterns – Part 2 32
Business Patterns for Software Engineering Use – Part 2By Philip Teale, Microsoft Corporation and Robert Jarvis, SA Ltd
1 Our definition of a ‘pattern’ followsthe classic definition created byChristopher Alexander in A TimelessWay of Building, Oxford UniversityPress, 1979. When writing patternswe use a Coplien-style notation.
2 For information on SAM – see EnterpriseArchitecture – Understanding the Bigger Picture,Bob Jarvis, a Best Practice Guideline published by the National Computing Centre, UK, May 2003or http://www.systems-advisers.com
IntroductionThis is the second article in a series oftwo. The purpose of these articles is toexplore whether business patterns1 canbe defined in a structured way, and if so – whether these business patternswould be of value for those who buildsoftware systems that support thebusiness. Our point of view is that this value does exist.
The first article was published inJOURNAL2 and it explored how todefine business patterns that would be useful for software engineers.Article 1 sets the scene for this articleand should be read before this one.
In the first article we used anEnterprise Architecture Frameworkcalled SAM2 to analyse the opportunityand we concluded that such businesspatterns will describe:
– The business functions beingsupported.
– The data that is required to supportthe described functions.
– The business components that are theIT representations of the data andfunctions the business needs.
– Optionally, the infrastructure neededto support the functions, data andcomponents. This is necessary in highly distributed enterprises or those made up of divisionsor units with diverse technical or operational environments.
In addition, we defined the businesspatterns that describe the keyrelationships between these dimensions.
In this second article, we describe how to develop business patterns based on business functions, data,and business components. We alsoshow how these can be used to engineer software systems.
Summary of Article 2In this article we use a realistic butsimplified example to show how to use standard techniques to developdescriptions of the business functions,data and business components required
Difference between patterns andsystemsIf you haven’t had much experiencewith patterns, you might look at theexamples in this article and see themas a system development rather thana pattern. This is because, forsoftware engineering, a pattern is astepping stone on the way todeveloping part of a system. Thepattern gets you part of the way tothe final objective by starting you offfrom a tried-and-trusted place. Itlooks familiar, but it is not a solutionin itself – it is an abstraction of thesolution that you have to embellishto make it the solution that suitsyour needs. The tricky bit in creatingpatterns is to get the level ofabstraction right. Very littleabstraction constrains the usefulnessof the pattern, as it is then veryspecific. But a lot of abstraction alsolimits the usefulness as it providesvery little practical value.
Think of a common situation todaywhere you might have designed aservice called ‘message broker’ toprovide message routing andtransformation between a variety ofIT services. A highly abstractedpattern would be exactly what wejust said – ‘if you want to providemessage routing and transformationbetween a variety of IT servicesthen use a message broker’. How
useful was that? Well, it starts youthinking the right way, so it is someuse but does not provide a lot ofpractical help.
On the other hand, after you’ve builta message broker for handling the ITservices required by a bank to handlea consolidated customer view, couldyou take that solution and call it apattern? Yes, but how many would itbe useful for? It is too specific and isa potentially-repeatable solutionrather than a pattern.
Somewhere in between is the ‘sweetspot’, where the key ideas at all thelevels involved in building thesystem can be abstracted andreapplied to solve many industryproblems. These are the mostpowerful patterns. In this article wethink the Business Componentspecification example we show is inthat sweet spot for the healthcareindustry. We think this is animportant aspect of businesspatterns that should be clearlyrecognised – some will not be ofinterest outside the industry (likethis one) and some will (thatrepresent common functions). Wespeculate that the common onesthat cross industries actually identifythe areas mostly ripe for outsourcing– but that’s a whole differentdiscussion. [0]
3 Problem Refinement Model – pleasesee Article 1 in JOURNAL2.
JOURNAL3 | Business Patterns – Part 2 33
for a business pattern. We do notdescribe the infrastructure needed, andhence will not explore the relationshipsto infrastructure either. Our goal isto say ‘it can be done, and you alreadyknow how’ rather than to provide a detailed guide to every step.
First we show a way to define thebusiness functions, data andcomponents we need, and then secondwe use the PRM3 to show a roadmapfor the journey. The example we use is based on the Healthcare industry but the techniques are valid for anyindustry. The techniques shown arethose that were actually used in a realproject. It is important to stress thatwe are not saying that you must usethese techniques. The techniques inthemselves are not important – it’s the results that matter. These shouldenable you to engineer either morerefined patterns or real softwaresystems from the deliverables.
Business Functions To develop business patterns our firstgoal must be to discover, define anddocument relevant business functions.This has two main tasks:
1. We want to discover the atomic levelfunction set that describes theproblem space. These are known as‘primitive functions’ in functionalmodelling. The same things areknown as ‘elementary processes’ inBusiness Process Re-engineering.
2. We want to aggregate the atomicfunctions into larger and largergrained functional groups.
Note that by ‘atomic’ we mean that the function cannot be meaningfullydivided any further – once started,the function has to be completed or
aborted. This can be more fine-grained,and therefore voluminous, than isstrictly necessary for business patterndefinition. Therefore what we actuallyseek for a business pattern is businessfunctions defined at to level ofdecomposition at which meaningfulrelationships can be formed with theother spheres, particularly the datasphere. In the case of business functionswe find this in practice to be at aboutthe fourth level of decomposition fromthe root of the hierarchy. At this level it usually possible to formulate CRUD(Create, Read, Update and Delete)relationships with data entities held in the Data sphere.
We can tackle the definition of businessfunctions either bottom-up or top-downor in a mixture of the two.
Bottom-Up analysisIn this approach the analyst workswith a representative set of users fromthe business domain, to analyse theirview of the business processes thatthey perform. This may be done usinguse case diagrams or any technique
that can show the sequential andparallel tasks carried out in theprocess. The collection of processes isthen catalogued and analysed and it is usually noticed that many of thesteps or low-level tasks carried out inthese processes are repeated manytimes in different processes. Theredundant tasks are identified andeliminated a non-redundant set ofprimitive functions.
This is shown in Figure 1 using asimple flowcharting technique. It canbe seen that task A is redundant and so will appear only once in theprimitive set.
Then having derived the non-redundant primitive set by thisanalysis, we can now iterativelyaggregate the functions into aprovisional hierarchy, such as that shown in Figure 2.
From the point of view of a businesspattern, there are issues with using a bottom-up approach:
Process 1
A B ?
C E
A DX Y
D
Process 2
Figure 1. Process discovery & synthesis
JOURNAL3 | Business Patterns – Part 2 34
1. This can be a very labour-intensiveprocess for a large business area,involving many users. (This can bemitigated by compromising on ascenario + use-case approach ratherthan full process analysis).
2. The nature of the process leads to ananalysis of the ‘as-is’ situation. Thisis acceptable as long as this is clearand the results are used accordingly.
3. In order to be sure that you aredocumenting a pattern, you wouldhave to repeat or at least verify the analysis in several similarenterprises. This could present many practical problems.
The benefits of the bottom-up approachare that the fundamental analysis hasbeen now done and can readily beapplied as a foundation for a solutionderived from the pattern. The point of a pattern is to document successfulpractice, and given that the examplehas been well chosen, this willnaturally occur.
Top-Down Analysis The functional decomposition in Figure2 could be derived another way. Thiscould involve hypothesising the upperlevels of the hierarchy and ‘filling in’the levels below until a satisfactorydegree of detail has been obtained.Clearly it is necessary to verify that theresults are accurate and reflect reality.
Thus, a Top-Down analysis starts by describing the business problemdomain at its most abstract, and then iteratively decomposing until the primitive level is reached. This canbe done by using a standard approachsuch as IDEF04 for functional modelling.Alternatively use the UML extensionsfor business process modelling.
In fact, the ‘top-down’ approach neednot address business processes at all. There are pros and cons to this.A benefit of using this approach inpattern definition is that the work canbe done with industry consultants whohave worked with many clients in theproblem domain. In essence, one is
performing ‘knowledge mining’ withexperts and this can be much faster.It reduces the number of examples youneed to work with directly, and altersthe analyst’s role to that of reviewerrather than worker.
The Mixed Approach In practice, it is often the case that wecarry out the top-down and bottom-upapproaches together, iterativelyswitching from process synthesis tohierarchy construction to decompositionto the next lower level. This has thebenefit of verifying the hypotheticaltop-down decomposition with real-world functions gleaned bottom-up.In practice this can be the quickest and most reliable method.
Functional DecompositionExample The following example, and those thatfollow, are based on a real world situationin Patient Healthcare and show thedeliverables for a core business pattern,using a process analysis approach.
The Enterprise
Marketing
ResearchTerritory
Management
Design &
Development
Workflow
Layout
General
Accounting
Financial
Planning
Product
Specification
Maintenance
Information
Control
Materials
Requirements
Capacity
Planning
Scheduling Purchasing
Receiving Maintenance
Inventory
Control
Equipment
Performance
Budget
Accounting
Funds
Management
Cost
Planning
Capital
Acquisition
Recruiting/
Development
Organisation
Analysis
Review &
Control
Risk
Management
Personnel
Planning
Business
Planning
Shipping
Planning
Forecasting
Sales
Selling
Order
Servicing
Administration
Engineering ProductionMaterials
Management
Facilities
ManagementAdministration Finance
Human
ResourcesManagement
Compensation
Figure 2. Example of typical result of business function analysis
4 A member of the IDEF family of FIPS standards from NIST. Seehttp://www.itl.nist.gov/fipspubs/by-num.htm number 183.
JOURNAL3 | Business Patterns – Part 2 35
Firstly, we carried out an analysis ofthe functional requirements of theproblem domain. Working from providedscenarios such as that for breast cancercare, we have derived over 40 processescarried out by patients, professionals,system administrators and the‘confidentiality guardian’. This latterrole is charged with the stewardship of confidentiality of information.
These processes were documentedusing UML use cases. These proved tobe highly repetitious in that the same
or similar sub-activities re-occurredwithin many use cases. Thus weextracted and consolidated theseactivities into a single list as follows:
Patient Logon (Authentication)Patient Logoff (Audit)Apply Patient SearchManage Own Patient's DetailsManage Own Patient's PreferencesView Personalised AreaView Own GP DetailsManage Own General Health DataManage Donor DetailsManage Patient Details
Manage Family MembersView Immunisations/VaccinationsManage Personal PreferencesView Patient Health RecordsView Patient JourneyView Patient Personalised AreaApply Clinical OverrideReview Clinical OverridesGenerate Patient EventsManage Own Patient EventsConstruct Patient JourneyView Own Patient JourneyView Own Medical HistoryDefine General Consents
Manage & OperateNHS Gateway
Manage Access
Patient Logon
(Authentification)
Patient Logoff
Professional Logon
(Authentication)
Professional Logoff
View Patient Health Records
Apply Clinical Override
Review Clinical Overrides
Generate Patient Events
Manage Own Patient Events
View Own Medical History
Manage PatientEvents
ManageConsents
Maintain CarePathways
Access GP &Hospital Systems
MaintainProfessionalRegister
Manage PatientInformation
Manage PatientJourneys
Maintain HealhSubject
Classifications
ManageAppointments
ManageOrganisational
Structure
Maintain ProfessionalPermissions
Define General Consents
Manage Own Consents
(Patient)
Maintain Care Pathways Access System Index
Access Event Detail
Maintain Clinical Processes
Maintain Professional
Register
Maintain Specific
Permissions
Apply Patient Search
Manage Own Patient’s Details
View Own Personalised Area
View Own GP’s Details
Manage Own General Health Data
Manage Donor Details
Manage Patient Details (by Prof)
Manage Family Members
View Immunisations/Vaccinations
Manage Personal Preferences
View Personalised Area (by Prof)
View Patient Journey
View Own Patient Journey
Maintain Health Subject
Capture Other Health Codes
Model Other Health Code
Perform Health Code
Translation
Book Appointment
Change Appointment
Maintain Role Definitions
Maintain Group/Team Structure
Maintain Group/Team Membership
Maintain Permission Delegations
Define General Permissions
View Own Prof Permissions
Figure 3. Healthcare Example – Functional Decomposition
JOURNAL3 | Business Patterns – Part 2 36
Manage Own ConsentsMaintain Health SubjectsPerform Health Code TranslationCapture Other Health CodeModel Other Health Code StructureMaintain Care PathwaysManage Book AppointmentChange AppointmentAccess Event DetailAccess System IndexMaintain Clinical ProcessesMaintain Role DefinitionsMaintain Group/Team StructureMaintain Group/Team MembershipMaintain Permission DelegationsMaintain Professional RegisterProfessional Logon (Authentication)Professional Logoff (Audit)View Own Prof PermissionsMaintain Specific PermissionsDefine General Permissions
These processes may be represented ina hierarchy as shown in Figure 3. In sodoing we have derived a higher levelwhich summaries and contains theseactivities according to their subject and functional similarity.
Data ModelThe Healthcare example is underpinnedby a comprehensive data model.
We carried out an analysis of the datacreated and managed within the scopeof our problem domain. This revealed31 main entities such as Patient,Healthcare Professional, Patient Event,Care Pathway, and so on. These weredefined. We identified candidateprimary keys and principal attributesfor these entities and mapped therelationships between the entities,including resolving any many-to-manyrelationships. The identification ofthese entities and their relationshipswas driven by the functional
analysis being carried out in parallel.As far as we are aware, all identifiedfunctional requirements are supportedby the data model and vice versa. The31 entities have been allocated to eight‘data subjects’ based on the cohesion of the entities in terms of the relativestrengths of the mutual relationships.
These groups are:– Patients– Patient Consents– Care Pathways– Health Subjects– Clinical Processes– Roles, Teams and Organisations– Professionals and Permissions– Local Systems
These data subjects form the first passdefinition of the required data basesand their provisional content. ThePatients’ data subject is shown inFigure 4. This takes the form of aconventional Entity-Relationshipmodel showing data entities namedand identified by their primary keys.The entities within the coloured areabelong to the Patients data subject.The entities outside the coloured areabelong to other data subjects but havesignificant relationships with entitieswithin the Patient data subject.
A ‘Crow’s Foot’ relationship notationhas been used but an IDEF1x notationcould have been used if preferred. Allmany-to-many relationships have beenresolved. This is necessary becauseM:M relationships usually concealfurther entities and relationships.
This approach allows us to form ashallow hierarchy for data (Root >Subject Area > Entity > Attribute) andthen to form relationships between themembers of Data and the members of
Business Function. This normally takesthe form of a CRUD Matrix showingthe actions of specific BusinessFunctions upon specific Data Entities.
Mapping RelationshipsHaving defined the requiredfunctionality in the form of a functionaldecomposition hierarchy and alsodefined the data required in an entityrelationship data model, we can nowderive a first cut componentarchitecture by comparing theidentified functions and data.
We do this by forming a matrix,the rows of which are the identifiedfunctions and the columns theidentified data entities. In the cells we place a value:
– ‘C’ meaning this function CREATESan instance of this data entity
– ‘R’ meaning this function READS aninstance of this data entity
– ‘U’ meaning this function UPDATESan instance of this data entity
– ‘D’ meaning this function DELETESan instance of this data entity
The result is shown in Figure 5.
Clustered MatrixWe have ensured that every column(data entity) has at least one createoperation and that each row (function)has some activity. Please note that thevalues are not absolutely precise,particularly with regard to readoperations. We have not specified delete operations.
We now analyse the matrix by usingthe affinity analysis and clustering
JOURNAL3 | Business Patterns – Part 2 37
Generic Care Pathway Segment Use
Care Pathway IDCare Pathway Segment ID
Sequence in PathwayMandatory Optional
Generic Care Pathway Event
Care Pathway Event ID
Patient NoCare Pathway IDCare Pathway Segment IDCare Pathway Event ID
used in
results in
offers
may be included in
Patients
Patient ID
Patient NamePatient AddressDate of BirthConsent Status (Default or Specific)Patient AttributesLast LogonPreferences
Patient EventPatient NoEvent DateHealth Subject CodeDuplicate Serial
Patients’ Data LinksPatient NoEvent DateHealth Subject CodeDuplicate SerialSystem IDProtocol/WSDL/URL
System ID Patient NoEvent DateHealth SubjectDuplicate SerialGroup/Team IDClinical Process CodeActivity IDDetail
Patients’ ePR Event
Patient NoEvent DateHealth Subject CodeDuplicate SerialDate & Time
Appointment
Healthcare Prof IDPatient NoEvent DateHealth SubjectDuplicate SerialAccess Date
Professional/Event Access History
Individual Access ConsentsPatient NoHealthcare Prof IDEvent DateHealth Subject
Health Subject Code
Patient NoHealth SubjectRole IDConsent reversal date
Patient NoHealth SubjectHealthcare Prof ID Role ID
has made
has
has
has
has
has
has
has
classifies
contains
Patient
Care Pathway Event Description
Patient Journey
Generic or CustomPlanned DateCompleted Date
Care Pathway IDCare Pathway Segment IDCare Pathway Event IDShort DescriptionStatus/OutcomePatient Confidentiality FlagHealthcare Professional Y/N
Patient Consent Reversal
Care Relationship
Start DateEnd Date
Health Subject
Definition
Date GrantedDate Revoked
Override Used Y/N
TypeProtocol/UDDI
System Index
Location
Figure 4. Example Data Model – Patients’ Data Subject
JOURNAL3 | Business Patterns – Part 2 38
technique. The objective is to deducegroups of functions and entities thatshare create and update operations.We are exploiting the ‘loose coupling’and ‘tight cohesion’ notions used in the functional decomposition with the inter-entity relationships (some of which are vital and others merelytransient) in the data model.
We adjust the model to bring togetherfunctions and entities with strongaffinity. Description of the detailed
algorithm used (the ‘North West’method) is beyond the scope of thispaper, however the result is theemergence of mutually exclusivegroups of functions and entities formedround create and update actions. Thesegroups are our candidate businesscomponents as shown in Figure 6.
Business Component DerivationFirstly we should clarify what we meanby business component.
A business function creates, reads,updates and deletes data. Groupingtogether all the functions that createand update the same data entities,using a technique such as commutativeclustering, defines non-redundant‘building blocks’ – business components– that can be used to constructpatterns, systems or applications thatin turn support particular businessprocesses. The Business Componentssphere is an example of a ‘derivedsphere’ – one which is deduced from
Patient Logon (Authentication)
Patient Logoff
Apply Patient Search
Manage Own Patient’s Details
Manage Own Patient’s Preferences
View Personalised Area
View Own GP Details
Manage Own General Health Data
Manage Donor Details
Manage Patient Details
Manage Family Members
View Immunisations/Vaccinations
Manage Personal Preferences
View Patient Health Records
View Patient Journey
View Patient Personalised Area
Apply Clinical Override
Review Clinical Overrides
Generate Patient Events
Manage Own Patient Events
Construct Patient Journey
View Own Patient Journey
View Own Medical History
Define General Consents
Manage Own Consents
Maintain Health Subjects
Perform Health Code Translation
Capture Other Health Code
Model Other Health Code Structure
Maintain Care Pathways
Manage Book Appointment
Change Appointment
Access Event Detail
Access System Index
Maintain Clinical Processes
Maintain Role Definitions
Maintain Group/Team Structure
Maintain Group/Team Membership
Maintain Permission Delegations
Maintain Professional Register
Prof Logon (Authentication)
Prof Logoff
View Own Prof Permissions
Maintain Specific Permissions
Define General Permissions
NHS GatewayComponent Architecture A
ppoi
ntm
ents
Car
eR
elat
ions
hip
Cli
nica
l Pat
hway
(Loc
al)
Cli
nica
l Pat
hway
Act
ivit
yG
ener
alC
onse
nts
Tabl
e
Gen
eric
Car
eP
athw
ay
Gen
eric
Car
eP
athw
ayE
vent
Gen
eric
Car
eP
athw
ayS
egm
ent
Gen
eric
Car
eP
athw
ayS
egm
ent
Use
Gro
up/T
eam
Mem
bers
hip
Gro
up/T
eam
Rol
es
Gro
up/T
eam
Str
uctu
re
Hea
lth
Sub
ject
Hea
lth
Sub
ject
Cod
eT
rans
lati
onIn
divi
dual
Acc
ess
Con
sent
sN
HS
Gro
ups
&Te
ams
NH
SP
rofe
ssio
nal
NH
SP
rofe
ssio
nal’s
Rol
esN
HS
Rol
es
Oth
erH
ealt
hC
ode
Pat
hway
Seg
men
tE
vent
Use
Pat
ien
t
Pat
ien
tC
onse
ntR
ever
sal
Pat
ien
tE
vent
Pat
ien
tJo
urne
y
Pat
ien
ts’ D
ata
Lin
ks
Pat
ien
ts’ e
PR
Eve
nts
Per
mis
sion
Del
egat
ion
s
Pro
fess
ion
alP
erm
issi
ons
Pro
fess
ion
al/E
vent
Acc
ess
His
tory
Sys
tem
Inde
x
R
R
R
R R
R
R R R R
R R R R
R R
R
R
R R R
R R
R R
R
R
R
R
R
R R R
RRRR R
R RR
RR
R RR R
R
RR
R
R R
R
R
R
R
R
R
RRRR
RR
R
R
R
R
R
R RR
RRRR
R
RR
RR R
R
R
R
R
R
R R R
R
R
RR
R
R
R
R
R
R
C
C
C C
C
C C C C C
CC C
C CC
C
C
C C
C
C
C
C
C
C
C C
C
C
C
C
C
R C
R CR C
R
R
R
R
R
U
U
UU
U
UU
UUU
U
U
U
U
U
Figure 5. Function vs Data ‘CRUD’ Matrix
JOURNAL3 | Business Patterns – Part 2 39
the relationships between two otherspheres. This is a powerful techniquethat exploits hidden value in an enterprise architecture. Byencapsulating functionality and datainto components, software reuse andreplaceability become practical.Further, components offer ‘services’that may be ‘orchestrated’ inconjunction with the services offered by other components to create an application.
The business components resulting fromour cluster analysis include serviceinterfaces, business entity components,data access components and perhapsservice agents. These artifacts align with the .Net ApplicationArchitecture5. The business componentdoes not contain ‘agile’ elements suchas UI Components and UI processcomponents, business workflows, orelements such as security, operationalmanagement and communication.
We think this coarse grained definitionis very useful. It provides the stablepart of the overall solution architecturewhich is then rounded out with theagile elements – like UI, UI Processesand Business Workflows. Thus we canprovide an agile solution based on thefoundation of a stable pattern.
The coarse-grained businesscomponents and their services arediscovered through an affinity and
Patient Logon (Authentication)
Patient Logoff
Apply Patient Search
Manage Own Patient’s Details
Manage Own Patient’s Preferences
View Personalised Area
View Own GP Details
Manage Own General Health Data
Manage Donor Details
Manage Patient Details
Manage Family Members
View Immunisations/Vaccinations
Manage Personal Preferences
View Patient Health Records
View Patient Journey
View Patient Personalised Area
Apply Clinical Override
Review Clinical Overrides
Generate Patient Events
Manage Own Patient Events
Construct Patient Journey
View Own Patient Journey
View Own Medical History
Define General Consents
Manage Own Consents
Maintain Health Subjects
Maintain Care Pathways
Manage Book Appointment
Change Appointment
Access Event Detail
Access System Index
Maintain Clinical Processes
Maintain Role Definitions
Maintain Group/Team Structure
Maintain Group/Team Membership
Maintain Permission Delegations
Maintain Professional Register
Prof Logon (Authentication)
Prof Logoff
View Own Prof Permissions
Maintain Specific Permissions
Define General Permissions
Patient HealthcareComponent Architecture P
atie
nt
Pro
fess
iona
l/Eve
ntA
cces
sH
isto
ryP
atie
nt
Eve
nt
Pat
ien
ts’ D
ata
Lin
ks
Pat
ien
tJo
urne
y
Gen
eral
Con
sent
sTa
ble
Indi
vidu
alA
cces
sC
onse
nts
Pat
ien
tC
onse
ntR
ever
sal
Hea
lth
Sub
ject
Hea
lth
Sub
ject
Cod
eT
rans
lati
onO
ther
Hea
lth
Cod
e
Gen
eric
Car
eP
athw
ay
Gen
eric
Car
eP
athw
ayS
egm
ent
Use
Gen
eric
Car
eP
athw
ayS
egm
ent
Pat
hway
Seg
men
tE
vent
Use
Gen
eric
Car
eP
athw
ayE
vent
App
oint
men
ts
Pat
ien
ts’ e
PR
Eve
nts
Sys
tem
Inde
x
Cli
nica
l Pat
hway
(Loc
al)
Cli
nica
l Pat
hway
Act
ivit
yG
ener
icR
oles
Gro
ups
&Te
ams
Gro
up/T
eam
Str
uctu
re
Gro
up/T
eam
Rol
es
Gro
up/T
eam
Mem
bers
hip
Per
mis
sion
Del
egat
ion
s
Pro
fess
ion
al
Pro
fess
ion
al’s
Rol
es
Car
eR
elat
ions
hip
Pro
fess
ion
alP
erm
issi
ons
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
RR
R R
R
R
R
R
R
R
R
R R
R
RR
R
R
R R R R R
R R R R R
R
R
R R
R
R
R
R R R R R
R
R
R
R R RR RR
R R R
R
R
R R
R
R R
RR
R
R RR R
R
R R
R
R
R
RR
R
RRR
RR
C
C
CC
C
C C
C
C
C CC C
R R
C
C C C C C
C
C CC
C C
R
R
R R
C
C
CC C
C C
C
U
U
U
U
U
U
U
U
U
U
U U
U
U
U
Patient Component
Prof Access History Component
Patients’ Event Component
Health Subject Component
Patient Consent Component
Care Pathways Component
GP & Hospital Systems Component
Appointments Component
Clinical Process Component
Groups & Teams Component
Professionals Component
Permissions Component
Figure 6. Clustered Matrix
5 Application Architecture for .Net:Designing Applications and Services –Patterns and Practices Guide –Microsoft Corporation 2002
JOURNAL3 | Business Patterns – Part 2 40
clustering analysis performed on the relationships between businessfunction and data. After these spheres are defined, a CRUD matrix is created to determine the component-data relationship.
In SAM, business component forms a hierarchy too. Thus the businesscomponent can decompose one level to specify the services, business entityand data access sub-components.At this level, business components will also specify the business services that we intend to expose. It is useful to explicitly call these out, so thatdecisions can be made around whetherthese will be internal servicesor exposed as web services.
Components and ServicesThe list of components is as follows:– Patient Component– Professional Access History
Component– Patients’ Events Component– Patient Consents Component– Health Subject Component– Care Pathways Component– Appointments Component– GP & Hospital Systems Access
Component– Clinical Processes Component– Groups & Teams Component– Professionals Component– Permissions Component
We think that this set ofcomponents represents the essenceof a Business Pattern definition.
One of these components - the Patientcomponent is shown in Figure 7. Thisindicates the functionality, datamanaged and services and featuresoffered by the component.
Services and Features provided:
EXAMPLES:• Patient Logon and Logoff records• Search for Patient by Patient Number or Name/Address/DoB etc. • Maintenance of Patient Details and Preferences by the Patient• Provision of GP assignment for each patient• Provision of patient medical attributes (blood group, etc.) – GP supplied• Provision of immunisation/vaccination information – GP supplied• Etc
Business Component Specification
Patient Component
Description
Functions supported (Business Logic) Data maintained (Business Entities)
The Patient component supports storage and access to data regarding a patient. Functions are provided to input, validate, maintain, store and output standing patient data such as name and address, personal details and limited medically-related data should this be available.
EXAMPLES:Patient Logon (Authentication)Patient Logoff (Audit)Apply Patient SearchManage Own Patient’s DetailsView Own GP DetailsManage Own General Health DataManage Donor DetailsManage Patient Data (by Professional)Manage Family MembersView Immunisations/VaccinationsManage Personal PreferencesEtc
EXAMPLES:Patient NumberPatient NamePatient AddressDate of BirthPatient Attributes: Phone NumberOccupation
Sex Special Needs Marital Status Weight History Etc…
PKAKAKAKAtts
Figure 7. Sample Business Component.
Architecture
Business
PrimitiveFunctions Optionally
1. Use Case Model
2. Analysis and Design Model
3. Deployment Model
1. Bus Process analysis
2. IDEFO Model
UML Elaborationfor Projects
Conceptual 3. Functional Decomposition4. E/R Data Model5. Business Component6. Business Services
7. Application Model 8. IT Services 9. IT Contracts10. Process Model
11. Topologies12. Placements
13. Product/Function mapping
Logical
Physical
Implementation
Design
Figure 8. PRM with Architect and Design views
JOURNAL3 | Business Patterns – Part 2 41
How the Business Patterns areuseful for Software EngineeringNow we’ve reached the second part ofthis article where we talk about theroadmap for how to use the abovedescriptions of business patterns to
engineer either other types of patterns,or actual software systems.
We illustrate this using the five layersof the PRM as shown in Figure 8. You’llnote that the PRM distinguishes
between the architect’s view (a higher-level view, for example guiding aprogramme of projects) and a designer’sview (for example, design for a projector subset thereof). Figure 8 intends toindicate the different techniques andnotations appropriate to these differentviews, just to give you a flavour for howthe refinement is done.
What is key to note is that the PRMsimply identifies the elements needs toget right through to an implementation– it makes no assumptions orjudgements about how best to achievethis! You can use whatever approachyou favour to fill out the set ofdeliverables and in whatever order youwant (after the business patterns aredefined). You chose the best way for theorganisation that you are working in.
When considering business patterns we need to recognise this essentialdifference between architecture anddesign in the context of the side-barabout the right level of abstraction.
Where the Business Patterns fitThe sweetest spot for the businesspatterns is shown in Figure 9. Herethey define those stable elements of abusiness, the business functions, dataand business components that are inscope for the effort. Used this way theycan be used to guide large programmesof work, which leads to greaterconsistency across projects, with less overall effort.
Business Patterns and ServiceOriented Architecture For those that want to provide eitherother IT patterns, or further guidance
Architecture
Business
PrimitiveFunctions Optionally
1. Use Case Model
2. Analysis and Design Model
3. Deployment Model
1. Bus Processanalysis
2. IDEFO Model
UML Elaborationfor Projects
Conceptual 3. Functional Decomposition4. E/R Data Model5. Business Componente6. Business Services
7. Application Model 8. IT Services 9. IT Contracts10. Process Model
11. Topologies12. Placements
13. Product/Function mapping
Logical
Physical
Implementation
Design
STABLE:Base
BusinessPattern
AGILE:BaseSOAModel
Figure 9. Business patterns sweet spot in the PRM
STABLE:Base
BusinessPattern
Architecture
Business
PrimitiveFunctions Optionally
1. Use Case Model
2. Analysis and Design Model
3. Deployment Model
1. Bus Processanalysis
2. IDEFO Model
UML Elaborationfor Projects
Conceptual 3. Functional Decomposition4. E/R Data Model5. Business Component6. Business Services
7. Application Model 8. IT Services 9. IT Contracts10. Process Model
11. Topologies12. Placements
13. Product/Function mapping
Logical
Physical
Implementation
Design
6 See http://www.microsoft.com/resources/practices/ or Amazon.7 http://www.martinfowler.com/books.html#eaa8 Design Patterns Elements of Reusable Object-Oriented Software by Erich Gamma, Richard Helm,Ralph Johnson, and John Vlissides. http://hillside.net/patterns/DPBook/DPBook.html
Figure 10. Adding an IT architecture to the business patterns
related to providing an IT solution forthe business problem, the businesspatterns can be refined further withthe elements of a Service OrientedArchitecture (SOA is not required, butis an excellent fit; what is needed is anarchitectural elaboration to transforminto an IT solution).
To do this we need to add the topics thatwe said provided the opportunity forbusiness agility, as shown in Figure 10.
The elaboration that we are describinghere and in the next part could still bedescribed in the form of patterns. Theaddition of these two sets of elementschanges a business pattern into abusiness solution pattern. Alternatively,we could build a specific businesssolution architecture for the businesspattern. We’ll revisit this in a moment.
Note that at this stage all the work isstill technology product-independent.
Business patterns, SOA and(Microsoft) Product detailsFinally we can elaborate yet further with product and practicerecommendations for successfullyimplementing the solution on Microsofttechnology, as in Figure 11.
At this point we have deliveredbusiness patterns, a service-orientedapplication architecture, and a set ofMicrosoft patterns for implementation.Examples of the Microsoft technologypatterns can be found on
http://www.microsoft.com/resources/practices/
JOURNAL3 | Business Patterns – Part 2 42
STABLE:Base
BusinessPattern
AGILE:BaseSOAModel
IndustrySpecificSolutions
Architecture
Business
PrimitiveFunctions
1. Bus Processanalysis
2. IDEFO Model
Conceptual 3. Functional Decomposition4. E/R Data Model5. Business Component6. Business Services
7. Application Model 8. IT Services 9. IT Contracts10. Process Model
11. Topologies12. Placements
13. Product/Function mapping
Logical
Physical
Implementation
Design
AGILE:Architectural
ProductGuidance
Figure 11. Adding Microsoft technology elements
Figure 12. Refinement for solution design
STABLE:Base
BusinessPattern
AGILE:Architectural
ProductGuidance
AGILE:BaseSOAModel
Architecture
Business
PrimitiveFunctions Optionally
1. Use Case Model
2. Analysis and Design Model
3. Deployment Model
1. Bus Processanalysis
2.IDEFO Model
UML Elaborationfor Projects
Conceptual 3. Functional Decomposition4. E/R Data Model5. Business Component6. Business Services
7. Application Model 8. IT Services 9. IT Contracts10. Process Model
11. Topologies12. Placements
13. Product/Function mapping
Logical
Physical
Implementation
Design
Robert Jarvis,Director, SA [email protected] Jarvis is a Director of Systems Advisers Limited, aUK consultancy specialising in the development of StrategicSystems Architectures for major international enterprises.He is also an Associate Architectural Consultant withMicrosoft Ltd. Bob has over 30 years experience as an
International Systems Consultant and Architect advisingbusiness and governmental organisations in the UK,Continental Europe and the Americas. He specialises in Enterprise Architecture working particularly on thebusiness/technology intersection. He is the author of‘Enterprise Architecture – Understanding the BiggerPicture’, a best practice guideline published by the UK’s National Computing Centre in 2003.
JOURNAL3 | Business Patterns – Part 2 43
At this stage we would have completedthe delivery of a full set of linkedpatterns, or an IT solution, to solve acommon business problem at thearchitecture level. But what if you’dlike to actually build a system? Youprobably need more detail, which iswhat the design view will deliver.
Business Patterns and Industry SolutionsThe above deliverables provide aguiding architecture which can be used to scope, costs and govern the IT projects that implement it. However,it is not detailed enough for solutionimplementation. If we want to providea solution, we need further elaborationat the design level, which is guided bythe architecture provided. It is probablethat the solution will be implementedin several projects, and in this case thearchitecture is what keeps all theprojects on the right track for cross-project consistency.
Figure 12 shows this next refinement.In the figure we show UML as it is acommon way to drive a design phase.Again though we want to be clear thatthis is an illustration and there isnothing to say that is has to be donethis way. All we are saying is that thisis a common way to achieve therefinement of the architecture into a design for implementation.
While we are performing thisrefinement, we can again be creatingpatterns – IT design ones this time. Infact we are now getting (finally) to theterritory that most of today’s softwarepattern literature describes! Examplesinclude Microsoft’s Enterprise SolutionPatterns Using Microsoft .NET6, andMartin Fowler’s Patterns of EnterpriseApplication Architecture7.
Or, rather than patterns we can becreating an actual solution design.
This is the end of the roadmap for thisarticle. Clearly the last stage would beto implement the design, and that iswhere patterns such as those of theGang of Four8 and the Patterns-Oriented Software Architecture set are very relevant, as well as theMicrosoft and Martin Fowler patternsalready mentioned.
Summary – Framework andapproach for Business Patternsand Industry SolutionsWhat we have described in this article is a framework and approach forcreating and then using businesspatterns through a guiding architecturethat can be used to scope, cost andgovern IT projects to implement it. Toprovide a solution, you need furtherelaboration at the design level, and thisis guided by the architecture provided.
Disclaimer:The opinions expressed in this paperare those of the authors. These are notnecessarily endorsed by their companiesand there is no implication that any of these ideas or concepts will bedelivered as offerings or products by those companies.
Philip Teale,Partner Strategy Consultant,Microsoft [email protected]
Philip Teale is a Partner StrategyConsultant working for Enterprise & Partner Group in Microsoft UK.Previously, he worked for the MicrosoftPrescriptive Architecture Group inRedmond, and for Microsoft ConsultingServices before that. He has 29 yearsof Enterprise IT experience of whichfour years have been with Microsoftand 16 with IBM, in both field andsoftware development roles. His
international experience includes nineyears working in the USA, three yearsin Canada and 17 years in the UK.Phil’s background is in architecting,designing and building large complexdistributed commercial systems. Hismost recent contribution to industrythought-leadership was to driveMicrosoft in the creation of patternsfor enterprise systems development.He is a Fellow of the RSA.
JOURNAL3 | Data Transfer Strategies 44
A Strategic Approach toData Transfer MethodsBy E G Nadhan and Jay-Louise Weldon, EDS
“A sound, enterprise-wide data transfer strategy is necessary to guide IT practitioners and to enable consistent representation of key business entities across enterprise applications.”
IntroductionToday, business is driven by havingaccess to the right information at theright time. Information is the lifelinefor the enterprise. However, timelyaccess to the right information iscomplicated by the number andcomplexity of business applications and the increased volumes of datamaintained. Data needs to be shared in order for business processes to be effective across the enterprise.Organizations have a variety of waysin which they can share data. A fullyintegrated set of applications or accessto common databases is ideal. However,if these alternatives are not practical,data must be moved from oneapplication or database to another and the designer must choose from therange of alternatives that exist for datatransfer. As a result, data sharing canpose a significant challenge in theabsence of an established data transferstrategy for the enterprise.
Most enterprises have acquired or builtapplications that support the executionof business processes specific toautonomous business units within theenterprise. While these applicationsserve the specific need of the businessunits, there continues to be a need to share data collected or maintainedby these applications with the rest of the enterprise. In cases where theapplications serve as the Systems ofRecord, the volume of data to be sharedis relatively high. Further, enterpriseshave accumulated huge volumes of dataover the past few decades as storagecosts have decreased and the amount of activity tracked in the e-Commerceenvironment has grown beyond thatof the mainframe-centric world.
Modern IT organizations face thechallenge of storing, managing, andfacilitating the exchange of data atunprecedented volumes. While database management systems haveevolved to address the storage andmanagement of terabytes of data, theissue of effectively exchanging highvolumes of data between and amongenterprise applications remains. Asound, enterprise-wide data transferstrategy is necessary to guide ITpractitioners and to enable consistentrepresentation of key business entitiesacross enterprise applications.
BackgroundThe mainframe world of the 70’sconsisted of punch card drivenmonolithic applications, many of whichcontinue to be the systems of record inorganizations today. The advent of thePersonal Computer in the 80’s fosteredthe client-server world of the early90’s where the PC evolved into a robustclient workstation. Processing powercontinued to grow exponentiallyresulting in the introduction of mid-range servers that were employed by key business units within theorganization. Client-server technologygave these autonomous business unitsthe power to store and use data in theirown world. Such autonomy gave birthto many repositories that did a good job of storing isolated pockets of datawithin the enterprise. N-tier distributedcomputing in the late 90’s resulted in the creation of additional layers that stored data specific to thesebusiness units.
While departmental business processesare not impacted by the proliferation ofdata across multiple repositories, there
exists a critical need to leverage data at an enterprise level – as well as atthe business unit level. For example,organizations need to have anenterprise level view of the customerand serve their customers as a singlelogical entity. In today’s world of real-time online interaction with customers,end-to-end system response has becomea critical success factor as well. Thefundamental requirements to providebasic customer service have notchanged over the years. However,maintenance and retrieval of the dataexpediently to provide such service hasbecome a much more complex process.
In spite of this complexity, enterprisesof today need to have access to all thesepockets of data as well as the originalsystems of record. Such access isaccomplished either by buildingconnection mechanisms to the varioussystems or by transferring databetween systems at periodic intervals.Enterprise Application Integration(EAI) tools can be applied to movetransactions and messages from one application to another. ExtractTransformation and Load (ETL) toolsperform the same task but usuallymove data in bulk.
This article describes the optionsavailable to address this problem of data sharing. While the options are not mutually exclusive, theyrepresent logically different design and execution principles.
Target AudienceIT personnel who are faced with thechallenges of sharing data betweenmultiple applications within theenterprise would benefit from the
JOURNAL3 | Data Transfer Strategies 45
contents of this article. Such personnelinclude IT Enterprise Architects, DataArchitects, Integration Architects aswell as Subject Matter Experts for thekey enterprise applications. Processand Functional managers within the enterprise who work closely with the IT architects will develop an appreciation for the complexities of data sharing driven by businessprocess changes.
Problem DefinitionApplications often need to make theirdata accessible to other applicationsand databases for various reasons.Data may need to be moved from one platform to another or from one geographic location to another.Data may need to be moved to make it accessible to other applications that need it without impacting theperformance of the source system.Changes in data may need to be movedto keep two systems in sync. Often,firms will create a shared repository,called an Operational Data Store(ODS), to collect data from sourcesystems and make it available to othersystems and databases. Data mustthen be moved from the originatingapplication to the ODS.
There are many ways to accomplishdata transfer and many factors toconsider when choosing the alternativethat best fits the situation at hand.Efficiencies become critical when the data volume is large. Bulk datatransfer may not be a viable alternativedue to time constraints. At the sametime, identification of changed data can be a challenge.
The example below represents a realistic scenario where this situation manifests itself.
Sample ScenarioA customer-facing website allowssubscribers to a service to enroll online.The processes involved in this activitywill capture a number of relevant dataelements about the subscriber. Thecaptured data will be immediatelyhoused in a Sales system (e.g. an OrderManagement System). In this example,the Sales system would be consideredthe System of Record for these dataelements. Subscriber data is, no doubt,critical to the Sales department. At thesame time, it is also important to severalother business units. For instance, theBilling department will need it to makesure that financial transactions withthe subscriber are executed. And, theMarketing department may want thisdata to help design campaigns to cross-sell and up-sell products and servicesto the subscriber. Therefore, it is crucialthat subscriber data be placed, as soonas possible in an enterprise accessibleOperational Data Store from which it can be made available to thoseapplications that need it.
From a systems standpoint, Figure 1represents the scenario just described.It illustrates a front end applicationstoring data into its proprietary System of Record and receiving anacknowledgement of a successfulupdate. This System of Record isconstantly populated with highvolumes of data that need to betransferred to an Operational DataStore so that they may be shared with the rest of the enterprise. Thesubsequent sections illustrate thedifferent ways of accomplishing such a transfer.
Figure 1 illustrates the following steps:1. Front End Application updates
System of Record.
2. System of Record acknowledgessuccessful update.
3. Transfer of data to the OperationalData Store.
Depending on the context of the specificproblem domain for a given enterprise,there are multiple approaches to effectthe transfer of data to the ODS in thisscenario. The various approachesinvolved are described in the sectionsthat follow.
The approaches presented are based on the following assumptions:1. For simplicity’s sake, we have
assumed that there is only oneSystem of Record within theEnterprise for any given dataelement. Propagation of data tomultiple Systems of Record can be accomplished using one or more of these options.
2. An update to a System Of Record canmean the creation, modification, oreven the deletion of a logical record.
3. The acknowledgement step is thefinal message to the Front EndApplication that indicates that all the intermediate steps involved in the propagation of data to theSystem of Record as well as theOperational Data Store have beensuccessfully completed. Additionalacknowledgement steps betweenselected pairs of nodes might be necessary depending on the
System OfRecord
OperationalData Store
2
1
3
Front EndApplication
Figure 1. Sample Scenario
JOURNAL3 | Data Transfer Strategies 46
implementation context for specificbusiness scenarios.
4. Metadata, while not directlyaddressed in this paper, is a crucialconsideration for data transfer1.It is assumed that all the optionsdiscussed entail the capture,manipulation and transfer ofmetadata. However, the discussionin this paper is limited to the logicalflow of data between the differentnodes within the end-to-end process.
Business Process ReviewThere are various options available forengineering the transfer of data withinthe Sample Scenario defined in Figure 1.
However, prior to exercising any givenoption, it is prudent to take a step back,review and validate the business needfor the transfer of data. The actualtransfer of data between systems could be a physical manifestation of adifferent problem at a logical businessprocess level. A review of the end-to-end processes may expose opportunitiesto streamline the business process flowresulting in the rationalization of theconstituent applications. Suchrationalization could mitigate and insome cases, eliminate the need for suchtransfer of data. Some simple questionsto ask would include:– Why does the data need
to be transferred?– Why can’t the data stay
in a single system?
A viable answer to these questionscould eliminate the need for suchtransfer of data.
If there is still a clear need for thistransfer of data even after a review ofthe end-to-end business process, thereare multiple options available thatbroadly conform to one or more of these approaches:– EAI Technologies– ETL Technologies– Combinations
The remaining options explore thesedifferent possibilities.
Data Transfer OptionsThis section describes the architecturaloptions that are available for sharingdata between discrete applications.The discussion here is independent ofcommercial solutions; rather it focuseson genres of technologies that areavailable in the market today.
The options discussed in this section are:– Option 1: EAI Real-time Transfer– Option 2: EAI Propagation of
Incremental records– Option 3: Incremental Batch Transfer
(Changed Data Capture)– Option 4: Native Replication– Option 5: Bulk Refresh using Batch
File Transfer– Option 6: ETL/ELT Transfer– Option 7: Enterprise Information
Integration
Option 1: EAI Real-time Transfer Figure 2 illustrates the manner inwhich an EAI Integration Broker2
can facilitate this transfer.
This option is application-driven andmost appropriate for data transferswhere the updates to the System ofRecord and the ODS are part of thesame transaction. An IntegrationBroker receives the transactioninitiated by the Front End Applicationafter which it assumes responsibilityfor the propagation of the data to theSystem of Record as well as theOperational Data Store. The steps areexecuted in the following sequence:1. Front End Application initiates
update to the System of Record.2. Integration Broker receives this
update from the Front EndApplication and sends it to theSystem of Record.
3. System of Record acknowledges the update.
4. Integration Broker immediatelyinitiates a corresponding update tothe Operational Data Store, therebyeffecting an immediate, real-timetransfer of this data.
5. ODS acknowledges the receipt of this update.
System OfRecord
OperationalData Store
61
2 3
54
Front EndApplication
IntegrationBroker
Figure 2. EAI Real-time Transfer
1 Effective data sharing requires a common understanding of themeaning and structure of data for theprovider and the receiver. Metadata –data about data – is the vehicle for achieving that understanding.When data is shared or physically
transferred between parties, metadataalso must be exchanged. It is thedesigner’s responsibility to ensure the appropriate metadata is capturedand transferred in all data transfersituations.
2 An integration broker is a componentthat routes the messages exchangedbetween applications. It facilitates the conditional transfer of messagesbetween applications based onpredefined rules driven by businesslogic and data synchronizationrequirements.
JOURNAL3 | Data Transfer Strategies 47
“While departmental business processes are not impacted by the proliferation of data across multiple repositories, there exists a critical need to leverage data at an enterprise level as well as at the business unit level.”
6. Integration broker sends theacknowledgement of these updates to the Front-End Application.
Usage Scenario – Financial InstitutionA front end CRM application capturesdata about prospects calling into theContact Center. The CRM applicationpropagates the prospect data to anOperational Data Store that containsthe basic customer data for enterprise-wide reference. This data needs to bepropagated to the ODS immediately sothat the most current data is availableto all the other customer-facingapplications like the Automated TellerMachine (ATM), financial centers(branches) and online banking access.
Option 2: EAI Propagation ofIncremental records This option is application-driven andappropriate for lower priority data.The Front End Application updates theSystem of Record after which this datais propagated to the ODS through theIntegration Broker. This is characteristicof scenarios where there is a tightlycoupled portal to an ERP or CRMsystem. There are two differentmechanisms to effect this transfer of data to the ODS:– Option 2a: Push to Integration
Broker: System of Record initiates the notification of the receipt of thisdata to the Integration Broker. The‘push’ is frequently triggered by a scheduled requirement, forexample, daily update.
– Option 2b: Pull from IntegrationBroker: Integration Broker continuouslypolls the System of Record for receiptof this data. The ‘pull’ is frequentlytriggered by a business event in theapplication using the ODS, forexample, a service transaction thatrequires up-to-date customer data.
Usage Scenario – ManufacturingOrganizationAn order entry ERP application is usedby Customer Service Representativesto enter orders every hour directly intothe backend orders database. Neworders received must be transferred to the enterprise service dashboardrepository on a daily basis. Theenterprise service dashboard providesmanagement a holistic view of theorder volume as of the previousbusiness day. The first option could be adaily ‘push’ of new orders from the ERPapplication to the dashboard repository.Or, the dashboard could initiate a ‘pull’from the orders database through theIntegration Broker to provide this datawhen management requires the latestview of the order volume.
Each of these options is explained in further detail below.
Option 2a: Push to IntegrationBrokerFigure 3 illustrates the EAI propagationof incremental records by having theSystem of Record push this data to the Integration Broker. The steps areexecuted in the following sequence:1. Front End Application initiates
update to the System Of Record 2. System of Record notifies Integration
Broker about receipt of this dataafter completing the update.
3. Integration Broker receives thisupdate and sends it to the ODS.
4. ODS acknowledges the update.5. Integration Broker sends an
acknowledgement of the successfulpropagation of this data to the FrontEnd Application.
Option 2b: Pull from Integration BrokerFigure 4 illustrates the EAI propagation
of incremental records by havingIntegration Broker poll the System of Record on a regular basis andpropagate this data to the ODS.The steps are executed in the following sequence:1. Front End Application initiates
update to the System of Record.2. Integration Broker polls the System
of Record to check if any new datahas been received.
3. System of Record responds to the poll.4. If there is new data to be propagated,
Integration Broker sends an updateto the ODS.
System OfRecord
OperationalData Store
5
2
34
1Front EndApplication
IntegrationBroker
Figure 3. Push to Integration Broker
System OfRecord
OperationalData Store
6
2
45
1
3
Front EndApplication
IntegrationBroker
Figure 4. Pull from Integration Broker
JOURNAL3 | Data Transfer Strategies 48
5. ODS acknowledges the update.6. Integration Broker sends an
acknowledgement of the successfulpropagation of this data to the FrontEnd Application.
Option 3: Incremental BatchTransfer (Changed Data Capture)Option 3 is data-driven and is used toperiodically move new or changed datafrom the source to the target datastore. This option is applicable toscenarios where it is acceptable for thedata updated in the System of Recordto be provided to other applicationsafter a finite time window (e.g. oneday). In such scenarios, the data istransferred on an incremental basisfrom the System of Record to the ODS.This data sharing option involvescapturing changed data from one ormore source applications and thentransporting this data to one or moretarget operations in batch. This isgraphically depicted in Figure 5.Typical considerations in this optioninclude identifying a batch transferwindow that is conducive to both thesource and target system(s) to extractand transport the data.
There are two ways to accomplish this:1. Change Log: System of Record
maintains the changed data indedicated record sets so that theBatch Transfer Program can directlyread these record sets to obtain the delta since the last transfer.In this case, the System of Record is responsible for identifying thechanged data in real-time as andwhen the change happens.
2. Comparison to previous: BatchTransfer Program leverages the data in the base record sets withinthe System of Record to identify the changed content. In this case,the Batch Transfer Program has the responsibility of comparing the current state of data with earlier states to determine what had changed in the interim.
The typical sequence of events for thiskind of data sharing is as follows:1. Front End Application initiates
update to the System of Record.2. Batch Transfer Program fetches
changed data from System of Record.3. Batch Transfer Program updates
Operational Data Store.4. An acknowledgement is sent to the
Front End Application, System ofRecord and/or the Batch TransferProgram after the Operational DataStore has been successfully updated.
Usage Scenario – Service ProviderThe sales force uses a sales leadsdatabase that tracks all the leads thatthe sales representatives are pursuing.The project delivery unit tracks theresources required for sales and deliveryrelated activities. The project deliveryunit maps resource requirements toexisting projects as well as leadscurrently in progress. To that end, theleads data is transferred on a daily
basis from the sales leads database tothe project delivery database throughthe incremental batch transfer option.
Option 4: Native ReplicationOption 4 is a data-driven option that isespecially relevant for high-availabilitysituations, e.g., emergency services,where the source and target data storesneed to stay in sync virtually all thetime. This data sharing option involvesthe use of native features of databasemanagement systems (DBMS) to reflect changes in one or more sourcedatabases to one or more targetdatabases. This could happen either in (near) real-time or in batch mode.The typical sequence of events fornative replication is:
1. Front End Application initiatesupdate to the System of Record.
2. Native Replication transfers data from System of Record to Operational Data Store.
3. Operational Data Store sends anacknowledgement of receipt of databack to the System of Record.
4. The System of Record sends anacknowledgement of the success of the operation to the Front End Application.
System OfRecord
OperationalData Store
4
1
4
BatchTransferProgram
2
34
Front EndApplication
Figure 5. Incremental Batch Transfer
System OfRecord
OperationalData Store
1
4
32
Front EndApplication
Figure 6. Native Replication
JOURNAL3 | Data Transfer Strategies 49
“The enterprise-wide data model functions like a virtual database. In some respects, it is a view, in relational database terms, on tables spread across multiple physical databases.”
Usage Scenario – Health Care PayerClaims data is being entered through a two-tier Client Server application to a backend RDBMS by CustomerService Representatives. Updates tothe Customer Profile are also made inthe System of Record while enteringdata about the claims. Customer Profileupdates are directly replicated into theODS which serves as the CustomerInformation File for all the otherenterprise applications.
Option 5: Bulk Refresh using BatchFile TransferThis option is data-driven andappropriate when a large amount ofdata, for example, a reference table ofproduct data, needs to be periodicallybrought into sync with the System ofRecord. This option transfers all thedata inclusive of the latest changes on a periodic basis. All the records areextracted from the System of Recordand refreshed into the ODS. Existing
records in the ODS are purged duringeach transfer. Such transfers aretypically done in batch mode overnight.
Bulk Refresh is well suited forscenarios where there is significantoverhead involved in identifying andpropagating the incremental changes.The incremental approach can be more error prone and therefore,maintenance intensive.These types of transfers can beaccomplished in one of two ways:– Option 5a: File Extract: A program
in System of Record extracts all therecords into an intermediate file.This file is subsequently loaded into the ODS by another program.
– Option 5b: Program Extract: Aseparate program queries the Systemof Record and transfers each recordin real time to the ODS. There is nointermediate file created.
Option 5a: File Extract with Full RefreshFigure 7 illustrates the file basedextraction process for bulk transfer ofdata. The following steps are executedin this process:
System OfRecord
OperationalData Store
3
4
1
2
ExtractFile
Front EndApplication
Figure 7. File Extract
Figure 8. Program Extract
System OfRecord
OperationalData Store
3
4
1
2
Front EndApplication
Extract and Load Program
JOURNAL3 | Data Transfer Strategies 50
1. Front End Application initiatesupdate to the System of Record.
2. System of Record acknowledges the update.
3. All records are extracted into anExtract File from the System of Record.
4. Extract File is refreshed into ODS.
Option 5b: Program Extract withFull RefreshFigure 8 illustrates the program-basedextraction process for bulk transfer ofdata. The following steps are executedin this process:1. Front End Application initiates
update to the System of Record.2. System of Record acknowledges
the update.3. Extract and Load program retrieves
and updates all the records from theSystem of Record into the ODS.
Unlike the File Extract, retrieval fromthe System of Record and updates intothe ODS are part of a single transactionwith no intermediate persistence of thedata. The Extract and Load program canbe triggered at fixed time intervals or on the occurrence of specific events. Forinstance, it can run four times a day,or on updates to a critical master table.While this is architecturally similar toOption 3: Incremental Batch Transfer(see Figure 5), the scope is different: here,all data from the System of Record istransferred to the ODS, rather than just an incremental change.
Usage Scenario – Large Enterprise HR DepartmentLarge international enterprises withthousands of employees have anorganizational hierarchy that isspread wide and deep across the globe.A minor change to this hierarchy can have a ripple effect across the
organizational layers. While theorganizational structure is maintainedin a single repository, it is used in aread only mode by other applicationsfrom the Operational Data Store.The organizational structure, thus,must be fully refreshed on a regularbasis in the Operational Data Store.
Option 6: ETL/ELT TransferOption 6, illustrated in Figure 9, isdata driven and most appropriatewhere substantial data scrubbing andtransformation are required as thedata are moved, e.g., for integrationinto a data warehouse or data mart.This option overlaps with both Option3: Incremental Batch Transfer andOption 5: Bulk Refresh transfers.The difference is that business logic is applied to the data while it istransported from source to targetsystems. An ETL tool is often used forthis kind of data transfer. Source datais extracted, transformed en route,and then loaded into one or more targetdatabases. The transformationsperformed on the data represent thebusiness rules of the organization. Thebusiness rules ensure that the data isstandardized, cleaned and possiblyenhanced through aggregation or othermanipulation before it is written to thetarget database(s).
ETL transfer involves the followingsteps:1. Front End Application initiates
update to the System of Record.2. ETL Transfer Program fetches
changed or bulk data from System of Record.
3. ETL Transfer Program updatesOperational Data Store.
4. An acknowledgement is sent to theFront End Application, System ofRecord and/or the ETL Transfer
Program after the Operational DataStore has been successfully updated.
The same applies to ELT transfer aswell. The difference between ETL andELT lies in the environment in whichthe data transformations are applied.In traditional ETL, the transformationtakes place when the data is en routefrom the source to the target system. InELT, the data is loaded into the targetsystem, and then transformed withinthe target system environment. Thishas become a popular option recentlywith since there are significantefficiencies that can be realized bymanipulating data within databaseenvironments (for example by usingstored procedures).
Usage Scenario – Health Care ProviderEmployers send Entitlementinformation for Employees and theirdependents to Health Care InsurancePayers on a weekly basis recording allthe changes that happened each week.The incoming data is in a formatproprietary to the Employer that needsto be converted into the Health Careprovider’s backend mainframe system’sformat. Summary records have to be
System OfRecord
OperationalData Store
4
1
4
ETLTransferProgram
2
34
Front EndApplication
Figure 9. ETL Transfer
JOURNAL3 | Data Transfer Strategies 51
created that list the number ofdependents and children that eachemployee has. ETL tools can be used to perform these format and contenttransformations in batch mode.
Option 7: Enterprise InformationIntegrationThis option is an emerging one and is similar to Business Process Review.It involves the creation of a logicalenterprise-wide data model thatrepresents the key business entitiesand their relationships in a consistent,standardized fashion. The EnterpriseInformation Integration layer wherethis model resides has the businessintelligence to do the following:– Determine the repository that has
the most accurate value for each data element.
– Construct the result set by fetchingthe right information from the rightrepository.
– Propagate updated information to all the affected repositories so thatthey are in a synchronized state all the time.
– Provide an enterprise-wide view for all the business entities.
The enterprise-wide data modelfunctions like a virtual database. Insome respects, it is a view, in relationaldatabase terms, on tables spread acrossmultiple physical databases.
As part of its information integrationresponsibilities, the EnterpriseInformation Integration (EII) layer canpropagate the information to the ODSand the System of Record ensuring that they are synchronized. This isillustrated in Figure 10.
The following execution steps are involved when the EII option is exercised:1. Front End Application initiates
update to the System of Recordthrough the EII layer.
2. EII layer updates System of Record.3. EII layer updates the Operational
Data Store.4. Upon successful completion of both
updates, the EII layer sends theacknowledgement back to the FrontEnd Application.
Compound Scenarios Apart from the Sample Scenariodescribed at the beginning of this paperand the usage scenarios describedunder each option, there are complexsituations where the various optionsfor data transfer need to be evaluatedcarefully and a combination of therelevant ones applied. These scenariosinclude, but are not limited to:– Populating a DW or an ODS with
data from operational systems– Populating data marts from a DW
or an ODS– Back propagating integrated data
into applications– Combinations of application-to-
application and application-to-ODSdata transfers
The first three of these scenarios can be handled using Business ProcessReview and/or Option 1: EAI Real-timeTransfer through Option 7: EnterpriseInformation Integration describedabove. Application to applicationscenarios involve a mix of the aboveoptions and two types are discussedhere in detail.
EnterpriseInformationIntegration
Layer
1
4
System OfRecord
OperationalData Store
2
3
Front EndApplication
Figure 10. Enterprise Information Integration
“The most appropriate option for an environment is based on the data transfer requirements and constraints specific to that environment.”
JOURNAL3 | Data Transfer Strategies 52
Option 8a: Application-to-Application Transfer with Cross-referenceOption 8a is appropriate when the EAItool must perform a simple lookupduring data transfer. For example,while transferring data from a Salesapplication (X) to a Finance application(Y), current account code based on thetransaction type in the Sales transactionmust be looked up and added to thetransaction during transfer.
The business requirement in thisscenario, graphically depicted in Figure 11, is to transfer data fromapplication X to application Y. As part of this transfer, there must bemanipulations performed on the data that require cross-reference tables (like looking up codes andtranslating into meaningful values in the target system). While real-timeEAI transfer can effect the transfer ofdata from application X to applicationY, ETL transfer can be used to transfercross-reference data from thesesystems into a cross-reference data construct (represented as XREF in the diagram).
Note: Option 5a: File Extract with FullRefresh or Option 5b: Program Extractwith Full Refresh could also be used toupdate the XREF table.
Option 8b: Application-to-Application Transfer with Static DataOption 8b represents a situation wherethe data from application X must beaugmented with data from applicationZ during transfer to application Y. Forexample, a transaction from the Salesapplication (X) must be augmented byproduct cost data from the Inventory
application (Z) during transfer into theFinance application (Y).
In this scenario, depicted in Figure 12,data is transferred from application X to Y. At the same time, updatingapplication Y also involves receivingother data from secondary applicationsthat are static – or at least relativelystatic compared to the real-time natureof transfer from X to Y. Here, EAI isused to achieve the transfer of some of the data from X to Y. ETL transfer is used to prepare and provide theadditional data that application Yrequires from a secondary application(Z) into an ODS. EAI then fetches theadditional data from the ODS topopulate application Y.
Note: Any one of Option 6: ETL/ELTTransfer through Option 7: EnterpriseInformation Integration could be usedfor the update of the ODS.
Options AnalysisThe most appropriate option for anenvironment is based on the datatransfer requirements and constraintsspecific to that environment. There areseveral procedural, architectural and
financial criteria that have to be taken into account while determiningthe most suitable option for anenvironment. This section outlines thekey criteria to be considered followedby a ranking of each option in thecontext of these criteria. While theremay very well be other applicableoptions or combinations of theseoptions as discussed under CompoundScenarios, this section focuses on thebasic options (1 through 6) describedearlier.
Business Process Review andEnterprise Information Integrationhave been excluded from the analysissince they do not actually involve thetransfer of data.
These criteria can be classified intoRequirements and Constraints asshown in Table 1. Requirements aretypically architectural in nature, drivenby business needs. Constraints definethe parameters within which thesolution must be architected keepingthe overall implementation andmaintenance effort in mind.
Y
XREF
ETLEAIX
Figure 11. A2A Transfer with Cross-Reference
ODS
ETL
Z
EAI
Y
X
Figure 12. A2A Transfer with Static Data
JOURNAL3 | Data Transfer Strategies 53
Table 2 outlines the characteristics ofOption 1: EAI Real-time Transferthrough Option 6: ETL/ELT Transferin the context of these criteria.Please note that Business ProcessReview and Option 7: EnterpriseInformation Integration have not been
analyzed in Table 2. Business ProcessReview is a revision to the existingbusiness processes that may result inthe implementation of any one of theother options. Option 7: EnterpriseInformation Integration has to do with the logical representation of
information at an enterprise level. Anyone of Option 1: EAI Real-time Transferthrough Option 6: ETL/ELT Transfermay be used in conjunction with theEII model.
Criterion Category Description
Latency Requirement How quickly is the data to be transferred?
Transformation Requirement Complexity of the transformation to be performed as part of the transfer
Volume Requirement Quantity of data exchanged during each transfer
Intrusion Constraint Degree of change to existing applications in order to effect data transfer
Effort Constraint Effort required to build and maintain the solution
Table 1. Evaluation Criteria
Table 2. Options Evaluation
Criterion
LatencyNear Near
BatchNear
Batch BatchReal Time Real time Real Time
Transformation High High MediumNot Not
HighApplicable Applicable
Volume Low Low Medium Low High High
Intrusion High Medium High Low Low Low
Effort High Medium Medium Low Low High
6 –
ET
L/E
LT
5 –
Bu
lk R
efre
shw
ith
Bat
chT
ran
sfer
4 –
Nat
ive
Rep
lica
tion
3 –
Incr
emen
tal
Bat
ch T
ran
sfer
2 –
EA
IP
rop
agat
ion
of
Incr
emen
tal
Rec
ord
s
1 –
EA
I R
eal
Tim
eT
ran
sfer
Option
E G NadhanPrincipal, EDS [email protected]
E G Nadhan is a Principal with theEDS Extended Enterprise Integrationgroup. With over 20 years of experiencein the software industry, Nadhan is
responsible for delivering integratedEAI and B2B solutions to large scalecustomers.
JOURNAL3 | Data Transfer Strategies 54
Conclusion There are many approaches available toenterprises for effecting data transferbetween and among their businessapplications. Enterprises should firstreview the Business Process to confirmthe necessity of the transfer. Onceconfirmed, there are multiple options,enabled by EAI and ETL technologies,to effect the data transfer. In some cases,a combination of options might beneeded to address the complete set of data transfer requirements within
an enterprise. The process driving such transfers should establish thetechnology and the tool employed ratherthan have the technology define theprocess. Large enterprises typicallyemploy an optimal mixture of all threestrategies: Business Process Review,EAI and ETL. Enterprise InformationIntegration is emerging as anotherviable option in this space. The rightoption or combination of options to beused for a given scenario depends uponseveral criteria, some of which are
requirements-driven while others areconstraints. This paper presents themost significant criteria to consider and provides an evaluation of eachoption based on these criteria.
Special Acknowledgement:The authors thank Carleen Christner,Managing Consultant with the EDSExtended Enterprise Integration groupfor her thorough review of the paperand the feedback she provided on thecontent and format.
Jay-Louise WeldonManaging Consultant, EDS [email protected]
Jay-Louise Weldon is a ManagingConsultant with EDS’ BusinessIntelligence Services group. Jay-Louise
has over 20 years experience withbusiness intelligence solutions anddatabase and system design.
JOURNAL3 | Messaging Patterns – Part 2 55
IntroductionIn part one of this paper published in issue 2 of JOURNAL we describedhow messaging patterns exist atdifferent levels of abstraction in SOA.Specifically, Message Type Patternswere used to describe differentvarieties of messages in SOA, MessageChannel Patterns explained messagingtransport systems and finally RoutingPatterns explained mechanisms toroute messages between the ServiceProvider and Service Consumer. In thissecond part of the paper we will coverContract Patterns that illustrate thebehavioral specifications required tomaintain smooth communicationsbetween Service Provider and ServiceConsumer and Message ConstructionPatterns that describe creation ofmessage content that travels across the messaging system.
Contracts and Information HidingAn interface contract is a publishedagreement between a service providerand a service consumer. The contractspecifies not only the arguments andreturn values that a service supplies,but also the service’s pre-conditionsand post-conditions.
Parnas and Clements best describe theprinciples of information hiding:
“Our module structure is based on thedecomposition criterion known asinformation hiding [IH]. According tothis principle, system details that arelikely to change independently shouldbe the secrets of separate modules; theonly assumptions that should appear inthe interfaces between modules arethose that are considered unlikely tochange. Each data structure is used in only one module; one or more
programs within the module maydirectly access it. Any other programthat requires information stored in amodule’s data structures must obtain it by calling access programs belongingto that module”.(Parnas and Clements 1984)[8]
Applying this statement to SOA,a service should never expose itsinternal data structures. Otherwise it causes unnecessary dependencies(tight coupling) between the serviceprovider and its consumers. Internalimplementation details are exposed bycreating an parameterized interfacedesign mapped to the service’simplementation aspects rather than to its functional aspects.
Contract PatternProblem:How can behaviors be definedindependent of implementations?
Solution:The concept of an interface contractwas added to programming languageslike C# and Java to describe a behaviorboth in syntax and semantics.Internal data semantics must bemapped into the external semantics of an independent contract. Thecontract depends only on the interface’s problem domain, not on any implementation details.
Interactions:The methods, method types, methodparameter types, and field typesprescribe the interface syntax. Thecomments, method names, and fieldnames describe the semantics of theinterface. An object can implementmultiple interfaces.
Message ConstructionThe message itself is simply some sortof data structure – such as a string,a byte array, a record, or an object.It can be interpreted simply as data,as the description of a command to be invoked on the receiver, or as thedescription of an event that occurredin the sender. When two applicationswish to exchange a piece of data, theydo so by wrapping it in a message.Message construction introduces the design issues to be considered after generating the message.In this message construction patterns catalogue we will present three important messageconstruction patterns.
Correlation IdentifierProblem:In any messaging system, a consumermight send several message requests todifferent service providers. As a resultit receives several replies. There mustbe some mechanism to correlate thereplies to the original request.
Solution:Each reply message should contain a correlation identifier; a unique id that indicates which request messagethis reply is for. This correlation id is generated based on a unique idcontaining within the request message.
Interactions:There are six parts to CorrelationIdentifier:– Requestor – Consumer application.– Replier – Service Provider. It
receives the request ID and stores it as the correlation ID in the reply.
– Request – A Message sent from theconsumer to the Service Providercontaining a request ID.
Messaging Patterns in Service Oriented Architecture – Part 2By Soumen Chatterjee
“A service should never expose its internal data structures.”
JOURNAL3 | Messaging Patterns – Part 2 56
– Reply – A Message sent from theService Provider to the Consumercontaining a correlation ID.
– Request ID – A token in the requestthat uniquely identifies the request.
– Correlation ID – A token in thereply that has the same value as therequest ID in the request.
Working Mechanism:During the creation time, a requestmessage is assigned with a request ID.When the service provider processesthe request, it saves the request ID and adds that ID to the reply as acorrelation ID. Therefore it helps toidentify request-reply matching. Acorrelation ID (and also the request ID)is usually associated with the messageheader of a message rather than thebody and can be treated as a metadataof the message.
Message SequenceProblem:Because of the inherent distributednature of messaging, communicationgenerally occurs over a network.You must utilize appropriate network bandwidth, maintaining best performance. In certain scenarios(such as sending a list of invoices for aparticular customer) you might need tosend large amounts of data (100 MB ormore). In such cases it is recommendedto divide the data into smaller chunks,
and send the data as a set of messages.The problem is, how to rearrange thedata chunks to form the whole set.
Solution:Use a message sequence, and markeach message with sequenceidentification fields.
Interaction:The three message sequenceidentification fields are:– Sequence ID – Used to differentiate
one sequence from other.– Position ID – A relative unique ID
to identify a message position withina particular sequence.
– End of Sequence indicator – Usedto indicate the end of a sequence.
The sequences are typically designedsuch that each message in a sequenceindicates the total size of the sequence;that is, the number of messages in thesequence (see Figure 26).
As an alternative, you can design thesequences such that each messageindicates whether it is the last messagein that sequence (see Figure 27).
Let’s take a real life example. Supposewe want to generate a report for allinvoices from 01/01/2001 to 31/12/2003.This might return millions of records.To handle this scenario, divide thetimeframe into quarters and returndata for each quarter. The sender sendsthe quarterly data as messages, andthe receiver uses the sequence numberto reassemble the data and identifiesthe completion of received data basedon End of Sequence indicator.
Message ExpirationProblem:Messages are stored on disk orpersistent media. With the growingnumber of messages, disk space isconsumed. At the end of messaging lifecycle, messages should be expired anddestroyed to reclaim disk space.
Consumer ServiceProvider
Request
Message ID
Correlation ID
Reply
3 2
1 2 3
1
Figure 25: Correlation Identifier
Sequence 9
Position 0
End F
Message Body
Sequence 9
Position 1
End F
Message Body
Sequence 9
Position n-1
size T
Message Body
Sequence #9
Figure 27: Message Sequence with message end indicator
Sequence 1
Position 1
size 1
Message Body
Sequence 1
Position 2
size n
Message Body
Sequence 1
Position n
size n
Message Body
Sequence #1
Figure 26: Message Sequence indicating size
“Message construction introduces the design issues to be considered after generating the message. Sender and receiver disagreements on the format of the message are reconciled by message transformation.”
JOURNAL3 | Messaging Patterns – Part 2 57
Solution:Set the message expiration to specify atime limit for preservation of messageson persisting media.
Interaction:A message expiration is a timestamp(date and time) that decides lifetime of the message.
When a message expires, the messagingsystem might simply discard it or moveit to a dead letter channel.
Message TransformationVarious applications might not agreeon the format for the same conceptualdata; the sender formats the messageone way, but the receiver expects it tobe formatted another way. To reconcilethis, the message must go through anintermediate conversion procedure thatconverts the message from one formatto another. Message transformation
might involve data change (dataaddition, data removal, or temporarydata removal) in existing nodes byimplementing business rules. Sometimesit might enrich an empty node as well.Here we present few importantmessage transformation patterns.
Envelope WrapperProblem:When one message format isencapsulated inside another, thesystem might not be able to access nodedata. Most messaging systems allowcomponents (for example, a content-based router) to access only data fieldsthat are part of the defined messageheader. If one message is packaged intoa data field inside another message, thecomponent might not be able to use thefields to perform routing or businessrule based transformation. Therefore,some data fields might have to beelevated from the original message
into the message header of the newmessage format.
Solutions:Use an envelope wrapper to wrap datainside an envelope that is compliantwith the messaging infrastructure.Unwrap the message when it arrives at the destination.
Interactions:The process of wrapping and unwrappinga message consists of five steps:1. The message source publishes a
message dependent on raw format.2. The wrapper takes the raw message
and transforms it into a messageformat that complies with themessaging system. This may includeadding message header fields,encrypting the message, addingsecurity credentials etc.
3. The messaging system processes the compliant messages.
Consumer
ExpiredMessage
Reroute MessageDelivery
MessageExpires
Request
IntendedServiceProvider
Channel
Dead LetterChannel
Figure 28: Message Expiration
Wrapper Messaging System
Unwrapper
Request
Reply
1 2 3
4
5
Figure 29: Envelope Wrapper
JOURNAL3 | Messaging Patterns – Part 2 58
4. A resulting message is delivered to the unwrapper. The unwrapperreverses any modifications thewrapper made. This may includeremoving header fields, decryptingthe message or verifying securitycredentials.
5. The message consumer receives a ‘clear text’ message.
An envelope typically wraps both themessage header and the message bodyor payload. We can think of the headeras being the information on the outsideof the envelope – it is used by themessaging system to route and trackthe message. The contents of theenvelope are the payload, or body – the messaging infrastructure does not care about it until it arrives at the destination.
Content EnricherProblem:Let’s consider the example. An onlineloan processing system receivesinformation including a customer creditcard number and an SSN. In order tocomplete the approval process, it needsto perform a complete credit historycheck. However, this loan processingsystem doesn’t have the credit historydata. How do we communicate withanother system if the messageoriginator does not have all therequired data fields available?
Solution:Use a specialized transformer, a contentenricher, to access an external datasource in order to enrich a messagewith missing information.
Interactions:The content enricher uses embeddedinformation inside the incomingmessage to retrieve data from an
external source. After the successfulretrieval of the required data from the resource, it appends the data to the message.
The content enricher is used in manyoccasions to resolve reference IDscontained in a message. In order tokeep messages small, manageable, andeasy to transport, very often we justpass simple object references or keysrather than passing a complete objectwith all data elements. The contentenricher retrieves the required databased on the object references includedin the original message.
Content FilterProblem:The content enricher helps us insituations where a message receiverrequires more (or different) dataelements than are contained in theoriginal message. There aresurprisingly many situations where thereverse is desired; the removal dataelements from a message. The reasonbehind data removal from the originalmessage is to simplify messagehandling, remove sensitive security
data, and to reduce network traffic.Therefore, we need to simplify theincoming documents to include only theelements we are actually interested.
Solution:Use a content filter to removeunimportant data items from a message.
Interactions:The content filter not only removes dataelements but also simplify the messagestructure. Many messages originatingfrom external systems or packagedservices contain multi-levels of nested,repeating groups because they aremodeled after generic, normalizeddatabase structures. The content filterflattens this complex nested messagehierarchy. Multiple content filters canbe used as a to break one complexmessage into individual messages thateach deal with a certain aspect of thelarge message.
Claim CheckProblem:A content enricher enriches messagedata and a content filter removesunneeded data items from a message.Sometimes however, the scenario mightbe little different. Moving largeamounts of data via messages might be inefficient due to network limitationor hard limits of message size, so wemight need to temporarily removefields for specific processing stepswhere they are not required, and add them back into the message at a later point.
Solution:Store message data in a persistentstore and pass a claim check tosubsequent components. Thesecomponents can use the claim check
Simple Message
Message Enricher
Resource
EnrichedMessage
Figure 30: Content Enricher
Simple Message
Content Filter
Filtered Message
Figure 31: Content Filter
JOURNAL3 | Messaging Patterns – Part 2 59
to retrieve the stored information using a content enricher.
Interactions:The Claim Check pattern consists of the following five steps:1. A message with data arrives.2. The ‘check luggage’ component
generates a unique key that is usedin later stage as the claim check
3. The check luggage componentextracts the data based on a uniquekey from the persistent store.
4. It removes the persisted data fromthe message and adds the claimcheck.
5. The checked data is retrieved byusing a content enricher to retrievethe data based on the claim check.
This process is analogous to a luggagecheck at the airport. If you do not wantto carry your luggage with you, yousimply check it with the airline counter.In return you receive a sticker on yourticket that has a reference number that uniquely identifies each piece ofluggage you checked. Once you reachyour final destination, you can retrieveyour luggage.
ConclusionSOA stresses interoperability, theability to communicate differentplatforms and languages with eachother. Today’s enterprise needs atechnology-neutral fabricated solutionto orchestrate the business processesacross the verticals. The SOA, then,presents a shift from the traditionalparadigm of enterprise applicationintegration (EAI) where automationof a business process required specificconnectivity between applications.According to Robert Shimp, vicepresident of Technology Marketing at Oracle:
“EAI requires specific knowledge ofwhat each application provided aheadof time. SOA views each application asa service provider and enables dynamicintrospection of services via a commonservice directory, Universal DescriptionDiscovery and Integration of Webservices (UDDI).” [10]
Messaging is the backbone of SOA.Steven Cheah, director of SoftwareEngineering and Architecture atMicrosoft Singapore, states:
“We now finally have a standard vehiclefor achieving SOA. We can now definethe message standards for SOA usingthese Web services standards.”[10]
Cheah considers SOA ‘a refinement ofEAI’. Specifically, SOA recommendssome principles, which actually helpachieve better application integration.These principles include the descriptionof services by the business functionsthey perform; the presentation ofservices as loosely-coupled functionswith the details of their inner workingsnot visible to parties who wish to usethem; the use of messages as the onlyway ‘in’ or ‘out’ of the services; andfederated control of the SOA acrossorganizational domains, with no singleparty having total control of it.
We started at the ten thousand footlevel with a vision of service-orientedenterprise. We then descended downthrough a common architecture (SOA)and proceeded by outlining messaging.Now, we are armed with the necessarymessaging patterns valuable to attackthe SOA complexities and to achievethe vision of dynamic process orientedservice bus enterprise.
“SOA is ‘a refinement of EAI’. Specifically, SOArecommends some principles, which actuallyhelp achieve better application integration.”
Simple Message Check luggage
Data Storage
Message withClaim Check
1 2
ContentEnricher Enriched
Message
54
3
Figure 32: Claim Check
JOURNAL3 | Messaging Patterns – Part 2 60
Copyright DeclarationG Hohpe & B Woolf, ENTERPRISEINTEGRATION PATTERNS, (adaptedmaterial from pages 59-83), © 2004Pearson Education, Inc. Reproduced bypermission of Pearson Education, Inc.Publishing as Pearson Addison Wesley.All rights reserved.
References1. Enterprise Integration Patterns:
Designing, Building, and DeployingMessaging Solutions, Gregor Hohpeand Bobby Woolf, Addison-Wesley,2004
2. Service Oriented architecture:A Primer, Michael S Pallos, EAIJournal, December 2001
3. Solving Information IntegrationChallenges in a Service-OrientedEnterprise, ZapThink Whitepaper,http://www.zapthink.com
4. SOA and EAI, De Gamma Website,http://www.2gamma.com/en/produit/soa/eai.asp
5. Introduction to Service-OrientedProgramming, Guy Bieber and JeffCarpenter, Project Openwings,Motorola ISD, 2002
6. Java Web Services Architecture,James McGovern, Sameer Tyagi,Michael Stevens, and Sunil Mathew,Morgan Kaufman Press, 2003
7. Using Service-Oriented Architectureand Component-Based Developmentto Build Web Service Applications,Alan Brown, Simon Johnston, andKevin Kelly, IBM, June 2003
8. The Modular Structure of ComplexSystems, Parnas D and Clements P,IEEE Journal, 1984
9. Design Patterns: Elements ofReusable Object-Oriented Software,Gamma E, Helm R, Johnson R, andVlissides J, Addison-Wesley, 1994
10. Computerworld Year-End Special:2004 Unplugged, Vol. 10, Issue No.10, 15 December 2003 - 6 January2004,http://www.computerworld.com.sg/pcwsg.nsf/currentfp/fp
11. Applying UML and Patterns – An introduction to OOA/D and theUnified Process, Craig Larman,2001
Soumen Chatterjee,[email protected]
Soumen is a Microsoft CertifiedProfessional and Sun CertifiedEnterprise Architect. He's significantlyinvolved in enterprise applicationintegration and distributed objectoriented system development usingJava/J2EE technology to serve globalgiants in the finance and health careindustries. With expertise in EAIdesign patterns, messaging patterns
and testing strategies he designs and develops scalable, reusable,maintainable and performance tunedEAI architectures. Soumen is anadmirer of extreme programmingmethodology and has primary interestsin AOP and EAI. Besides softwareSoumen likes movies, music andfollows mind power technologies.
Microsoft is a registered trademark of Microsoft Corporation
JOURNAL3
Executive Editor & Program ManagerArvindra SehmiArchitect, Developer and Platform Evangelism Group, Microsoft EMEAwww.thearchitectexchange.com/asehmi
Managing EditorGraeme MalcolmPrincipal Technologist,Content Master Ltd
Editorial BoardChristopher Baldwin Principal Consultant, Developer and Platform Evangelism Group,Microsoft EMEAGianpaolo CarraroArchitect, Developer and Platform Evangelism Group, Microsoft EMEASimon Guest Program Manager, Developer and Platform Evangelism Group,Architecture Strategy,Microsoft Corporationwww.simonguest.com
Wilfried GrommenGeneral Manager, Business Strategy,Microsoft EMEARichard Hughes Program Manager, Developer and Platform Evangelism Group,Architecture Strategy, MicrosoftCorporationNeil Hutson Director of Windows Evangelism,Developer and Platform EvangelismGroup, Microsoft CorporationEugenio PaceProgram Manager,Platform Architecture Group,Microsoft CorporationHarry Pierson architect, Developer and Platform Evangelism Group,Architecture Strategy,Microsoft Corporationdevhawk.netMichael PlattArchitect, Developer and Platform Evangelism Group, Microsoft Ltdblogs.msdn.com/michael_plattPhilip TealePartner Strategy Manager, Enterprise Partner Group, Microsoft Ltd
Project ManagementContent Master Ltdwww.contentmaster.com
Design Directionventurethree, Londonwww.venturethree.comOrb Solutions, Londonwww.orb-solutions.com
OrchestrationKatharine PikeWW Architect Programs Manager,Developer and Platform EvangelismGroup, Architecture Strategy,Microsoft Corporation
Foreword ContributorHarry Piersonarchitect, Developer and Platform Evangelism Group,Architecture Strategy,Microsoft Corporationdevhawk.net
The information contained in this Microsoft® Architects Journal (‘Journal’) is for information purposes only. The material in the Journal does not constitute the opinion of Microsoft or Microsoft’s advice and you should not relyon any material in this Journal without seeking independent advice. Microsoft does not make any warranty or representation as to the accuracy or fitness for purpose of any material in this Journal and in no event does
Microsoft accept liability of any description, including liability for negligence (except for personal injury or death), for any damages or losses (including, without limitation, loss of business, revenue, profits, or consequential loss)whatsoever resulting from use of this Journal. The Journal may contain technical inaccuracies and typographical errors. The Journal may be updated from time to time and may at times be out of date. Microsoft accepts no
responsibility for keeping the information in this Journal up to date or liability for any failure to do so. This Journal contains material submitted and created by third parties. To the maximum extent permitted by applicablelaw, Microsoft excludes all liability for any illegality arising from or error, omission or inaccuracy in this Journal and Microsoft takes no responsibility for such third party material.
All copyright, trade marks and other intellectual property rights in the material contained in the Journal belong, or are licenced to, Microsoft Corporation. Copyright © 2003 All rights reserved. You may not copy, reproduce,transmit, store, adapt or modify the layout or content of this Journal without the prior written consent of Microsoft Corporation and the individual authors. Unless otherwise specified, the authors of the literary and artistic
works in this Journal have asserted their moral right pursuant to Section 77 of the Copyright Designs and Patents Act 1988 to be identified as the author of those works.