introduction to content engineering
DESCRIPTION
This is an introductory tutorial that presents, in a whirlwind fashion, the core concepts underlying Content Engineering.TRANSCRIPT
Copyright © Stilo International plc 2008
An Introduction to Content Engineering
Joe Gollner VP e-Publishing [email protected]
Introduction to Content Engineering: Topics
What is Content?
Content Engineering & the Content Processing Roadmap
The Business Context of Content Engineering
Aims:Establish the nature of, and need for, Content EngineeringDefine a rubric of terminology for the tools and techniques that constitute a practical working framework for discussing, designing, developing and deployingcontent management and processing systems
What is Content?
Content is how we Communicate
Narrative StructuresImplied Associations
Associative MemoryAcquired PerspectivesImperfect Expression
Associative MemoryAcquired PerspectivesImperfect Interpretation
Content is the physical formof human communication
Content is meaningfulbecause it entails context
Content is typically serializeddue to the ways we
express, store and interpret information
The Document as the Popular Face of ContentThe document has proven to be a
powerful device for communicating and retaining content
While documents provide effective physical containers for content, they also lead to multiple modes of exchange and potential obsolescence
Content is EverywhereThis has been true since the dawn of
civilization and its importance grows daily
Content populates an ecosystem where people receive, internalize, modify, create and share that content. Content connects everything.
The Truth about Content
We are faced with:Massively expanding content volumesDiversifying venues for content deliveryProliferating format varietiesRising expectations of usersEscalating specialization of contentEvolving interconnectedness of contentMultiplying problems related to content securityContinuing lifecycle challenges (obsolescence remains a risk)Increasing complexity of content
(the reintegration of data & documents)Growing recognition of the central importance of content
What Lies Ahead?
What are the biggest challenges you face today in managing and using content?
What do you suspect will be the biggest challenge you will be facingin the next five years?
What are the opportunities emergingto leverage content in your business?
An Essential Response: Content Engineering
Working DefinitionThe application of rigorous engineering discipline to the design, development and deployment of content management and processing systems
Distinguishing FeaturesSystematic approachProgressive use of technologyAwareness of
Lifecycle considerationsTotal cost of ownershipSolution scalability
Engineering and ContentOrganizing work
Laying outwork spacesSequencing of process stepsOptimizing tasksRefining toolsImproving materialsTransferring results between stagesSharing resourcesPerforming maintenanceTroubleshootingproblems
Differential Analyzer – Vannevar Bush (1930s)
Content EngineeringContent Engineering
Governing disciplineGoal-directed
Content ManagementProtect Value
Content ProcessingEnhance Value
PeopleCreate Value
PlanningDesigningAuthoringEditing
Content Management ComponentsContent Management
ControlOrganize resources, access and lifecycleChangeFacilitate the evolution of content and the associated servicesDeployEnable the servicesthe content makespossible
Control Change Deploy
Content Management and Content Processing
A Close RelationshipCM cannot exist without content processing services
Expanding CM services demands more processing
The sophistication of the processing functions increases more rapidly than management functions
Many CMS solutions are constrained by weakcontent processing capabilities
Content Processing Components
Content ProcessingConvertTransformPublish
Key Focus in Content Engineering
Content Processing ComponentsContent Processing
ConvertTransformPublish
TransformationBreaks down into
RefactorRelateCollectResolveCompile
Emphasis on leveraging efficient automation
The Content Processing RoadmapACQUIRE ENRICH DELIVER
CONTEXT
CONNECTIONS
ContentProcessing
ContentProcessing
Convert
Refactor
Collect
Relate
Import
Import
Select
Select
ManageImport Select
Metadata
Links
PublishCONTENT
Resolve
Compile
Convert ContentACQUIRE ENRICH DELIVER
CONTEXT
CONNECTIONS
ContentProcessing
ContentProcessing Refactor Relate
Import
Import
Select
Select
ManageImport Select
Links
PublishCONTENT
Resolve
CompileCollect
Metadata
Convert
Converting Content
?
Conversion: changing the format of legacy content to make it increasinglysuitable for efficient management, revision, reuse and publishing.
The Harsh Reality of Legacy ContentLegacy Content
All content resources that modification in order to be useful
The Legacy Content SpectrumOpaque
Not directly processable (e.g., paper)Annoying
Aggressively proprietaryLittle or no predictability in usage
PollutedNormally processable but frequentlyfilled with deviations & additions (HTML)
TolerableDocumented format that exposes format& structure in a processable form
Conversion Fundamentals
Conversion is unavoidable and always under-estimated
Conversion is fundamentally a matter of interpretationParsing the legacy format & layoutInferring a meaning from this informationCorrelating the format & layout to a target structureAddressing problems introduced by format peculiaritiesLeveraging the content itself to guide format interpretation Enhancing interpretive rules by matching content patterns
Automating conversion typically relies on two stages:Format Interpreter that can make sense of source formattingRules-based Correlation Processor that maps content into structures
Conversion Process Template
Interaction
Modify Conversion
Process
Source Analysis
Source to Target
Mapping
SubjectMatterExperts
Execute Conversion
Process
Result Analysis
Identified Issues
Validation &Verification
ApplicationTests
Interaction
LegacySourceContent
ModifiedConversion
Rules
ExampleSet
SampleSet 10%
CompleteSet 100%
1
2
3
Target XML
Schema
ManualEditing
Guidance
Complete
ExistingConversion
Rules
Refactor ContentACQUIRE ENRICH DELIVER
CONTEXT
CONNECTIONS
ContentProcessing
ContentProcessing
Convert
Relate
Import
Import
Select
Select
ManageImport Select
Links
PublishCONTENT
Resolve
CompileCollect
Metadata
Refactor
Refactoring Content
Refactoring: restructuring content, without loss of meaning, to improve itssuitability for management, maintenance and specifically reuse.
Aspects of RefactoringRefactoring breaks down into two tasks
BurstingNormalization
Content BurstingDecomposing content into components optimized for reuse
Content NormalizationSystematic removal of redundancies to improve maintainability
ChallengesEnsuring content components remain meaningful & manageableMaintaining a complete equivalence with the originalAdapting the linking mechanisms so they remain valid and functional
Usually entails introduction of an indirect referencing scheme
Refactoring Strategies
Strategy needed to ensure adequate returns on investmentRefactor content that undergoes the highest rates of change first
Con
vers
ion
Com
pare
Out
puts
Out
puts
Collect MetadataACQUIRE ENRICH DELIVER
CONTEXT
CONNECTIONS
ContentProcessing
ContentProcessing
Convert
Refactor Relate
Import
Import
Select
Select
ManageImport Select
Links
PublishCONTENT
Resolve
CompileCollect
Metadata
Collecting Metadata
Metadata: a set of data that provides information about other data.Collecting Metadata: extracting, validating, integrating, supplementing, synchronizing and storing metadata from, and about, the content.
The Function of MetadataMetadata is used to make the context of content explicit
Used to facilitate Control
SecurityLimitation of rights
Orderly storage & retrievalDiscovery
SearchingNavigating
Exchange
Surprisingly important pointThe boundary between metadata and content is never completely clear Yale University Library
The Storage of Metadata
Useful Design Pattern: Detachable MetadataKey metadata clustered into a document sub-componentShareable amongst many usesIncorporated into documentwhen important to do so &only then
Ontologies, Taxonomies & Metadata
The Meaning of MetadataMetadata categories and values relate content to aspects of an OntologyThe Ontology provides the context for metadata
OntologiesDescribe a domain of knowledgeCan be used as the basis of:
Taxonomies (classification schemes)Link networksContext driven navigational aids
Taxonomy
metadata
metadata
Link Network
Ontology
Topic
Topic
Topic
Topic
Establish RelationshipsACQUIRE ENRICH DELIVER
CONTEXT
CONNECTIONS
ContentProcessing
ContentProcessing
Convert
Refactor
Import
Import
Select
Select
ManageImport Select PublishCONTENT
Resolve
CompileCollect
Metadata
Relate
Links
Establishing Relationships
Explicit Links (Actual)
Identifier Source Target Type
A1
A2
Implicit Links (Potential)
Identifier Source Target Type
B1
B2
Reuse Links (Physical)
Identifier Resource Request Condition
R1
R2
Links: the connections or relationships between things that represent a significant portion of the meaning and value of content
Link Management
Increasingly importantIncreasingly complexLink Analysis
Significant processingLeverages external storage of links& link metadata
Link generationbecoming critical
metadata
metadataOutbound Link
Transclusion Link
Inbound Link
Link Base
Bidirectional External Link
Link Analysis:Outbound Links: Intact or brokenTransclusions: Where usedInbound Links: Track-back / Where citedExternal Links: Network participation
Deliver ContentACQUIRE ENRICH DELIVER
CONTEXT
CONNECTIONS
ContentProcessing
ContentProcessing
Convert
Refactor Relate
Import
Import
Select
Select
ManageImport Select
Links
CONTENT
Collect
Metadata
Compile
Publish
Resolve
Delivering Content
Resolve: assemble content and instantiate applicable relationshipsCompile: convert resolved content into a form suitable for renditionPublish: render the content in the forms required by the context
Resolve
Compile Publish
The Goal: High Fidelity Automation
Delivery ProcessingAssembling the inputs
Content requestedSupporting assetsApplicable stylesheets & rules
Resolve into a processable wholeCompile formattable content representationsPublish final formatted renditions
Print Publishing(PDF)
Deliver- Resolve- Compile- Publish
Web Publishing(Portal / Portable)
Content
Res
olve
Publish
Output Web Products
Output Print Products
XHTML
TemplatesOutput Plan
(Map & View)
Assets
Rules
Out
put V
aria
nts
Ren
der
Tran
sfor
mat
ions
Compile
Content
Content Processing & Validation
ValidationEssential capabilityEnables consistent processingStreamlines processes
Validation must beAccurateManageableInformativeActionablePro-activeContinuously improving
Validate & Transform: SimpleContent Validation
DTD structural rulesInstance conformance
Content TransformationTraditionally focused on arranging content for formattingSupporting primarily structural manipulation
Validated OutputsInputs to rendition processesHTML outputsXML outputs
Validate & Transform: ComplexContent Validation & Verification
Schema structural rulesRules governing content valuesInstance conformance
Content TransformationContinuous process of improvementParse, validate, align, verify…repeatManipulation of many content types
Validated OutputsInputs to rendition processesHTML outputsXML outputsData outputs for applications
TransformationProcessing
Outputs
ContentInstance
Schema Rules
Structure Validation Content Verification
Complexity and the Cost of Quality
Complexity is inherent in the nature of content
Increasing content complexity increases the amount and sophistication of content processing tasks
Increases in content processing tasks results in a significant increase in the total cost of quality
Solution ArchitecturesAssemblescomponentsto provideintegratedservices
Technologyselection &integration
Standardsselection &integration
Multiple solution instances will exist
SolutionArchitectures
Content Engineering
Content Processing
Content Management
Convert Transform Publish
Relate
Refactor Collect Compile
Resolve
Validate
Managing Solution Risk
Integration risk representsThe potential loss of servicesThe potential loss of assets
Integration risk increases with the increase in the number of technologies used to build a solution
System complexityCan be managed Ultimately limits solution affordability and even viabilityAddressed in design selections
Technology Selection
Key ConsiderationsSolution contextScored against requirementsScoring scale
0 – No Fit6 – Total Fit
Results weighedagainst acquisition cost
Technology Lifecycle Considerations
Solution context includesUrgencyComplexityCriticalityConstraints
Projected lifecycleExpected lifespanRate of changeInfluencing factors
Low
High
High
Time
High
High
Complexity
Measuring Overall Productivity over Time
Solution Component Dependencies
MediaSources
ProcessRules
StyleSheetsABC
ProcessingScripts
DocumentTemplates Data
Sources
Relationships
A
LogReports
Because all components within a solution evolve their inter-dependencies require explicit description and management.
Schemas
QualityReports
StructureMaps
AnalysisReports
Bx y.. .... ..
Import Sources
Content Files<X>
ConfigurationFiles
Evaluating Standards as Potential ToolsIndependence
From parochial interests, proprietary claims, external influences
FormalityOf creation, validation, approval & modification process
StabilityOf standard over time & the backward compatibility of changes
CompletenessSufficiency for declared scope as well as availability of useful documentation & reference implementations
AdoptionExtent of support amongst tool vendors, authorities & users
PracticalityThe extent to which all, or parts, of the standard can be deployed
Evaluating a Specialized Industry Standard
ScenarioIndustry specificationBroad scopeSpecialized stakeholder communityContinuouslychanging & expanding
StrategyImplement where necessaryAddress risk areas
Evaluating a Cross-Industry Standard
ScenarioAddressing widespread issuesBroad stakeholder communityMatureFurther capabilities emerging
StrategyPlan for adoptionConsider for use in variety of areas
Content Solution Architecture Framework
Content Architecture
Enterprise
Programs Domains
Document Sources
Ontology Sources
Data Sources
Active
External
Inputs Outputs
Authors
Subject Matter Experts
Administrators
Information Architects
Developers
Content Management
Content Processing
Content Authoring
Development Tools
Web Services
Resources
Budget
Personnel
Infrastructure
Mechanisms
Con
trols
Spe
cial
ized
Mod
els
Rul
es
Users Tools
Legacy
Publishing Services
Discovery Services
Data Services
Web
ApplicationInte
grat
e
Content ArchitectureEstablishesgoverning modelof the knowledgedomain
The knowledgethat has informedthe content
The knowledgebeing encapsulatedin the solutions
Supports multiplesolution instances
Content Architecture
SolutionArchitectures
Content Engineering
Content Processing
Content Management
Convert Transform Publish
Relate
Refactor Collect Compile
Resolve
Validate
The Central Role of the Content Architecture
Concept Reference
Effectivity
SpecializedInformation Types
Specialized Domains
TaskData
Data
Data
Data
Description
Data
Data
Data Data
Data
Data
Data
Description
Description
Description
DataData
SpecializedDelivery Processes
Procedure
Topic
FormattingAnnotation
Change
Procedure
Procedure
Procedure
Procedure
SpecializedTaxonomies
Service Requirements
ContentArchitecture
Procedure
Procedure
Discovery Requirements
Content Solution Design Principles
The nature of content demands an adaptable architecture
Technology components should be loosely-coupledContent must always be available in its simplest self-describing form
Data stores should be replaceable by stored instancesTrue for content, metadata and links
Content processing events can be performed many waysSimple methods must be present, sophisticated methods may be
All interfaces established as the exchange of validated contentProcessing rules are, themselves, managed & processable content
Content Processing should be extensively leveragedContent validation, analysis and reporting at every stage Used to manage & optimize solution components to improve efficiency
Content Engineering Maturity Model
Modeled on the Software Engineering Institutes (SEI)Capability Maturity Model Integration (CMMI)
“managed” used instead of “quantitatively managed” for level 4“repeated” used instead of “managed” for level 2“reactive” used instead of “performed” for level 1
ObjectiveFollow softwareengineering inemphasizing theimportance of formalization &quantitative methodsfor continuousimprovement
Optimized
Managed
Defined
Repeated
Reactive
Incomplete
5
4
3
2
1
0
LevelContent Engineering Maturity Model
CE Maturity Model: Level 0 Incomplete
IncompleteOften the complete absence of a documented processA process that is documented but not followed also qualifies
FeaturesNew requirementsaddressed usingavailable toolsEach solution seeks cost minimizationNo persistentinfrastructureNo improvementbetween projects
CE Maturity Model: Level 1 Reactive
ReactiveA process exists for specific goalsSufficient for the needs of selected productsNot institutionalized and not integrated with institutional processes
FeaturesNot designed tohandle new orchanging requirementsCan result in multiple solutionseach created as areaction
Optimized
Managed
Defined
Repeated
Reactive
Incomplete
5
4
3
2
1
0
LevelContent Engineering Maturity Model
CE Maturity Model: Level 2 Repeated
RepeatedA managed process exists and is supported by basic infrastructurePredictability can be achieved in process performance & productsReviews are conducted to identify & initiate improvements
FeaturesA common set of tools has been selectedProcedures exist for stepsSolution componentsdocumented
Optimized
Managed
Defined
Repeated
Reactive
Incomplete
5
4
3
2
1
0
LevelContent Engineering Maturity Model
CE Maturity Model: Level 3 Defined
DefinedStandardization in processes established on an institutional levelCommon tools & techniques used across processes & projects
FeaturesA single infrastructure usedto support multipleprocesses & projectsProcesses definedwith reference toenterprise modelsInterrelationships are known
Optimized
Managed
Defined
Repeated
Reactive
Incomplete
5
4
3
2
1
0
LevelContent Engineering Maturity Model
CE Maturity Model: Level 4 Managed
ManagedProcesses are managed using quantitative measurementAutomation is maximized in the execution of process stepsA single integrated & managed environment supports all processes
FeaturesInfrastructure components managed as contentwith automation used to adapt behaviourHigh levels ofquality sustained
Optimized
Managed
Defined
Repeated
Reactive
Incomplete
5
4
3
2
1
0
LevelContent Engineering Maturity Model
CE Maturity Model: Level 5 Optimized
OptimizedContinuous orientation towards improvementContinuous refactoring of solution and content to achieve efficienciesContinuous identification & implementation of heightened standards
FeaturesSystematic analysis& correction of variationsProactive identification of newproducts & servicesthat can be offeredIndustry innovation
Optimized
Managed
Defined
Repeated
Reactive
Incomplete
5
4
3
2
1
0
LevelContent Engineering Maturity Model
General ObservationsContent is inherently complex
Current trends have moved content to the center of attention
Content Engineering is an essential responseProvides the necessary discipline & the conceptual frameworkContent has not typically received this level of attention in the past
Effective Content Processing is central to successContent Management services are enabled by content processesAdaptive content processing is essential for addressing change
Effective Content Solutions are designed to cover the complete content lifecycle and all stakeholder perspectives
The efficient management and processing of content remains an elusive goal for most organizations
Content Engineering and Business Value
The design of Content Solutions shouldContinuously minimize the costs of acquiring, enriching, managing and delivering contentContinuously improve contentresources through enrichmentContinuously increase the benefits realized throughthe delivery of contentContinuously reduce risksthreatening content assets or the services being supported
Each of these represents an increase in value
Top Ten Secrets of Content Solution SuccessDon’t underestimate your content or your businessDon’t underestimate the power of good automationChose an appropriate tool set and validate your choicesDon’t invest in content management technology too earlyCarefully plan and execute migration activitiesTake a “customer service” focus in delivering tangiblebenefits (new products / services) from your investmentsBe demanding of your suppliers (expect quality)Engage your stakeholders and “take control” of the solutionLeverage standards, don’t be enslaved by themBe an active part of the community as a way to learn and as a way to share what you have learned
The End
Admittedly an awful lot to cover ina single go. Hopefully some of the ideas connect with some of your experiences and perhaps help in framing aspects of your next project.
Joe GollnerVP e-Publishing SolutionsStilo International