Download - Data: Application requirements, data flow, and person registry Tom Barton University of Chicago
Data: Application Data: Application requirements, data flow, and requirements, data flow, and
person registryperson registry
Tom Barton
University of Chicago
CAMP Directory Workshop Feb 3-6, 2004
Copyright Tom Barton 2004. This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.
CAMP Directory Workshop Feb 3-6, 2004
OutlineOutline
Three stages of managing identity information1. Feeding the person registry - integrating identity
from many authoritative sources
2. Processes & business logic at the person registry
3. Feeding consumers of identity information
Some examples sprinkled in Selected policy & process issues (time
permitting)
CAMP Directory Workshop Feb 3-6, 2004
Core middleware for an Core middleware for an integrated architectureintegrated architecture
CAMP Directory Workshop Feb 3-6, 2004
Potential sources of identity infoPotential sources of identity info
“Big” administrative systems: student systems, payroll/HR systems, academic records systems, financials, telecom mgmt system, alumni systems, library systems, …
“Small” sources: affiliated organizations with fairly simple administrative operations (excel?)
Collateral operational systems: application-specific directories/databases, NOS directories, campus card systems, other metadirectory/ID Mgmt operations
People’s heads: “ad hoc” affiliations, self, proxies
CAMP Directory Workshop Feb 3-6, 2004
UofC sources:UofC sources:nownow
Student info & campus card system by live RDBMS views
Payroll & faculty by periodic batches Dozen or so “small feeds” by aperiodic upload Self Trusted Agents to make temporary and “pre-
feed” accounts 370 or so departmental directory reviewers Network security group
CAMP Directory Workshop Feb 3-6, 2004
UofC sources:UofC sources:planning or earnest discussionplanning or earnest discussion
Feed from UC Hospitals Alumni system Select distributed IT support staff (mail
& password resets) Potentially anyone to manage ad hoc
groups
CAMP Directory Workshop Feb 3-6, 2004
Feed mechanicsFeed mechanics
Source system selection criteria– Express the set of affiliation types or constituencies
authoritatively represented in the source– Affiliation indicator attributes
Format & transmission technology– Complete selections vs. differentials vs. transactions– Automated vs. semi-manual (eg, maildrop) vs. manual– scp flatfiles, live views, varieties of EAI (what are you
using?)– Actual metadirectory products (what are you using?)– Ad hoc record structure, XML (what are you doing?)
CAMP Directory Workshop Feb 3-6, 2004
Identity MatchingIdentity Matching Matching strategies
– Match personal IDs for each source record– Per-source shared identifier with prior matching– Broadly used institutional identifier with prior matching
The query “is this person new” is resolved somewhere, somehow. – Inaccurate answers spoil 1–1 relationship between
registry objects and real world subjects– It’s worthwhile to think on how to improve it!
Insert “rational” ID Mgmt spiel here …
CAMP Directory Workshop Feb 3-6, 2004
Identity matching at UofC:Identity matching at UofC:nownow
SSN StudentID (after prior match by SSN) “CorpID” (mangling of substrings of
lastname, SSN) Several options for identifying “self” as
authoritative source
CAMP Directory Workshop Feb 3-6, 2004
Identity matching at UofC:Identity matching at UofC:upcoming (dose of rationality upcoming (dose of rationality ))
UCID (SSN replacement) assigned as unique key in payroll & student systems at record creation time
Person registry is authoritative source of UCID “Is this person new” is answered when a new
record is to be created in payroll or student systems
Tightly-coupled and loosely-coupled designs are being considered
UC Hospitals feed might also use a similar design
CAMP Directory Workshop Feb 3-6, 2004
CanonicalizationCanonicalization
Provide simpler, consistent representation of certain data– Name– Phone number(s)– Address(es)– Department names– Names of “major” affiliations
Transformation rules and business logic – Which source trumps name– Phone & address mappings– Rules to determine expressed affiliations
CAMP Directory Workshop Feb 3-6, 2004
Fat or thin?Fat or thin? Fat = contains selected data from sources Thin = contains only links to sources Issues with thin:
– Source system availability– Source system security (apps need creds)– App complexity (feed mechanics, identity matching,
canonicalization rules)– Policy complexity (authorize N apps to access M sources)
Issues with fat:– Data freshness– Downstream from canonicalization (usually a pro, but can
be a con) Most campuses are fat!
CAMP Directory Workshop Feb 3-6, 2004
Functional requirements for a Functional requirements for a registry entryregistry entry
Private primary key– Never reassigned, never revoked– Not used for any other purpose– GUIDs are preferable to uniqueness within a database
Publicly visible key– Available for sources or consumers to use to refer to the
person (better than, say, a username)– Probably numeric string <= 9 digits to ensure that it fits in
most predefined fields– Reduces exposure in case of disaster with primary key
Crosswalk source and consumer specific identifiers
CAMP Directory Workshop Feb 3-6, 2004
Functional requirements for a Functional requirements for a registry entryregistry entry
Personal information – answer the “is this person new” query with sufficient
accuracy– Support account claiming, initialization, or re-initialization
Storage for whatever’s authoritative in the person registry– Egs: support for provisioning, email, username(s)
Information obtained from source systems that is valuable to authorization or entitlement algorithms and policies
The entry and its principal identifiers and personal info (at least) are never deleted from the registry (except…)
CAMP Directory Workshop Feb 3-6, 2004
Registry record structure at UofCRegistry record structure at UofC
RDBMS (Sybase) with tables for:– Each major source system– One in which to collect all “small feeds”– Individuals, one row per person– Tracking usernames– Supporting service baskets and (de-)provisioning– Supporting the security model for registry operations
DB-local primary key (not a GUID), no PVID Records for “temporary” affiliations are removed
CAMP Directory Workshop Feb 3-6, 2004
Logging & reporting requirementsLogging & reporting requirements
Audit– Who had which identifiers when– State changes (when using a stateful provisioning
model)– Activity, to a degree
Diagnostic views/reports for selected helpdesk and operational staff
Refer requests for reports outside of the scope of IT operational needs to the data warehouse group!
CAMP Directory Workshop Feb 3-6, 2004
Provisioning strategyProvisioning strategy
Provisioning = maintenance of electronic ephemera required to facilitate users’ access to services
Format & transmission technology– Incremental vs. differential vs. full
consumer rebuilds– Periodic vs. asynchronous updates– Per-consumer or standard record formats– Transmission techniques (what do you
do?)
CAMP Directory Workshop Feb 3-6, 2004
Provisioning strategyProvisioning strategy
Service baskets– Business logic that determines which categories
of people are entitled to participate in which services, with which service levels
– One aspect of a more inclusive access control architecture
– Egs: shell accounts & quotas, mailboxes, email forwarding, dialup profiles, vpn, wireless, computer registration, calendar, …
– Issue of excessive granularization
CAMP Directory Workshop Feb 3-6, 2004
Not shown: transitions to prospective state from
grace, limbo, slide, IDonly.
Stateful provisioningStateful provisioning
CAMP Directory Workshop Feb 3-6, 2004
Independent variables for state Independent variables for state transitionstransitions
state substate date the present state was reached date by which the present state might end
(expiration date) major affiliation (faculty, staff, enrolled student,
accepted student, registered student, alum, …) list of the identifiers of resources being managed
for this account
CAMP Directory Workshop Feb 3-6, 2004
Fault avoidance & recoveryFault avoidance & recovery
Bad source data arrives – what happens? Flux high water marks
– Hold update when # changes exceeds threshold– Possible in source side, more often seen in consumer
provisioning techniques “Semantical filters”
– E.g. can absence from the HR feed mean anything other than they’re gone?
– Construct source filters based on knowledge of business practices that relate to selection criteria on the source system.
CAMP Directory Workshop Feb 3-6, 2004
Fault avoidance & recoveryFault avoidance & recovery
Person registry change log– Enables rollback & replay of consumer
updates– Good diagnostic info– Supports a “hit me with the new ones”
incremental provisioning strategy Stateful provisioning model can be
constructed to ensure continuity of service & buy time to fix effects of bad source data
CAMP Directory Workshop Feb 3-6, 2004
Expression of rulesExpression of rules
Hard coded or abstracted rule syntax? Rules for
– Affiliation– State transitions– Inclusion in service baskets– Memberships in selected groups (“minor” affiliations,
privilege classes) Stanford, Memphis examples
– Rules expressed in terms of registry object methods– External configuration file eval’d by the code that
manages the registry
CAMP Directory Workshop Feb 3-6, 2004
Common consumersCommon consumers
Minimum set of consumers & consumer technologies needed to meet application requirements!– Authentication, attributes, groups, coordinated identity
management Types
– Generic LDAP (maybe >1 replication networks)– Active Directory (maybe >1 consuming domain)– Kerberos– eDirectory, NIS, Ph, RDBMS (show hands?, others?)– Applications as direct consumers– Affiliated identity management operations
CAMP Directory Workshop Feb 3-6, 2004
UofC consumersUofC consumers
Consumers– openLDAP (1 replication network), Kerberos, Active
Directory, NIS, Ph uid is RDN uid namespace issues: regular, temporary, hospital
people– Above with periodic diffs, high water hold, async self
& management updates– Peer ID Mgmt operations (periodic full)
Service baskets & statefulness being developed– Manual quarterly account closures suits UofC culture– Automated stateful approach to loss of services per-
basket
CAMP Directory Workshop Feb 3-6, 2004
Selected policy & process issuesSelected policy & process issues
How will the University operate its identity management infrastructure?– What balance between centralized and distributed
operation? Registry – singular, centralized function Consumers – high degree of distribution possible Registration Authorities – small number??
– Who may have which role with what authority & obligations?– Leverages & extends existing data administration policies &
processes, or begs if those are insufficient– Highly cross-functional activity demanding organizational
flexibility
CAMP Directory Workshop Feb 3-6, 2004
Selected policy & process issuesSelected policy & process issues
What entitlements should attend each type of affiliation?– “Major” affiliations: student, faculty, alum, …
Possibly former or recent student, faculty, …?
– “Minor” affiliations: <role> in course 123, <role> in department X, <role> in degree program Y, occupant of building Z, …
– What processes should determine entitlements for each affiliation?
How should affiliations be structured?
CAMP Directory Workshop Feb 3-6, 2004
Selected policy & process issuesSelected policy & process issues
Who should be issued a credential? What assurance level should authentication for each constituency achieve? What constraints may pertain to each?– Applicants (student, faculty, staff)– Admitted students, accepted faculty or staff– Alums– Parents– Library patrons– Guests: visiting academics, conference attendees, hotel
guests, arbitrary “friends”, …