regnet stanford university gloria lau dr. shawn kerrigan dr. kincho law dr. gio wiederhold wits’03...
TRANSCRIPT
REGNETREGNET
Stanford UniversityGloria LauDr. Shawn KerriganDr. Kincho LawDr. Gio Wiederhold
WITS’03Dec 13th, 2003
An Information Infrastructure for Government Regulations
2
MotivationMotivation
Multiple sources of regulations E.g. federal, state, local Different formats Conflicting ideas
Need for a repository Locate relevant information E.g. small business
Need for analysis tool Complexity of regulations
Multiple sources Understanding of regulations & their relationships
3
Example 1Example 1
ADAAG Appendix 4.6.3
… Such a curb ramp opening must be located within the access aisle boundaries, not within the parking space boundaries.
CBC 1129B.4.3
… Ramps shall not encroach into any parking space.
Exception: 1. Ramps located at the front of accessible parking spaces may encroach into the length of such spaces …
CBC allows curb ramps encroaching into accessible parking stall access aisles, while ADA disallows encroachment into any portion of the stall.
4
Example 2Example 2
ADAAG 4.7.2Slope. …Transitions from ramps to walks, gutters, or streets shall be flush and free of abrupt changes…
CBC 1127B.5.5Beveled lip. The lower end of each curb ramp shall have a ½ inch (13mm) lip beveled at 45 degrees as a detectable way-finding edge for persons with visual impairments.
ADAAG focuses on wheelchair traversal; CBC focuses on the visually impaired when using a cane.
5
ScopeScope Repository development
Shallow parser Feature extraction Ontology development
Automated extraction of related provisions Feature matching Structural matching Application to e-rulemaking
Compliance assistance using a Q&A system FOPC logic implementation Q&A compliance check
6
Repository developmentRepository development
shallow parser
regulations in HTML, PDF,plain text, etc
feature extractor
Ontology
XML regulations
measurements exceptions definitions
Semio
concepts
author-prescribed
indicesglossaryterms refined XML regulations
generic features
domain-specific features
DomainExpert
chemicals
effectivedates
7
Shallow parserShallow parser
Data Source Accessibility standards
US, UK and Scotland Drinking water standards in Environmental
regulations Federal and California
Current standard: HTML, PDF, hardcopy... Our system standard: XML Unit of extraction: section
<regElement name=”ufas.4.32.1” title=”minimum number” asterisk=”0” >
<regText> Fixed or built-in seating, ... </regText>
<ref name=”ufas.4.5” num=”1” />
<ref name=”ufas.4.32” num=”1” />
</regElement>
8
Automated Translation to Hierarchical Automated Translation to Hierarchical StructureStructure
PART 279—Standards For The Management Of Used Oil
Subpart B – Applicability
…§ 279.12 Prohibitions.(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. (b) Use as a dust suppressant. The use of used oil as a dust suppressant is prohibited, except when such activity takes place in one of the states listed in § 279.82(c).(c) Burning in particular units. Off-specification used oil fuel may be burned for energy recovery in only the following devices: (1) Industrial furnaces identified in § 260.10 of this chapter; (2) Boilers, as defined in § 260.10 of this chapter, that are identified as follows: (i) Industrial boilers located on the site of a facility engaged in a manufacturing process where substances are transformed into new products, including the component parts of products, by mechanical or chemical processes;….
Subsection(a)
Subsection(b)
Subsection(c)
40 CFR 279
Subpart A Subpart B Subpart I
Section 279.10 Section 279.11 Section 279.12
…
… …
contains
…
(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter.
(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units …
Example:
9
Ontology ViewOntology View
10
Feature extractionFeature extraction
Generic features Concepts Exceptions Definitions
Domain-specific features Glossary terms Author-prescribed indices Effective dates Measurements Chemicals, e.g., drinking water contaminants
11
XML regulation with features addedXML regulation with features addedOriginal section 141.11.b from the 40 CFR§ 141.11 Maximum contaminant levels for inorganic chemicals. (a) The maximum contaminant level for arsenic applies only to community water
systems ... (b) The maximum contaminant level for arsenic is 0.05 milligrams per liter for
community water systems until January 23, 2006. Refined section 141.11.b in XML format<regElement id=”40.cfr.141.11.b” name=””> <dwc name=”arsen” times=”1” /> <concept name=”commun water system” times=”1” /> <measurement unit=”ppm” size=”0.05” quantifier=”max” /> <date to=”January 23, 2006” /> ... <regText> The maximum contaminant level for arsenic is 0.05 milligrams per liter for community water systems until January 23, 2006. </regText></regElement>
12
Similarity AnalysisSimilarity Analysis
measurements
exceptions
definitions
author-prescribed
indices
glossary terms
feature matching base score
neighbor inclusion
refined score
referencedistribution
final score
Similarity Analysis Core
trashbelow
thresholdpairs
refinedXML
regulations
relatedpairs
13
Similarity Score computationSimilarity Score computation
Feature matching f0 = (i = features fi) / # features i
Features Concept & index match
tf idf vector tf = term frequency idf = inverse document frequency = log(n/ni)
Chemical match Measurement match Exception match Effective date match Glossary/definition term match
14
Score refinementsScore refinements
Near-tree neighbors Self vs. parent-sibling-child (psc), fs-psc
psc vs psc, fpsc-psc
A U
ADAAG UFAS
parentparent
sibling
child
sibling
child
psc(A) psc(U)
s-psc
psc-psc
15
Score refinementsScore refinements
Reference distribution, frd
Not-so-immediate neighbor effect on score E.g. f(A5.3, U6.4(a)) updates f(A2.1, U3.3)
ADAAG--------------------------
Section 2.1-----------------------------------------------------------------
Section 5.3--------------------------
UFAS---------------------------------------
Section 3.3-----------------------------------------------------------------
Section 6.4(a)-------------
no crossreference
similarsections: fo != 0
reference
16
Phrasing difference between American and British regulationsufas.4.13.9 Door Hardware. Handles, pulls, latches, locks, and other operating devices on accessible doors shall have a shape that is easy …
bs8300.12.5.4.2 Door Furniture. Door handles on hinged and sliding doors in accessible bedrooms should be easy to grip …
Neighbor similarities imply similarity between the interested nodes
Preliminary results: UFAS vs BS8300Preliminary results: UFAS vs BS8300
4.13 Doors 12.5.4 Doors
4.13.9Door Hardware
12.5.4.2Door Furniture
12.5.4.14.13.1
4.13.3
4.13.2
4.13.12
UFAS BS8300
parent
sibling
17
Application domain: e-rulemaking Comparison between draft of rules and the
associated public comments ADAAG Chapter 11, rights-of-way draft
Less than 15 pages Over 1400 public comments received within 4
months Comments ~ 10MB in size; most are several pages
long New regulation draft can easily generate a huge
amount of data that needs to be reviewed and analyzed
Preliminary results: e-rulemakingPreliminary results: e-rulemaking
18
Preliminary results: e-rulemakingPreliminary results: e-rulemaking
1105.4 [6]
Content ofSection 1105.4
6 Related Public Comments
19
Related draft section and public commentAdaag.1105.4.1
Where signal timing is inadequate for full crossing of all traffic lanes or where the crossing is not signalized, cut-through medians …
Deborah Wood, October 29, 2002
… This often means walk lights that are so short in duration that by the time a person who is blind realizes …
No identified related sectionDonna Ring, September 6, 2002
If you become blind, no amount of electronics … will make you safe … You have to learn modern blindness skills from a good teacher. You have to practice your new skills …
Concern not addressed in the draft
Preliminary results: e-rulemakingPreliminary results: e-rulemaking
20
Compliance Assistance SystemCompliance Assistance System
21
Compliance IssuesCompliance Issues
22
ConclusionsConclusions
An infrastructure for Repository development
Shallow parser Feature extraction Ontology development
Automated extraction of related provisions Feature matching Structural matching Application to e-rulemaking
Compliance assistance using a Q&A system FOPC logic implementation Q&A compliance check
Future Directions Application on other semi-structured documents Inconsistency identification
23
Thank You!Thank You!
Questions?Questions?