semi-automatic content extraction from specifications
DESCRIPTION
Semi-Automatic Content Extraction from Specifications. Krishnaprasad Thirunarayan Department of Computer Science & Engineering Wright State University Aaron Berkovich and Dan Sokol Cohesia Corporation. Extraction : Summarize in a prescribed vocabulary. Spec: Text. Spec: SDR. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/1.jpg)
1
Semi-Automatic Content Extraction from Specifications
Krishnaprasad ThirunarayanDepartment of Computer Science & Engineering
Wright State University Aaron Berkovich and Dan Sokol
Cohesia Corporation
![Page 2: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/2.jpg)
2
Extraction : Summarize in a prescribed vocabulary
Spec: Text Spec: SDR
Domain Library
![Page 3: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/3.jpg)
3
Sponsor: National Science Foundation SBIR: Phase I and Phase II
Industry: Cohesia Corporation Developer of (B2B) content and lower-level
infrastructure University: Wright State University
User-level tools: conceptualization and designOthers: Geometric Software Solutions, …
Tool/Product development and integration
Participants
![Page 4: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/4.jpg)
4
Outline
Background and Goal (What?)Motivation (Why?)Details (How?)Conclusions
![Page 5: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/5.jpg)
5
Background and Goal
![Page 6: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/6.jpg)
6
Manual Content Extraction
Input: Paper-based specifications of a
manufacturing task describing composition, processing, and testing of materials
Additional constraints imposed by customers and vendors
Appropriate Ontology and Domain Library defining standard vocabulary
![Page 7: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/7.jpg)
7
Output: An “equivalent” formalized description of
specs in Specification Definition Representation (SDR)
Observation: Specs originating from a common source
(ASTM, SAE, GE) share vocabulary and structure.
Linguistic patterns found in specs are exploited by an experienced extractor to interpret it.
![Page 8: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/8.jpg)
8
Assistance for Extraction Document
PaperDocument
TextMark-Up Editor
(Wizard)
Document SDR
Document Proofer
original
![Page 9: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/9.jpg)
9
Semi-automatic Content Extraction
Starting from an electronic version of a spec, develop a strategy for semantic markup, to assist in creating an “equivalent” SDR.
Semantic Markup: The task of overlaying an abstract syntax (“the essence”) on the “free-form” text.
• Spec: Human-sensible• Mark-up: Computer-sensible
Automate routine mechanical tasks.
![Page 10: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/10.jpg)
10
AEROSPACE SPECIFICATION
TOLERANCES
Corrosion and Heat Resistant Steel, Iron Alloy, Titanium, and Titanium Alloy Bars and Wire
1. SCOPE: This specification covers established inch/pound manufacturing tolerances
applicable to corrosion and heat resistant steel, iron alloy, titanium, and titanium alloy bars and wire ordered to inch/pound dimensions. These tolerances apply to all conditions unless otherwise noted. The term excl. is used to apply only to the higher figure of the specified range.
2. DIAMETER AND THICKNESS: 2.1 Cold Finished Bars: 2.1.1 Rounds, Squares, Rexagons, and Octanons {See 2.1.3 and 2.1.4)
TABLE I Tolerance, Inch
Squares, Hexagons, Specified Diameter Rounds and Octagons or Thickness plus and minus minus only Inches (See 2.1.1.1) (See 2.1.1.2) Over 0.500 to 1.000, excl 0.002 0.004 1.000 0.0025 0.004 Over 1.000 to 1.500, excl 0.0025 0.006 1.500 to 2.000, incl 0.003 0.006 Over 2.000 to 3.000, incl 0.003 0.008 Over 3.000 to 4.000, incl 0.003 0.010 2.1.1.1 Size tolerances for round bars are plus and minus as shown in Table I, unless otherwise
specified. If required, however, they may be specified all plus and nothing minus, or all minus and nothing plus, or any combination of plus and minus, if the total spread in size tolerance for a specified size is not less than the total spread shown in the table.
2.1.1.2 For titanium and titanium alloys, the difference among the three measurements of the
distance between opposite faces of hexagons shall be not greater than one-half the size tolerance and the difference between the measurements of the distance between opposite faces of octagons shall be not greater than the size tolerance.
AS 2241J Issued 5-1-75 Revised 1-1-83
Value
Characteristic
Spec NameSpec Title
Revision
Revision Date
Qualifier
Values
Procedure
Semantic Mark-up
![Page 11: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/11.jpg)
11
Ontology
(Gruber) An ontology is an explicit
specification of a conceptualization, which is an abstract, simplified view of the world that we wish to represent for some purpose.
![Page 12: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/12.jpg)
12
Procedure
1 or many
1 or many
0, 1 or many
0, 1 or many
Characteristic
Document
Ref: 0, 1 or many
Ref: 0, 1 or many
Ref: 0, 1 or many
Value
Layer
RevisionReference
0, 1 or many
DomainLibrary
SDL Ontology
![Page 13: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/13.jpg)
13
Spec: Text Spec: SDR
Extraction: Spec to SDR
![Page 14: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/14.jpg)
14
Fundamental ObstaclesThe relation between the spec and its SDR rendition is “not linear”.
Same spec information duplicated in SDR in different contexts.
Contiguous block of information in SDR spread out in spec.
Equivalence of phrases hard to formalize.Tables and footnotes abbreviate information in irregular and complicated ways.
![Page 15: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/15.jpg)
15
Linearizing through Abstraction: Introducing Specification Definition Language
Original Spec SDL
SDR
Manual (Ph-I) Compiled (Ph-I)
Original AMS-4976 spec is 8 pages. Its SDL equivalent is 15 pages.
Original AMS-5662J spec is 11 pages. Its SDR equivalent is 30 pages.
Manual (original)
Literal, Integrated,Semi-automatic (Ph-II)
![Page 16: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/16.jpg)
16
![Page 17: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/17.jpg)
17
![Page 18: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/18.jpg)
18
Introducing Extraction Wizard
![Page 19: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/19.jpg)
19
Motivation (Why?)
![Page 20: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/20.jpg)
20
Business Background (Supply Chain)
Engine
Metal
Forger
Drawing
Spec
Drawing
Spec
![Page 21: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/21.jpg)
21
Diverse and Large number of specs and spec users
QualityAssurance
Inspecting/Testing
Sales
Engineering
Certificateof
Test
Certificateof
Test
SalesOrder
LabRouting
ProductionRouting
Specs: AMS, DIN, JIS, PWA, GE, ASTMGM, etc.
![Page 22: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/22.jpg)
22
Quality Issues Transcription Errors
From spec to hand-written sheet to computerCompleteness
Info in spec but missing in SDRSoundness
Info in SDR but not in specUniformity of Form Uniformity in Interpretation
Different understanding of the meaning while mapping to SDR (Ambiguity/Inconsistency)
![Page 23: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/23.jpg)
23
Efficiency Issues Minimize time/effort required. Automate routine mechanizable tasks
Eliminate “cut-paste-modify” cycleMinimize duplication of information. Concise representation
Size of translation = O(Size of spec). Update consistency
Flexible rendition into various external forms.
![Page 24: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/24.jpg)
24
Details (How?)
![Page 25: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/25.jpg)
25
Essence of our Approach : Literal Translation
Conceptually, every piece of info in SDR owes its existence to phrases in spec.
Enable maintenance of correspondence between spec and its translation, and attempt to embed the translation into spec.
Requires compilation into SDL/SDR. Cf. XML/XSL Technology
![Page 26: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/26.jpg)
26
Semi-automatic approach is feasible only if the partially generated translations (annotations) are intelligible to an extractor in the context of the original spec, and is systematically extensible.
Note that current manual extractions into SDL are not literal even though SDL enables it to an extent.
![Page 27: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/27.jpg)
27
SDL Studio and its ExtensionSDL studio enables creation and editing of SDL documents. It has facilities to search domain library and compile SDL into an equivalent SDR. This can be further enriched using
Improved Domain Library Search Extraction and composition of SDL fragments Providing templates for commonly occurring
“procedures” Table processor etc …
![Page 28: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/28.jpg)
28
Domain Library Search Engine
![Page 29: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/29.jpg)
29
Domain Library
Currently, it contains technical phrases pertinent to materials and processing requirementsCohesia creates and maintains DLs for in-house use and for use by its clients such as GE, Alcoa, Allvac, etc.Typical size: 10,000 phrases
![Page 30: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/30.jpg)
30
![Page 31: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/31.jpg)
31
Improving Domain Library Search
Goal: Mapping “equivalent” phrases to same Domain Library TermUses: Techniques for prefix removal,
stemming, and dealing with other variations for root recognition
Stop words elimination Abbreviation expander and alias
normalization
![Page 32: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/32.jpg)
32
Algorithm SketchList[Phrase] dl;Phrase ip; Int mt;List[Word] dlwm, inwm; % with back referencesList[Phrase] dlts;begin dl := readAndBuildDomainLibrary(); dlwm := buildWordMapAndBackLinks(dl); % delete stop words, link words to DLTs (in,mt) := readInputPhraseAndMatchThreshold(); inwm := buildWordMap(in); dlts :=
buildDLTsListContainingMatchedWords(dlwm,inwm); dlts := evaluateAndFilterDLTs(dlts,mt);end;
![Page 33: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/33.jpg)
33
Matching wordsInt wordMatch(w1,w2)begin % normalized = vowels deleted, i.e., only consonants
present if caseUniformAndCleanedMatch(w1,w2)
return 100; if normalizedMatch(w1,w2)
return 90; if orderedNormalizedMatch(w1,w2)
return 70; % analyze for differences due to prefix and suffix
if normalizedDifferenceInPrefixSuffixTables(w1,w2) return 90;
end;
![Page 34: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/34.jpg)
34
Design RationaleInput phrase may contain multiple DLTs.DLT words may not appear contiguous in input.Consonants are significant, and "correct" spellings may differ in vowels. Robustness with respect to spelling errors such as transposition of letters or missing vowels.Stemmers do not work for words appearing in DLTs satisfactorily. Instead, create tables customized to deal with prefixes and suffixes that arise in practice, and normalize dynamically.Err on the side of recall rather than precision.Number of words < Number of DLTs
![Page 35: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/35.jpg)
35
Extraction Tool
![Page 36: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/36.jpg)
36
Overall Approach
Preprocessing: Obtain spec in plain text form (from MSWord format).
This is a practical alternative to scanning and OCR-ing a paper-based spec.
Saving it in HTML format has the benefit of isolating tables. On the con side, it retains formatting tags.
Semi-Automatic Extraction: Recognize phrases in spec text that are associated with a requirement and generate SDL fragments to assist an extractor.
![Page 37: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/37.jpg)
37
Two possible Avenues(From Document to SDL)Iteratively annotate the document text with XML tags reflecting the SDL structure and ontology. Generate various views of the document
and SDL from this single XML Master. Iteratively generate a sequence of progressively detailed SDL document from spec text.
![Page 38: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/38.jpg)
38
First Avenue : Via XMLSemi-automatic extraction is accomplished in two phases: Initial automatic markup phase: Systematically
recognize domain library terms in spec text and add suitable XML annotations. Then generate a first-cut SDL fragment.
Subsequent manual conversion phase: Extractor organizes the information and completes the translation into an equivalent SDL.
Further steps: As the tool matures, automation can be attempted to produce more detailed extractions.
![Page 39: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/39.jpg)
39
Advantages: Focus is on a single persistent XML
Master that tries to maintain a link between the spec and the extractions.
All the processing is orchestrated on this XML file.
Implements various views of the XML source using XSLFO and various transformations on the XML source using XSLT.
(cont’d)
![Page 40: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/40.jpg)
40
Disadvantages: There is a need to manage a separate
SDL version to incorporate user inputs and corrections. This is because, even though it may be possible to represent SDL constructs using XML tags, it may not be possible to integrate user edits literally into the XML source.
(cont’d)
![Page 41: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/41.jpg)
41
Insert Structure
Tags
Insert Ontology
Tags
Infer MissingChar.
GroupChar.
& Values
GroupC-Vs into
Procedures
Semantic-Markup Algorithm
![Page 42: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/42.jpg)
42
DLT Tagger
Group Tagger
SDL Converter
Text file
XML file
XML file
XML file
SDL file
DomainLibrary
Structure Tagger
Functional Components
![Page 43: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/43.jpg)
43
Tagging and Transformingflex structTagger.flexgcc lex.yy.c -lfla < GE.txt > GE.xmljava org.apache.xalan.xslt.Process -in GE.xml -xsl CSDLStylesheet.xsl -out GE.sdl …java org.apache.xalan.xslt.Process -in GE.xml -xsl CExpSDLStylesheet.xsl -out GE.exp.sdljava org.apache.xalan.xslt.Process -in GE.xml -xsl OriginalStylesheet.xsl -out GE.org.txt
![Page 44: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/44.jpg)
44
![Page 45: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/45.jpg)
45
![Page 46: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/46.jpg)
46
Second Avenue: SDL all alongAs there is no obvious way of incorporating SDL edits into the XML source in general, try to generate legal SDL at different levels of detail all along. Advantage: Yields SDL documents that can be immediately used in Spec Studio and extended by an extractor.Disadvantage: This form does not retain correspondence with the original document explicitly.
![Page 47: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/47.jpg)
47
Extraction Tool – Prototype Operation
Prototype Operation
![Page 48: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/48.jpg)
48
![Page 49: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/49.jpg)
49
Views: In the context of Spec
Plain text view Text view with
“requirement” phrases color coded and highlighted
View of domain library terms found in the spec
Views: In the context of SDL
Spec identity view + Large Note : Method D Extraction
Method C Extraction
Procedure view Characteristic-
value pair view
![Page 50: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/50.jpg)
50
Extraction Method
Qualifiers Requirements Procedures
References
D Spec Class Only All information in notes
Not used
In notes
C Spec Class, Product, Alloy
All information in notes
Not used
In notes
B Many Qualifiers Characteristic-Value
pairs and notes
Used Retrieved
A Many Qualifiers CV pairs, pre-conditions,
permissibility, formulas, etc
Used Retrieved
![Page 51: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/51.jpg)
51
Additional Standalone ToolsDomain Library Browser Given a word or a phrase, display all the
domain library information related to it.SDL Fragment Generator Given a sentence, generate an SDL
fragment that captures its essence.These tools can assist an extractor in composing SDL document incrementally.
![Page 52: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/52.jpg)
52
Future Work
![Page 53: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/53.jpg)
53
Longer-term VisionMarketplace continues to confirm the need for tools to capture the semantic interpretation of document contentCohesia plans to productize the results of the research into a viable commercial product
![Page 54: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/54.jpg)
54
Example Engineering TasksHow to express and represent templates for well-known “procedures”? Alternative to cut-paste-modify cycle
Tensile Test Heat Treatment Melt Method Chemistry Packaging
![Page 55: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/55.jpg)
55
How to express and represent heterogeneous tables and non-trivial footnotes in a spec in a convenient and uniform way?How to create, manipulate, and store specs in SDR and SDL among other forms and maintain interoperability?
![Page 56: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/56.jpg)
56
Example Research QuestionsWhat are the forms of extraction rules? Phrase pattern matching Theory of equivalence/subsumption
Example: Aliases / Equivalent Phrases Creep = Plastic Strain Delivery Condition = Surface Finish Cause for Rejection = Rejection Criteria Imperfections detrimental to usage of product
= Free of injurious defects
![Page 57: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/57.jpg)
57
Rules for interpreting “logic words”o Connectives: and, or, …o Quantifiers: all, every, each, …o Modifiers: over, under, more, less, …o Negation: not, no, unless, except, “free of” ...
Mismatch?• A, B, and C => {A,B,C}
union/OR-logic Distributive Laws?
• Lot and order number => lot number and order number
![Page 58: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/58.jpg)
58
Another Example Scenerio
Melt Atmosphere = Inert GasSulphur < 2.0%Niobium < 0.5%
Melt Atmosphere = ArgonSulphur < 1.7%
Columbium < 0.2%
Buyers’ Purchase Order
Sellers’ Inventory
Match?
![Page 59: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/59.jpg)
59
What are the strategies for searching and matching? Top-down: Template-driven
expectations Bottom-up: Identifying requirements
present Closure: Manual addition /
modification / disambiguation
![Page 60: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/60.jpg)
60
Relevant Information Extraction Research and Technologies References
Message Understanding Conferences. Work on NLP an IE at UMass, NYU, SRI,
etc. Search and Filtering tools.
![Page 61: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/61.jpg)
61
Conclusions
![Page 62: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/62.jpg)
62
Spec Text asElectronic
Image
OpticalCharacter
Recognition
SpecText on Paper
PaperScanning
SDL (XML) SDR
SDLCompiler
SDLEditor
Spec Text inHTML/XML
ExtractionWizard
Read,Interpret,& Type
NSF SBIR Phase I
NSF SBIR Phase II
Before
![Page 63: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/63.jpg)
63
Appendix
![Page 64: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/64.jpg)
64
AMS 4928N (Ti Alloy)
![Page 65: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/65.jpg)
65
Tensile Test
![Page 66: Semi-Automatic Content Extraction from Specifications](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681682d550346895dddca80/html5/thumbnails/66.jpg)
66