cis 895 – mse p rojectpeople.cs.ksu.edu/~sowji/100jimse/phase2/presentation2_20090325… ·...
TRANSCRIPT
![Page 1: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/1.jpg)
KDD- Service based Numerical Entity Searcher (KSNES)
Presentation 2 on March 31st , 2009
Naga Sowjanya Karumuri
CIS 895 – MSE PROJECT
Naga Sowjanya [email protected]
1
![Page 2: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/2.jpg)
OUTLINE
¢ Project Data Flow Diagram¢ Action Items¢ Architectural Design¢ Test PlanFormal Inspection Checklist¢ Formal Inspection Checklist
¢ Project Plan¢ Prototype Demonstration¢ Questions / Comments
2
![Page 3: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/3.jpg)
PROJECTDATA FLOWDIAGRAM:
NUMERICAL
ENTITY
SEARCHERSEARCHER
3
![Page 4: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/4.jpg)
MODULES IN THE PROJECT
¢ Webpage (JSP): For requesting and receiving information from the service.
¢ POS Tagger (Java): Stanford POS Tagger
¢ Numerical Phrase Extractor (Java): Implemented ¢ Numerical Phrase Extractor (Java): Implemented using Shallow Parsing Technique
¢ Number-Unit/Date Pattern Recognizer (C++): Implemented based on the Numerical Quantifier developed by Benjamin Sapp, UIUC.
4
![Page 5: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/5.jpg)
ACTION ITEMS
¢ Implemented Numerical Phrase Extractor
¢ Detailed Description of Test Plan
¢ Wrote Formal Specification using USE
¢ UML Representation of the System
5
![Page 6: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/6.jpg)
ARCHITECTURAL DESIGN
Service Oriented Architecture
6
![Page 7: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/7.jpg)
PACKAGE VIEW
Overall Package View
7
Class Descriptions, Attributes and Operations are contained in Architecture Design Document
![Page 8: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/8.jpg)
SEQUENCE DIAGRAM
8
![Page 9: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/9.jpg)
CLASS DIAGRAM(NPE PACKAGE)
9
![Page 10: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/10.jpg)
CLASS DIAGRAM(NDPR PACKAGE)
10
![Page 11: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/11.jpg)
IMPLEMENTING NUMERICAL PHRASE
EXTRACTOR
¢ Input: Tagged Text� I/PRP lost/VBD thirty-three/JJ dollars/NNS in/IN 1998/CD
¢ Regular Expressions are used to determine the numerical patterns in the input.numerical patterns in the input.� thirty-three/JJ dollars/NNS� in/IN 1998/CD
¢ Output: Numerical Phrases� thirty-three dollars� in 1998 11
![Page 12: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/12.jpg)
TAGSET
12
![Page 13: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/13.jpg)
SOME PATTERNS
¢ "\\d+-\\d+(/JJ|/CD) [a-zA-Z]+/NN"parses
\\d+-\\d+(/JJ|/CD) [a-zA-Z]+/NN
3-2/JJ lead/NN
20-20/JJ match/NN
¢ "(between|Between|from|From|In|in|since|Since|during|During)/IN ..../CD (([a-zA-Z]+/CC|[a-z]+/TO) ..../CD)?”
parses'between 1987 and 1997', 'in 2007 and 2008’ 13
20-20/JJ match/NN
![Page 14: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/14.jpg)
ASSIGNING BOUNDS
¢ Words that will be detected so as to set the bounds like >, <, ~, =
¢ “ = ” is used if no words are mentioned
Bound Corresponding words
14
Bound Corresponding words
> more than, no less than, no fewer than, at most, over
< up to, not over, no more than, at least, less than, not over than
~ about, around, approximately, some, nearly, almost,
![Page 15: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/15.jpg)
SOME PATTERNS
¢ [a-zA-Z0-9]+/CD( percent/NN)?( out/IN)? of/IN( the/DT)? ( [a-zA-Z]+/CD)?( [a-zA-Z]+/JJ)? [a-zA-Z]+(/NN|/NNS|/NNP)
parsesone of the five peopletwo of the groupsone of the rare cases89 percent of peoplefive of the seven former employees3 out of 5 people
15
![Page 16: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/16.jpg)
PHRASES THAT CAN BE PARSED
Numerical Phrases
27 year-old boy
A 3-2 lead
9 in 10 people
About 100 miles per hour
200 adults and children
Temporal Phrases
Last year
Next week
Monday – Sunday
January–December
1956-60
16
$3 million
About two-thirds of the vote
The 17-mile drive
Less than 10% support
Six-bedroom apartment
5.987 ml
10:00 a.m. CST
From 400 to 500 miles
Mid-1990s
Between 1999 and 2008
17th centaury
18 April 2008
Dec 21, 2009
October 10th 1984
John, 67
Since 1998
![Page 17: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/17.jpg)
PHRASES THAT ARE NOT CURRENTLY
PARSED
Numerical Phrases Temporal Phrases
six-pack of drinks 31st of March 1998
$100 more Since mid-November
252° (as POS can’t parse this) the January-April period
17
Future Work:
These phrases can also be parsed by adding more patterns to the current system but for now the most important and commonly occurring patterns are considered.
Current goal is to develop a basic idea of numerical phrase extraction.
![Page 18: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/18.jpg)
FORMAL SPECIFICATION
¢ Created and validated using USE 2.3.1.¢ All Classes are specified
� All important attributes and methods are specified� Constructor methods are not specified
¢ Contained at the end of the Architectural Design ¢ Contained at the end of the Architectural Design Document
18
![Page 19: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/19.jpg)
TEST PLAN
¢ Outputs are checked at each module by the developer by matching them to the results manually calculated� Check if the POS tagger has given the tagged text.� Check if the numerical phrases are extracted Check if the numerical phrase is explained to Value, � Check if the numerical phrase is explained to Value, Unit and Unit-Type.
¢ UML diagrams and the required specifications will be checked for consistency by two fellow MSE students
¢ User interaction will be tested by the developer and the technical inspectors. 19
![Page 20: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/20.jpg)
FORMAL INSPECTION CHECKLIST
¢ The following items are to be checked:� The symbols used in the class diagram conform to UML
standards� The symbols used in the sequence diagrams conform to UML
standards� The classes in the class diagrams have corresponding
descriptions provided in the Architecture Documentdescriptions provided in the Architecture Document� The descriptions of the classes in the Architecture Document
are clear and concise� The classes in the USE model are consistent with those in the
Architecture Document� All the requirements in the Software Requirements
Specification have been covered in the Architecture Document� The multiplicities in the USE model have been depicted in the
class diagram20
![Page 21: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/21.jpg)
PROJECT SCHEDULE
¢ Key Dates� Presentation 1: February 24th, 2009
¢ Complete Numerical Sub-Chunker
� Presentation 2: March 31st , 2009¢ Complete Numerical Phrase Extractor
� Presentation 3: April 10th, 2009� Presentation 3: April 10 , 2009¢ Patch up the modules¢ Develop a GUI¢ Set them up on the server
� To completely submit the documents by April 13th, 2009to the committee
� Final Portfolio submitted by April 15th , 2009
21
![Page 22: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/22.jpg)
PROJECT SCHEDULE
22
![Page 23: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/23.jpg)
PROTOTYPE DEMONSTRATION
¢ POS Tagger working� For now it works on the local machine
¢ Numerical Pattern Extractor¢ Numerical Pattern Extractor� For now it works on the local machine
23
![Page 24: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/24.jpg)
PHASE 3 DELIVERABLES
¢ Action items¢ Component Design¢ Assessment Evaluation¢ Project EvaluationUser’s Manual¢ User’s Manual
¢ Formal Technical Inspection Checklists¢ Presentation 3¢ Executable Project ¢ Source Code
24
![Page 25: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/25.jpg)
TO-DO LIST
¢ Revise the Documents¢ Revise Project Schedule¢ Work on the Phase3 deliverables¢ Final Demo
25
![Page 26: CIS 895 – MSE P ROJECTpeople.cs.ksu.edu/~sowji/100jiMSE/Phase2/Presentation2_20090325… · 25/3/2009 · KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on](https://reader034.vdocuments.us/reader034/viewer/2022051823/5fee0cc84b2f1a341a4487cf/html5/thumbnails/26.jpg)
Questions??
Suggestions!!
THANK YOU 26