tango (rpi, june 2009)
DESCRIPTION
TANGO (RPI, June 2009). George Nagy, Mukkai Krishnamoorthy, Sharad Seth Raghav Padmanabhan, Ramana C. Jandhyala, Sean Kelley Max Muthalathu, William Silversmith. Completed Stuff. WNT (Piyushee, MS May 2008) TAT (Raghav, MS May 2009) Pubs: ICPR08, WNT PJ & GN, Dec. 2008 - PowerPoint PPT PresentationTRANSCRIPT
TANGO (RPI, June 2009)
George Nagy, Mukkai Krishnamoorthy,
Sharad Seth
Raghav Padmanabhan,
Ramana C. Jandhyala,
Sean Kelley
Max Muthalathu,
William Silversmith
June 15,3009 TANGO PROGRESS REPORT 2
Completed Stuff• WNT (Piyushee, MS May 2008)
• TAT (Raghav, MS May 2009)
Pubs:
ICPR08, WNT PJ & GN, Dec. 2008
ICPR08, QBT, RP & GN Dec. 2008
MKM09, Tessellations, RJ, RP, MK, GN, SS, WS, July 2009
GREC09, TAT results, RP, RP, MK, GN, SS, WS, July 2009
June 15,3009 TANGO PROGRESS REPORT 3
Software
• TAT (demo)
• EX2XY, XY2EX (Ramana)
• OO2XY, XY2OO (Sean, in progress)
• XY2LN (SS, MK)
• XY2WN (Bill)
• TAT stat analysis (RB & GN, in progress)
June 15,3009 TANGO PROGRESS REPORT 4
Partial grammar for X-Y trees (MK & SS)
Employment Status
Unemployed Employed
Education
High School or Less
College
High School or Less
College
BS/BAGraduat
e Degree
BS/BAGraduat
e Degree
SXY = { c [ c c ] c [ c { c [ c c ] } c { c [ c c ] } ]
Grammar G1 for parsing all layout-equivalent tessellations of this kind is:
S : = AA : = { B }
B : = c [ X ] B | c [ X ] X : = c X | A X | A | c
June 15,3009 TANGO PROGRESS REPORT 5
A’ and A’’ table formatsTwo different table formatsAll possible combinations may exist (
B1 B2 B1 B2D1D2D1D2
AB B1 B2 B1 B2
C DD1D2D1D2
C D B B1 B2 B1 B2D1D2D1D2
C1
C2
C
C2
C1
DC1
C2
AA1 A2
AA1 A2
B
A1 A2
A’
A’’
Hybrid
June 15,3009 TANGO PROGRESS REPORT 6
Appearance-based distance (WS?)
Each table cell is described by a vector:width, type size, typeface, indent, justification, alpha/num, color, #_of_chars,…
Compute differences between horizontally and vertically adjacent cells
From resulting “gradient map” determine row header, column header, and delta cell regions.
(Show GN’s Excel example)
June 15,3009 TANGO PROGRESS REPORT 7
Prediction of TAT-time
Multiple regression of interaction time from:
• Size of table (#cols, #rows, or # cells)
• Number of aggregates
• Number of footnotes
• Number units
• Other?
(GN has tried it with 20 tables – have Excel ‘GN_Data_Analysis’)
June 15,3009 TANGO PROGRESS REPORT 8
Table similarity• May be useful to determine similar edit sequences.
• Tree distance between X-Y representationssymmetry?
• Edit distance between linear P-notation for X-Y trees
• Metric for parse sequences??
• Tree distance between Wang category forests? (new)
June 15,3009 TANGO PROGRESS REPORT 9
Learning ???
• Retain edit sequences from TAT• Make X-Y tree from each imported but not edited table• Find distance of X-Y tree from new table to all previous• Execute edit sequences of nearest neighbor(s)• Check algorithmically if resulting X-Y tree corresponds
to correct WN• Check visually if table corresponding to resulting X-Y
tree is equivalent to original table.• If not, edit• Concatenate further edit and associate with X-Y tree of
new table, then add to reference set
June 15,3009 TANGO PROGRESS REPORT 10
Discussion Items• Lists & Ordering• XML format and verification• Augmentations (spotting and processing)• Open Office• Table ontology• XY tree to WN via lexical parse (checks?)• Use of parse trees for XY2WN• Learning?• Overall TANGO evaluation for final report• Critique draft slides for GREC and MKM• Tools: RPI: OO, VBA, Matlab, Python, BYU: ??• Other RPI projects: PERFECT, CERVITOR, CAVIAR
June 15,3009 TANGO PROGRESS REPORT 11
Survival Plans
• NSF TANGO Final Report !• New NSF proposal (Maria)• Other possible sponsors?• Confs• Archival Journals• Collaborators• Demos and dissemination• Next visit