factminers & prima's "turning text soup into smart data" - the goal: smart data
TRANSCRIPT
Goal: Smart DataFrom “readable” to “computable”
FactMiners & PRImA’sKnight News Challenge Entry
Turn Text Soup into Smart Data in
Newspaper & Magazine Archives”
A self-running video slideshow.
One slide every 15 seconds.
Pause as needed.
Q: What is Smart Data?
• A: Smart Data is self-descriptivedata that can “carry on a conversation”
with Smart Programs to support access, editing, and visualization of the data itself.
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”
The “actual” data of the database
To access the “actual” data of the database,
Smart Programs “talk” to an embedded
“database about the database” (AKA a metamodel )
Q: What does Smart Data look like?
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”
• A: Smart Data includes BOTH the
complex document structureof the source AND the underlying conceptual model of the source
content.
Q: What can Smart Data do?
• A: Turn expensive, time-consuming, labor-intensive research studies into “Just ask!” queries
• Good for things like:• How did local reporting of race
relations impact public policy in Indiana in the 1950s?
• Did advertising or editorial coverage account for the popularity of programs in the Softalk Bestseller lists?
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”
Q: How “smart” is our Smart Data design?
• We spent a year researching
museum informatics and prototyping Smart Data designs.
• Our software architecture is based
on CIDOC-CRM (Conceptual Reference Model for Museums) microservice workflows and
PRESSoo, the ISSN.org metamodel for serial publications
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”
Winter, 2013
Spring, 2014
Fall, 2014
Summer, 2015
Neo4j GraphGist Challenge,
a 1st place for Metamodel
Subgraph domain model
Semi-finals Ashoka/LEGO
“Re-imagine Learning” Challenge.
#MW2014 FactMiners demo.
Introduced to #cidocCRM.
Museum Computer Network
Emerging Professional Scholarship.
#MCN2014 paper & demo.
“Massively Addressable Text” published
in peer-reviewed CODE|WORDS.
#HILT2015 Crowdsourcing Course
DPLA Community Reps.
Internet Archive Content Partner.
ICOM #cidocCRM SIG member.
Incorporate PRESSoo into design.
Begin PRImA Collaboration.
Q: How “open” is our Smart Data design?
• Using a metamodel subgraph design pattern to embed and pass info about data and its access and transformation is
technology neutral &
future-proof.
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”
Without Smart Data
With Smart Data
Database10 Load X
20 Print X
30 Goto 10
Domain knowledge written
into task-specific programs
Metamodel statically stored
within #TEI header section of
source documents std. text files
<teiHeader>
<metamodel />
<structure />
<content />
Any “smart” DB
For dynamic Linked Open Data access,
DB need only have import &
ability to represent data structures
read from metamodel header.
10 Load metamodel
20 Configure editors
30 Do stuff…
“Smart” program in
any language
We have a design to “tame” Text Soup and unlock “facts” in archive data.
• An innovative design combining international standardsfor conceptual modeling of museum collections
(cidocCRM and PRESSoo) together with a “self-descriptive” software/database design pattern provide the foundation for mining Smart Data from Text Soup.
• In the next slideshow, we describe our design for the
technology to “fact-mine” Smart Data from newspaper & magazine digital archives…
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”
FactMiners & PRImA: Our Knight News Challenge Entry
• “Turn Text Soup into Smart Data in Newspaper & Magazine Archives” https://goo.gl/99Vn5M
• Team• Jim Salmons, FactMiners
• Timlynn Babitsky, FactMiners
• Apostolos Antonacopoulos, PRImA
• Christian Clausner, PRImA
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”