factminers & prima's "turning text soup into smart data" - the goal: smart data

8
Goal: Smart Data From “readable” to “computable” FactMiners & PRImA’s Knight News Challenge Entry Turn Text Soup into Smart Data in Newspaper & Magazine Archives” A self-running video slideshow. One slide every 15 seconds. Pause as needed.

Upload: jim-salmons

Post on 10-Feb-2017

114 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Goal: Smart DataFrom “readable” to “computable”

FactMiners & PRImA’sKnight News Challenge Entry

Turn Text Soup into Smart Data in

Newspaper & Magazine Archives”

A self-running video slideshow.

One slide every 15 seconds.

Pause as needed.

Q: What is Smart Data?

• A: Smart Data is self-descriptivedata that can “carry on a conversation”

with Smart Programs to support access, editing, and visualization of the data itself.

FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”

The “actual” data of the database

To access the “actual” data of the database,

Smart Programs “talk” to an embedded

“database about the database” (AKA a metamodel )

Q: What does Smart Data look like?

FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”

• A: Smart Data includes BOTH the

complex document structureof the source AND the underlying conceptual model of the source

content.

Q: What can Smart Data do?

• A: Turn expensive, time-consuming, labor-intensive research studies into “Just ask!” queries

• Good for things like:• How did local reporting of race

relations impact public policy in Indiana in the 1950s?

• Did advertising or editorial coverage account for the popularity of programs in the Softalk Bestseller lists?

FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”

Q: How “smart” is our Smart Data design?

• We spent a year researching

museum informatics and prototyping Smart Data designs.

• Our software architecture is based

on CIDOC-CRM (Conceptual Reference Model for Museums) microservice workflows and

PRESSoo, the ISSN.org metamodel for serial publications

FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”

Winter, 2013

Spring, 2014

Fall, 2014

Summer, 2015

Neo4j GraphGist Challenge,

a 1st place for Metamodel

Subgraph domain model

Semi-finals Ashoka/LEGO

“Re-imagine Learning” Challenge.

#MW2014 FactMiners demo.

Introduced to #cidocCRM.

Museum Computer Network

Emerging Professional Scholarship.

#MCN2014 paper & demo.

“Massively Addressable Text” published

in peer-reviewed CODE|WORDS.

#HILT2015 Crowdsourcing Course

DPLA Community Reps.

Internet Archive Content Partner.

ICOM #cidocCRM SIG member.

Incorporate PRESSoo into design.

Begin PRImA Collaboration.

Q: How “open” is our Smart Data design?

• Using a metamodel subgraph design pattern to embed and pass info about data and its access and transformation is

technology neutral &

future-proof.

FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”

Without Smart Data

With Smart Data

Database10 Load X

20 Print X

30 Goto 10

Domain knowledge written

into task-specific programs

Metamodel statically stored

within #TEI header section of

source documents std. text files

<teiHeader>

<metamodel />

<structure />

<content />

Any “smart” DB

For dynamic Linked Open Data access,

DB need only have import &

ability to represent data structures

read from metamodel header.

10 Load metamodel

20 Configure editors

30 Do stuff…

“Smart” program in

any language

We have a design to “tame” Text Soup and unlock “facts” in archive data.

• An innovative design combining international standardsfor conceptual modeling of museum collections

(cidocCRM and PRESSoo) together with a “self-descriptive” software/database design pattern provide the foundation for mining Smart Data from Text Soup.

• In the next slideshow, we describe our design for the

technology to “fact-mine” Smart Data from newspaper & magazine digital archives…

FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”

FactMiners & PRImA: Our Knight News Challenge Entry

• “Turn Text Soup into Smart Data in Newspaper & Magazine Archives” https://goo.gl/99Vn5M

• Team• Jim Salmons, FactMiners

• Timlynn Babitsky, FactMiners

• Apostolos Antonacopoulos, PRImA

• Christian Clausner, PRImA

FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”