ias, language and lego -- an introduction to semantic analysis

Post on 19-Sep-2014

31 Views

Category:

Technology

15 Downloads

Preview:

Click to see full reader

DESCRIPTION

This presentation will introduce Semantic Analysis – a way in which content can be analysed and classified through its linguistic basis, rather than through its overt meaning. It will achieve this by using Lego as a metaphor for language and demonstrating that by examining the building blocks of language a deeper understanding of content can be gained.

TRANSCRIPT

1

SM

S M

anag

emen

t & T

echn

olog

y

IAs, Language and Lego™ – an Introduction to Semantic Analysis

Matthew HodgsonRegional-lead, Web and Information Management, Canberra Australia

12 April 2008

2

SM

S M

anag

emen

t & T

echn

olog

y

3

SM

S M

anag

emen

t & T

echn

olog

y

4

SM

S M

anag

emen

t & T

echn

olog

y

IA Tools for understanding content

5

SM

S M

anag

emen

t & T

echn

olog

y

Content analysis…

6

SM

S M

anag

emen

t & T

echn

olog

y

We all:Think about information in different waysWrite about information in different ways

Information: we all think differently …

7

SM

S M

anag

emen

t & T

echn

olog

y

… we all even write differently …

8

SM

S M

anag

emen

t & T

echn

olog

y

Jeffrey Veen on analysing content

“a mind-numbingly detailed odyssey through your web site...

…this process…is a relatively straightforward process of clicking through your web site and recording what you find.”

Source: http://www.adaptivepath.com/ideas/essays/archives/000040.php

9

SM

S M

anag

emen

t & T

echn

olog

y

When analysing content …

10

SM

S M

anag

emen

t & T

echn

olog

y

An extract of medical restrictions text

11

SM

S M

anag

emen

t & T

echn

olog

y

What is this content?! Medical restrictions text Free-text built in Word and hand-crafted (*grrr*) Unclassified Varied consistency within and between texts Highly complex sentence structures in pseudo-legalese Style reflects the author rather than

the meaning in the communication

Content needed for re-use Content output was needed for reuse by others Multiple audiences Multiple purposes for re-use

Codification Codification by 3rd parties (after authoring) takes too long Need to reduce timeframes!

12

SM

S M

anag

emen

t & T

echn

olog

y

The task . . .analyse and codify

Concept 1

Concept 2Concept 3

Concept 4 Concept 5

Concept 5

13

SM

S M

anag

emen

t & T

echn

olog

y

What tools would be appropriate?

?

14

SM

S M

anag

emen

t & T

echn

olog

y Linguistics…a whole discipline devoted to the

study of language…

preposition

verb adjective

noun

determiner

subjectobject

conjunction

semantics

sentence structure

all language has structure

15

SM

S M

anag

emen

t & T

echn

olog

y

Language is like Lego™

Building blocks Subject (S) Verb (V) Object (O)

Order of blocks Differs depending on the language

16

SM

S M

anag

emen

t & T

echn

olog

y

Language is like Lego™

SVO languages English, French, Chinese, Bulgarian, SwahiliSOV Japanese, Turkish, KoreanVSO Classical Arabic, Celtic and HawaiianVOS Fijian, Yoda’s amusing phrases

17

SM

S M

anag

emen

t & T

echn

olog

y

Lego bricks: subjects, verbs and objects

Subject Verb Object

Those Lego bricks are [some] Lego bricksred

Sometimes, though, the SVO structure is hidden: “The Lego is red” or “Those Lego bricks are [some] red Lego bricks” ?

Uncovering the hidden structure helps to differentiate between the subject and the object and identify the who and what

19

SM

S M

anag

emen

t & T

echn

olog

y

Lego trees…

OBJECTVERBSUBJECT

Those Lego bricks are [some] Lego bricksred

SentenceRoot

Adj Adj Adj

NounPhrase

VerbPhrase

NounPhrase

NounDetVerbDet Noun

20

SM

S M

anag

emen

t & T

echn

olog

y

Semantic analysis

Medical restrictions wording:

Restricted benefitGastro-oesophageal reflux disease; Scleroderma oesophagus;

Authority requiredPeptic ulcer

21

SM

S M

anag

emen

t & T

echn

olog

y

Semantic analysis (cont.)

Actual sentencePeptic ulcer

Implied sentenceThe prescription of medicine is restricted to

the initial treatment of patients with peptic ulcer

22

SM

S M

anag

emen

t & T

echn

olog

y

Semantic structure of ‘peptic ulcer’

OBJ ECTVERBSUBJ ECT

ofthe prescription

DETVNDET PN

(SUBJECT)AUX

VAUX NP P ADJ NN

NounPhrase

PreposPhrase

NounPhrase

Root VerbPhrase

NounPhrase

PreposPhrase

NounPhrase

medicine is restricted to the initial

Adj

treatment of patients with peptic ulcer

23

SM

S M

anag

emen

t & T

echn

olog

y

Semantic model for restrictions textWHO

TREATED?

treatment of patientsinitial

Initi

al o

r co

ntin

uin

g

70 year old

mother

pregnant

Co

ndi

tion

be

ing

tre

ate

d

form

Pra

ctic

al a

spe

cts

Ob

ject

the prescription of medicine is restricted to the

Su

bje

ct

Ve

rb

femalecontinuing

other ADJ

male

Pa

tient

des

crip

tors

(p

op

ulat

ion/

gro

up

)

details of doctorrecord

daterecord

sign

receivingdBMARD treatment

previouslyPBS-

subsidised

PB

S s

ubs

idis

ed

receivingPBS-

subsidiseddBMARD treatment

treated immunologistclinical

Lim

itatio

n o

fP

resc

ribin

g to

a s

pec

ific

spe

cia

list

grou

p

withnausea and

vomiting

advanced psoriasis

peptic ulcerwith

tumorwith malignant

scleroderma oesphaguswith

with

with chronic pain

chemotherapycytotoxic

receivingA 5HT3

antagonist

radiotherapyreceiving

Exi

stin

g t

reat

me

nt

de

scri

pto

rs

of

po

pu

latio

n

not toresponding anelgesics

not

ADJ

receiving

treated dermatologist

WHATCONDITION?

+

ADJ

NOUN

PREP

VERB

by

by

KEY

not previously

ACTIONREQUIRED

=complete

Authority action sheet

includewhole body

area diagrams

treat for period of time

provide historypreivous

prescribe repeatsnumber

with seizures

not toother

anti-epilepticdrugs

receiving treatment2 years

incomplete resolution

ADJ/PP

of

no indication of

surgeryhaving

responding

unable take of topiramatesolid form

partial

hormone dependent metastatic

cancerwith

Me

asu

res/

de

scri

pto

rsof

Co

nd

itio

n s

eve

rity

(AD

J)

breast

contact Medicare

obtainAuthority number

24

SM

S M

anag

emen

t & T

echn

olog

y

Semantics describing “Who Treated”

Age

Patient Group

Documented history

[mg ...etc]

[CLINICIAN] Requiring special expertise in

Requiring no special expertise

[EXPERTISE]

[SEVERITY] [CONDITION]

Sex

PBS subsidised

PBS non-subsidised

At a dose of

Weekly

Daily

Monthly

Yearly

Fortnightly

Hourly

Hours

Days

Weeks

Months

Years

Vocation Veteran

Male

Female

All

Ethnicity [ETHNICITY]

Entitlement [?]

[LIST]

[LIST]

Pregnant

Breastfeeding

[ADJECTIVE]

Veteran

?

[MEASURED AS]?

Co-administered with

That meet a specific definition/criteria as set out in [LIST of references]

General schedule of Lipid-lowering Drugs

and

[DEFINED BY]

Treatments

Within timeframe of

Over a period of

Trials

Treatment with

Treatment of

Treatment for

Initial

Continuing

Maintenance

Effective

Ineffective

Inappropriate

Initiation

Stabilisation

In conjunction with

Not in conjunction with

Following

Preceeding

Received

Has not received

Not responding

Responding

Failed to qualify for

Qualified for

Not indicated

Indicated

Has had

Has not had

Can have

Can not have

Can not receive

Disease progression

Disease regression

Treated by

Diagnosis confirmed by

=

[NUMBER]Over

Under

Exactly

Between

At least

[DRUG]

[TREATMENT]

Diet

Exercise

Surgery [TYPE]

[THERAPY]

Evidence of

[PROCEDURE]

in

[DISORDER]

Symptoms?

Clinical findings

Starts new prepositional-phrase in the same text-block

Starts new prepositional-phrase in the same text-block

Starts new prepositional-phrase

in the same text-block

As measured by?

As evidenced by

Starts new prepositional-phrase

in the same text-block

25

SM

S M

anag

emen

t & T

echn

olog

y Authority Action

(allow) Maximum

Therapy

Supply

(allow) Minimum

In writing

By telephone

[TIME]

days

weeks

months

Therapy

Supply[AMOUNT]

Repeats[AMOUNT]

Repeats[AMOUNT]

Initial

Subsequent

Ongoing

Initial

Subsequent

Ongoing

Initial

Subsequent

Ongoing In writing

By telephone

To complete

Followed by

In writing

By telephone Within timeframe of [TIME]

days

weeks

months

Treatment

Treatment

Electronically

Electronically

Electronically

Remaining

Remaining

Remaining

In writing

By telephone

Electronically

Initial

Subsequent

Ongoing

Remaining

Where approval

[TIMEFRAME]

To [AUTHORITY]

Medicare

To [AUTHORITY]

Medicare

To [AUTHORITY]

Medicare

...etc...

...etc...

...etc...

Repeats[AMOUNT]

Starts new prepositional-phrase

in the same text-block

Starts new prepositional-phrase

in the same text-block

Starts new prepositional-phrase

in the same text-block

Semantics describing “Authority Action”

26

SM

S M

anag

emen

t & T

echn

olog

y

High-level semantic overview

HOWAUTHORISED

WHATCONDITION

WHO TREATED

Notes and Cautions + + + + =

Age limitations

Clinical initiation or

continuation criteria

Prescribing clinicians

Prescribing adviceCondition

Contact information

Grandfathering clauses Patient

groups

Prior treatments Severity

Patient GroupDefinitions Condition Authority ActionForeword

27

SM

S M

anag

emen

t & T

echn

olog

y

Yes, it can be codified!

Medical restrictions: Did have structure Did have underlying logic Were based on repeatable business processes Could be codified

Could we make a ‘system’ to reinforce the structure at the point of authoring?

28

SM

S M

anag

emen

t & T

echn

olog

y

Demo

Putting it together in a system: Supporting building of content restrictions in a

codified way Protyotyping with Axure

29

SM

S M

anag

emen

t & T

echn

olog

y

30

SM

S M

anag

emen

t & T

echn

olog

y

31

SM

S M

anag

emen

t & T

echn

olog

y

32

SM

S M

anag

emen

t & T

echn

olog

y

33

SM

S M

anag

emen

t & T

echn

olog

y

34

SM

S M

anag

emen

t & T

echn

olog

y

35

SM

S M

anag

emen

t & T

echn

olog

y

36

SM

S M

anag

emen

t & T

echn

olog

y

37

SM

S M

anag

emen

t & T

echn

olog

y

The semantic analysis advantage

vsIdentifies:• Themes in content

Identifies:• Themes in content• Work processes• Folk taxonomies used• ‘Things’ written about

38

SM

S M

anag

emen

t & T

echn

olog

y

What else could you use it for?

When you need to understand: Business processes that create content

When you want to disassemble content for: FAQs A-Z indexes Help files

39

SM

S M

anag

emen

t & T

echn

olog

y

How can I add this to my toolbox??!

Theory is important An understanding of semantics - sentence trees

and grammar Text books by authors like Fromkin and Rodman

can help through the tricky bits

Need good tools Connexor:

http://www.connexor.eu/technology/machinese/demo/

Big sheets of paper (and an electronic whiteboard) Visio (not PowerPoint!)

40

SM

S M

anag

emen

t & T

echn

olog

y

Demo

Connexor:http://www.connexor.eu/technology/machinese/demo/

41

SM

S M

anag

emen

t & T

echn

olog

y

Connexor

42

SM

S M

anag

emen

t & T

echn

olog

y

Connexor – machine tagger

43

SM

S M

anag

emen

t & T

echn

olog

y

Connexor – machine syntax

44

SM

S M

anag

emen

t & T

echn

olog

y

Why should I care about this? Google uses semantic analysis to index content

Translation software uses semantic analysis to identify ‘components’ for translation

Good sentence structure equals: Accurate indexing Higher rank relevance of content Happy people (they find what they’re looking for)

45

SM

S M

anag

emen

t & T

echn

olog

y

Why should I care about this?

46

SM

S M

anag

emen

t & T

echn

olog

y

‘Calais’ by Reuters

47

SM

S M

anag

emen

t & T

echn

olog

y

Summing up

Content is still king!

But how can you tell if your content: Is of good quality? Matches your website’s categories? Accurately reflects your metadata? Can be found by people?

Semantic analysis can: Make your content audits more objective Inform processes to improve the quality of the content Inform processes to improve search engine indexing Inform metadata creation Inform choice of taxonomy

48

SM

S M

anag

emen

t & T

echn

olog

y

Take-home message

Semantic analysis can help IAs:Infer How people think about, and structure, their informationDescribe Business processes that produce contentIdentify Where content quality is poor so it can be improved Critical components of the sentence for codificationDesign Taxonomies and describe folk taxonomiesBuild Systems to help bring some structure to content authoring

49

SM

S M

anag

emen

t & T

echn

olog

y

Fin

50

SM

S M

anag

emen

t & T

echn

olog

y

IAs, Language and Lego™

an Introduction to Semantic Analysis

51

SM

S M

anag

emen

t & T

echn

olog

y

by

Matthew Hodgson

Regional-lead, Web and Information Management

SMS Management & Technology Canberra Australia

52

SM

S M

anag

emen

t & T

echn

olog

y

by

Matthew Hodgson

Email mhodgson@smsmt.comBlog magia3e.wordpress.com

Slideshare www.slideshare.net/magia3e

Twitter magia3e

top related