semantic annotation€¦ · internal, distributed with thinklab; not visible except in knowledge...

http://springuniversity.bc3research.org/ | 1

Semantic annotationQuick guide for modelers


Observableseach model or data source has at least one observable, possibly with constraints that imply others.

• Observable concept categories:• subjects: must be countable, physical, recognizable (they “are”);• events: time-limited, countable, recognizable as atomic unities (they “happen”);• processes: time-explicit (not necessarily limited), recognizable as “mechanisms” that are inherent to and

affect subjects;• qualities: can only exist inherently to subjects, are normally created by processes, observed by comparison

(indirectly) with reference systems (they “describe”)

• Non-observable concepts that we use with observables:• attributes: e.g. “annual”, “average”, “vulnerable”…• identities: typically species (biological or chemical) – Only one per observable.• realms: e.g. soil, atmosphere, land, … Only one per observable.• orderings: e.g. level (high, medium, low). Attributes and orderings may be defined to be subjective (their

interpretation differs in different applications) or objective (also the default if you don’t say anything).

• Each observable must have identity (directly or through its super-class), and may optionally have a realm and many (different) attributes and traits. (different = can’t have two values of the same general type, e.g. both medium and low).


Abstract vs. Concrete observables

• An abstract observable is an observable without identity. Ontology im.core contains often used ones.• e.g. im.core:Mass, im.core:Volume, im.core:Individual, im.core:Group

• Observables must be made concrete before they can be used to annotate models or data. You do so by adding an identity, which is a concept that may exist in ontologies or be managed through authorities.• im.chemistry:Water im:Volume• im.ecology:Individual identified as “2133312” by GBIF;

• The im.* ontologies contain many concrete observables you can use right away.


The structure of the im.* ontologiesim = integrated modeling. A first step towards providing a set of coherent ontologies for semantic annotation. Available automatically according to the groups a user belongs to.

Foundational ontologies (DOLCE, imcore)

imfundamental traits

im.coreabstract observables

im.**

core observables and identities for

ecology chemistry hydrology geology geography conservation economics policy ecosystem services demography, ethics

Internal, distributed with Thinklab; not visible except in knowledge map.

Automatically synchronized from the server for the groups that use them. Searchable.

Domain-specific knowledge in individual projects

(org.aries.**)

Shared with and visible to specific interest groups.

Your projects here

Domain authoritiesfor open-ended identities (too many to build ontologies for):

GBIF -> biological speciesIUPAC -> chemical species

FAO/AGRIVOC -> agricultural typologies

…


The semantic annotation process

• Define the kind of observation your data source or model represents• subjects, measurements, rankings, classifications…

• Find or create the concrete concept(s) that reflect the identity and all attributes of the observable.• land cover type, elevation, average surface temperature, soil carbon concentration,

road, village• must be recognizable independent of its observation (e.g. a “ratio of water to wine”

is not an observable, as it only exists in your mind after observing the actual observables: quantities of water and wine).

• Create the appropriate observation statement for the data source or model, adding any other it may depend on or be contextualized to. Observations may require two or more observables (e.g. the ratio above, or constrained to be “within” some type of subject).


Lookup concept by keyword

Can it be expressed as an abstract observable +

identity?

Is the identity managed by an

authority?

Does it have observational attributes

(annual, average…)?Lookup attribute

by keyword

More attributes?Does its meaning depend on being in the context of a particular subject that

may vary?

Assign provisional name, issue request

Look up identity trait

Use authority to obtain identity(e.g., Identified

“23343” by GBIF)

Assign provisional name, issue request

Assign attribute(e.g., im:Annual im.hydrology:RainfallAmount)

Define concept for inherent

subject

Triple check usage; Assign primary

observable

Use identity to define trait for abstract observable(e.g., im.chemistry:Carbon im:Concentration

im.ecology:Individual identified “23343” by GBIF)

Decide type of observation

Annotate modelOBSERVABLE DEFINITION FLOWCHART

Lookup primary observable

Not found Yes Yes

No

Found

No

No

No

Found

Not found

Yes Not found

No

YesFound

Yes

Subject type may

need traits, identities,

etc.

1

2

3


Annotation of quality models• All quality models (what we commonly call “data”) are one of:

• measurements (physical properties with standard units)• classifications (into categories from a specified set)• rankings (numeric, linear, monotonic, “arbitrary” units)• probabilities (of something happening, event or process)• counts (of subjects)• ratios (comparing two quantities);• percentages or proportions (comparing a specialized quantity to a more

generic one)• uncertainties (of other qualities)• values (of a process, subject, or quality vs. a specific currency)

• The Thinklab language provides syntax for annotating these observations easily.

• Subjective transformations can be made to define discrete “levels” for all numeric observations.

• Numeric observation can indifferently produce “crisp” numbers or probability distributions.


Establishing a quality observable• First of all, look up English words in the knowledge dictionary. If something is found,

ensure it really is the quality you need. Use the knowledge map to understand the concept fully before using it.

• Ensure you’re looking for a quality: a common problem is using the process name to describe qualities that are the outcome of the process.

• If not found:

• What physical entity are you observing? A type, amount, frequency…? Find the fundamental im.core concept for it (choose “xxx” from dropdown menu to restrict the search). When found:

• Find an identity for the concept (choose “xxx” from dropdown ) or, if appropriate, use an authority to locate it.

• Does the observable need a realm (e.g. atmosphere)? Locate it (choose “xxx” from dropdown ).

• Does the observable need objective attributes related to the way it was observed that are not already present in the scale (e.g. annual, maximum, average)? Locate (choose “xxx” from dropdown ) and add all necessary objective attributes.

• Is the observable inherent to a specific type of subject that is essential to its identity? Locate that (choose “xxx” from dropdown) and use the withinspecification. The subject must also be an observable.


Thinklab model syntax summary


measure

To note:• needs a unit from unit vocabulary (being written!)• may be discretized• only physical properties should be measured (check the knowledge

graph if in doubt)• physical properties may be extensive or intensive, and behave correctly

with aggregation (advanced topic)• units are converted across dependencies – the elevation in m will satisfy

a requirement for elevation in ft


classify

• classification is a quality that describes an attribute that can change in the subject’s scale

• produces concepts that are usually created with the keyword class: they are like a “distributed trait” that can change across extents.

• classification can mediate other observers:


rank

• ranks are ordinal numbers that are not measurements, values, or any of the other numeric observers

• if the semantics is right, they should be used sparingly

• “true” rankings usually appear with prioritizations or indicators that are semi-quantitative. In most cases, think of the general case first, and if necessary convert an indicator into the numbers it categorizes if possible.

• we use ranking sometimes for well-known indicators that express measurements in very complex units that do not change in usage:


probability

• probability is 0 to 1

• the observable must be an event

• the probability observer creates a new concepts behind the scenes: e.g. im.climate:RainOnSnowProbability


count

• counts have an optional unit to reflect their distribution in space and/or time

• you can only count what’s countable – subjects or event

• the count observer also creates a new concept – e.g. im.ecology:GoatIndividualCount


ratio

• ratios compare two observables with compatible physical nature and different identity

• ratios create concepts like im.ecology:SoilCarbonToNitrogenRatio and produce numbers


percentage and proportion

• some percentages/proportions (check the knowledge graph)!

• others are built on the spot by indicating an observable and a trait that defines the percentage:


presence

• presence is a quality that describes the presence of a subject or process (qualities cannot be “present” by themselves).

• if we have the subjects as data (see next) we don’t need to explicitly annotate presence observers.

• used for outputs or to annotate data that already express presence/absence


uncertainty

• uncertainty is commonly used as an output from stochastic operations that produce it

• it can also be used to annotate uncertainty data if available

• it can be associated to any observable and does not mandate any specific computation method.


value

• value implies a currency – either monetary (usd@2007) or simply a concept that identifies the type of value.

• value can be assigned to all observables; if qualities, they must be values and you use ‘value’ alone, otherwise it’s written as ‘value of’ the observable – e.g. ‘value of im.infrastructure:House in EUR@2015’

• the support for value in Thinklab is only syntactic – you can write the models but they won’t work correctly. Will be added soon.

im:infrastructure:House


Discretization and classification of numeric observers• maintains the numeric identity when used in a numeric context

• should be done ‘by’ an ordering subjective trait

• bridges to discrete distributions


Annotation of subjects, processes and events

• ‘model each’ will produce subjects instead of indirect observation values

• for our purposes, it can be used to annotate attributes and will automatically produce presence quality model.


One annotation may imply others

• For qualities: • knowing the ratio of A/B implies knowing A when B is known, and B when A is

known.

• For subjects: • annotating subjects with observable that is defined with the subject (e.g. the

height of each tree in a annotation of tree data) also creates the correspondent quality observation “within” that subject type.

• All subject annotations create a “presence” annotation for them.

• If they are spatial, they also create a “length”, an “area” or a “volume” according to their spatial nature.


Thinklab model syntax summary

semantic annotation€¦ · internal, distributed with thinklab; not visible except in knowledge...

Documents