semantic annotation€¦ · internal, distributed with thinklab; not visible except in knowledge...
TRANSCRIPT
http://springuniversity.bc3research.org/ | 1
Semantic annotationQuick guide for modelers
http://springuniversity.bc3research.org/ | 2
Observableseach model or data source has at least one observable, possibly with constraints that imply others.
• Observable concept categories:• subjects: must be countable, physical, recognizable (they “are”);• events: time-limited, countable, recognizable as atomic unities (they “happen”);• processes: time-explicit (not necessarily limited), recognizable as “mechanisms” that are inherent to and
affect subjects;• qualities: can only exist inherently to subjects, are normally created by processes, observed by comparison
(indirectly) with reference systems (they “describe”)
• Non-observable concepts that we use with observables:• attributes: e.g. “annual”, “average”, “vulnerable”…• identities: typically species (biological or chemical) – Only one per observable.• realms: e.g. soil, atmosphere, land, … Only one per observable.• orderings: e.g. level (high, medium, low). Attributes and orderings may be defined to be subjective (their
interpretation differs in different applications) or objective (also the default if you don’t say anything).
• Each observable must have identity (directly or through its super-class), and may optionally have a realm and many (different) attributes and traits. (different = can’t have two values of the same general type, e.g. both medium and low).
http://springuniversity.bc3research.org/ | 3
Abstract vs. Concrete observables
• An abstract observable is an observable without identity. Ontology im.core contains often used ones.• e.g. im.core:Mass, im.core:Volume, im.core:Individual, im.core:Group
• Observables must be made concrete before they can be used to annotate models or data. You do so by adding an identity, which is a concept that may exist in ontologies or be managed through authorities.• im.chemistry:Water im:Volume• im.ecology:Individual identified as “2133312” by GBIF;
• The im.* ontologies contain many concrete observables you can use right away.
http://springuniversity.bc3research.org/ | 4
The structure of the im.* ontologiesim = integrated modeling. A first step towards providing a set of coherent ontologies for semantic annotation. Available automatically according to the groups a user belongs to.
Foundational ontologies (DOLCE, imcore)
imfundamental traits
im.coreabstract observables
im.**
core observables and identities for
ecology chemistry hydrology geology geography conservation economics policy ecosystem services demography, ethics
Internal, distributed with Thinklab; not visible except in knowledge map.
Automatically synchronized from the server for the groups that use them. Searchable.
Domain-specific knowledge in individual projects
(org.aries.**)
Shared with and visible to specific interest groups.
Your projects here
Domain authoritiesfor open-ended identities (too many to build ontologies for):
GBIF -> biological speciesIUPAC -> chemical species
FAO/AGRIVOC -> agricultural typologies
…
http://springuniversity.bc3research.org/ | 5
The semantic annotation process
• Define the kind of observation your data source or model represents• subjects, measurements, rankings, classifications…
• Find or create the concrete concept(s) that reflect the identity and all attributes of the observable.• land cover type, elevation, average surface temperature, soil carbon concentration,
road, village• must be recognizable independent of its observation (e.g. a “ratio of water to wine”
is not an observable, as it only exists in your mind after observing the actual observables: quantities of water and wine).
• Create the appropriate observation statement for the data source or model, adding any other it may depend on or be contextualized to. Observations may require two or more observables (e.g. the ratio above, or constrained to be “within” some type of subject).
http://springuniversity.bc3research.org/ | 6
Lookup concept by keyword
Can it be expressed as an abstract observable +
identity?
Is the identity managed by an
authority?
Does it have observational attributes
(annual, average…)?Lookup attribute
by keyword
More attributes?Does its meaning depend on being in the context of a particular subject that
may vary?
Assign provisional name, issue request
Look up identity trait
Use authority to obtain identity(e.g., Identified
“23343” by GBIF)
Assign provisional name, issue request
Assign attribute(e.g., im:Annual im.hydrology:RainfallAmount)
Define concept for inherent
subject
Triple check usage; Assign primary
observable
Use identity to define trait for abstract observable(e.g., im.chemistry:Carbon im:Concentration
im.ecology:Individual identified “23343” by GBIF)
Decide type of observation
Annotate modelOBSERVABLE DEFINITION FLOWCHART
Lookup primary observable
Not found Yes Yes
No
Found
No
No
No
Found
Not found
Yes Not found
No
YesFound
Yes
Subject type may
need traits, identities,
etc.
1
2
3
http://springuniversity.bc3research.org/ | 7
Annotation of quality models• All quality models (what we commonly call “data”) are one of:
• measurements (physical properties with standard units)• classifications (into categories from a specified set)• rankings (numeric, linear, monotonic, “arbitrary” units)• probabilities (of something happening, event or process)• counts (of subjects)• ratios (comparing two quantities);• percentages or proportions (comparing a specialized quantity to a more
generic one)• uncertainties (of other qualities)• values (of a process, subject, or quality vs. a specific currency)
• The Thinklab language provides syntax for annotating these observations easily.
• Subjective transformations can be made to define discrete “levels” for all numeric observations.
• Numeric observation can indifferently produce “crisp” numbers or probability distributions.
http://springuniversity.bc3research.org/ | 8
Establishing a quality observable• First of all, look up English words in the knowledge dictionary. If something is found,
ensure it really is the quality you need. Use the knowledge map to understand the concept fully before using it.
• Ensure you’re looking for a quality: a common problem is using the process name to describe qualities that are the outcome of the process.
• If not found:
• What physical entity are you observing? A type, amount, frequency…? Find the fundamental im.core concept for it (choose “xxx” from dropdown menu to restrict the search). When found:
• Find an identity for the concept (choose “xxx” from dropdown ) or, if appropriate, use an authority to locate it.
• Does the observable need a realm (e.g. atmosphere)? Locate it (choose “xxx” from dropdown ).
• Does the observable need objective attributes related to the way it was observed that are not already present in the scale (e.g. annual, maximum, average)? Locate (choose “xxx” from dropdown ) and add all necessary objective attributes.
• Is the observable inherent to a specific type of subject that is essential to its identity? Locate that (choose “xxx” from dropdown) and use the withinspecification. The subject must also be an observable.
http://springuniversity.bc3research.org/ | 9
Thinklab model syntax summary
http://springuniversity.bc3research.org/ | 10
measure
To note:• needs a unit from unit vocabulary (being written!)• may be discretized• only physical properties should be measured (check the knowledge
graph if in doubt)• physical properties may be extensive or intensive, and behave correctly
with aggregation (advanced topic)• units are converted across dependencies – the elevation in m will satisfy
a requirement for elevation in ft
http://springuniversity.bc3research.org/ | 11
classify
• classification is a quality that describes an attribute that can change in the subject’s scale
• produces concepts that are usually created with the keyword class: they are like a “distributed trait” that can change across extents.
• classification can mediate other observers:
http://springuniversity.bc3research.org/ | 12
rank
• ranks are ordinal numbers that are not measurements, values, or any of the other numeric observers
• if the semantics is right, they should be used sparingly
• “true” rankings usually appear with prioritizations or indicators that are semi-quantitative. In most cases, think of the general case first, and if necessary convert an indicator into the numbers it categorizes if possible.
• we use ranking sometimes for well-known indicators that express measurements in very complex units that do not change in usage:
http://springuniversity.bc3research.org/ | 13
probability
• probability is 0 to 1
• the observable must be an event
• the probability observer creates a new concepts behind the scenes: e.g. im.climate:RainOnSnowProbability
http://springuniversity.bc3research.org/ | 14
count
• counts have an optional unit to reflect their distribution in space and/or time
• you can only count what’s countable – subjects or event
• the count observer also creates a new concept – e.g. im.ecology:GoatIndividualCount
http://springuniversity.bc3research.org/ | 15
ratio
• ratios compare two observables with compatible physical nature and different identity
• ratios create concepts like im.ecology:SoilCarbonToNitrogenRatio and produce numbers
http://springuniversity.bc3research.org/ | 16
percentage and proportion
• some percentages/proportions (check the knowledge graph)!
• others are built on the spot by indicating an observable and a trait that defines the percentage:
http://springuniversity.bc3research.org/ | 17
presence
• presence is a quality that describes the presence of a subject or process (qualities cannot be “present” by themselves).
• if we have the subjects as data (see next) we don’t need to explicitly annotate presence observers.
• used for outputs or to annotate data that already express presence/absence
http://springuniversity.bc3research.org/ | 18
uncertainty
• uncertainty is commonly used as an output from stochastic operations that produce it
• it can also be used to annotate uncertainty data if available
• it can be associated to any observable and does not mandate any specific computation method.
http://springuniversity.bc3research.org/ | 19
value
• value implies a currency – either monetary (usd@2007) or simply a concept that identifies the type of value.
• value can be assigned to all observables; if qualities, they must be values and you use ‘value’ alone, otherwise it’s written as ‘value of’ the observable – e.g. ‘value of im.infrastructure:House in EUR@2015’
• the support for value in Thinklab is only syntactic – you can write the models but they won’t work correctly. Will be added soon.
http://springuniversity.bc3research.org/ | 20
Discretization and classification of numeric observers• maintains the numeric identity when used in a numeric context
• should be done ‘by’ an ordering subjective trait
• bridges to discrete distributions
http://springuniversity.bc3research.org/ | 21
Annotation of subjects, processes and events
• ‘model each’ will produce subjects instead of indirect observation values
• for our purposes, it can be used to annotate attributes and will automatically produce presence quality model.
http://springuniversity.bc3research.org/ | 22
One annotation may imply others
• For qualities: • knowing the ratio of A/B implies knowing A when B is known, and B when A is
known.
• For subjects: • annotating subjects with observable that is defined with the subject (e.g. the
height of each tree in a annotation of tree data) also creates the correspondent quality observation “within” that subject type.
• All subject annotations create a “presence” annotation for them.
• If they are spatial, they also create a “length”, an “area” or a “volume” according to their spatial nature.
http://springuniversity.bc3research.org/ | 23
Thinklab model syntax summary