integrating structure and semantics into audio-visual documents tuesday 21 st of october, 2003...
Post on 11-Jan-2016
217 Views
Preview:
TRANSCRIPT
Integrating Structure and Semantics into
Audio-visual Documents
Tuesday 21st of October, 2003
Raphaël Troncy
2nd International Semantic Web Conference (ISWC2003)
10/21/2003 Raphaël Troncy - ISWC'2003 2
Description of the AV content
• A three step process :– identification of the content creator and the
content provider : Dublin Core metadata, VRA
core categories …– structural decomposition in video segments
corresponding to the logical structure of the program : time-code, spatial coordinates
– semantic description of these segments : controlled vocabulary, thesaurus, free text annotation
10/21/2003 Raphaël Troncy - ISWC'2003 3
Description of the AV content
• Segmentation– locate and date some
events
• Description– type each segment with an
AV genre– type each segment with a
general thematic
– describe the scene (who, when, where, what, …)
describe the logical structure
describe the semantics of the content
time t
report
athletics
Michael Johnson smashed the 200mworld record to complete a 200m in
19''32 in Atlanta for the Olympic Games
10/21/2003 Raphaël Troncy - ISWC'2003 4
Example
Q : Find all AV sequences of type interview with Sandy Casar and concerning the Paris-Nice cycling race– noise answer : there are other sports news in the sequence– incomplete answer : the interview was broadcasted in two parts
and began in a previous sequence– the query cannot be extended !
13 [Indoor Set: 6th part]
at 18:43:56:00 - 00:09:06:00. – Eurosport
In studio, the second part of the interview, from Nice, of Sandy CASAR by Jean René GODART about the Paris-Nice cycling race and a few sports news with pictures commented by Alexandre BOYON and Laurent PUYAT.
Q : Find all AV sequences of type dialog sequence with a rider and concerning any cycling race with several stages
10/21/2003 Raphaël Troncy - ISWC'2003 5
• Requirements :– express models that constrain the logical structure
• identify an interview inside a report of a sports magazine
– represent the meaning contained in this structure• a cartoon is a fiction with no real characters
– describe semantically the content of each sequence• the Prologue is always an individual time trial numbered stage 0
Which languages are the most suitable to perform all these tasks ?
Problems• Weak use of the logical structures• Descriptions are not made for reasoning
make the AV descriptions accessible to automated processes
10/21/2003 Raphaël Troncy - ISWC'2003 6
"Pure" documentary approaches• General bibliographic description languages (DC, VRA)
• MPEG-7 : the new multimedia description language ?– three components: D, DS and DDL– structure: segment = abstract unit defined by temporal localization or
masks– semantics: entity–attribute–relation model + thesaurus for structuring the
knowledge (Classification Schemes)– tools: Videto (ZGDV), Vizard (EU-IST Project), MovieTool (© Ricoh)
• Extensions– XML Schema : add structure without semantics
• TV Anytime, Mdéfi [Tran Thuong, 2003]
– Classification Schemes : very poor expressivity• COALA [Fatemi, 2003]
10/21/2003 Raphaël Troncy - ISWC'2003 7
KR approaches: OWL+RDF
• Definition of concepts and relationsStudioProgram and ( HomogeneousProgram
(all hasPart StudioSequence) )
• Definition of axiomsHomogeneousProgram HeterogeneousProgram = Problem : the structure of the document
(i.e. the context) is lost !
let us merge the two approaches !
10/21/2003 Raphaël Troncy - ISWC'2003 8
General architecture
Documentschemes
Documentinstances
validusers AV Ontology
Domain-specific Ontology
statementsbase
MPEG-7 /XML Schema
OWL / RDF
documentalists
query
Transfor-mation
10/21/2003 Raphaël Troncy - ISWC'2003 9
The Audio-visual Ontology• Methodology of construction: [Bachimont et al., EKAW’02]
– Conceptualization : differential principles– Formalization : formal definitions, axioms– Operationalization : export into a KR language
• AV domain:– Production objects (program, sequence, AV genre), Properties
(theme), Persons, Technical Process (shooting, recording, post-production), Signal descriptors (audio, video), etc.
• Tools:– Conceptualization : DOE [Bachimont et al., EKAW’02]– Formalization : OilEd [Bechhofer, KI’01]– Languages : DAML+OIL … OWL
• DOE and ontologies are available at :http://opales.ina.fr/public/ontologies/
10/21/2003 Raphaël Troncy - ISWC'2003 10
The Audio-visual Ontology
10/21/2003 Raphaël Troncy - ISWC'2003 11
Documentschemes
Documentinstances
validusers AV Ontology
Domain-specific Ontology
statementsbase
MPEG-7 /XML Schema
OWL / RDF
documentalists
query
Transfor-mation
General architecture
10/21/2003 Raphaël Troncy - ISWC'2003 12
Generate XML Schema types
OWL• Class• Sub-class• Restriction on
properties• Union of classes
XML Schema• Complex type• Extension• Element of the content
model• Choice in the content
modelXSLT ?
Some concepts (program, sequence) extend the MPEG-7 Segment type, hence the descriptions are MPEG-7 valid
10/21/2003 Raphaël Troncy - ISWC'2003 13
Build description schemes for the documents
• Let us watch some sports magazine– construction of a simple schema based on
StudioSequence, Report and Interview– a Report contains some FilmClips of Broadcast Live
Sports
• The schema provides the description skeleton for several sports magazine:– Téléfoot (soccer)– VéloClub (cycling)– 3 Partout (multisports)
10/21/2003 Raphaël Troncy - ISWC'2003 14
Documentschemes
Documentinstances
validusers AV Ontology
Domain-specific Ontology
statementsbase
MPEG-7 /XML Schema
OWL / RDF
documentalists
query
Transfor-mation
General architecture
10/21/2003 Raphaël Troncy - ISWC'2003 15
SegmenTool [French project CHAPERON]
10/21/2003 Raphaël Troncy - ISWC'2003 16
Instantiate a document content model
<ina:Report id="aa23c647c-6517-4aee-8bce-870ae52a01af">
...
<mp7:TemporalDecomposition>
<ina:Interview id="adb23ab65-f8e7-4b2a-8c98-807197da600a">
<mp7:Semantic>...</mp7:Semantic>
<mp7:MediaTime>
<mp7:MediaTimePoint>T00:24:19</mp7:MediaTimePoint>
<mp7:MediaDuration>PT00H00M07S</mp7:MediaDuration>
</mp7:MediaTime>
<ina:Themes value="Cycling"/>
</ina:Interview>
</mp7:TemporalDecomposition>
...
</ina:Report> KBRDF triples
Interview
Cycling24m19s 7s
hasDurationhasThemes
hasStartTime
10/21/2003 Raphaël Troncy - ISWC'2003 17
Documentschemes
Documentinstances
validusers AV Ontology
Domain-specific Ontology
statementsbase
MPEG-7 /XML Schema
OWL / RDF
documentalists
query
Transfor-mation
General architecture
10/21/2003 Raphaël Troncy - ISWC'2003 18
The Cycling Ontology
10/21/2003 Raphaël Troncy - ISWC'2003 19
Knowledge base population
Cycling Domain
texttext
text
+Base of
facts
<rdf about="{URI}/SportsMagazine/Report3/Interview4">
<!– formal statements from a base of fact} -->
</rdf>
Rider
SeveralStagesRace
2
Sandy Casar
Paris-Nice
hasNameoverallResults
positioncyclingRace
hasName
<rdf:Description rdf:about="http://../Stade2-17_03_2002.xml#ina:Interview[@id=interview4]"> .....</rdf:Description>
10/21/2003 Raphaël Troncy - ISWC'2003 20
Implementation of the KB
• Sesame : architecture for the storage of RDF triples [Broekstra, 2002]– Supports different query languages: RQL, RDQL and SeRQL– Implements the RDFS semantics (RDF-MT engine)
• BOR : reasoner for the DAML+OIL language [Simov & Jordanov, 2002]
• SeBOR : integration of the two systems, done in the On-To-Knowledge EU-IST Project– Enhanced inference services are provided– Closed to what OWL DL reasoner will perform
10/21/2003 Raphaël Troncy - ISWC'2003 21
Sesame+BOR interface
Demo
10/21/2003 Raphaël Troncy - ISWC'2003 22
Conclusion• General architecture for reasoning on descriptions
of video documents:– Modeling of 2 ontologies (methodology + DOE)
– Formalization of these ontologies (OilEd, OWL)
– Creation of document schemes (extended MPEG-7)
– Creation of instances of these schemas: the structure of the descriptions (SegmenTool + XSLT transformation for creating a base of RDF triples)
– Creation of a Knowledge Base of events related to cycling race and use of an adapted reasoner (Sesame + BOR, ©AIdministrator-NL & ©OntoText-BG)
10/21/2003 Raphaël Troncy - ISWC'2003 23
Future work• Development integration
– provide a simple interface for querying on both the structure and the content of the video
– watch the AV sequences corresponding to the RDF triples returned by SeBOR
• Mid-term objectives– scalability: test the system on a large base of videos annotated with
real users– use the future OWL reasoners
• Long-term objectives– use this architecture with another domain (other than cycling)– will we have to simply build another ontology ? what do we have to
adapt ?
top related