Download - JTS 2010, 3 May 2010
JTS 2010, 3 May 2010Context Sensitive Archiving of Videos on the Web
Paper authors:Thomas DrugeonValentine FreyJérôme ThièvreMatteo Treleani
Ina collections
2
Current collections
60 years of TV program and 70 years of radio program
Legal deposit since 1992
4,500,000 hours of TV and radio + 1,000,000 hours captured live from 102 TV and radio channels each year
Context sensitive archiving on the web| 2 mai 2010
Extension to the Web
Web legal deposit law (2006), shared between BnF and Ina, as an extension to their current collections
Ina is developing specialized tools and methods to collect, archive, preserve, and give access to this archived web collection
→ Preserve, promote, transmit
Web Legal Deposit
3
Archiving French audiovisual information on the web→ Focus on audiovisual contents
Context sensitive archiving on the web| 2 mai 2010
Why not only archive video and audio contents from the web?The web is not just a way to access contents, it is a media
→ Archiving websites related to French audiovisual media
Operational since February 2009as of april 2010: 6000 websites (3000 at start) 2,500,000,000 “objects”, 260 TB 10,000,000 video objects, 100 TB 19,000,000 autio objects, 100 TB
→ 260 TB compressed to only 21 TB of storage (DAFF)
Methods
4
The web is not a broadcast media:no stream to capture, no explicit path to follow
Context sensitive archiving on the web| 2 mai 2010
The web responds to interactionsWe have to discover and recreate these interactions to archive it
→ crawling
Websites grow and change in heterogeneous waysWe have to visit a page to know it was updated
→ sampling
Accessing the archive means browsing itWe have to recreate the interactions to make the archive browsable
→ simulating
Limits
5
Crawling
Sampling
Simulating
Context sensitive archiving on the web| 2 mai 2010
Some updates will be missingLinked pages are crawled at a different date from the original page
Some interactions cannot be crawled, and thus some contents will be missing or altered in the archive (pages or parts of pages)
Dead web (train reservation, google search, etc.)Some interactions are lost (crawling issues)Temporal inconsistencies between pages (sampling issues)
Web Archaeology
6
Authenticity: the document is what it pretends to be (Duranti, 2001)Reliability: we can trust the document and its content (Bachimont, 2009)
Non-Integrity of web documents Integrity: the document hasn’t been altered (Lynch, 1994)
The consequence of technical problems:
How to preserve authenticity and reliability without depending on material integrity?
Reconstructing the meaning of the document through traces (a sort of archaeological practice)
DlWeb archives traces
Context sensitive archiving on the web| 2 mai 2010
7
Context influences the meaning of a video posted on the webBut not all the items of the context have the same impact on interpretation.
Example Preserving the meaning of a video posted on the web means to preserve the significant elements of the context
Meaning precedes the material form.
Web Archiving: pre-eminence of the meaning
We thus have to find the elements influencing the meaning.
Context sensitive archiving on the web| 2 mai 2010
Example: The relocation of The Eiffel Tower
8
Ina.fr posted a news programme from 1964: the Eiffel Tower was to be relocated. The video provoked a buzz on the Web.
Context sensitive archiving on the web| 2 mai 2010
9
Example: The relocation of The Eiffel Tower
A methodological approach: The commutation test (from linguistics): The substitution of an item of the expression can cause a possible modification of the meaningEx. changing a phoneme of a word (peer – beer).
How to find which elements of the context to preserve in order to safeguard the archival value of the video (its correct interpretation) ?
Context sensitive archiving on the web| 2 mai 2010
How to reconstruct the meaning in complex documents?
10
Where is the document and where the context?
Web Documents are often complex and referring to a large spectre of cultural elements.
Hypothesis
We can reconstruct the meaning through a narrativization.Narrativization can be based on the research of cluesIt’s the critical historical approach called by Ginzburg “evidential paradigm” (clues are in this case the significant elements found through the commutation test).
A Sherlock Holmes’ approach…
Context sensitive archiving on the web| 2 mai 2010
11
Example: narrativization based on clues
The Dailymotion channel of Gameblog.fr posts a news report on France 2 from the 21st of November 2004, and explains that the content was an amalgam of fake news.
It announces a collective suicide in Japan: 147 people committed suicide because of a delay in the release of a videogame (Dead or Alive).
They swallowed some sachets of silicon…
Context sensitive archiving on the web| 2 mai 2010
12
A link in a comment allows us to better understand what happened.
France 2 cited an articled which appeared in the newspaper Libération, reporting a collective suicide in Mars 2004.
The source of the article was a Blog post.
Example: narrativization based on clues
Context sensitive archiving on the web| 2 mai 2010
13
The post was satirical: it appeared on the webzine Xbox Mag to mock the excessive interest in the release of this product by videogamers.
Example: narrativization based on clues
Context sensitive archiving on the web| 2 mai 2010
14
The editors of Xbox Mag advised France 2 and Libération about the error.
The 25th of November Libération presented a rectification.
The 26th of November France 2 announces the error blaming the “Anglo-Japanese press”(their only source was Libération)
Example: narrativization based on clues
Context sensitive archiving on the web| 2 mai 2010
The complexity of a web document
The problem of the completeness of traces
To understand the facts we need no less than 3 web pages often not interrelated:-The video posted on Dailymotion-The original post on Xbox Mag-The post on Xbox Mag explaining the errors
The Web always refers to (and remediates) other medias:-The archival video of France 2 (conserved at Inathèque)-The press: Libération
15
The Intrinsic Value of a Web Document
Web Archiving is the most complete way to reconstruct these events (TV and press are not sufficient)
The example reveals:
Context sensitive archiving on the web| 2 mai 2010
How to help reconstructing the narration?
16
Give access to the researcher to all available technical and methodological information (ie archiving context)
→ clues
Context sensitive archiving on the web| 2 mai 2010
DlWeb archives traces
Develop tools to help the researcher to organise and exploit these clues
→ Methodological DlWeb workshops with audiovisual researchers, archivists and documentalists
Improve completeness