1 cs 430 / info 430 information retrieval lecture 23 non-textual materials 2
TRANSCRIPT
![Page 1: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/1.jpg)
1
CS 430 / INFO 430 Information Retrieval
Lecture 23
Non-Textual Materials 2
![Page 2: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/2.jpg)
2
Course Administration
Assignment 3
Grades and comments will be sent out tomorrow
Assignment 4 has been posted
![Page 3: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/3.jpg)
3
Automatic Creation of Surrogates for Non-textual Materials
Discovery of non-textual materials usually requires surrogates
• How far can these surrogates be created automatically?
• Automatically created surrogates are much less expensive than manually created, but have high error rates.
• If surrogates have high rates of error, is it possible to have effective information discovery?
![Page 4: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/4.jpg)
4
Example: Informedia Digital Video Library
Collections: Segments of video programs, e.g., TV and radio news and documentary broadcasts. Cable Network News, British Open University, WQED television.
Segmentation: Automatically broken into short segments of video, such as the individual items in a news broadcast.
Size: More than 4,000 hours, 2 terabyte.
Objective: Research into automatic methods for organizing and retrieving information from video.
Funding: NSF, DARPA, NASA and others.
Principal investigator: Howard Wactlar (Carnegie Mellon University).
![Page 5: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/5.jpg)
5
Informedia Digital Video Library
History
• Carnegie Mellon has broad research programs in speech recognition, image recognition, natural language processing.
• 1994. Basic mock-up demonstrated the general concept of a system using speech recognition to build an index from a sound track matched against spoken queries. (DARPA funded.)
• 1994-1998. Informedia developed the concept of multi-modal information discovery with a series of users interface experiments. (NSF/DARPA/NASA Digital Libraries Initiative.)
• 1998 - . Continued research and commercial spin-off (which failed).
![Page 6: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/6.jpg)
6
The Challenge
A video sequence is awkward for information discovery:
• Textual methods of information retrieval cannot be applied
• Browsing requires the user to view the sequence. Fast skimming is difficult.
• Computing requirements are demanding (MPEG-1 requires 1.2 Mbits/sec).
Surrogates are required
![Page 7: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/7.jpg)
7
Multi-Modal Information Discovery
The multi-modal approach to information retrieval
Computer programs to analyze video materials for clues e.g., changes of scene
• methods from artificial intelligence, e.g., speech recognition, natural language processing, image recognition.
• analysis of video track, sound track, closed captioning if present, any other information.
Each mode gives imperfect information. Therefore use many approaches and combine the evidence.
![Page 8: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/8.jpg)
8
Informedia Library Creation
Video Audio Text
Speech recognition
Image extraction
Natural language interpretation
SegmentationSegments
with derived metadata
![Page 9: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/9.jpg)
9
Informedia: Information Discovery
User
Segments with derived
metadata
Browsing via multimedia surrogates
Querying via natural
languageRequested segments
and metadata
![Page 10: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/10.jpg)
10
Text Extraction
Source
Sound track: Automatic speech recognition using Sphinx II and III recognition systems. (Unrestricted vocabulary, speaker independent, multi-lingual, background sounds). Error rates 25% up.
Closed captions: Digitally encoded text. (Not on all video. Often inaccurate.)
Text on screen: Can be extracted by image recognition and optical character recognition. (Matches speaker with name.)
Query
Spoken query: Automatic speech recognition using the same system as is used to index the sound track.
Typed by user
![Page 11: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/11.jpg)
11
Image Understanding
Informedia has developed specialized tools for various aspects of image understanding
• scene break detection
segmentation
icon selection
• image similarity matching
• camera motion and object tracking
• video-OCR (recognize text on screen)
• face detection and association
![Page 12: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/12.jpg)
12
Multimodal Metadata Extraction
![Page 13: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/13.jpg)
13
An Evaluation Experiment
Test corpus:
• 602 news stories from CNN, etc. Average length 672 words.
• Manually transcribed to obtained accurate text.
• Speech recognition of text using Sphinx II (50.7% error rate)
• Errors introduced artificially to give error rates from 0% to 80%.
• Relative precision and recall (using a vector ranking) were used as measures of retrieval performance.
As word error rate increased from 0% to 50%:
• Relative precision fell from 80% to 65%
• Relative recall fell from 90% to 80%
![Page 14: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/14.jpg)
14
Speech recognition and retrieval performance
![Page 15: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/15.jpg)
15
User Interface Concepts
Users need a variety of ways to search and browse, depending on the task being carried out and preferred style of working
• Visual icons
one-line headlinesfilm strip viewsvideo skimstranscript following of audio track
• Collages
• Semantic zooming
• Results set
• Named faces
• Skimming
![Page 16: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/16.jpg)
16
![Page 17: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/17.jpg)
17
Thumbnails, Filmstrips and Video Skims
Thumbnail:
• A single image that illustrates the content of a video
Filmstrip:
• A sequence of thumbnails that illustrate the flow of a video segment
Video skim:
• A short video that summarizes the contents of a longer sequence, by combining shorter sequences of video and sound that provide an overview of the full sequence
![Page 18: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/18.jpg)
18
Creating a Filmstrip
Separate video sequence into shots
• Use techniques from image recognition to identify dramatic changes in scene. Frames with similar color characteristics are assumed to be part of a single shot.
Choose a sample frame
• Default is to select the middle frame from the shot.
• If camera motion, select frame where motion ends.
User feedback:
• Frames are tied to time sequence.
![Page 19: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/19.jpg)
19
Creating Video Skims
Static:
• Precomputed based on video and audio phrases
• Fixed compression, e.g., one minute skim of 10 minute sequence
Dynamic:
• After a query, skim is created to emphasize context of the hit
• Variable compression selected by user
• Adjustable during playback
![Page 20: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/20.jpg)
20
Limits to Scalability
Informedia has demonstrated effective information discovery with moderately large collections
Problems with increased scale:
• Technical -- storage, bandwidth, etc.
• Diversity of content -- difficult to tune heuristics
• User interfaces -- complexity of browsing grows with scale
![Page 21: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/21.jpg)
21
Lessons Learned
• Searching and browsing must be considered integrated parts of a single information discovery process.
• Data (content and metadata), computing systems (e.g., search engines), and user interfaces must be designed together.
• Multi-modal methods compensate for incomplete or error-prone data.
![Page 22: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/22.jpg)
22
CS 430 / INFO 430 Information Retrieval
Lecture 23
Architecture of Information Retrieval Systems
![Page 23: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/23.jpg)
23
Basic Architecture 1: Single Homogeneous Collection
• Documents and indexes are held on a single computer system (may be several computers).
• The user interface and search methods are selected for the specific service.
Examples: Medline (medical information) Cornell University library catalog
Index
Documents
![Page 24: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/24.jpg)
24
Basic Architecture 2: Several Similar Collections -- One Computer System
• Several more or less similar collections are held on a single computer system.
• Each collection is indexed separately using the same software, procedures, algorithms, etc. (but tuned for each collection, e.g., stoplists).
• The user interface is the same (or very similar) for each service.
Examples: OCLC's FirstSearch
![Page 25: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/25.jpg)
25
Distributed Architecture 1: Standard Search Protocols
Find x
Find x
Strict adherence to standards allows any user interface to search any conforming search service.
![Page 26: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/26.jpg)
26
Distributed Architecture 2: Broadcast Search (a.k.a. Federated Search)
Find xInterface Service
An interface server broadcasts a query to each collection, combines the results and returns them to the user.
Examples: Dienst (digital library protocol), Web metasearch services
![Page 27: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2](https://reader033.vdocuments.us/reader033/viewer/2022051316/56649e9f5503460f94ba1892/html5/thumbnails/27.jpg)
27
Distributed Architecture 3: Centralized Search Services
Find x
Batch indexing: Metadata about all items is accumulated in a central system.
Real-time searching: The user (a) searches the central system, and (b) retrieves items from collections.
Examples: Union catalogs, Web search services
Search Service
retrieve
search