*term paper.doc.doc

22
§ م س ب ه ل ل ا ن م ح ر ل ا م ي ح ر ل ا§ Al-Quds University College of Science & Technology Department of Computer Database 2 Mining Multi Media Data Drear Numan Bader ID: 9910540 Instructor:-Dr:-Bade Al-Sartawi

Upload: tommy96

Post on 08-May-2015

727 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: *TERM PAPER.doc.doc

§الرحيم الرحمن الله بسم§

Al-Quds University College of Science & Technology

Department of Computer

Database 2Mining Multi Media Data

Drear Numan BaderID: 9910540

Instructor:-Dr:-Bade Al-Sartawi

2023أبريل, 11الثالثاء,

Page 2: *TERM PAPER.doc.doc

Mining Multi Media Data

Abstract :

What is mining multi media data ? Is multimedia data mining just a new combination of buzz-words or is it a new interdisciplinary field which not only incorporates methods and techniques from the relevant disciplines, but is also capable to produce new methodologies and influence related interdisciplinary fields. Rather than making an overview of existing methods and techniques, or presenting a particular technique, this paper aims to present some facets of multi media data mining in the context of its potential to influence some relatively new interdisciplinary domains.

Data Mining, also known as Knowledge Discovery from Multimedia, is not Information Retrieval from multimedia, contrary to popular belief, but the extraction of interesting patterns and new facts from multimedia objects. Knowledge discovery in multimedia databases is focused on the synergy between two fields: Knowledge Discovery and Multimedia Databases . Knowledge discovery and data mining, which consist in extracting valuable and relevant knowledge from large volumes of data, have received much attention these last years. The approaches used for knowledge discovery are non-trivial and often domain specific, depending on the canonical mining primitives. The data patterns discovered are typically used in decision-making whether in business, in scientific research or other.

While significant research has been done on knowledge discovery from large corpora, most of the approaches are related to numerical transactional data such as market-basket analysis, web activities, etc….. , thus very little has been achieved on mining multimedia data probably due to the complexity of multimedia and multimedia repositories. The aim of this workshop is to bring together experts in digital media content analysis, state- of-art data mining and knowledge discovery in multimedia database systems, knowledge engineers and domain experts from diverse applied disciplines with potential in multimedia data mining. There is a massive increase of information available on electronic networks. This profusion of resources on the World- Wide Web gave rise to considerable interest in the research community. Traditional information retrieval techniques have been applied to the document collection on the Internet, and a panoply of search engines and tools have been proposed and implemented. However, the electiveness of these tools is not satisfactory. None of them is capable of discovering knowledge from the Internet. The Web is still an alarming rate.

In a recent report on the future of database research known as the Sylmar Report, it has been predicted that in ten years from now , the majority of human information will be available on the World-Wide Web, and it has been observed that the database research community has contributed little to the Web thus far. In this work we propose a structure, called a Virtual Web View, on top of the existing Web. Through this virtual view.Data mining and information retrieval from the Web by drilling through the visualized results.

The preliminary experiments demonstrate that multi media data mining may lead to interesting and fruitful knowledge discoveries in multimedia databases. Multimedia and digital media, and data mining are perhaps among the top ten most overused terms in the last decade. The field of multimedia and digital media is at the intersection of several major fields, including computing, telecommunications, desktop publishing, digital arts, the television/movie/game/broadcasting industry, audio-video electronics. The advent of Internet and low-cost digital audio/video sensors accelerated the development of distributed multimedia systems and on-line multimedia communication. The list of their application spans from distance-learning , digital libraries, and home entertainment to fine arts, fundamental and applied science and research. As a result there is some multiplicity of definitions and fluctuations in terminology[2]. In this paper digital (multi)media denotes computer-mediated and controlled integration of numeric, text, graphics and other geometry representations (CAD drawings, 3D models, virtual universes), images , animation, sound , video and any other type of information medium which can be represented, stored, processed and transmitted over the network in digital form.

Page 3: *TERM PAPER.doc.doc

This report presents a brief overview of multimedia data mining and the corresponding workshop series at ACM SIGKDD conference series on data mining and knowledge discovery. It summarizes the presentations , conclusions and directions for future work that were discussed during the 3 rd edition of the International Workshop on Multimedia Data Mining.

Keywords:Multimedia data mining, knowledge discovery, multimedia databases, image content mining, digital media, sound analysis ,and the video analysis .

Introduction

Digital multimedia differs from previous forms of combined media in that the bits that represent text, images, animations, and audio, video and other signals can be treated as data by computer programs. One facet of this diverse data in terms of underlying models and formats is that it is synchronized and integrated, hence, can be treated as integral data records. Such records can be found in a number of areas of human endeavor.

Modern medicine generates huge amounts of such digital data. For example, a medical data record may include SPECT images , DNA micro array data, ECG signals, clinical measurements, like blood pressure and cholesterol levels, and the description of the

diagnosis given by the physician interpretation of all these data. Another example is architecture design and related Architecture , Engineering and Construction (AEC) industry. An architectural design data record may include CAD files with floor plans and data for generating 3D-models, links to building component data bases and the instances of this components, stored in corresponding formats , images of the environment, text descriptions and drawings from the initial and evolved design requirements, financial and personnel data, and other data related to the other disciplines involved in the AEC industry,e.g. civil and electrical engineering .

Virtual communities (in the broad sense of this word, which includes any communities mediated by digital technologies ) are another example where generated data constitutes an integral data record. Such data may include data about member profiles, the content generated by the virtual community, and communication data in different formats, including email, chat records, SMS messages, video conferencing records.No t all multimedia data is so diverse. An example of less diverse, but larger in terms of the collected amount, is the data generated by video surveillance systems, where each integral data record roughly consists of a set of time-stamped images – the video frames. In any case, the collection of such integral data records constitutes a multimedia data set . The challenge of extracting meaningful patterns from such data sets has lead to the research and development in the area of multimedia data

mining .

This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required, if not essential, in many applications. Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for managing and searching within such collections, but the need for tools to extract hidden useful knowledge embedded within multimedia data is becoming critical for many decision-making applications. The tools needed today are tools for discovering relationships between data items or segments within images, classifying images based on their content, extracting patterns from sound, categorizing speech and music, recognizing and tracking objects in video streams. relations between different multi media components , result of the rapid progress in these fields is the number of challenges for computer systems research and development, and cross including:

*enlarged data sets with variety of formats and structures;

*variety of models for integration of media elements and components;

*demands on computational efficiency of media analysis and retrieval algorithms;

Page 4: *TERM PAPER.doc.doc

visualization metaphors.*

Multimedia (or digital media) representations comprise a collection of domain descriptions in "native" for the there variety in terms of the degree to which representation is structured and to what degree the specification of this representation complies with some formal models . The term "hypermedia" is added to stress the presence of links in the digital media under consideration. Of knowledge representation and specification.

OBJECTIVE

MDM/KDD WORKSHOP SERIES : The workshop series was initiated in early 1998, however, the first successful event was staged at KDD2000. Since the beginning of the century, the MDM/KDD workshops at the ACM SIGKDD forums (MDM/KDD2000, MDM/KDD2001, MDM/KDD2002 in conjunction with KDD2000 (in Boston), KDD2001 (in San Francisco) and KDD2003 (in Edmonton) , respectively) have been bringing together cross-disciplinary experts in analysis of digital multimedia content, multimedia databases, spatial data analysis, analysis of data in collaborative virtual environments, and knowledge engineers and domain experts from different applied disciplines related to multimedia data mining. MDD/KDD2000 revealed a variety of topics that come under the umbrella of multimedia data mining. Accepted papers were presented in three sessions :

Mining spatial multimedia data ; Mining audio data and multimedia support ; and Mining image and video data [1] .

The presentations at the workshop showed that many researchers and developers in the areas of multimedia information systems and digital media turn to data mining methods for techniques that can improve indexing and retrieval in digital media . The workshop ended with a discussion on the scope of multimedia data mining and the need to establish an annual forum. When text, maps, video, sound , and images typically fall into the realm of multimedia, research fields such as spatial data mining and text mining were already known active disciplines. There was a consensus that multimedia data mining is emerging as its own distinct area of research and development . The work in the area was expected to focus on algorithms and methods for mining

from images , sound and video streams. Workshop participants identified that there is a need for)i()ii(

A_ development and application of specific methods, techniques and tools for multimedia data mining; a

)iii(B_ frameworks that provide consistent methodology for multimedia data analysis and integration of discovered knowledge back system.

SIGKDD Explorations .Volume 4, Issue 2 – page 119can be utilized. These conclusions and to some extent decisions influenced the selection of the papers for presentation at MDM/KDD2001, grouped in the following streams: Frameworks for multimedia mining (2 papers); Multimedia mining for information retrieval (4 papers); and Applications of multimedia mining (6 papers) [2]. The workshop discussion revised the scope of multimedia data mining outlined during the previous workshop, clearly identifying the need to approach multimedia data as a “single unit” rather than ignoring some layers in favor of others. The participants acknowledged the high potential of multimedia data mining methods in medical domains, design and creative industries. There was an agreement that the research and development in multimedia mining should be extended in the area of collaborative virtual environments, 3D virtual reality systems, musical.

THIS YEAR WORKSHOP : The 3 rd International Workshop on Multimedia Data mining attempted to address the above-mentioned issues looking at specific issues in pattern extraction from image data, sound, and video; suitable multimedia representations and formats that can assist multimedia data mining ; and advanced architectures of multimedia data mining systems. Papers , selected for presentation Domain and e-business technologies. provided an interesting coverage of these issues and some technical solutions. They were grouped in the following three streams:

Page 5: *TERM PAPER.doc.doc

*Frameworks for Multimedia Data Mining.

* Multimedia Data Mining Methods and Algorithms ;

*Applications of Multimedia Data Mining including applications in medical image analysis and applications based on multimedia processing. This mini-symposium will try to address this and related issues concerning large multimedia databases. There will be four focused presentation by experts in content based retrieval of images, video and speech, followed by a panel discussion on the multimedia .The hypermedia representations comprise a collection of case descriptions represented as text in free or table format and other multimedia data, such as CAD drawings, images, video, sound, etc. Another characteristic of hypermedia is the use of links , where the links can connect information within a case , between different cases , or links to data that lies outside the case library. Usually , the representation of the cases in the library is organized according to some conceptual model of the domain .

General Area Of Research

In Multimedia Data Mining Framework for Raw Video Sequences by (Department of Computer Science and Engineering, University of Texas at Arlington , USA ) presented a general framework for real time video data mining from “raw videos” (e.g. traffic videos, surveillance videos). In proposed technique the first step for mining raw video data was grouping the input frames into a set of basic units – segments , that became the organizational blocks of the video database for video data mining . Then , based on some features ( motion , object , colors , etc.) extracted from each segment , the segments could be clustered into similar groups for detecting interesting patterns. The focus within the presented framework was on the motion as a feature, and how to compute and represent it for further processing . The multi-level hierarchical segment clustering procedure used category and motion . Preliminary experimental results were promising. Readers may see also the framework for multimedia data mining from traffic video sequences presented by a joint research team from Florida International University at an earlier workshop [3] . In An Innovative Concept for Image Information Mining , (Remote Sensing Technology Institute – IMF , German Aerospace Center – DLR , Germany ) and Klaus ( Computer Vision Lab , ETH , Switzerland ) introduced the concept of ‘image information mining’ and discussed a system that implements this concept . The approach is based on modeling the causalities , which link the image –signal contents to the objects and structures within the interest of the users . The basic idea is to split the information

representation into four steps :

)1 (image feature extraction using a library of algorithms to obtain a quasi-complete signal description;

)2 (unsupervised grouping in a large number of clusters to be suitable for a large set of tasks ;

)3 (data reduction by parametric modeling of the clusters , and;

)4 (supervised learning of user semantics , that is the level where , Instead of being programmed , the system is trained by a set of examples.

Through these steps the links from image content to user semantics are created. Step 4 employs advanced visualization tools . At the time of the workshop the system has been prototyped for inclusion in a new generation of intelligent satellite ground

segment systems, value - adding tools in the area of geo-information, and several applications in medicine and biometrics . Proceedings 2nd Int. Workshop on Multimedia Data Mining MDM/KDD2001.

Multimedia Data Mining Methods and Algorithms The first presentation in the session , on Multimedia Data Mining Using P-trees , focused on the specific research work of the Data SURG group at North Dakota State University , USA (co- authored by A The group came to data mining from the context of evaluation of remotely sensed images for use in agricultural applications. During that research a spatial data structure was developed that provided an efficient , loss-less , data mining ready representation of the data. In time the similarity of a sequence of remotely sensed images and some categories of multimedia data became apparent . The Count Tree (P-tree) technology provided an efficient way to store and mine sets of images and related data . For most multimedia data mining applications, feature extraction converts the pertinent data to a structured

Page 6: *TERM PAPER.doc.doc

relational or tabular form , and then the tupelos or rows are analyzed . Proposed (P-tree) data structure design addressed such data mining setting .

In scale space Exploration for Mining Image Information Content Mariana Ciucu, Patrick Heas, Mihai Datcu (Remote Sensing Technology Institute – IMF, German Aerospace Center – DLR, Germany) and James C. Tilton (NASA's Goddard Space Flight Center, USA) described an application of a scale space- clustering algorithm (melting) for exploration of image information content. Clustering by melting considers the feature space as a thermos-dynamical ensemble and groups the data by minimizing the free energy, having the temperature as a scale parameter. The implementation of the algorithm has been done through a multi- tree structure. With this structure, they were able to explore the image content as an information mining function and to obtain a more compact data structure. They had maximum of information in scale space because they memorized the bifurcation points and the trajectories of the centers points in the scale space. The information encoded in the tree structure enabled the fast reconstruction and exploration of the data cluster structure and the investigation of hierarchical sequences of image classifications.

Applications of Multimedia Data Mining

Applications in Medical Image Analysis : (University of Alberta , Edmonton , Canada) presented in Mammography Classification by an Association Rule-based Classifier the continuation of their work in classification of medical (mammography) images, discussed at MDM/KDD2001 [5]. They proposed an association rule-based classifier , that was ested on a real dataset of medical images. The proposed approach included a significant (and important) pre-processing phase, a phase for mining the resulted transactional database, and a final phase to organize derived association rules in a classification model . Presented experimental results showed that the method performed well reaching over 80% in accuracy . Authors emphasized the importance of the data-cleaning phase in building an accurate data mining architecture for image classification . In An Application of Data Mining in Detection of Myocardial Ischemia Utilizing Pre- and Post-Stress Echo images Pramod K. Singh, Simeon J. Sim off ( University of Technology Sydney, Australia) and David Feng (University of Sydney , Australia) presented the initial work in the development of a data mining Approach for computer-assisted detection of myocardial ischemia. Proposed approach is based on automatic identification of endocrinal and epicedial boundaries of left ventricle (LV) from images generated from echocardiogram data . These images are of poor quality and similar to the mammography images , discussed in the previous presentation , require significant preprocessing. The overall algorithm includes LV-wall boundary identification , segmentation and further comparative analysis of wall segments in pre- and post stress echocardiograms. Applications in Content-Based Multimedia Processing :In From Data to Insight : The Community of Multimedia AgentsGang Wei, Valery A. Petrushin and Anatole V. Gershman (Accentor Technology Labs , Chicago , USA) presented a work devoted to creating an open environment for developing , testing, learning and prototyping multimedia content analysis and annotation methods. The work is part of the multimedia agents project COMMA. The environment served as a medium for researchers to contribute and share their achievements while protecting their proprietary techniques . Each method was represented as an agent that could communicate with the other agents registered in the environment using templates that were based on the Descriptors and Description Schemes in the emerging MPEG-7 standard . Such approach allowed agents developed by different organizations to operate and communicate with each other seamlessly regardless of their programming languages and internal architecture . To facilitate the construction of media analysis methods , the research team provided a development environment. This environment enabled researchers to compare the performance of different agents and combine them in more powerful and robust system prototypes. The COMMA could also serve as a learning environment for researchers and students to acquire and test cutting edge multimedia analysis algorithms , and sharing media agents. Some background to this work can be found in Valerie's previous work on the PERSEUS .

In A Conte t Based Video Description Scheme and Video Database Navigator Sadiye Guler and Ian Pushee (Northrop Grumman Information Technology , USA ) , introduced a unified framework for a comprehensive video description scheme and presented the “database navigator” – a browsing and manipulation tool for video data mining. The proposed description scheme was based on the structure and the semantics of the video, incorporating scene, camera, and object and behavior information pertaining to a large class of video data . The Database Navigator was designed to exploit both the hierarchical structure of video data , the clips , shots and objects , as well as the semantic structure , such as scene geometry the object be saviors . The navigator provided means for visual data mining of multimedia data:

Page 7: *TERM PAPER.doc.doc

Intuitive presentation , interactive manipulation , ability to visualize the information and data from a number of perspectives , and to annotate and correlate the data in the video database. The authors demonstrated a working prototype.

In Subjective Interpretation of Complex Data : Requirements for Supporting Kansei Mining Process Nadia Bianchi-Berthouze and Tomofumi Hayashi (University of Aizu , Aizu Wakamatsu , Japan ) presented the continuation of Nadia’s work on modeling of visual impression from point of view of multimedia data mining , presented at MDM/KDD2001 [7]. The new work described a data warehouse for the mining process of multimedia information, where a unique characteristic of the data warehouse was its ability to store multiple hierarchical descriptions of the multimedia data. Such characteristic is necessary to allow mining multimedia data , not only at different levels of abstraction , but also according to multiple interpretation of the content . Proposed framework could be generalized to support the analysis of any type of complex data that relate to subjective cognitive processes, whose interpretation Would be greatly variable . In User Concept Pattern Discovery Using Relevance Feedback and Multiple Instance Learning for Content-Based Image Retrieval Xin Huang , Shu-Ching Chen (Florida International University, USA) , Mei -Ling Shyu (University of Miami , USA) and Chengcui Zhang ( Florida International University , USA) proposed a multimedia data mining framework that incorporated multiple instance learning into the user relevance feedback to discover users concept patterns , especially to find where the users most interested region was allocated and how to map the local feature vector of that region to the high - level concept pattern of users . This underlying mapping could be progressively discovered through proposed feedback and learning procedure . The role of the user in the retrieval system was in guiding the mining process according to user’s focus of attention.

Computer Vision : Multimedia Data Mining

Image Similarity

GOAL

In this project , we are interested in the question of finding pairs of near - duplicate images in a massive image database . We have two primary concerns : robustly determining when two images are near - duplicates of each other ( under a suitable definition of "near-duplicate" ) and determining very quickly which images in a database are near-duplicates of a query image . 

We say that two images are near-duplicates if one is a transformed version of the other . Transformations can be geometric like rotation, scaling, cropping, or translation or "image-processing" changes like compression and filtering. Note that this problem is a restricted version of the general image retrieval problem .

MOTIVATION

The massive image datasets collected by web crawlers contain large number of duplicate and near-duplicate images. The same image may have been stored at different resolutions or in different formats . It will be very useful to index these datasets and store either only one copy of the duplicate images or add information about the existence of duplicates to the database . When the crawler downloads a new image , It can use the index to decide whether to store the image or not . An image comparison/retrieval scheme like the one we are developing will be very useful in such a situation .

Another scenario where our system can be applied is copyright protection . Watermarking is a commonly used tool to enforce copyright protection . Watermarks typically add very slight changes to images . When a user wants to determine whether a particular image is owned by him , he can use our system to quickly find the image in his catalogue ( which may contain a very large number of images ) that are nearly similar to the suspect image and then apply a sophisticated ( and possibly time-consuming ) watermark-detection algorithm. 

Page 8: *TERM PAPER.doc.doc

PREVIOUS WORK

All previous techniques for searching multimedia databases for similar images operate in two stages . They first formulate a method for comparing two images for similarity . In the second stage , they use the comparison algorithm to detect which images in a database are similar to the query image.

There are many previous techniques for measuring the similarity between two images . Most of these methods fall into two classes . One class of techniques computes interesting and useful statistics ( such as the histogram or a filter response ) for the entire image and then determines the distance between the images as a function of the computed statistics using well-known metrics such as Euclidean and Manhattan distance. The other class of techniques breaks up the image into regions of various sizes at different scales and computes statistics for each region. These methods then combine the different pieces of statistics and compute the distance between the combinations. Alternatively , they treat the statistics as a set or bag and compare the images based on the overlap between the two sets.

Most methods for detecting images in the database that are similar to a query image check every image in the database for similarity , a process that is very slow for very large data sets . While techniques do exist that speed up the search by indexing the database , they fail to restrict the search to a small portion of the database when applied to our problem . The reason why the entire database or most of the database is searched for many queries is related to the manner in which an image is represented in terms of its statistics . The standard form Is to combine the statistics into a vector in a high –dimensional space (tens or hundreds of dimensions ) and to define the “distance” between the images to be an appropriate measure of distance between the corresponding vectors. The typical measures used for measuring distance are the Euclidean and Manhattan metrics. Two images are now said to be similar if the distance between the corresponding vectors is small. Searching for points near a query point in these high-dimensional spaces is an extremely challenging problem. The performance of all known and popular data structures for this problem degrades considerably for high dimensions . For instance , it is well known that the K-d tree and its variants perform poorly in dimensions higher than twenty . In such high dimensions , nearly the entire data structure is searched when it is queried for a near or nearest neighbor . This problem is compounded when the distance measure used is not a metric (e.g., the K-L divergence) . Standard similarity metrics cannot take full advantage of recent advances in multidimensional near-neighbor searching such as (LSH) because they do not satisfy the properties required by LSH.

OUR STRATEGY

Our approach has two major strengths . First , we have developed a novel measure of image similarity that is robust to the common transformations performed on images, e.g., rotation, translation, scaling, cropping, filtering, compression , etc . Second , our measure of similarity has the highly desirable property that it satisfies the conditions required by the LSH framework for near-neighbor searching . The LSH algorithm has the crucial ( mathematically-provable ) property that it finds a near-neighbor by examining only a very small fraction of the database . Thus , it performs extremely efficiently for databases with millions of images and even when the entire database does not fit into the memory of the computer . In fact , when the database does not fit into memory , the LSH algorithm finds near-similar images with a very small number of disk accesses.

Page 9: *TERM PAPER.doc.doc

Smart Kiosk ProjectHuman-Computer Interaction Systems Group

Kiosks are free - standing boxes with a display , an interactive user –Interface , and possibly a network connection. They are designed to provide information, sell products , and entertain . Kiosks are typically located in high - traffic areas like museums and stores . Existing kiosk applications include information centers at theme parks like Disney's Epcot center , music CD preview stations , airplane /movie /concert ticket dispensers , and custom greeting card machines . Current kiosk are limited in their interaction with people to responding to touch screen or keyboard input.

The Smart Kiosk project was initiated at HP to explore how advances in computer technology could be applied to improve public kiosks . The goal of this project is to develop a "Smart Kiosk" that interacts with people in a natural, intuitive fashion. The ideal Smart Kiosk will recognize people and track people in its vicinity, communicate with them visually and audibly , and interact in a friendly , natural manner.

To test kiosk technologies and explore user requirements , we have been building a series of kiosk prototypes . Each prototype addresses a real-world problem and provides an experimental test vehicle for exploring core kiosk technologies . The first prototype placed outside of our laboratory was installed in the Cyber smith Café in Cambridge, Massachusetts in September 1997.

SPEECH RECOGNITION BASED MULTIMEDIA INDEXING

Is the first Internet search site for Indexing streaming spoken audio on the web . Unlike previous attempts to index spoken audio on the Web which have relied on either adjacent text, metadata, or hand supplied transcripts and close captions , Speech uses automatic speech recognition technology to transcribe and index documents that do not have transcripts or other content information . The use of speech recognition permits the efficient and cost - effective indexing of thousands of hours of audio content , which were previously inaccessible . Because of this indexing , Speech allows users to quickly search for relevant content in long audio documents and yields a high precision on first page- retrieved items.

Speech indexes streaming media files based on their content, much as conventional search sites index ordinary Web pages by their text content . Like conventional search sites , Speech does not store or serve the multimedia files themselves, but rather provides users with links . The index is continually updated using SpeechBot’s highly scalable architecture.

The Speech site was designed as a demonstration vehicle for HP Corporate Research’s work on multimedia indexing for the Internet . This technology includes workflow systems for crawling, downloading, Transco ding, and indexing multimedia files , several technologies ( including speech recognition ) for analyzing and annotating multimedia content , and search site technology to handle user queries and serve appropriate Web pages .

Video and Image Processing Group

The Video and Imaging Processing Group at Cambridge Research Lab is developing core enabling technologies to allow motion video, still image, and audio data to be accommodated in HP's computer systems and information appliances.

Page 10: *TERM PAPER.doc.doc

The central project is Media to, an exploratory information appliance where a goal is to tie together research efforts in a number of human-computer interaction areas in the lab.  In particular, efforts from the speech and vision Groups.

Facial Animation

As humans we are wet-wired to respond to faces. Cells in the brain activate when presented with images of faces and respond directly to facial orientation and pose. Understanding human response to faces has been the subject of much investigation by the scientific community over many years. For example Charles Darwinian dealt precisely with these issues.

Tapping into human responses to faces promises to re-shape the human computer experience. The ability to present talking faces provides a unique opportunity for Multimedia providers to create animated characters to deliver messages, guides to provide information about a site, or bring Email and Chat avatars to life. These opportunities have resulted in a tremendous surge of interest and growth over the past five years.

A Grand Challenge for Facial Synthesis

As the fidelity of facial synthesis improves, it becomes increasingly hard to deceive the observer. This is because we are all experts at interpreting and understanding subtle motions and expressions displayed on the face. Consequently, when we are presented with a synthetic face of an individual, the illusion of reality is soon lost as we can quickly determine that it is a fake.

It is clear that at some point in the future we will be able to synthesize characters that can be masquerade as real people, but how do we know how close we are to reality?  Is it possible to take a single snapshot of an individual, record some voice samples and then re-animate the individual in a new sequence to be indistinguishable from the real person? Essentially this is a "Turing Test for Facial Animation". We are not there yet and it remains a grand challenge for computer synthesis. 

The Turing Test for Facial Animation

Alan Turing first described in his 1950 article Computing machinery and intelligence (Mind, Vol. 59, No. 236, pp. 433-460) a test for machine intelligence as follows:"The interrogator is connected to one person and one machine via a terminal, therefore can't see her counterparts. Her task is to find out which of the two candidates is the machine, and which is human only by asking them questions. If the interrogator cannot make a decision within a certain time (Turing proposed five minutes, but the exact amount of time is generally considered irrelevant), the machine is intelligent".

The visual equivalent to this test would be approximately the same test with a visual surrogate and could be done in two parts, the first a passive observation with no interaction, and the second more challenging test with interrogator interaction.

Face Works

At CRL have been working on real-time facial animation. This has resulted in a Multimedia tool face work capable of generating talking faces from a .jpeg image and real speech.

Vision-based Interaction Group

Human-computer interaction is on the brink of radical change. The catalyst for this change will be the computer's ability to sense and recognize its users, to see and reconstruct its environment, and to respond visually and audibly to these stimuli. The Vision and Graphics Group at Cambridge Research Lab researches vision-based human-computer interaction. Our interest is in building real systems for real users, for example, a smart kiosk that sees its users and interacts with them in a natural, intelligent way. Our expertise is in computer vision, 3D model building, computer graphics, facial animation, and behavior modeling.

Page 11: *TERM PAPER.doc.doc

Current Work

We are developing a Smart Kiosk that is, a kiosk that sees its users and interacts with them in a natural way. The image below shows our earliest kiosk prototype (circa 1995) which demonstrated the ability to sense a user with a camera and respond in real-time via a talking synthetic face. More information is available about these efforts.

Discovering implicit knowledge in hypermedia case bases is substantially different to data mining in databases. The disorganization units in database data mining are the data tables, in particular their columns or rows. Inside the hypermedia case base the organizational unit is the case or sub case, which merely consists of one or more multimedia pages. The pages comprise variety of data formats. Knowledge discovery then, in our use of the term, involves finding patterns in primarily unstructured data. More formally the knowledge discovery process in multimedia case representations can be viewed as machine learning where a case library replaces the training set The approach is illustrated As illustrated in Figure 8, the information model of the case library constitutes the initial basis for the multimedia data mining[10]. During the data segmentation multimedia data are divided into logical interconnected segments. For example, in hypermedia case library each segment can include one or more pages. Within particular media type a segment may have different meaning. For example, within a text a segment could be paragraph, a sentence. The actual mining and analysis procedures are expected to reveal some relations in the segments and between the segments at different levels. For instance, the text analysis can identify relations between concepts presented in the text and the CAD drawings presented on that page (or vice versa, find that the actual CAD drawings are not connected with the content of the text description). The analysis within text segments can identify word concordances that denote complex terms, not explicitly defined in the case base. Extracted patterns are incorporated and linked under the framework of the information model. As a result there can be additional attributes, change in links, revision of identified attributes, changes in some attribute values. Some paragraphs, images or other media segments could become insignificant. Consequently, the information model should be able to accommodate changes in the structure and media content. The model is expanded in Figure 9. In case-based reasoning systems the specific interest can be focused in finding patterns in the cases that can assist with indexing and adapting cases as a way of improving the retrieval of related previous experience and indication when an adaptation lies outside some reasonable constraints, based on the experience in the case base. Patterns in the form dynamic thematic paths can assist with the navigation in retrieved cases. The framework combines two consecutive complementary strategies - data- and hypothesis-driven exploration, discussed in more details in [12].The above described data mining schema can be integrated in the learning loop of the cased-based reasoning. The overall enhancedmodel of case-based reasoning with knowledge discovery back-end, as shown in Figure 10, illustrates the dynamics of case manipulation, analysis, mining, knowledge formulation and case update. The visual "symmetry" in Figure 10 reflects in some sense the mutual benefit from the amalgamation of these computing approaches. On the one hand, multimedia data mining has the potential to improve the case-based system. On the other hand, the case-based paradigm provides mechanism for incorporation and management of discovered knowledge. On the indexing side potential advantages in the extended case-based reasoning model include:

*generation of term indexing schemes, based on the words used in the text representation [11];

*generation of term indexing schemes, based on relating terms to regularities discovered in other media types;

*generation of alternative indexing schemes (for example, a graph structure indexing scheme), based on structural patterns discovered in graphics, image, audio and video media in the case Thematic path is a set of multimedia pages, each of which is part. of the case library, that are relevant to the explanation and illustration of particular concept and have to be visited inparticular sequence.

*association of multiple indexing schemes to one case library;

*dynamic generation of the above listed indexing schemes.

Page 12: *TERM PAPER.doc.doc

Multimedia Miner

Multimedia Miner is a prototype of a data mining system for mining high-level multimedia information and knowledge from large multimedia databases.

Page 13: *TERM PAPER.doc.doc

REFERENCES:-[1]Berthold, M. R., F. Sudweeks, S. Newton and R. Coyne"Clustering on the Net: Applying an Auto associative NeuralNetwork to Computer-Mediated Discussions," Journal ofComputer Mediated Communication, 2 (4),http://www.ascusc.org/jcmc/vol2/issue4/bert-hold.html

(1997).[2]Fluckiger, F. Understanding Networked Multimedia:Applications and Technology, Prentice Hall, London (1995).

[3]Ip, H. H. S. and A. W. M. Smeulders, eds.,

Page 14: *TERM PAPER.doc.doc

MultimediaInformation Analysis and Retrieval, Springer, Heidelberg,(1998).

[4]Jones, S. ed., Doing Internet Research, Sage Publications,Thousand Oaks, CA, 29-55 (1999).

[5]Knuth, D E., The art of computer programming, Vol 1:Fundamental algorithms, Addison-Wesley, Reading, MA,311-312 (1973).

[6]Maher, M. L., S. J. Simoff and A. Cicognani, "Potentials andlimitations of Virtual Design Studio," InteractiveConstruction On-line, January, a1 (1997).

[7]Maher, M. L. and S. J. Simoff, "Knowledge discovery frommultimedia case libraries", in Smith, I., ed., ArtificialIntelligence in Structural Engineering, Springer, Berlin, 197-213 (1998).

[8]Maher, M. L., S. J. Simoff and A. Cicognani, UnderstandingVirtual Design Studios, Springer, Heidelberg (2000).

[9]Sudweeks, F., M. McLaughlin and S. Rafaeli, eds, Networkand Netplay: Virtual Groups on the Internet, AAAI/MITPress, Menlo Park, CA, 191-220 (1998).

[10]Simoff, S. J. and M. L. Maher, "Ontology-based multimediadata mining for design information retrieval", Proceedings ofthe ACSE Computing Congress, Cambridge, MA, 310 - 320

Page 15: *TERM PAPER.doc.doc

(1998).

[11]Simoff, S. J. and M. L. Maher, "Deriving ontology fromdesign cases", International Journal of Design Computing,1, http://www.arch.usyd.edu.au/kcdc/journal/vol1 (1998).

[12]Simoff, S. J. and M. L. Maher, "Knowledge Discovery inHypermedia Case Libraries - A Methodological Framework,"Proceedings of the Knowledge Acquisition Workshop, 12thAustralian Joint Conference on Artificial Intelligence, AI'99,213-225 (1999).

[13]Simoff, S. J. and Maher, M. L. "Analysing Participation inCollaborative Design Environments," Design Studies, 21,119-144 (2000).

[14]Zaïane, O. R., J. Han, Z.-N. Li, S. H. Chee, S. H. and J. Y.Chiang. "Multimedia Miner: A system Prototype forMultimedia Data Mining, Proceedings of ACM SIGMODInternational Conference on Management of Data, 581 - 583(1998).

[15]Zaïane, O. R., J. Han, Z.-N. Li, J. Hou, "Mining MultimediaData" Proceedings CASCON'98: Meeting of Minds, Toronto,Canada, 83-96 (1998).