james pittman february 9, 2011 eel 6788
DESCRIPTION
MoVi : Mobile Phone based Video Highlights via Collaborative Sensing. Xuan Bao Department of ECE Duke University. Romit Roy Choudhury Department of ECE Duke University. James Pittman February 9, 2011 EEL 6788. Outline. Introduction Assumptions System Overview Challenges - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/1.jpg)
James PittmanFebruary 9, 2011EEL 6788
MoVi: Mobile Phone based Video Highlights via Collaborative Sensing
Xuan BaoDepartment of ECEDuke University
Romit Roy ChoudhuryDepartment of ECEDuke University
![Page 2: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/2.jpg)
2
OutlineIntroductionAssumptionsSystem OverviewChallengesDesign elementsEvaluationExperiments and Discussion of
ResultsLimitations and Conclusions
![Page 3: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/3.jpg)
3
IntroductionBasic Concepts
◦Replace sensor motes with mobile phones in social settings
◦Sensors in these settings will record a large amount of continuous data
◦How do you distill all of the data from a group of sensors in a social setting?
◦Can mobile phone sensors in a social setting be used to create “highlights” of the occasion?
![Page 4: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/4.jpg)
4
IntroductionBasic Concepts
◦Develop trigger concepts to know when to sense with phones
◦Derive values for sensed data to determine which sensor is recording the ‘best’ data
◦Combine system based sensor results to create Highlights of social occasions
◦Compare MoVi Highlights to human created manual Highlights
![Page 5: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/5.jpg)
5
AssumptionsTo make this system work
assumptions about the situation are required:1. People are wearing a camera 2. People are wearing a sensor
(mobile phone) These can be the same device
![Page 6: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/6.jpg)
6
System OverviewThe MoVi system has 4 parts1. Group Management
◦ Analyze data to compute social groups among phones
2. Trigger Detection◦ Scan the data for potentially
interesting events
![Page 7: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/7.jpg)
7
System OverviewThe MoVi system has 4 parts3. View Selector
◦ Pick a sensor or group with the best “view” of the event
4. Event Segmentation◦ Extract the appropriate section of
video that fully captures the event
![Page 8: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/8.jpg)
8
System OverviewClient / Server Architecture for MoVi
![Page 9: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/9.jpg)
9
ChallengesGroup Management1. Correctly partitioning the mobile
devices into groups2. Identifying “social zones” to
based on social context3. Mapping the phones into the
zones and into groups and keeping them updated
![Page 10: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/10.jpg)
10
ChallengesEvent Detection1. Recognizing which events are
socially “interesting”2. Deriving rules to classify the
events as interesting
![Page 11: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/11.jpg)
11
ChallengesView Selection1. Determining the best view from
the group of sensors that witness an event
2. Designing heuristics to eliminate poor candidates
![Page 12: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/12.jpg)
12
ChallengesEvent Segmentation1. Taking event triggers and
converting them to a logical beginning & end for a segment of video
2. Identification and learning of patterns in social events
![Page 13: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/13.jpg)
13
Design ElementsSocial Group Identification – AcousticInitial groupings are seeded by a
random phone playing a high-frequency ringtone periodically.
Using a similarity measure to score the phones overhearing the ringtone, ones closest to the transmitter are grouped.
![Page 14: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/14.jpg)
14
Design ElementsSocial Group Identification – AcousticAmbient sound is hard to classify but
easier to detect than the ringtonesAuthors classify the ambient sound
using Support Vector Machines (SVM).Mel-Frequency Cepstral Coefficients
(MFCC) are used as features in the SVM◦ This is a type of representation of the sound
spectrum that approximates the human auditory system
![Page 15: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/15.jpg)
15
Design ElementsSocial Group Identification – VisualGrouping through light intensity
◦Grouped using similarity functions similar to the ones for sound
To avoid issues with sensitivity due to orientation the classes for light were restricted to 3 types
![Page 16: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/16.jpg)
16
Design ElementsSocial Group Identification – VisualGrouping through View Similarity
◦If multiple people simultaneously look at a specific person or area they have a similar view
◦Use of a spatiogram generates a similarity measure that can be extracted even if the views are from different angles.
◦View similarity is the highest priority for grouping
![Page 17: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/17.jpg)
17
Design ElementsTrigger Detection – Specific EventsTriggers derived from human
activities◦Laughter, clapping, shouting, etc.◦Too many of these for the initial work
Decided to start with laughter◦Created laughter samples◦Created negative samples of
conversation and background noiseUsed these to derive a trigger
![Page 18: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/18.jpg)
18
Design ElementsTrigger Detection – Group BehaviorLooking for a majority of group
members to behave similarly◦Similar view◦Group rotation◦Similar acoustic ambience
![Page 19: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/19.jpg)
19
Design ElementsTrigger Detection – Group BehaviorUnusual View Similarity
◦Multiple cameras are found to be viewing the same object from different angles
![Page 20: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/20.jpg)
20
Design ElementsUnusual View Similarity
![Page 21: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/21.jpg)
21
Design ElementsTrigger Detection – Group BehaviorGroup Rotation
◦Using the built in compass function of the phones you can detect when multiple members of the group turn the same direction at the same time
◦Example: everyone turning when someone new enters, or toward a birthday cake in time to sing “happy birthday”
![Page 22: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/22.jpg)
22
Design ElementsTrigger Detection – Group BehaviorAmbience Fluctuation
◦When light or sound ambience changes above a threshold this can be a trigger
◦When this happens across multiple sensors this is considered a good trigger, especially within a short period of time
![Page 23: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/23.jpg)
23
Design ElementsTrigger Detection – Neighbor AssistanceAdding humans into the sensor
loop◦Any time a user specifically takes a
picture it is considered significant, and a signal is transmitted
◦Any other sensor in the vicinity oriented in the same compass direction will be recruited as a candidate for a highlight.
![Page 24: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/24.jpg)
24
Design ElementsView SelectionA module is required to select videos that
have a “good view”4 Heuristics are used
1. Face Count more faces = higher priority
2. Accelerometer reading ranking less movement = higher ranking
3. Light intensity “regular” light is preferred to dark or overly bright
4. Human in the Loop human triggered events will be rated highly
![Page 25: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/25.jpg)
25
Design ElementsView Selection
![Page 26: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/26.jpg)
26
Design ElementsEvent SegmentationThe last module is necessary to
identify the logical start and end of the detected events
A clap or laugh is a trigger, but the system must find the event (such as a song or speech or joke) and include that for the highlight to make sense
![Page 27: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/27.jpg)
27
Design ElementsEvent Segmentation
Example: laughter, rewind the video to try and find the beginning of the joke. Go back to sound classification to try and find transitions
![Page 28: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/28.jpg)
28
Evaluation: Experiments & ResultsMoVi system tested in 3
experiments1. Controlled setting2. Field Experiment: Thanksgiving
Party3. Field Experiment: SmartHome Tour
5 participants w/iPod Nano (for video) and Nokia N95 phones (other sensors)
Other stationary cameras
![Page 29: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/29.jpg)
29
Evaluation: Experiments & ResultsiPods can record 1.5 hours (5400
seconds), which results in a 5x5400 matrix of 1 second videos
![Page 30: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/30.jpg)
30
Evaluation: Experiments & ResultsControl experiment
![Page 31: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/31.jpg)
31
Field Experiment 1
![Page 32: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/32.jpg)
32
Field Experiment 2
![Page 33: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/33.jpg)
33
Evaluation: Experiments & ResultsUncontrolled Experiments – Evaluation
Metrics
Human Selected – Picked by users (union of those event selected by multiple humans)
Non-Relevant – Events not picked by users
![Page 34: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/34.jpg)
34
Evaluation: Experiments & Results
Overall: 0.3852 Precision, 0.3885 Recall, 0.2109 Fall-out
Overall: 0.3048 Precision, 0.4759 Recall, 0.2318 Fall-out
![Page 35: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/35.jpg)
35
Discussion of ResultsThe results were quite favorable if you
compare it as an improvement over random selection. (more than 100% improvement)
The subjective nature of human interest and the strict exact scoring nature of the metrics make the results reasonable but not ground breaking
That said, the concept appears sound as a first step toward a long term collaborative sensing project
![Page 36: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/36.jpg)
36
Limitations to be overcomeRetrieval accuracy – defining human interestUnsatisfying camera views – what if every
view of a situation is bad?Energy Consumption – constant use of the
sensors greatly shortens the overall durationPrivacy – how to handle social occasions
where not everyone signs on to the conceptGreater algorithmic sophistication – would like
to be able to handle more optionsDissimilar movement between phones and
iPods – dealing with differences in sensing based on sensor location
![Page 37: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/37.jpg)
37
ConclusionsThe MoVi system is one part of the
new concept of “social activity coverage”.
MoVi is able to sense many different types of social triggers and create a highlight of a social occasion that is at least somewhat close to a hand picked result by a human
It shows promise with future increases in sophistication
![Page 38: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/38.jpg)
38
Referenceshttp://
en.wikipedia.org/wiki/Mel-frequency_cepstrum
![Page 39: James Pittman February 9, 2011 EEL 6788](https://reader036.vdocuments.us/reader036/viewer/2022062411/56816727550346895ddbc098/html5/thumbnails/39.jpg)
39