privacy in video surveilled areaskonijn/publications/2006/... · 2012-03-12 · their video...

Privacy in Video Surveilled Areas

Torsten SpindlerETH Zurich, Prof. Ludger Hovestadt

Institute of Building Technology, CAADWolfgang Pauli Str. 15

8093 Zurich, [email protected]

Daniel RothETH Zurich, Prof. Luc van Gool

Computer Vision LaboratorySternwartstrasse 7


Christoph WartmannETH Zurich, Prof. Ludger Hovestadt

Institute of Building Technology, CAADWolfgang Pauli Str. 15


Prof. Andreas SteffenUniversity of Applied Sciences Rapperswil

Institute for Internet Technologies and ApplicationsOberseestr. 10

8640 Rapperswil, [email protected]

Abstract

We present a system prototype for self-determination andprivacy enhancement in video surveilled areas by integrat-ing computer vision and cryptographic techniques into net-worked building automation systems. This paper describesresearch work that has been done within the first half of thecollaborative blue-c-II project and is conducted by an inter-disciplinary team of researchers. Persons in a video streamcontrol their visibility on a per-viewer base and can chooseto allow either the real view or an obscured image to beseen. The parts of the video stream that show a person areprotected by an AES cipher and can be sent over untrustednetworks. Experimental results are presented by the exam-ple of a meeting room scenario. The paper concludes withremarks on the usability and encountered problems.

Surveillance, Cryptography, Computer Vision, BuildingAutomation

1 Introduction

Today video surveillance systems are being deployedworldwide, covering public places, corporate buildings andprivate homes with CCTV and networked cameras. In ourwork we focus on technical solutions to preserve the privacyof individuals within buildings despite the omnipresence ofcameras. Our scenarios for which we built a system proto-type are placed in typical office buildings.

To enhance privacy in such an environment our approach

is to use techniques provided by computer vision and cryp-tography and supported by facilities provided by a buildingautomation system, to give surveilled persons the power tocontrol their video information. This approach is embeddedin a larger research project where the capabilities of conven-tional and 3D video cameras are explored [10]. Within thiscontext our sub-project is deploying results from researchin the field of computer-vision based tracking to a proto-type system. In our collaborative project the prototype isexamined from Architects and Computer Engineers’ pointsof view.

Furthermore, the actual use of our rather hidden and in-visible technology by a monitored person is also of greatimportance to us. Additional demands are ease of use and atight integration with existing controls of a networked build-ing automation system. Fig. 1(a) presents a use case dia-gram of the involved actors and the system. An observedperson can allow or deny access to the parts of a videostream that contain images of her or him. A viewer can seethe original video stream where no person is visible. Forevery person present in the video stream the viewer seeseither the original clear image, if access has been allowed(Fig. 1(b)) or an obscured image, if access has been denied(Fig. 1(c)).

1.1 Related Work

In this paper different techniques from computer vision,cryptography, distributed computing and building automa-tion are presented. All four disciplines are by themselves

ObservedPerson

Privacy System

allowsviewing

forbidsviewing

viewsViewer

1

(a) UML Use Case Diagram

(b) Clear Video (c) Obscured Video

Figure 1. System Overview

research fields with a rich tradition. As far as we know, thecombination of all four is novel. However, combination be-tween just two of them were presented in previous work.Senior et al. [20] reviewed privacy in video surveillancetechniques for future systems. Furthermore they presented aprivacy console and PrivacyCam, both implementing a sub-set of the proposed methods allowing some vision analysis,transformation and encryption of the video. However, theaccess control was fixed to a privacy console and only basedon users privileges given by the system owner, for examplea security company. Our system in contrast allows indi-vidual persons to grant or deny access to other persons orgroups with help of a variety of devices. Additionally, oursystem focuses on being used within a building and beingintegrated with a building automation system to allow theease of use for surveilled individuals. Dufaux et al. [3] pre-sented a system using computer vision detectors to scrambleand encrypt image regions into JPEG 2000 image format.The amount of image scrambling can be controlled and thesystem has been tested with different detectors such as skincolor and face detectors. However, the system does not usetracking methods and thus is not able to apply an individualprivacy status when multiple people are present in a scene.A startup company Emitall [7] now uses this technology toencrypt the videos of surveillance cameras, unless the needarises for law enforcement agencies to view the clear im-age, which then can decrypt the stream. Another startupcompany Eptascape [8] is trying to bring their privacy en-hancing technologies to the market. The first product ofthis start-up company is an MPEG-7 encoder and real-time

tracking hardware box, which can be directly attached toCCTV cameras. The proprietary tracking algorithm seemsto be base segmentation and therefore similar to our tracker.However, the example videos do not show changing light-ing conditions and the tracker seems not be able to handleocclusion, a prerequisite for tracking multiple people indi-vidually.

The outline of this paper is as follows. Section 2 givesan overview of our interdisciplinary approach. The systemarchitecture will be explained in Section 3, while the de-ployed computer vision methods are examined more thor-oughly in Section 4. Section 5 presents the user interface.Results are discussed in Section 6 and Section 7 concludesthe paper with remarks on problems encountered, includingan outlook.

2 Overview

The Privacy in Video Surveilled Areas (PiViSA) projectcombines computer vision and cryptographic technologies.It is meant to be integrated with a networked building au-tomation system. The projects environment is the building,more specifically an office building, and its scope is a room.Our goal is to enhance privacy for a person that is surveilledby networked video cameras.

With privacy we mean the ability of a person or a groupto be unobserved by others. In a conventional building theprivacy of a room is determined by the ability to see or lis-ten to what is happening in the room. Based on our informaldefinition of privacy, walls, windows and doors influencethe privacy of a room in a building. The privacy of a roomcan be controlled by opening and closing doors or curtains.When a room is observed, the observed persons can eitherbe aware of the observation or unaware. This concept canbe found in different architectural settings. The most fa-mous example is most likely Jeremy Bentham’s Panopticon[16], though it never has been built. Invented as prison itenables guards to oversee inmates without being seen. Anumber of buildings provide more limited observation sys-tems. For example, a city house by Gaudi allowed the houseowner to spy on the women room. For similar purposes thefamous Opera Garnier in Paris contains portholes towardsthe dancers preparation area. We believe that in a modernbuilding, with use of automation infrastructure and videosurveillance equipment, a complete loss of privacy can hap-pen. In this context our system is meant to be used as anelectronic successor to a curtain. While the easiest wayto achieve this result would be to stop all surveillance of aroom when privacy is wanted, our approach allows for finegrained control. Unlike a curtain, that blocks visibility andreduces audibility, our system enables an observed personto control who can see them in taken videos.

To achieve this goal, the video stream taken by a camera

is available in different versions. Similar to the approachpresented by Senior et al. [20] a public version is availablewhere privacy relevant information is removed or obscured.If an observed person grants access to an observer, an ad-ditional clear video is sent protected by a cipher to the ob-server’s display, where it is merged with the publicly avail-able obscured version. The roles of video controlled spaces,building automation and cryptography for our project areexplained before details of the system are presented.

Display

Camera

Track

Obscure

Crypt

Decrypt

Merge

Network

Figure 2. The PiViSA architecture of net-worked nodes.

2.1 Video Controlled Spaces

Over the last decade a growing number of public spacesand commercial buildings have installed surveillance cam-eras. Access to recordings of these cameras is usuallystrictly limited to security staff members and alike. How-ever, this does not necessarily prevent misuse. The ques-tion of who monitors the surveillant is still as importantas ever. Recently three guards were sentenced for abusingtheir video surveillance equipment for spying on a woman[18]. Despite the possibility of abuse, more and more videosurveillance system are being deployed as the advantagesseem to outweigh disadvantages. For example, residentsof the London neighborhood Shoreditch have access to lo-cal surveillance cameras on their home television with thehelp of a set top box [25]. While on a different scale, it isalso noteworthy that in the UK a system to monitor licenseplates of cars nationwide is being installed [12]. The pos-sibility to profile not only general traffic patterns, but alsothe use of individual cars is quite intriguing. Especially the

countrywide scope of this project is of interest. We believethat it is more than likely that such serious effort to auto-matically monitor the traffic in a whole country will lead tosimilar projects within buildings. In Variations on a ThemePark [21] the role of controlled spaces, foremost in form ofmalls, has been examined. The ability to track and identifycustomers in such a closed environment might offer newpossibilities. If an economic benefit is to be expected ora customers desire can be satisfied it will be implementedeventually.

2.2 Building Automation

We look at the issue of privacy in a video surveilledarea against the background of an intelligent building, asmart home or, probably better termed, a computer inte-grated building as first introduced by Frank Duffy [4]. Sucha building utilizes computers to improve the level of com-fort by providing personalized services and can also pro-vide information for facility management systems. Mostof current building automation systems use field bus sys-tems that can be seen as a closed and proprietary networktechnology. However, the transition to more versatile IP-network based systems is on the way. It is noteworthy thatthe older field bus systems are already retrofitted with IP-network gateways to offer their services via http and otherprotocols. New systems are often using TCP/IP networksas a core technology.

In addition to the adoption of network technologies thetasks are expanded beyond traditional automation. Next tothe control of heating, ventilation and air conditioning theintegration of services like IP based telephony, televisionand video on demand is possible. The common communi-cation and control channel for such a system is the alreadyavailable network infrastructure. At the chair for ComputerAided Architectural Design (CAAD) at the Swiss FederalInstitute of Technology research was undertaken how toadd multimedia services for an IP based building automa-tion system. The chosen system was RaumComputer [17],which uses the Java based OSGi service platform [15]. Themultimedia services were added and a generic interface forhand held devices was developed [24] [23]. With a singleinterface it is possible to control all services a room pro-vides, from light control to video services. The technical de-vices in the room, ranging from speaker, microphone, lightto TV were abstracted as services and are no longer used assingle devices. Rather the room itself becomes a containerof available services and provides a unified user interface.Within this context surveillance cameras and video con-ferencing systems were used to extend visibility betweenrooms. A personalized user interface for controlling therooms service were provided after an RFID based authenti-cation. We expect that such a building automation system

will know if a room is currently occupied and by whom.As a next step the building automation system could

provide valuable information for a computer vision basedtracking system. The accuracy of the tracking is influencedby changes in the observed environment. Some of thesechanges are known by a computer integrated building andcould be relayed as events to the tracking systems. Ex-amples for such events are opening and closing of curtainsand doors, changing illumination, giving presentations. Thetracking system could react to these changes and either ig-nore certain parts of the scene or adopt to the new environ-ment. We think that this communication could increase therobustness of a visual tracker in the mentioned situations.

2.3 Cryptography

Cryptography is used to protect information against spy-ing and tampering. In our system we define two types ofdata that need protection. The clear regions of a videostream, where a person or object is visible, are one type.The other type is the management and administrative data.For these two types we have use for two different crypto-graphic components. Primary concerns for the protection ofmanagement and administrative data are secrecy and tamperresistance. Primary concern for the protection of the videostreams is speed, because typical surveillance cameras pro-duce between a few hundred kilobit to several megabit ofvideo data per second.

For the first task asymmetric cryptographic technologiescan be used. For example the SSL/TLS protocol suite al-lows to establish a secure channel between nodes on a net-work. With use of passwords, certificates or challenge-response mechanisms nodes can authenticate and informa-tion can then be sent securely between nodes. Also, infor-mation that is necessary to authenticate and authorize mayneed to be protected. If possible, facilities provided by abuilding automation system should be used to store and ac-cess these.

For the second task a symmetric cipher in stream modecan be used [9]. Secret keys for the cipher can be trans-ported securely by the above described secure channel be-tween nodes. Important characteristics for the cipher arespeed, security and hardware availability.

3 System Architecture

With the focus on a building and rooms therein, we havechosen a distributed approach. Nodes connected by a net-work are the building blocks of our system. Based on ourexperience with adding media streaming to building auto-mation systems, we adhere to the idea that every node pro-vides a specialized service, that is loosely coupled with therest of the system. While this enhances the chance to use

parts of the systems in different contexts, there is a price topay in terms of performance.

Fig. 2 gives an overview of the different nodes involvedin sending a video stream. The outer ring shows the differ-ent nodes, with either clear or obscured images which aresent over the network. Nodes can be specialized hardware,such as cameras, or reside on a host PC alone or together.

Fig. 3 presents a typical data flow from camera to dis-play. Starting from the camera node the clear video stream

Encrypt

Camera

Obscure

Merge

Track

Decrypt

Display

Figure 3. Network paths for the video streamand tracking information

is delivered to the nodes that track, obscure and encrypt.The track node analyzes the video and is described in Sec-tion 4. The result of the tracking process is sent to nodes re-sponsible to encrypt and obscure the stream. The result sentby the track node contains information on regions whichdefine the location of objects and persons found and inter-nal identifiers for these. While the track node is follow-ing an object it will use the same identifier for a region inevery tracking result. However when a person leaves the ob-served area and enters again, the person will get a differentidentifier. The task to relate those identifiers to identities ofpersons is given to an identify node. This node uses infor-mation about identities from the configuration node and therelative or absolute positions of tracked regions as well asthe images of the video streams at these position. Once theperson is identified and their identity is related to the track-ers identifier, the encrypting node protects the clear imageby a cipher. The key to be used in this cipher is issuedby the key manager node. Here session keys are createdwhen needed. The session key is distributed by the keymanager node to participating encrypt nodes and decryptnodes. An encrypt or decrypt node participates when it isassociated with a merge node that is assigned to a displaynode, which is finally controlled by an authorized observer.

The merge node takes the clear image information sent bythe decrypt node and replaces the part of the video streamthat has been obscured by the obscure node. Lastly thedisplay node shows the final video stream. Depending onthe number of persons and objects tracked and permissionsfor these, the displayed video will contain a combination ofclear and obscured parts.

3.1 Network Protocol

In our project the network communication uses the UDPprotocol to transfer large amounts of data. The videostreams themselves contribute to most of the network traf-fic. A video stream sent by the obscuring node is named ob-scured stream and one sent by a encrypting node is namedprotected stream. Where possible we use UDP multicast toreduce the load on the network. In Fig. 3 the camera canuse multicast to reach the obscure, track and encrypt nodes.All our protocols, except the ones for video streaming, arebased on XML messages. While XML is not a lightweightapproach it is considered to be best suitable for integrationinto existing frameworks. Here is an example for the mes-sage sent by the tracking node to the encrypt and obscurenodes:<obscure><frame number="804"

timestamp="1203932276"><objectlist>

<object id="1"><box xc="446" yc="242"

w="68" h="78"/></object><object id="0"><box xc="254" yc="262"

w="132" h="172"/></object>

</objectlist></frame></obscure>

The messages refers to an internal frame count (804) anduses a timestamp within the MPEG stream of the camera forsynchronization with other components. Within this frame,two objects have been detected and their coordinates aregiven as attributes of the box tag.

The obscure node takes the information sent by thetracker and applies two filters on the stream. First, a blockfilter is applied that averages color information and removesthe texture from the image. Second, all colors are reversedby an invert filter. The later is used to make an obscured re-gion noticeable when applied to an area with the same colorand not much texture. This is mainly useful for illustrationand debugging purposes. Fig. 4 shows an image as sent outby the obscure node.

Figure 4. Multi-person tracking and obscuringin a conference room

3.2 Scalability

For better scalability we group our system in two parts:One part produces protected video streams, the other con-sumes it. The producer acts like a streaming server and theconsumer like a streaming client. The grouping of the nodesis shown in Fig. 5. Camera, tracking, encrypting and ob-scuring nodes define the producer while decrypting, merg-ing and displaying nodes define the consumer. Due to the

Consumer

ProducerCamera

Obscure

Merge

Track Encrypt

Decrypt

Display

Figure 5. Consumer and Producer

use of multicasts to send the streams, new consumers can beadded with no additional cost in network traffic, when thestreams are already present in the network segment. If a newconsumer has permission to view a protected stream sent byan encrypt node, only this will be sent in addition to existing

protected streams and the obscured stream. A producer willonly send its streams when a consumer has a demand for it.The number of producers that can send parallel streams islimited by the available network bandwidth. Also the num-ber of protected streams can change over time. Eventuallya situation may occur, where permission is available for aprotected stream, but the network cannot take the additionaltraffic. The number of nodes tasked with encrypting anddecrypting is flexible and it is possible to balance the loadbetween them. The identifier sent by the tracking node can,for example, be used for allocating a specific node to a re-gion. The tracking node is one of the computationally mostdemanding processes in our setup. Therefore, it would bedesirable to have multiple tracking nodes working on onlya subpart of the camera image. However, the effort to keepa consistent view on the surveilled scene may outweigh theperformance gained and we do not use multiple trackers inour prototype. Another reason to increase the number oftracking nodes could be the demand for very specializedtracking methods, where multiple complementary trackingmethods could improve the overall reliability.

3.3 Key Management

The key manager node controls the protected parts of thevideo stream. For every observed and identified person alist of persons with authorization to see a clear image ofthe observed is maintained. Additionally it is tasked withauthentication. Authentication can be either implementedwithin the key manager or be delegated by the key managerto an external service.

To transfer the clear video securely between the producerand consumer (Fig. 5) the AES-256 cipher in counter modeis used. More specifically, it is used in the encrypt and de-crypt nodes of all participating producers and consumers.A ciphertext C is created by Ci := Pi ⊕ Ki, with P beingthe payload and K the key [9]. For every ciphertext thatis created the key K is composed of a random key and aninitialization vector. The initialization vector is composedof a nonce1 and a counter. The secrecy of the protected in-formation is depending on the key’s secrecy. Therefore, therandom key and nonce part of the initialization has to betransfered securely between key manager and nodes taskedwith encrypting or decrypting.

We do not assume that in our distributed system all nodesshare the same knowledge on the system’s state. Thereforethe session key is changed periodically by the key manager.When either an encrypt or decrypt node fails to receive thekey change message, the decrypt node cannot decode theincoming stream and the obscured region is shown. Whenall participating encrypt and decrypt nodes fail to receivethe key change message, they will continue to work until

1Number used Once, see [9]

the counter repeats. Then the encrypt node is obliged tostop using the session key, as the same key would be usedtwice and thus possibly infringing security.

For playback of a recorded video it is necessary to storethe session keys, assuming that the obscured and encryptedstreams and regions are saved and not the original video.The key manager can use files or databases to store the keysand is responsible for protecting them. When the key man-ager is providing authentication, the keys may be protectedby a secret of the person, for example a password or privatekey.

The key manager is also responsible for providing au-thentication. Authentication is needed for assigning a per-son to a display node and for observed persons. As authenti-cation is also needed by the building automation system it isconsidered advantageous, when the key manager interfaceswith this facility. The key manager may utilize a servicelike LDAP or RADIUS to delegate authentication.

3.4 Security

As our system follows an open and distributed approachthere are several areas that need to be protected. Withinthe consumer and producer (Fig. 5) the video is sent un-protected. The channel between key manager and encryptand decrypt nodes needs protection when session keys aretransferred. The authentication process and transmission ofauthorization information are vulnerable as well. Furtherthe obscured stream may be altered.

To open a secure channel between key manager and en-crypting and decrypting node we use the SSL protocol [14].With the use of host keys we identify participating nodes.The https protocol is often supported by streaming videocameras. This protocol uses SSL to transport informationsecurely and we use it therefore in the producer, if possible.As access via https will be unicast, a multicast reflector canbe used to spread the stream. Within the consumer SSL andhost keys can be used as well. To protect against attacksfrom the person using the system, a client-side certificateor pass-phrase can be used. Communication between thesystem and the client can be done over SSL again.

Additionally there are a number of security problems thesystem does not attempt to address: First, a person withaccess permission to a clear video can record or stream thevideo without protection. Second, cameras involved mightbe accessed directly, circumventing the complete system.Moreover, control over the key management node will resultin control over all protected streams.

3.5 Prototype

A prototype was developed and deployed to gain insightin the technical feasibility of our approach. First, a reduced

producer was implemented consisting of a manual identifi-cation, camera, track and obscure node. A prototype inter-face was implemented that allowed to switch between theclear and obscured stream. A first usability test with col-leagues showed high interest for an even improved ease ofuse of the system and a wish to also manually control thesize of the obscured area. Second, the cryptographic sys-tem made of key management, configuration, encrypt anddecrypt node was put together. Also a user interface to man-ually add an obscured area to the video was implemented.Finally, the merging node was implemented and replacedthe semi-automatic switching between obscured and pro-tected streams. Simplifications were made in respect to thenumber of allowed regions to track and the absence of anautomatic identification. Authentication is provided by thebuilding automation’s bar code reader, which is used to readan employees or students identification card. A unique iden-tifier is embedded there for use with the school’s libraries.Currently RFID is evaluated if it is suited as replacementfor the bar code reader.

4 Visual Tracking

Automatic detection and tracking of objects from videocameras is a very active research topic in the computer vi-sion community. Applications such as visual surveillance,intelligent living environment and human behavior analysisrely directly on new visual tracking algorithms. Within thecontext of “privacy in video surveilled areas” a real-time vi-sual tracking method [19] will be presented, showing thecapabilities of vision based trackers.

The visual tracking presented in this section is mostclosely related to the following publications [13, 2]. Mit-tal and Davis [13] developed a multi-camera system whichalso uses Bayesian classification. However, the 3D segmen-tation approach owes its robustness to the use of multiplecameras, that stand in the way of real-time implementation.Capellades et al. [2] implemented an appearance basedtracker which applies Bayesian classification only on pre-segmented foreground pixels, while we let all the models– object models, background model and model explainingnewly appeared objects – compete for all pixels, thus main-taining a consistent probabilistic approach throughout thewhole algorithm.

4.1 Bayesian Per-Pixel Classification

The proposed method performs a per-pixel classificationto assign every pixel to one of the different objects that havebeen identified, including a background. The classificationis based on the probability that a given pixel belongs to oneof the objects given its specific color and position. The ob-ject probabilities are determined on the basis of two com-

ponents. On the one hand, the appearance of the differentobjects is learned and updated, which yields indications ofhow compatible observed pixel colors are with these mod-els. On the other hand, a motion model makes predictionsof where to expect the different objects, based on their pre-vious position. The approach is akin to similar Bayesianfiltering approaches, but has been slimmed down to strike agood balance between robustness and speed.

Fig. 6 sketches the tracking framework with its probabil-ity images. Different characteristics for every object such asits appearance and motion are incorporated in specializedmodels and updated over time. The segmentation image as-signs every pixel individually to the object with the highestmultiplied probability of its appearance and motion model.

Figure 6. Tracking framework

4.2 Tracking Models

Color Model All our appearance models use Gaussianmixtures in RGB color space. Stauffer and Grim-son [22] proposed this popular choice for model-ing scene backgrounds with time-adaptive per-pixelmixtures of Gaussians (TAPPMOGs). However, wemodified this approach to fit into our multi-model ap-proach.

Appearance Background Model In contrast to Stauffer’salgorithm which combines foreground and backgroundin one color model per pixel we use individual modelsfor whole foreground objects. Therefore we model thebackground only with a single Gaussian per pixel. Thebackground model is initialized at system startup.

Detecting new Objects Pixels with a significant differentcolor than the background model have a low probabil-ity. If it is lower than a certain threshold the pixel is

assigned to a generic ‘new object model’. If whole re-gions fall on this model, a new foreground object isdetected and it’s motion and appearance models areinitialized.

Appearance Foreground Model For the appearancemodels we use a so called ‘sliced object model’, as itdivides the person into a fixed number of horizontalslices of equal height. For each slice, color modelswith multiple Gaussians are generated representing themost important colors in that region of a person. (Fig.7(b) and 7(c))

Motion model The size and movement of each foregroundobject is predicted individually. We use a linear Kal-man filter for the 2D position in the image and aweighted average filter for the object size (Fig. 7(a)).

(a) Motion modelbounding box

(b) 7 slices (c) probability image

Figure 7. Different specialized models

5 Interface

The primary task of the user interface is to control a per-son’s appearance on a viewer’s display in our privacy en-hancing system. The observed person has to be capable toallow or deny access to their clear view for every viewer orwhole groups.

Requirements on this interface for us are

• Ease of use

• Ease of integration in other interfaces

• Deployable on fixed terminals as well as on hand helddevices

To fulfill these requirements we took a look at related ar-eas first. We found that internet based chatting applicationsdeal with similar problems: Services like ICQ or Jabber usethe concept of friends or buddies to manage a potentiallyvery high number of users very easily with the help of lists.

The design and use of our interface prototype adapts sev-eral techniques found in instant messaging clients to thespecial needs of our privacy application. The prototype of

the user interface is programmed in Flash [11] which is wellknown to work with most browsers and on many computerplatforms. This allows us to use a variety of end user de-vices like fixed workplaces and laptops, touch-screen dis-plays and very compact and mobile internet tablets like theNokia 770. It would even be possible to deploy the interfaceon a flash capable cellphone.

In our system prototype the user interface queries the keymanager for a list of persons. The protocol used for infor-mation exchange between key manager and user interfaceis XML based. Information can be relayed between the twowith a http proxy. This assures that the key manager canwork from a private network. After the interface receivedthe list of pre-registered persons, a login screen looking fa-miliar to many computer users is shown (Fig. 8).

Figure 8. Login Screen of User Interface Pro-totype

After identification with a valid password the list of po-tential viewers is displayed. In Fig. 9 the list of viewers thatwill see the obscured image are called ’user’. Those view-ers that are allowed to see the clear image are found below’superuser’. The terms ’user’ and ’superuser’ are config-urable and the ones chosen in the example give a hint on theobserved persons favorite operating system.

To give or revoke viewers access to the clear video im-ages a intuitive drag and drop mechanism is used. The pro-totype interface follows standards set by online chatting sys-tems. These are known to work for a large number of peo-ple.

6 Results

This section demonstrates several of theses features us-ing different sequences and datasets recorded with our sys-

Figure 9. Privacy Control Screen of User In-terface Prototype

tem prototype. A more detailed tracking evaluation withpublic video databases is available in [19]. Our first scenecontains two people entering a meeting room for a discus-sion. Fig. 10 shows the camera view with the two personsand a bounding-box around them showing the successfultracking by the algorithm. Due to our multi model track-ing framework the persons are detected during the wholemeeting without fading to the background.

Figure 10. Multi-person tracking in a confer-ence room

The advantages of the presented tracking method for theproposed scenario are:

• Discrete and different models for foreground objectsand the background allow the tracking of moving andmotionless objects.

• Adaptive color models for foreground and backgroundadapt individually to changing lighting conditions.

• Multiple people can be tracked individually and occlu-sions between two objects can be handled for objectswith different color-models.

• The actual implementation runs in real-time on a Linuxworkstation at QVGA (320x240) resolution.

Occlusion can be handled by the tracking framework asshown in Fig. 11. However, sufficiently discriminable colormodels of the objects are a prerequisite for correct oper-ation. During the occlusion phase, appearance and mo-tion models are no more updated. This results in frozencolor models and a fixed dimension of the objects duringthe whole occlusion phase. While partial occlusion, the ob-ject positions are taken from non-overlapping object parts.During complete occlusion the velocity of the object is as-sumed constant until some object parts reappear.

6.1 Limitations of visual tracking

Current tracking methods are not able to fully model allpossible events happening in front of a camera, nor are theyable to have an in-depth understanding of the scene dueto limited computational resources. For our segmentationbased approach all changes in the image of a reasonablesize and not related to persons are a source of detection ortracking errors. Fig. 12 shows such problems in the form ofa projection screen, where the tracker cannot distinguish be-tween real objects or projected ones. Furthermore, changesin the environment like the door which was left open in Fig.12 can lead to wrong objects if new images significantly dif-fer form the initially learned background model. The Adap-tation of the background is designed to handle slow changesin the image like varying lighting conditions.

In most security applications, it is not sufficient to usejust a single technology to provide an accurate securitystandard. Therefore, we do not propose that computer vi-sion based techniques alone can provide a foolproof systemable to provide privacy. Instead a combination of multipletechniques should be used. Information from the buildingautomation system about open doors, projector screens orlights turning on or off are possible solutions to these prob-lems.

6.2 Synchronization

A synchronization mechanism had to be deployed asthe different nodes process their information with differ-

Figure 11. Occlusion handling between twopersons. The lower image shows the per-pixel classification

Figure 12. Large change in the backgroundcan lead to wrong objects.

ent speed. Influenced by the utilized MPEG-2 video cam-eras (Axis-230 [1]) a timestamp embedded within the ini-tial camera video-stream is used. Due to the difference inframe rates between the camera which delivers 25 framesper second (fps) and the tracking node which delivers 10-15fps, a delay on the obscured video is very noticeable. Thisexcludes the use of our system for a live video conferencesetting.

Figure 13. Synchronization problem

Fig. 13 illustrates another problem found. The face ofthe person sitting to the right is not obscured. Even a sit-ting person will move slightly and can move out of the re-gion detected by the tracking node between updates of theframes. Therefore the obscure node adds a fixed margin of16 pixels to each side of the tracked region. This value hasbeen derived from observations of the running system andis adjustable.

When information arrives at a merging node it also needsto be synchronized, as the obscured stream will usually ar-rive ahead of the regions sent by decrypting nodes. Againthe same timestamp is used.

6.3 Latency

Due to the need for synchronizing and thus waiting forinformation to arrive the prototype has a high latency thatdiffers on the systems configuration between 1 and 10 sec-onds. Several nodes add to the latency. Beginning withthe camera some latency is added due to the use of MPEG-2. Next the tracking node has to analyze and understandthe scene. The merging node has to wait for the decrypt-ing node to deliver the unobscured information and it has tobuffer the video stream to send a continuous stream to thedisplaying node.

To reduce the latency within the camera an open camera

system like the Elphel 333 [5] might be used. The cam-era is open to modification and comes with a programmableFPGA. The FPGA source code that runs on the camera isavailable as Open Source [6]. The task of obscuring and en-crypting parts of the video can be done on the camera itself.The latency of the merging and displaying node could fur-ther be reduced when they would be combined into a singlenode.

6.4 Multiple identities

When persons are standing very close to each other itmight not able to separate and identify them individually.Instead the tracker node will only notice a single region andsend a single identity. Therefore, the system needs to sup-port groups of identities. In such a case the group of peo-ple will only be visible to a viewer holds the permission toview all persons in the group. However, the whole systemswill act trivial in very crowded situation, when many peoplecover the whole camera image and either the tracker com-bines them to a single big group or the massive occlusionslead to wrong tracking results. But given the scope of theproject to a room within an office building we cannot imag-ine that this happens very often.

6.5 Interface Usability

The interface prototype was presented to colleagues atthe chair for CAAD. Overall the interface was found to beintuitively usable. The comments made during this usabilitytest were:

1. the ability to control the margin to the tracked regionby the observed person

2. the ability to obscure the complete video

3. the ability to define groups of persons

4. color coding to identify persons currently viewing

5. ability to control a display in the room to view thetaken video

The first four items are already implemented or plannedto be while item 5 should be solved by closer integrationwith the interface of the building automation system.

7 Conclusion

The presented work focused on an automatic distributedsystem that provides reliable, scalable and efficient protec-tion of a persons privacy while being surveilled by video

cameras in a building. Using computer vision, crypto-graphic technologies and an integrated interface we canachieve this partially.

A prototype was successfully built on top of a buildingautomation system. We added specialized processing nodesto build a scalable and distributed setup containing nodesfor camera, tracking, key management, encryption, decryp-tion and display. An easy to use graphical user interfaceallows everybody in a building to control the system from avariety of different electronic devices.

The functionality of the presented system prototype alsoshowed todays technological limitations. What has beenachieved is a system where a person can give and revokeaccess to others. Only authorized persons can view theclear video while others see only an obscured version. Ifa viewer has the permission to see a subset off all persons ina surveilled scene, the resulting video contains mixed clearand obscured images of persons. However, the people haveto be correctly identified manually and a reliable trackingcannot always be guaranteed.

Due to the limitations in identifying and todays visualtracking methods further components will have to be added.A bar code scanner is one possible solution for the identi-fication problem, as it is not feasible to robustly identifypeople form CCTV cameras automatically. Even with a barcode on an ID card, the observed person has still to be will-ing to provide its identity truthfully. It is very possible that aperson might be tempted to take on a false identity. Deploy-ment of an authentication method that makes this harderwill affect the ease of use and acceptance of the system. Forexample entering a pass phrase or identifying by biometricmethods (iris scan, fingerprint) is thought to be an awkwardprocedure in an everyday situation in an office.

Even when the above mentioned limitations are solvedsome problems are permanent. What happens to video in-formation that can be viewed by an authorized person whodistributes the video further? Currently there is no wayto control a digital videos distribution when the system isopen. Only with very restrictive digital rights management(DRM) methods is it possible to limit a distribution. Butmany methods used in DRM are not applicable here. Mostprominently a loss in quality of the video will still be anacceptable option for abuse of the video. Also people withaccess to the personal keys will be able to see all videos.

8 Acknowledgments

The authors gratefully acknowledge support by the ETHZurich project blue-c-II and Swiss SNF NCCR project IM2.

References

[1] Axis network cameras. http://www.axis.com.

[2] M. B. Capellades, D. Doermann, D. DeMenthon, andR. Chellappa. An appearance based approach for humanand object tracking. In ICIP, 2003.

[3] F. Dufaux and T. Ebrahimi. Smart video surveillance systempreserving privacy. In Image and Video Communicationsand Processing, 2005.

[4] F. Duffy. The intelligent building in europe. In DEGW Lon-don, 1991.

[5] Elphel network cameras. http://www.elphel.com.[6] Elphel open source. http://sourceforge.net/projects/elphel.[7] EMITALL Surveillance SA. http://www.emitall.com.[8] Eptascape Inc. http://www.eptascape.com.[9] N. Ferguson and B. Schneier. Practical Cryptography. Wi-

ley, 2003.[10] BlueC II Project. http://blue-c-ii.ethz.ch.[11] Adobe (former Macromedia) FLASH.

http://www.macromedia.com.[12] Britain will be first country to monitor every car journey.

news.independent.co.uk/uk/transport/article334686.ece.[13] A. Mittal and L. S. Davis. M2tracker: A multi-view ap-

proach to segmenting and tracking people in a clutteredscene using region-based stereo. May 2002.

[14] The openssl library implements ssl and tls protocols.http://www.openssl.org.

[15] OSGi Service Platform. http://www.osgi.org.[16] Jeremy Bentham’s Panopticon.

http://en.wikipedia.org/wiki/Panopticon.[17] RaumComputer building automation system.

http://www.raumcomputer.de.[18] The Register: CCTV Peeping Toms jailed.

www.theregister.co.uk/2006/01/13/cctv men jailed/.[19] D. Roth, P. Doubek, and L. Van Gool. Bayesian pixel clas-

sification for human tracking. In MOTION 2005, January2005.

[20] A. Senior, S. Pankanti, A. Hampapur, L. Brown, Y.-L. Tian,A. Ekin, J. Connell, C. F. Shu, and M. Lu. Enabling videoprivacy through computer vision. IEEE Security and Pri-vacy, 03(3):50–57, 2005.

[21] M. Sorkin. Variations on a Theme Park : The New AmericanCity and the End of Public Space. Hill and Wang, 1992.

[22] C. Stauffer and W. Grimson. Adaptive background mixturemodels for real-time tracking. In CVPR, 1999.

[23] K. Strehlke. Merging building automation systems andmultimedia systems - a framework for computer integratedbuidlings. In Innovation in Architecture, Engineering andConstruction, 2005.

[24] K. Strehlke, M. Ochsendorf, and U. Bahr. Generative inter-faces and scenarios for intelligent architecture - a frameworkfor computer integrated buildings. In Generative Art Con-ference, 2004.

[25] S. Swinford. The Sunday Times - Asbo TV helps residentswatch out. http://www.timesonline.co.uk/article/0,,2087-1974974,00.html, January 08 2006.

privacy in video surveilled areaskonijn/publications/2006/... · 2012-03-12 · their video...

Documents