ieee projects 2012 2013 - parallal and distributed computing

Elysium Technologies Private Limited Approved by ISO 9001:2008 and AICTE for SKP Training Singapore | Madurai | Trichy | Coimbatore | Cochin | Kollam | Chennai http://www.elysiumtechnologies.com, [email protected]

IEEE Final Year Projects 2012 |Student Projects | Parallel and Distributed Computing

Projects

IEEE FINAL YEAR PROJECTS 2012 – 2013

Parallel and Distributed Computing

Corporate Office: Madurai

227-230, Church road, Anna nagar, Madurai – 625 020.

0452 – 4390702, 4392702, +9199447933980

Email: [email protected], [email protected]

Website: www.elysiumtechnologies.com

Branch Office: Trichy

15, III Floor, SI Towers, Melapudur main road, Trichy – 620 001.

0431 – 4002234, +919790464324.

Email: [email protected], [email protected].


Branch Office: Coimbatore

577/4, DB Road, RS Puram, Opp to KFC, Coimbatore – 641 002.

+919677751577

Website: Elysiumtechnologies.com, Email: [email protected]

Branch Office: Kollam

Surya Complex, Vendor junction, Kollam – 691 010, Kerala.

0474 – 2723622, +919446505482.

Email: [email protected].


Branch Office: Cochin

4th

Floor, Anjali Complex, near south over bridge, Valanjambalam,

Cochin – 682 016, Kerala.

0484 – 6006002, +917736004002.

mailto:[email protected]








Projects

EGC

3201

EGC

3202

Email: [email protected], Website: www.elysiumtechnologies.com

PARALLEL AND DISTRIBUTED COMPUTING 2012 – 2013

In this paper, we describe an FPGA-based coprocessor architecture that performs a high-throughput branch-and-bound

search of the space of phylogenetic trees corresponding to the number of input taxa. Our coprocessor architecture is

designed to accelerate maximum-parsimony phylogeny reconstruction for gene-order and sequence data and is

amenable to both exhaustive and heuristic tree searches. Our architecture exposes coarse-grain parallelism by dividing

the search space among parallel processing elements (PEs) and each PE exposes fine-grain memory parallelism for

their lower-bound computation, the kernel computation performed by each PE. Inter-PE communication is performed

entirely on-chip. When using this coprocessor for maximum-parsimony reconstruction for gene-order data, our

coprocessor achieves a 40X improvement over software in search throughput, corresponding to a 14X end-to-end

application improvement when including all communication and systems overheads.

In this paper, we present a framework for analyzing routing performance in delay tolerant networks (DTNs). Differently

from previous work, our framework is aimed at characterizing the exact distribution of relevant performance metrics,

which is a substantial improvement over existing studies characterizing either the expected value of the metric, or an

asymptotic approximation of the actual distribution. In particular, the considered performance metrics are packet

delivery delay, and communication cost, expressed as number of copies of a packet circulating in the network at the

time of delivery. Our proposed framework is based on a characterization of the routing process as a stochastic coloring

process and can be applied to model performance of most stateless delay tolerant routing protocols, such as epidemic,

two-hops, and spray and wait. After introducing the framework, we present examples of its application to derive the

packet delivery delay and communication cost distribution of two such protocols, namely epidemic and two-hops

routing. Characterizing packet delivery delay and communication cost distribution is important to investigate

fundamental properties of delay tolerant networks. As an example, we show how packet delivery delay distribution can

be used to estimate how epidemic routing performance changes in presence of different degrees of node cooperation

within the network. More specifically, we consider fully cooperative, noncooperative, and probabilistic cooperative

scenarios, and derive nearly exact expressions of the packet delivery rate (PDR) under these scenarios based on our

proposed framework. The comparison of the obtained packet delivery rate estimation in the various cooperation

scenarios suggests that even a modest level of node cooperation (probabilistic cooperation with a low probability of

cooperation) is sufficient to achieve 2-fold performance improvement with respect to the most pessimistic scenario in

which all potential forwarders dr- - op packets.

A Cluster-on-a-Chip Architecture for High-Throughput Phylogeny Search

A Framework for Routing Performance Analysis in Delay Tolerant Networks with

Application to Non-Cooperative Networks




Projects

EGC

3203

EGC

3205

EGC

3204

In many large-scale content sharing applications, participants or nodes are connected with each other based on their

content or interests, thus forming clusters. In this paper, we model the formation of such clustered overlays as a

strategic game, where nodes determine their cluster membership with the goal of improving the recall of their queries.

We study the evolution of such overlays both theoretically and experimentally in terms of stability, optimality, load

balance, and the required overhead. We show that, in general, decisions made independently by each node using only

local information lead to overall cost-effective cluster configurations that are also dynamically adaptable to system

updates such as churn and query or content changes.

We present a novel parallel algorithm for computing the scan operations on x86 multicore processors. The existing best

known parallel scan for the same platform requires the number of processors to be a power of two. But this constraint is

removed from our proposed method. In the design of the algorithm architectural considerations for x86 multicore

processors are given so that the rate of cache misses is reduced and the cost of thread synchronization and

management is minimized. Results from tests made on a machine with dual-socket times quad-core Intel Xeon E5405

showed that the proposed solution outperformed the best known parallel reference. A novel approach to sparse matrix-

vector multiplication (SpMV) based on the proposed scan is then explained. The approach, unlike the existing ones that

make use of backward segmented operations, uses forward ones for more efficient caching. An implementation of the

proposed SpMV was tested against the SpMV in Intel's Math Kernel Library (MKL) and merits were found in the proposed

approach.

Recently, a number of wireless communication technologies are migrating toward heterogeneous overlay networks. The

integration of Mobile WiMAX and WLAN seems to be a promising approach due to their homogeneous nature and

complementary characteristics. In this paper, we investigate several important issues for the interworking of Mobile

WiMAX and WLAN networks. We address a tightly coupled interworking architecture. Further, a seamless and proactive

vertical handoff scheme is designed based on the architecture with aims to provide always the best quality of service

(QoS) for users. Both the performance of applications and network conditions are considered in the handoff process.

Moreover, we derive evaluation algorithms to estimate the conditions of both WiMAX and WLAN networks in terms of

available bandwidth and packet delay. A simulation study has demonstrated that the proposed schemes can keep

stations always being best connected.

A Novel Parallel Scan for Multicore Processors and Its Application in Sparse Matrix-

Vector Multiplication

A QoS Oriented Vertical Handoff Scheme for WiMAX/WLAN Overlay Networks

A Game-Theoretic Approach to the Formation of Clustered Overlay Networks



Projects

EGC

3206

EGC

3208

EGC

3207

A large class of Wireless Sensor Networks (WSN) applications involves a set of isolated urban areas (e.g., urban parks

or building blocks) covered by sensor nodes (SNs) monitoring environmental parameters. Mobile sinks (MSs) mounted

upon urban vehicles with fixed trajectories (e.g., buses) provide the ideal infrastructure to effectively retrieve sensory

data from such isolated WSN fields. Existing approaches involve either single-hop transfer of data from SNs that lie

within the MS's range or heavy involvement of network periphery nodes in data retrieval, processing, buffering, and

delivering tasks. These nodes run the risk of rapid energy exhaustion resulting in loss of network connectivity and

decreased network lifetime. Our proposed protocol aims at minimizing the overall network overhead and energy

expenditure associated with the multihop data retrieval process while also ensuring balanced energy consumption

among SNs and prolonged network lifetime. This is achieved through building cluster structures consisted of member

nodes that route their measured data to their assigned cluster head (CH). CHs perform data filtering upon raw data

exploiting potential spatial-temporal data redundancy and forward the filtered information to appropriate end nodes with

sufficient residual energy, located in proximity to the MS's trajectory. Simulation results confirm the effectiveness of our

approach against as well as its performance gain over alternative methods.

Out-of-order retirement of instructions has been shown to be an effective technique to increase the number of in-flight

instructions. This form of runtime scheduling can reduce pipeline stalls caused by head-of-line blocking effects in the

reorder buffer (ROB). Expanding the width of the instruction window can be highly beneficial to multiprocessors that

implement a strict memory model, especially when both loads and stores encounter long latencies due to cache misses,

and whose stalls must be overlapped with instruction execution to overcome the memory latencies. Based on the

Validation Buffer (VB) architecture (a previously proposed out-of-order retirement, checkpoint-free architecture for

single processors), this paper proposes a cost-effective, scalable, out-of-order retirement multiprocessor, capable of

enforcing sequential consistency without impacting the design of the memory hierarchy or interconnect. Our simulation

results indicate that utilizing a VB can speed up both relaxed and sequentially consistent in-order retirement in future

multiprocessor systems by between 3 and 20 percent, depending on the ROB

size.

Most standard cluster interconnect technologies are flexible with respect to network topology. This has spawned a

substantial amount of research on topology-agnostic routing algorithms, which make no assumption about the network

structure, thus providing the flexibility needed to route on irregular networks. Actually, such an irregularity should be

A Sequentially Consistent Multiprocessor Architecture for Out-of-Order Retirement of Instructions

A Rendezvous-Based Approach Enabling Energy-Efficient Sensory Data Collection with

Mobile Sinks

A Survey and Evaluation of Topology-Agnostic Deterministic Routing Algorithms



Projects

EGC

3209

EGC

3210

often interpreted as minor modifications of some regular interconnection pattern, such as those induced by faults. In

fact, topology-agnostic routing algorithms are also becoming increasingly useful for networks on chip (NoCs), where

faults may make the preferred 2D mesh topology irregular. Existing topology-agnostic routing algorithms were

developed for varying purposes, giving them different and not always comparable properties. Details are scattered

among many papers, each with distinct conditions, making comparison difficult. This paper presents a comprehensive

overview of the known topology-agnostic routing algorithms. We classify these algorithms by their most important

properties, and evaluate them consistently. This provides significant insight into the algorithms and their

appropriateness for different on- and off-chip environments.

In this work, we present a survey of the different parallel programming models and tools available today with special

consideration to their suitability for high-performance computing. Thus, we review the shared and distributed memory

approaches, as well as the current heterogeneous parallel programming model. In addition, we analyze how the

partitioned global address space (PGAS) and hybrid parallel programming models are used to combine the advantages

of shared and distributed memory systems. The work is completed by considering languages with specific parallel

support and the distributed programming paradigm. In all cases, we present characteristics, strengths, and weaknesses.

The study shows that the availability of multi-core CPUs has given new impulse to the shared memory parallel

programming approach. In addition, we find that hybrid parallel programming is the current way of harnessing the

capabilities of computer clusters with multi-core nodes. On the other hand, heterogeneous programming is found to be

an increasingly popular paradigm, as a consequence of the availability of multi-core CPUs+GPUs systems. The use of

open industry standards like OpenMP, MPI, or OpenCL, as opposed to proprietary solutions, seems to be the way to

uniformize and extend the use of parallel programming models.

High productivity is critical in harnessing the power of high-performance computing systems to solve science and

engineering problems. It is a challenge to bridge the gap between the hardware complexity and the software limitations.

Despite significant progress in programming language, compiler, and performance tools, tuning an application remains

largely a manual task, and is done mostly by experts. In this paper, we propose a systematic approach toward

automated performance analysis and tuning that we expect to improve the productivity of performance debugging

significantly. Our approach seeks to build a framework that facilitates the combination of expert knowledge, compiler

techniques, and performance research for performance diagnosis and solution discovery. With our framework, once a

diagnosis and tuning strategy has been developed, it can be stored in an open and extensible database and thus be

reused in the future. We demonstrate the effectiveness of our approach through the automated performance analysis

and tuning of two scientific applications. We show that the tuning process is highly automated, and the performance

improvement is significant.

A Survey of Parallel Programming Models and Tools in the Multi and Many-Core Era

A Systematic Approach toward Automated Performance Analysis and Tuning



Projects

EGC

3211

EGC

3212

EGC

3213

Receive side scaling (RSS) is an NIC technology that provides the benefits of parallel receive processing in

multiprocessing environments. However, RSS lacks a critical data steering mechanism that would automatically steer

incoming network data to the same core on which its application thread resides. This absence causes inefficient cache

usage if an application thread is not running on the core on which RSS has scheduled the received traffic to be

processed and results in degraded performance. To remedy the RSS limitation, Intel's Ethernet Flow Director technology

has been introduced. However, our analysis shows that Flow Director can cause significant packet reordering. Packet

reordering causes various negative impacts in high-speed networks. We propose an NIC data steering mechanism to

remedy the RSS and Flow Director limitations. This data steering mechanism is mainly targeted at TCP. We term an NIC

with such a data steering mechanism “A Transport-Friendly NIC” (A-TFN). Experimental results have proven the

effectiveness of A-TFN in accelerating TCP/IP performance.

This paper introduces the Spidergon-Donut (SD) on-chip interconnection network for interconnecting 1,000 cores in

future MPSoCs and CMPs. Unlike the Spidergon network, the SD network which extends the Spidergon network into the

second dimension, significantly reduces the network diameter, well below the popular 2D Mesh and Torus networks for

one extra node degree and roughly 25 percent more links. A detailed construction of the SD network and a method to

reshuffle the SD network's nodes for layout onto the 2D plane, and simple one-to-one and broadcast routing algorithms

for the SD network are presented. The various configurations of the SD network are analyzed and compared including

detailed area and delay studies. To interconnect a thousand cores, the paper concludes that a hybrid version of the SD

network with smaller SD instances interconnected by a crossbar is a feasible low-diameter network topology for

interconnecting the cores of a thousand core system.

Many scientific or engineering applications involve matrix operations, in which reduction of vectors is a common

operation. If the core operator of the reduction is deeply pipelined, which is usually the case, dependencies between the

input data elements cause data hazards. To tackle this problem, we propose a new reduction method with low latency

and high pipeline utilization. The performance of the proposed design is evaluated for both single data set and multiple

data set scenarios. Further, QR decomposition is used to demonstrate how the proposed method can accelerate its

execution. We implement the design on an FPGA and compare its results to other

A Two-Dimensional Low-Diameter Scalable On-Chip Network for Interconnecting Thousands of

Cores

A Transport-Friendly NIC for Multicore/Multiprocessor Systems

Accelerating Matrix Operations with Improved Deeply Pipelined Vector Reduction



Projects

EGC

3214

EGC

3215

EGC

3216

methods.

In-network data aggregation is a useful technique to reduce redundant data and to improve communication efficiency.

Traditional data aggregation schemes for wireless sensor networks usually rely on a fixed routing structure to ensure

data can be aggregated at certain sensor nodes. However, they cannot be applied in highly mobile vehicular

environments. In this paper, we propose an adaptive forwarding delay control scheme, namely Catch-Up, which

dynamically changes the forwarding speed of nearby reports so that they have a better chance to meet each other and

be aggregated together. The Catch-Up scheme is designed based on a distributed learning algorithm. Each vehicle

learns from local observations and chooses a delay based on learning results. The simulation results demonstrate that

our scheme can efficiently reduce the number of redundant reports and achieve a good trade-off between delay and

communication overhead.

String matching requires a combination of (sometimes all) the following characteristics: high and/or predictable

performance, support for large data sets and flexibility of integration and customization. This paper compares several

software-based implementations of the Aho-Corasick algorithm for high-performance systems. We focus on the

matching of unknown inputs streamed from a single source, typical of security applications and difficult to manage

since the input cannot be preprocessed to obtain locality. We consider shared-memory architectures (Niagara 2, x86

multiprocessors, and Cray XMT) and distributed-memory architectures with homogeneous (InfiniBand cluster of x86

multicores) or heterogeneous processing elements (InfiniBand cluster of x86 multicores with NVIDIA Tesla C1060

GPUs). We describe how each solution achieves the objectives of supporting large dictionaries, sustaining high

performance, and enabling customization and flexibility using various data sets.

A deadlock-free minimal routing algorithm called clue is first proposed for VCT (virtual cut-through)-switched tori. Only

two virtual channels are required. One channel is applied in the deadlock-free routing algorithm for the mesh

subnetwork based on a known base routing scheme, such as, negative-first or dimension-order routing. The other

channel is similar to an adaptive channel. This combination presents a novel fully adaptive minimal routing scheme

because the first channel does not supply routing paths for every source-destination pair. Other two algorithms named

flow controlled clue and wormhole clue are proposed. Flow controlled clue is proposed for VCT-switched tori, which is

fully adaptive minimal deadlock-free with no virtual channel. Each input port requires at least two buffers, each of which

Adaptive Forwarding Delay Control for VANET Data Aggregation

Aho-Corasick String Matching on Shared -Memory Parallel Architectures

An Efficient Adaptive Deadlock-Free Routing Algorithm for Torus Networks



Projects

EGC

3217

EGC

3218

EGC

3218

is able to keep a packet. A simple but well-designed flow control function is used in the proposed flow controlled clue

routing algorithm to avoid deadlocks. Wormhole clue is proposed for wormhole-switched tori. It is partially adaptive

because we add some constraints to the adaptive channels for deadlock avoidance. It is shown that clue and flow

controlled clue work better than the bubble flow control scheme under several popular traffic patterns in 3-dimensional

(3D) torus. In a wormhole-switched tori, the advantage of wormhole clue over Duato's protocol is also very

apparent.

Due to the heterogeneity involved in smart interconnected devices, cellular applications, and surrounding (GPS-aware)

environments there is a need to develop a realistic approach to track mobile assets. Current tracking systems are costly

and inefficient over wireless data transmission systems where cost is based on the rate of data being sent. Our aim is to

develop an efficient and improved geographical asset tracking solution and conserve valuable mobile resources by

dynamically adapting the tracking scheme by means of context-aware personalized route learning techniques. We intend

to perform this tracking by proactively monitoring the context information in a distributed, efficient, and scalable

fashion. Context profiles, which indicate the characteristics of a route based on environmental conditions, are utilized to

dynamically represent the values of the asset's properties. We designed and implemented an adaptive learning based

scheme that makes an optimized judgment of data transmission. This manuscript is complemented with theoretical and

practical evaluations that prove that significant costs can be saved and operational efficiency can be achieved.

Routing is one of the most challenging, open problems in disruption-tolerant networks (DTNs) because of the short-lived

wireless connectivity environment. To deal with this issue, researchers have investigated routing based on the

prediction of future contacts, taking advantage of nodes' mobility history. However, most of the previous work focused

on the prediction of whether two nodes would have a contact, without considering the time of the contact. This paper

proposes predict and relay (PER), an efficient routing algorithm for DTNs, where nodes determine the probability

distribution of future contact times and choose a proper next-hop in order to improve the end-to-end delivery

probability. The algorithm is based on two observations: one is that nodes usually move around a set of well-visited

landmark points instead of moving randomly; the other is that node mobility behavior is semi-deterministic and could be

predicted once there is sufficient mobility history information. Specifically, our approach employs a time-homogeneous

semi-Markov process model that describes node mobility as transitions between landmarks. Then, we extend it to

handle the scenario where we consider the transition time between two landmarks. A simulation study shows that this

approach improves the delivery ratio and also reduces the delivery latency compared to traditional DTN routing

schemes.

An Efficient Approach for Mobile Asset Tracking Using Contexts

An Efficient Prediction-Based Routing in Disruption-Tolerant Networks



Projects

EGC

3220

EGC

3221

EGC

3219

Emerging applications in Multihop Wireless Networks (MHWNs) require considerable processing power which often may

be beyond the capability of individual nodes. Parallel processing provides a promising solution, which partitions a

program into multiple small tasks and executes each task concurrently on independent nodes. However, multihop

wireless communication is inevitable in such networks and it could have an adverse effect on distributed processing. In

this paper, an adaptive intelligent task mapping together with a scheduling scheme based on a genetic algorithm is

proposed to provide real-time guarantees. This solution enables efficient parallel processing in a way that only possible

node collaborations with cost-effective communications are considered. Furthermore, in order to alleviate the power

scarcity of MHWN, a hybrid fitness function is derived and embedded in the algorithm to extend the overall network

lifetime via workload balancing among the collaborative nodes, while still ensuring the arbitrary application deadlines.

Simulation results show significant performance improvement in various testing environments over existing

mechanisms.

Wireless sensor networks (WSNs) are distributed systems that have proliferated across diverse application domains

(e.g., security/defense, health care, etc.). One commonality across all WSN domains is the need to meet application

requirements (i.e., lifetime, responsiveness, etc.) through domain specific sensor node design. Techniques such as

sensor node parameter tuning enable WSN designers to specialize tunable parameters (i.e., processor voltage and

frequency, sensing frequency, etc.) to meet these application requirements. However, given WSN domain diversity,

varying environmental situations (stimuli), and sensor node complexity, sensor node parameter tuning is a very

challenging task. In this paper, we propose an automated Markov Decision Process (MDP)-based methodology to

prescribe optimal sensor node operation (selection of values for tunable parameters such as processor voltage,

processor frequency, and sensing frequency) to meet application requirements and adapt to changing environmental

stimuli. Numerical results confirm the optimality of our proposed methodology and reveal that our methodology more

closely meets application requirements compared to other feasible policies.

In this paper, we investigate the impact of transmission opportunity (TXOP), arbitration interframe space (AIFS), and

contention window on the performance of an IEEE 802.11e cluster with four traffic classes under Poisson frame arrivals.

We derive an analytical model of the cluster using queuing model of individual nodes, discrete time Markov chain, and

probabilistic modeling of the backoff process. The analytical model demonstrates the complex interaction between

Analysis of Impact of TXOP Allocation on IEEE 802.11e EDCA under Variable Network Load

An MDP-Based Dynamic Optimization Methodology for Wireless Sensor Networks

An Intelligent Task Allocation Scheme for Multihop Wireless Networks



Projects

EGC

3222

EGC

3223

TXOP, on one side, and AIFS and contention window, on the other. We derive saturation and stability points for all traffic

classes and discuss their dependency on TXOP allocations. Our results indicate that use of nonzero TXOP parameter

under Poisson frame arrivals improves performance slightly by separating points of saturation and instability. More

substantial performance improvements should be expected by deploying TXOP differentiation under bursty traffic. Since

all traffic classes need to operate in stable, nonsaturated regime, this work has important implications for the design of

congestion control and admission control schemes in IEEE 802.11e clusters.

Low-power Wireless Networks (LWNs) have become increasingly available for mission-critical applications such as

security surveillance and disaster response. In particular, emerging low-power wireless audio platforms provide an

economical solution for ad hoc voice communication in emergency scenarios. In this paper, we develop a system called

Adaptive Stream Multicast (ASM) for voice communication over multihop LWNs. ASM is composed of several novel

components specially designed to deliver robust voice quality for multiple sinks in dynamic environments: 1) an

empirical model to automatically evaluate the voice quality perceived at sinks based on current network condition; 2) a

feedback-based Forward Error Correction (FEC) scheme where the source can adapt its coding redundancy ratio

dynamically in response to the voice quality variation at sinks; 3) a Tree-based Opportunistic Routing (TOR) protocol

that fully exploits the broadcast opportunities on a tree based on novel forwarder selection and coordination rules; and

4) a distributed admission control algorithm that ensures the voice quality guarantees when admitting new voice

streams. ASM has been implemented on a low-power hardware platform and extensively evaluated through experiments

on a test bed of 18 nodes. The experiment results show that ASM can achieve satisfactory multicast voice quality in

dynamic environments while incurring low-communication overhead.

Localization of wireless sensor nodes has long been regarded as a problem that is difficult to solve, especially when

considering characteristics of real-world environments. This paper formally describes, designs, implements, and

evaluates a novel localization system called Spotlight. The system uses spatiotemporal properties of well-controlled

events in the network, light in this case, to obtain locations of sensor nodes. Performance of the system is evaluated

through deployments of Mica2 and XSM motes in an outdoor environment, where 20 cm localization error is achieved. A

sensor network consisting of any number of nodes deployed in a 2,500 m2 area can be localized in under 10 minutes.

Submeter localization error in an outdoor environment is made possible without equipping the wireless sensor nodes

with specialized ranging hardware.

Asymmetric Event-Driven Node Localization in Wireless Sensor Networks

ASM: Adaptive Voice Stream Multicast over Low-Power Wireless Networks



Projects

EGC

3226

EGC

3225

EGC

3224

To reduce the cost of infrastructure and electrical energy, enterprise datacenters consolidate workloads on the same

physical hardware. Often, these workloads comprise both transactional and long-running analytic computations. Such

consolidation brings new performance management challenges due to the intrinsically different nature of a

heterogeneous set of mixed workloads, ranging from scientific simulations to multitier transactional applications. The

fact that such different workloads have different natures imposes the need for new scheduling mechanisms to manage

collocated heterogeneous sets of applications, such as running a web application and a batch job on the same physical

server, with differentiated performance goals. In this paper, we present a technique that enables existing middleware to

fairly manage mixed workloads: long running jobs and transactional applications. Our technique permits collocation of

the workload types on the same physical hardware, and leverages virtualization control mechanisms to perform online

system reconfiguration. In our experiments, including simulations as well as a prototype system built on top of state-of-

the-art commercial middleware, we demonstrate that our technique maximizes mixed workload performance while

providing service differentiation based on high-level performance goals.

This paper presents an innovative router design, called Rotary Router, which successfully addresses CMP

cost/performance constraints. The router structure is based on two independent rings, which force packets to circulate

either clockwise or counterclockwise, traveling through every port of the router. These two rings constitute a completely

decentralized arbitration scheme that enables a simple, but efficient way to connect every input port to every output

port. The proposed router is able to avoid network deadlock, livelock, and starvation without requiring data-path

modifications. The organization of the router permits the inclusion of throughput enhancement techniques without

significantly penalizing the implementation cost. In particular, the router performs adaptive routing, eliminates HOL

blocking, and carries out implicit congestion control using simple arbitration and buffering strategies. Additionally, the

proposal is capable of avoiding end-to-end deadlock at coherence protocol level with no physical or virtual resource

replication, while guaranteeing in-order packet delivery. This facilitates router management and improves storage

utilization. Using a comprehensive evaluation framework that includes full-system simulation and hardware description,

the proposal is compared with two representative router counterparts. The results obtained demonstrate the Rotary

Router's substantial performance and efficiency advantages.

Balancing the Trade-Offs between Query Delay and Data Availability in MANETs

Balancing Performance and Cost in CMP Interconnection Networks

Autonomic Placement of Mixed Batch and Transactional Workloads



Projects

EGC

3229

EGC

3227

EGC

3228

In mobile ad hoc networks (MANETs), nodes move freely and link/node failures are common, which leads to frequent

network partitions. When a network partition occurs, mobile nodes in one partition are not able to access data hosted by

nodes in other partitions, and hence significantly degrade the performance of data access. To deal with this problem, we

apply data replication techniques. Existing data replication solutions in both wired or wireless networks aim at either

reducing the query delay or improving the data availability, but not both. As both metrics are important for mobile

nodes, we propose schemes to balance the trade-offs between data availability and query delay under different system

settings and requirements. Extensive simulation results show that the proposed schemes can achieve a balance

between these two metrics and provide satisfying system performance.

Injecting false data attack is a well known serious threat to wireless sensor network, for which an adversary reports

bogus information to sink causing error decision at upper level and energy waste in en-route nodes. In this paper, we

propose a novel bandwidth-efficient cooperative authentication (BECAN) scheme for filtering injected false data. Based

on the random graph characteristics of sensor node deployment and the cooperative bit-compressed authentication

technique, the proposed BECAN scheme can save energy by early detecting and filtering the majority of injected false

data with minor extra overheads at the en-route nodes. In addition, only a very small fraction of injected false data needs

to be checked by the sink, which thus largely reduces the burden of the sink. Both theoretical and simulation results are

given to demonstrate the effectiveness of the proposed scheme in terms of high filtering probability and energy saving.

Sensor networks have their own distinguishing characteristics that set them apart from other types of networks. Several

techniques have been proposed in the literature to address some of the fundamental problems faced by a sensor

network design. Most of the proposed techniques attempt to solve one problem in isolation from the others; hence,

protocol designers have to face the same common challenges again and again. This, in turn, has a direct impact on the

complexity of the protocols and on energy consumption. Instead of using this approach, we propose BEES, a

lightweight bioinspired backbone construction protocol, that can help mitigate many of the typical challenges in sensor

networks by allowing the development of simpler network protocols. We show how BEES can help mitigate many of the

typical challenges inherent to sensor networks including sensor localization, clustering, and data aggregation among

others.

BloomCast: Efficient and Effective Full-Text Retrieval in Unstructured P2P Networks

BEES: Bio inspired backbone Selection in Wireless Sensor Networks

BECAN: A Bandwidth-Efficient Cooperative Authentication Scheme for Filtering Injected False

Data in Wireless Sensor Networks



Projects

EGC

3231

EGC

3230

Efficient and effective full-text retrieval in unstructured peer-to-peer networks remains a challenge in the research

community. First, it is difficult, if not impossible, for unstructured P2P systems to effectively locate items with

guaranteed recall. Second, existing schemes to improve search success rate often rely on replicating a large number of

item replicas across the wide area network, incurring a large amount of communication and storage costs. In this paper,

we propose BloomCast, an efficient and effective full-text retrieval scheme, in unstructured P2P networks. By leveraging

a hybrid P2P protocol, BloomCast replicates the items uniformly at random across the P2P networks, achieving a

guaranteed recall at a communication cost of O(√N), where N is the size of the network. Furthermore, by casting Bloom

Filters instead of the raw documents across the network, BloomCast significantly reduces the communication and

storage costs for replication. We demonstrate the power of BloomCast design through both mathematical proof and

comprehensive simulations based on the query logs from a major commercial search engine and NIST TREC WT10G

data collection. Results show that BloomCast achieves an average query recall of 91 percent, which outperforms the

existing WP algorithm by 18 percent, while BloomCast greatly reduces the search latency for query processing by 57

percent.

Self-stabilization is a versatile approach to fault-tolerance since it permits a distributed system to recover from any

transient fault that arbitrarily corrupts the contents of all memories in the system. Byzantine tolerance is an attractive

feature of distributed systems that permit to cope with arbitrary malicious behaviors. Combining these two properties

proved difficult: it is impossible to contain the spatial impact of Byzantine nodes in a self-stabilizing context for global

tasks such as tree orientation and tree construction. We present and illustrate a new concept of Byzantine containment

in stabilization. Our property, called Strong Stabilization enables to contain the impact of Byzantine nodes if they

actually perform too many Byzantine actions. We derive impossibility results for strong stabilization and present

strongly stabilizing protocols for tree orientation and tree construction that are optimal with respect to the number of

Byzantine nodes that can be tolerated in a self-stabilizing context.

Data collection is a fundamental function provided by wireless sensor networks. How to efficiently collect sensing data

from all sensor nodes is critical to the performance of sensor networks. In this paper, we aim to understand the

theoretical limits of data collection in a TDMA-based sensor network in terms of possible and achievable maximum

capacity. Previously, the study of data collection capacity has concentrated on large-scale random networks. However,

in most of the practical sensor applications, the sensor network is not uniformly deployed and the number of sensors

may not be as huge as in theory. Therefore, it is necessary to study the capacity of data collection in an arbitrary

network. In this paper, we first derive the upper and lower bounds for data collection capacity in arbitrary networks

Capacity of Data Collection in Arbitrary Wireless Sensor Networks

Bounding the Impact of Unbounded Attacks in Stabilization



Projects

EGC

3233

EGC

3232

under protocol interference and disk graph models. We show that a simple BFS tree-based method can lead to order-

optimal performance for any arbitrary sensor networks. We then study the capacity bounds of data collection under a

general graph model, where two nearby nodes may be unable to communicate due to barriers or path fading, and

discuss performance implications. Finally, we provide discussions on the design of data collection under a physical

interference model or a Gaussian channel model.

Over the past decades, caching has become the key technology used for bridging the performance gap across memory

hierarchies via temporal or spatial localities; in particular, the effect is prominent in disk storage systems. Applications

that involve heavy I/O activities, which are common in the cloud, probably benefit the most from caching. The use of

local volatile memory as cache might be a natural alternative, but many well-known restrictions, such as capacity and

the utilization of host machines, hinder its effective use. In addition to technical challenges, providing cache services in

clouds encounters a major practical issue (quality of service or service level agreement issue) of pricing. Currently,

(public) cloud users are limited to a small set of uniform and coarse-grained service offerings, such as High-Memory and

High-CPU in Amazon EC2. In this paper, we present the cache as a service (CaaS) model as an optional service to typical

infrastructure service offerings. Specifically, the cloud provider sets aside a large pool of memory that can be

dynamically partitioned and allocated to standard infrastructure services as disk cache. We first investigate the

feasibility of providing CaaS with the proof-of-concept elastic cache system (using dedicated remote memory servers)

built and validated on the actual system, and practical benefits of CaaS for both users and providers (i.e., performance

and profit, respectively) are thoroughly studied with a novel pricing scheme. Our CaaS model helps to leverage the

cloud economy greatly in that 1) the extra user cost for I/O performance gain is minimal if ever exists, and 2) the

provider's profit increases due to improvements in server consolidation resulting from that performance gain. Through

extensive experiments with eight resource allocation strategies, we demonstrate that our CaaS model can be a

promising cost-efficient solution for both users and providers.

With continued Moore's law scaling, multicore-based architectures are becoming the de facto design paradigm for

achieving low-cost and performance/power-efficient processing systems through effective exploitation of available

parallelism in software and hardware. A crucial subsystem within multicores is the on-chip interconnection network that

orchestrates high-bandwidth, low-latency, and low-power communication of data. Much previous work has focused on

improving the design of on-chip networks but without more fully taking into consideration the on-chip communication

behavior of application workloads that can be exploited by the network design. A significant portion of this paper

Communication-Aware Globally-Coordinated On-Chip Networks

Cashing in on the Cache in the Cloud



Projects

EGC

3235

EGC

3234

analyzes and models on-chip network traffic characteristics of representative application workloads. Leveraged by this,

the notion of globally coordinated on-chip networks is proposed in which application communication behavior-captured

by traffic profiling-is utilized in the design and configuration of on-chip networks so as to support prevailing traffic flows

well, in a globally coordinated manner. This is applied to the design of a hybrid network consisting of a mesh augmented

with configurable multidrop (bus-like) spanning channels that serve as express paths for traffic flows benefiting from

them, according to the characterized traffic profile. Evaluations reveal that network latency and energy consumption for

a 64-core system running OpenMP benchmarks can be improved on average by 15 and 27 percent, respectively, with

globally coordinated on-chip networks.

Consensus is central to several applications including collaborative ones which a wireless ad hoc network can facilitate

for mobile users in terrains with no infrastructure support for communication. We solve the consensus problem in a

sparse network in which a node can at times have no other node in its wireless range and useful end-to-end connectivity

between nodes can just be a temporary feature that emerges at arbitrary intervals of time for any given node pair.

Efficient one-to-many dissemination, essential for consensus, now becomes a challenge; enough number of

destinations cannot deliver a multicast unless nodes retain the multicast message for exercising opportunistic

forwarding. Seeking to keep storage and bandwidth costs low, we propose two protocols. An eventually relinquishing

(◇ RC) protocol that does not store messages for long is used for attempting at consensus, and an eventually quiescent

(◇ QC) one that stops forwarding messages after a while is used for concluding consensus. Use of the ◇ RC protocol

poses additional challenges for consensus, when the fraction, f/n, of nodes that can crash is 1/4 ≤ f/n <; 1/2. Consensus

latency and packet overhead are measured through simulations and both decrease considerably even for a modest

increase in network density.

Recently, utility Grids have emerged as a new model of service provisioning in heterogeneous distributed systems. In

this model, users negotiate with service providers on their required Quality of Service and on the corresponding price to

reach a Service Level Agreement. One of the most challenging problems in utility Grids is workflow scheduling, i.e., the

problem of satisfying the QoS of the users as well as minimizing the cost of workflow execution. In this paper, we

propose a new QoS-based workflow scheduling algorithm based on a novel concept called Partial Critical Paths (PCP),

that tries to minimize the cost of workflow execution while meeting a user-defined deadline. The PCP algorithm has two

phases: in the deadline distribution phase it recursively assigns subdeadlines to the tasks on the partial critical paths

ending at previously assigned tasks, and in the planning phase it assigns the cheapest service to each task while

meeting its subdeadline. The simulation results show that the performance of the PCP algorithm is very promising.

Cost-Driven Scheduling of Grid Workflows Using Partial Critical Paths

Consensus in Sparse, Mobile Ad Hoc Networks



Projects

EGC

3238

EGC

3237

EGC

3236

In duty-cycled wireless sensor networks (WSNs) for stochastic event monitoring, existing efforts are mainly

concentrated on energy-efficient scheduling of sensor nodes to guarantee the coverage performance, ignoring another

crucial issue of connectivity. The connectivity problem is extremely challenging in the duty-cycled WSNs due to the fact

that the link connections between nodes are transient thus unstable. In this paper, we propose a new kind of network,

partitioned synchronous network, to jointly address the coverage and connectivity problem. We analyze the coverage

and connectivity performances of partitioned synchronous network and compare them with those of existing

asynchronous network. We perform extensive simulations to demonstrate that the proposed partitioned synchronous

network has a better connectivity performance than that of asynchronous network, while coverage performances of two

types of networks are close.

It is well known that sensor duty-cycling is an important mechanism that helps densely deployed wireless sensor

networks (WSNs) save energy. On the other hand, geographic forwarding is an efficient scheme for WSNs as it requires

maintaining only local topology information to forward data to their destination. Most of geographic forwarding

protocols assume that all sensors are always on (or active) during forwarding. However, such an assumption is

unrealistic for real-world applications where sensors are switched on or off (or inactive). In this paper, we describe our

cover-sense-inform (CSI) framework for k-covered WSNs, where each point in a sensor field is covered by at least k

active sensors. In CSI, k-coverage, sensor scheduling, and data forwarding are jointly considered. Based on our

previous work on connected k-coverage [3], we propose the first design of geographic forwarding protocols for duty-

cycled k-covered WSNs with and without data aggregation. Then, we evaluate the performance of our joint k-coverage

and geographic forwarding protocols and compare them to CCP [37], a k-Coverage Configuration Protocol, with a

geographic forwarding protocol on top of it, such as BVGF [36], which we have slightly updated in such a way that it

considers energy for a fair comparison. Simulation results show that our joint protocols outperform CCP+BVGF.

A wireless sensor network can get separated into multiple connected components due to the failure of some of its

nodes, which is called a “cut.” In this paper, we consider the problem of detecting cuts by the remaining nodes of a

wireless sensor network. We propose an algorithm that allows 1) every node to detect when the connectivity to a

specially designated node has been lost, and 2) one or more nodes (that are connected to the special node after the cut)

to detect the occurrence of the cut. The algorithm is distributed and asynchronous: every node needs to communicate

Cut Detection in Wireless Sensor Networks

CSI: An Energy-Aware Cover-Sense-Inform Framework for k-Covered Wireless Sensor

Networks

Coverage and Connectivity in Duty-Cycled Wireless Sensor Networks for Event Monitoring



Projects

EGC

3241

EGC

3240

EGC

3239

with only those nodes that are within its communication range. The algorithm is based on the iterative computation of a

fictitious “electrical potential” of the nodes. The convergence rate of the underlying iterative scheme is independent of

the size and structure of the network. We demonstrate the effectiveness of the proposed algorithm through simulations

and a real hardware implementation.

In this paper, we propose a distributed asynchronous clock synchronization (DCS) protocol for Delay Tolerant Networks

(DTNs). Different from existing clock synchronization protocols, the proposed DCS protocol can achieve global clock

synchronization among mobile nodes within the network over asynchronous and intermittent connections with long

delays. Convergence of the clock values can be reached by compensating for clock errors using mutual relative clock

information that is propagated in the network by contacted nodes. The level of clock accuracy is depreciated with

respect to time in order to account for long delays between contact opportunities. Mathematical analysis and simulation

results for various network scenarios are presented to demonstrate the convergence and performance of the DCS

protocol. It is shown that the DCS protocol can achieve faster clock convergence speed and, as a result, reduces energy

cost by half for neighbor discovery.

RFID has been gaining popularity due to its variety of applications, such as inventory control and localization. One

important issue in RFID system is tag identification. In RFID systems, the tag randomly selects a slot to send a Random

Number (RN) packet to contend for identification. Collision happens when multiple tags select the same slot, which

makes the RN packet undecodable and thus reduces the channel utilization. In this paper, we redesign the RN pattern to

make the collided RNs decodable. By leveraging the collision slots, the system performance can be dramatically

enhanced. This novel scheme is called DDC, which is able to directly decode the collisions without exact knowledge of

collided RNs. In the DDC scheme, we modify the RN generator in RFID tag and add a collision decoding scheme for RFID

reader. We implement DDC in GNU Radio and USRP2 based testbed to verify its feasibility. Both theoretical analysis and

testbed experiment show that DDC achieves 40 percent tag read rate gain compared with traditional RFID protocol.

Massively parallel applications often require periodic data checkpointing for program restart and post-run data analysis.

Although high performance computing systems provide massive parallelism and computing power to fulfill the crucial

requirements of the scientific applications, the I/O tasks of high-end applications do not scale. Strict data consistency

Delegation-Based I/O Mechanism for High Performance Computing Systems

DDC: A Novel Scheme to Directly Decode the Collisions in UHF RFID Systems

DCS: Distributed Asynchronous Clock Synchronization in Delay Tolerant Networks



Projects

EGC

3243

EGC

3242

semantics adopted from traditional file systems are inadequate for homogeneous parallel computing platforms. For high

performance parallel applications independent I/O is critical, particularly if checkpointing data are dynamically created

or irregularly partitioned. In particular, parallel programs generating a large number of unrelated I/O accesses on large-

scale systems often face serious I/O serializations introduced by lock contention and conflicts at file system layer. As

these applications may not be able to utilize the I/O optimizations requiring process synchronization, they pose a great

challenge for parallel I/O architecture and software designs. We propose an I/O mechanism to bridge the gap between

scientific applications and parallel storage systems. A static file domain partitioning method is developed to align the I/O

requests and produce a client-server mapping that minimizes the file lock acquisition costs and eliminates the lock

contention. Our performance evaluations of production application I/O kernels demonstrate scalable performance and

achieve high I/O bandwidths.

We consider the problems of 1) estimating the physical locations of nodes in an indoor wireless network, and 2)

estimating the channel noise in a MIMO wireless network, since knowing these parameters are important to many tasks

of a wireless network such as network management, event detection, location-based service, and routing. A hierarchical

support vector machines (H-SVM) scheme is proposed with the following advantages. First, H-SVM offers an efficient

evaluation procedure in a distributed manner due to hierarchical structure. Second, H-SVM could determine these

parameters based only on simpler network information, e.g., the hop counts, without requiring particular ranging

hardware. Third, the exact mean and the variance of the estimation error introduced by H-SVM are derived which are

seldom addressed in previous works. Furthermore, we present a parallel learning algorithm to reduce the computation

time required for the proposed H-SVM. Thanks for the quicker matrix diagonization technique, our algorithm can reduce

the traditional SVM learning complexity from O(n3) to O(n

2) where n is the training sample size. Finally, the simulation

results verify the validity and effectiveness for the proposed H-SVM with parallel learning algorithm.

This work introduces the Distributed Network Reachability (DNR) algorithm, a distributed system-level diagnosis

algorithm that allows every node of a partitionable arbitrary topology network to determine which portions of the

network are reachable and unreachable. DNR is the first distributed diagnosis algorithm that works in the presence of

network partitions and healings caused by dynamic fault and repair events. Both crash and timing faults are assumed,

and a faulty node is indistinguishable of a network partition. Every link is alternately tested by one of its adjacent nodes

at subsequent testing intervals. Upon the detection of a new event, the new diagnostic information is disseminated to

reachable nodes. New events can occur before the dissemination completes. Any time a new event is detected or

informed, a working node may compute the network reachability using local diagnostic information. The bounded

Distributed Diagnosis of Dynamic Events in Partitionable Arbitrary Topology Networks

Determination of Wireless Networks Parameters through Parallel Hierarchical Support Vector

Machines



Projects

EGC

3246

EGC

3244

EGC

3245

correctness of DNR is proved, including the bounded diagnostic latency, bounded startup and accuracy. Simulation

results are presented for several random and regular topologies, showing the performance of the algorithm under highly

dynamic fault situations.

Evidence propagation is a major step in exact inference, a key problem in exploring probabilistic graphical models. In

this paper, we propose a novel approach for parallelizing evidence propagation in junction trees on clusters. Our

proposed method explores structural parallelism in a given junction tree. We decompose a junction tree into a set of

subtrees, each consisting of one or multiple leaf-root paths in the junction tree. In evidence propagation, we first

perform evidence collection in these subtrees concurrently. Then, the partially updated subtrees exchange data for

junction tree merging, so that all the cliques in the junction tree can be fully updated for evidence collection. Finally,

evidence distribution is performed in all the subtrees to complete evidence propagation. Since merging subtrees

requires communication across processors, we propose a technique called bitmap partitioning to explore the tradeoff

between bandwidth utilization efficiency and the overhead due to the startup latency of message passing. We

implemented the proposed method using Message Passing Interface (MPI) on a state-of-the-art Myrinet cluster

consisting of 128 processors. Compared with a baseline method, our technique results in improved scalability.

High-speed routers rely on well-designed packet buffers that support multiple queues, provide large capacity and short

response times. Some researchers suggested combined SRAM/DRAM hierarchical buffer architectures to meet these

challenges. However, these architectures suffer from either large SRAM requirement or high time-complexity in the

memory management. In this paper, we present scalable, efficient, and novel distributed packet buffer architecture. Two

fundamental issues need to be addressed to make this architecture feasible: 1) how to minimize the overhead of an

individual packet buffer; and 2) how to design scalable packet buffers using independent buffer subsystems. We

address these issues by first designing an efficient compact buffer that reduces the SRAM size requirement by (k-1)/k.

Then, we introduce a feasible way of coordinating multiple subsystems with a load-balancing algorithm that maximizes

the overall system performance. Both theoretical analysis and experimental results demonstrate that our load-balancing

algorithm and the distributed packet buffer architecture can easily scale to meet the buffering needs of high bandwidth

links and satisfy the requirements of scale and support for multiple queues.

Distributed Privacy-Preserving Access Control in Sensor Networks

Distributed Packet Buffers for High-Bandwidth Switches and Routers

Distributed Evidence Propagation in Junction Trees on Clusters



Projects

EGC

3248

EGC

3247

The owner and users of a sensor network may be different, which necessitates privacy-preserving access control. On

the one hand, the network owner need enforce strict access control so that the sensed data are only accessible to users

willing to pay. On the other hand, users wish to protect their respective data access patterns whose disclosure may be

used against their interests. This paper presents {rm DP}^2{rm{AC}}, a Distributed Privacy-Preserving Access Control

scheme for sensor networks, which is the first work of its kind. Users in {rm DP}^2{rm{AC}} purchase tokens from the

network owner whereby to query data from sensor nodes which will reply only after validating the tokens. The use of

blind signatures in token generation ensures that tokens are publicly verifiable yet unlinkable to user identities, so

privacy-preserving access control is achieved. A central component in {rm DP}^2{rm{AC}} is to prevent malicious users

from reusing tokens, for which we propose a suite of distributed token reuse detection (DTRD) schemes without

involving the base station. These schemes share the essential idea that a sensor node checks with some other nodes

(called witnesses) whether a token has been used, but they differ in how the witnesses are chosen. We thoroughly

compare their performance with regard to TRD capability, communication overhead, storage overhead, and attack

resilience. The efficacy and efficiency of {rm DP}^2{rm{AC}} are confirmed by detailed performance evaluations.

ZigBee, a unique communication standard designed for low-rate wireless personal area networks, has extremely low

complexity, cost, and power consumption for wireless connectivity in inexpensive, portable, and mobile devices. Among

the well-known ZigBee topologies, ZigBee cluster-tree is especially suitable for low-power and low-cost wireless sensor

networks because it supports power saving operations and light-weight routing. In a constructed wireless sensor

network, the information about some area of interest may require further investigation such that more traffic will be

generated. However, the restricted routing of a ZigBee cluster-tree network may not be able to provide sufficient

bandwidth for the increased traffic load, so the additional information may not be delivered successfully. In this paper,

we present an adoptive-parent-based framework for a ZigBee cluster-tree network to increase bandwidth utilization

without generating any extra message exchange. To optimize the throughput in the framework, we model the process as

a vertex-constraint maximum flow problem, and develop a distributed algorithm that is fully compatible with the ZigBee

standard. The optimality and convergence property of the algorithm are proved theoretically. Finally, the results of

simulation experiments demonstrate the significant performance improvement achieved by the proposed framework and

algorithm over existing approaches.

Distributed Throughput Optimization for ZigBee Cluster-Tree Networks

Distributed Uplink Power Control in Multiservice Wireless Networks via a Game Theoretic

Approach with Convex Pricing



Projects

EGC

3249

EGC

3250

In this paper, the problem of efficient distributed power control via convex pricing of users' transmission power in the

uplink of CDMA wireless networks supporting multiple services is addressed. Each user is associated with a nested

utility function, which appropriately represents his degree of satisfaction in relation to the expected trade-off between

his QoS-aware actual uplink throughput performance and the corresponding power consumption. Initially, a Multiservice

Uplink Power Control game (MSUPC) is formulated, where each user aims selfishly at maximizing his utility-based

performance under the imposed physical limitations and its unique Nash equilibrium point is determined. Then the

inefficiency of MSUPC game's Nash equilibrium is proven and a usage-based convex pricing policy of the transmission

power is introduced, which offers a more effective approach compared to the linear pricing schemes that have been

adopted in the literature. Consequently, a Multiservice Uplink Power Control game with Convex Pricing (MSUPC-CP) is

formulated and its unique Pareto optimal Nash equilibrium is determined. A distributed iterative algorithm for computing

MSUPC-CP game's equilibrium is proposed, while the overall approach's efficiency is illustrated via modeling and

simulation.

Wireless sensor networks may be deployed in many applications to detect and track events of interest. Events can be

either point events with an exact location and constant shape, or region events which cover a large area and have

dynamic shapes. While both types of events have received attention, no event detection and tracking protocol in

existing wireless sensor network research is able to identify and track region events with dynamic identities, which arise

when events are created or destroyed through splitting and merging. In this paper, we propose DRAGON, an event

detection and tracking protocol which is able to handle all types of events including region events with dynamic

identities. DRAGON employs two physics metaphors: event center of mass, to give an approximate location to the event;

and node momentum, to guide the detection of event merges and splits. Both detailed theoretical analysis and extensive

performance studies of DRAGON's properties demonstrate that DRAGON's execution is distributed among the sensor

nodes, has low latency, is energy efficient, is able to run on a wide array of physical deployments, and has performance

which scales well with event size, speed, and count.

In mobile-beacon assisted sensor localization, beacon mobility scheduling aims to determine the best beacon trajectory

so that each sensor receives sufficient beacon signals and becomes localized with minimum delay. We propose a novel

DeteRministic dynamic bEAcon Mobility Scheduling (DREAMS) algorithm, without requiring any prior knowledge of the

sensory field. In this algorithm, the beacon trajectory is defined as the track of Depth-First Traversal (DFT) of the

network graph, thus deterministic. The mobile beacon performs DFT dynamically, under the instruction of nearby

sensors on the fly. It moves from sensor to sensor in an intelligent heuristic manner according to Received Signal

Strength (RSS)-based distance measurements. We prove that DREAMS guarantees full localization (every sensor is

Dynamic Beacon Mobility Scheduling for Sensor Localization

DRAGON: Detection and Tracking of Dynamic Amorphous Events in Wireless Sensor Networks



Projects

EGC

3251

EGC

3252

localized) when the measurements are noise-free, and derive the upper bound of beacon total moving distance in this

case. Then, we suggest to apply node elimination and Local Minimum Spanning Tree (LMST) to shorten beacon tour and

reduce delay. Further, we extend DREAMS to multibeacon scenarios. Beacons with different coordinate systems

compete for localizing sensors. Loser beacons agree on winner beacons' coordinate system, and become cooperative in

subsequent localization. All sensors are finally localized in a commonly agreed coordinate systems. Through simulation

we show that DREAMS guarantees full localization even with noisy distance measurements. We evaluate its

performance on localization delay and communication overhead in comparison with a previously proposed static path-

based scheduling method.

We propose a novel job scheduling approach for homogeneous cluster computing platforms. Its key feature is the use of

virtual machine technology to share fractional node resources in a precise and controlled manner. Other VM-based

scheduling approaches have focused primarily on technical issues or extensions to existing batch scheduling systems,

while we take a more aggressive approach and seek to find heuristics that maximize an objective metric correlated with

job performance. We derive absolute performance bounds and develop algorithms for the online nonclairvoyant version

of our scheduling problem. We further evaluate these algorithms in simulation against both synthetic and real-world

HPC workloads and compare our algorithms to standard batch scheduling approaches. We find that our approach

improves over batch scheduling by orders of magnitude in terms of job stretch, while leading to comparable or better

resource utilization. Our results demonstrate that virtualization technology coupled with lightweight online scheduling

strategies can afford dramatic improvements in performance for executing HPC workloads.

Dynamic programming (DP) is a popular and efficient technique in many scientific applications such as computational

biology. Nevertheless, its performance is limited due to the burgeoning volume of scientific data, and parallelism is

necessary and crucial to keep the computation time at acceptable levels. The intrinsically strong data dependency of

dynamic programming makes it difficult and error-prone for the programmer to write a correct and efficient parallel

program. Therefore, this paper builds a runtime system named EasyPDP aiming at parallelizing dynamic programming

algorithms on multicore and multiprocessor platforms. Under the concept of software reusability and complexity

reduction of parallel programming, a DAG Data Driven Model is proposed, which supports those applications with a

strong data interdependence relationship. Based on the model, EasyPDP runtime system is designed and implemented.

It automatically handles thread creation, dynamic data task allocation and scheduling, data partitioning, and fault

tolerance. Five frequently used DAG patterns from biological dynamic programming algorithms have been put into the

DAG pattern library of EasyPDP, so that the programmer can choose to use any of them according to his/her specific

EasyPDP: An Efficient Parallel Dynamic Programming Runtime System for Computational

Biology

Dynamic Fractional Resource Scheduling versus Batch Scheduling



Projects

EGC

3254

EGC

3253

EGC

3255

application. Besides, an ideal computing distribution model is proposed to discuss the optimal values for the

performance tuning arguments of EasyPDP. We evaluate the performance potential and fault tolerance feature of

EasyPDP in multicore system. We also compare EasyPDP with other methods such as Block-Cycle Wavefront (BCW).

The experimental results illustrate that EasyPDP system is fine and provides an efficient infrastructure for dynamic

programming algorithms.

In this paper, we show that the hexagonal mesh networks developed in the early 1990s are a special case of the EJ

networks that have been considered more recently. Using a node addressing scheme based on the EJ number system,

we give a shortest path routing algorithm for hexagonal mesh networks. We also extend the known efficient one-to-all

broadcasting algorithm on hexagonal mesh networks to algorithms for one-to-one personalized broadcasting, all-to-all

broadcasting, and all-to-all personalized broadcasting algorithms. Their time complexity and optimality are analyzed.

Traditional software-based barrier implementations for shared memory parallel machines tend to produce hotspots in

terms of memory and network contention as the number of processors increases. This could limit their applicability to

future many-core CMPs in which possibly several dozens of cores would need to be synchronized efficiently. In this

work, we develop GBarrier, a hardware-based barrier mechanism especially aimed at providing efficient barriers in

future many-core CMPs. Our proposal deploys a dedicated G-line-based network to allow for fast and efficient signaling

of barrier arrival and departure. Since GBarrier does not have any influence on the memory system, we avoid all

coherence activity and barrier-related network traffic that traditional approaches introduce and that restrict scalability.

Through detailed simulations of a 32-core CMP, we compare GBarrier against one of the most efficient software-based

barrier implementations for a set of kernels and scientific applications. Evaluation results show average reductions of 54

and 21 percent in execution time, 53 and 18 percent in network traffic, and also 76 and 31 percent in the energy-delay²

product metric for the full CMP when the kernels and scientific applications, respectively, are considered.

The master/worker (MW) paradigm can be used as an approach to parallel discrete event simulation (PDES) on

metacomputing systems. MW PDES applications incur overheads not found in conventional PDES executions executing

on tightly coupled machines. We introduce four optimization techniques in MW PDES systems on public resource and

desktop grid infrastructures. Work unit caching, pipelined state updates, expedited message delivery, and adaptive work

Efficient Master/Worker Parallel Discrete Event Simulation on Meta computing Systems

Efficient Hardware Barrier Synchronization in Many-Core CMPs

Efficient Communication Algorithms in Hexagonal Mesh Interconnection Networks



Projects

EGC

3257

EGC

3256

unit scheduling mechanisms in the context of MW PDES are described. These optimizations provide significant

performance benefits when used in tandem. We present results showing that an optimized MW PDES system using

these techniques can exhibit performance comparable to a traditional PDES system for queueing network and particle

physics simulation applications while providing execution capability across metacomputing systems.

Dynamic virtual server provisioning is critical to quality-of-service assurance for multitier Internet applications. In this

paper, we address three important challenging problems. First, we propose an efficient server provisioning approach on

multitier clusters based on an end-to-end resource allocation optimization model. It is to minimize the number of virtual

servers allocated to the system while the average end-to-end response time guarantee is satisfied. Second, we design a

model-independent fuzzy controller for bounding an important performance metric, the 90th-percentile response time of

requests flowing through the multitier architecture. Third, to compensate for the latency due to the dynamic addition of

virtual servers, we design a self-tuning component that adaptively adjusts the output scaling factor of the fuzzy

controller according to the transient behavior of the end-to-end response time. Extensive simulation results, using two

representative customer behavior models in a typical three-tier web cluster, demonstrate that the provisioning approach

is able to significantly reduce the number of virtual servers allocated for the performance guarantee compared to an

existing representative approach. The approach integrated with the model-independent self-tuning fuzzy controller can

efficiently assure the average and the 90th-percentile end-to-end response time guarantees on multitier clusters.

Interconnection networks with adaptive routing are susceptible to deadlock, which could lead to performance

degradation or system failure. Detecting deadlocks at runtime is challenging because of their highly distributed

characteristics. In this paper, we present a deadlock detection method that utilizes runtime transitive closure (TC)

computation to discover the existence of deadlock-equivalence sets, which imply loops of requests in networks-on-chip

(NoCs). This detection scheme guarantees the discovery of all true deadlocks without false alarms in contrast with state-

of-the-art approximation and heuristic approaches. A distributed TC-network architecture, which couples with the NoC

infrastructure, is also presented to realize the detection mechanism efficiently. Detailed hardware realization

architectures and schematics are also discussed. Our results based on a cycle-accurate simulator demonstrate the

effectiveness of the proposed method. It drastically outperforms timing-based deadlock detection mechanisms by

eliminating false detections and, thus, reducing energy wastage in retransmission for various traffic scenarios including

real-world application. We found that timing-based methods may produce two orders of magnitude more deadlock

alarms than the TC-network method. Moreover, the implementations presented in this paper demonstrate that the

hardware overhead of TC-networks is insignificant.

Embedded Transitive Closure Network for Runtime Deadlock Detection in Networks-on-Chip

Efficient Server Provisioning with Control for End-to-End Response Time Guarantee on Multitier Clusters



Projects

EGC

3258

EGC

3259

EGC

3260

Cloud computing economically enables the paradigm of data service outsourcing. However, to protect data privacy,

sensitive cloud data have to be encrypted before outsourced to the commercial public cloud, which makes effective data

utilization service a very challenging task. Although traditional searchable encryption techniques allow users to securely

search over encrypted data through keywords, they support only Boolean search and are not yet sufficient to meet the

effective data utilization need that is inherently demanded by large number of users and huge amount of data files in

cloud. In this paper, we define and solve the problem of secure ranked keyword search over encrypted cloud data.

Ranked search greatly enhances system usability by enabling search result relevance ranking instead of sending

undifferentiated results, and further ensures the file retrieval accuracy. Specifically, we explore the statistical measure

approach, i.e., relevance score, from information retrieval to build a secure searchable index, and develop a one-to-many

order-preserving mapping technique to properly protect those sensitive score information. The resulting design is able

to facilitate efficient server-side ranking without losing keyword privacy. Thorough analysis shows that our proposed

solution enjoys “as-strong-as-possible” security guarantee compared to previous searchable encryption schemes, while

correctly realizing the goal of ranked keyword search. Extensive experimental results demonstrate the efficiency of the

proposed solution.

For lightly loaded multicore processors that contain more processing cores than running tasks and have dynamic

voltage and frequency scaling capability, we address the energy-efficient scheduling of periodic real-time tasks. First,

we introduce two energy-saving techniques for the lightly loaded multicore processors: exploiting overabundant cores

for executing a task in parallel with a lower frequency and turning off power of rarely used cores. Next, we verify that if

the two introduced techniques are supported, then the problem of minimizing energy consumption of real-time tasks

while meeting their deadlines is NP-hard on a lightly loaded multicore processor. Finally, we propose a polynomial-time

scheduling scheme that provides a near minimum-energy feasible schedule. The difference of energy consumption

between the provided schedule and the minimum-energy schedule is limited. The scheme saves up to 64 percent of the

processing core energy consumed by the previous scheme that executes each task on a separate core.

Energy-Efficient Topology Control in Cooperative Ad Hoc Networks

Energy-Efficient Scheduling of Periodic Real-Time Tasks on Lightly Loaded Multicore

Processors

Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data



Projects

EGC

3261

EGC

3262

Cooperative communication (CC) exploits space diversity through allowing multiple nodes cooperatively relay signals to

the receiver so that the combined signal at the receiver can be correctly decoded. Since CC can reduce the transmission

power and extend the transmission coverage, it has been considered in topology control protocols [1], [2]. However,

prior research on topology control with CC only focuses on maintaining the network connectivity, minimizing the

transmission power of each node, whereas ignores the energy efficiency of paths in constructed topologies. This may

cause inefficient routes and hurt the overall network performance in cooperative ad hoc networks. In this paper, to

address this problem, we introduce a new topology control problem: energy-efficient topology control problem with

cooperative communication, and propose two topology control algorithms to build cooperative energy spanners in

which the energy efficiency of individual paths are guaranteed. Both proposed algorithms can be performed in

distributed and localized fashion while maintaining the globally efficient paths. Simulation results confirm the nice

performance of all proposed algorithms.

Declustering techniques reduce query response times through parallel I/O by distributing data among multiple devices.

Except for a few cases, it is not possible to find declustering schemes that are optimal for all spatial range queries. As a

result of this, most of the research on declustering have focused on finding schemes with low worst case additive error.

Number-theoretic declustering techniques provide low additive error and high threshold. In this paper, we investigate

equivalent disk allocations and focus on number-theoretic declustering. Most of the number-theoretic disk allocations

are equivalent and provide the same additive error and threshold. Investigation of equivalent allocations simplifies

schemes to find allocations with desirable properties. By keeping one of the equivalent disk allocations, we can reduce

the complexity of searching for good disk allocations under various criteria such as additive error and threshold. Using

proposed scheme, we were able to collect the most extensive experimental results on additive error and threshold in 2,

3, and 4 dimensions.

This paper proposes a parallel simulation methodology to speed up network simulations on modern multicore systems.

In this paper, we present the design and implementation of this approach and the performance speedups achieved

under various network conditions. This methodology provides two unique and important advantages: 1) one can readily

enjoy performance speedups without using an unfamiliar simulation language/library to rewrite his protocol module

code for parallel simulations, and 2) one can conduct parallel simulations in the same way as when he conducts

sequential simulations. We implemented this methodology and evaluated its performance speedups on the popular ns-2

network simulator. Our results show that this methodology is feasible and can provide satisfactory performance

speedups under high event load conditions on wired networks.

Exploiting Event-Level Parallelism for Parallel Network Simulation on Multicore Systems

Equivalent Disk Allocations



Projects

EGC

3265

EGC

3264

EGC

3263

Jamming attacks are especially harmful when ensuring the dependability of wireless communication. Finding the

position of a jammer will enable the network to actively exploit a wide range of defense strategies. In this paper, we

focus on developing mechanisms to localize a jammer by exploiting neighbor changes. We first conduct jamming effect

analysis to examine how the communication range alters with the jammer's location and transmission power using free-

space model. Then, we show that a node's affected communication range can be estimated purely by examining its

neighbor changes caused by jamming attacks and thus, we can perform the jammer location estimation by solving a

least-squares (LSQ) problem that exploits the changes of communication range. Compared with our previous iterative-

search-based virtual force algorithm, our LSQ-based algorithm exhibits lower computational cost (i.e., one step instead

of iterative searches) and higher localization accuracy. Furthermore, we analyze the localization challenges in real

systems by building the log-normal shadowing model empirically and devising an adaptive LSQ-based algorithm to

address those challenges. The extensive evaluation shows that the adaptive LSQ-based algorithm can effectively

estimate the location of the jammer even in a highly complex propagation environment.

The fast-growing traffic of Peer-to-Peer (P2P) applications, most notably BitTorrent (BT), is putting unprecedented

pressure to Internet Service Providers (ISPs). P2P locality has, therefore, been widely suggested to mitigate the costly

inter-ISP traffic. In this paper, we for the first time examine the existence and distribution of the locality through a large-

scale hybrid PlanetLab-Internet measurement. We find that even in the most popular Autonomous Systems (ASes), very

few individual torrents are able to form large enough local clusters of peers, making state-of-the-art locality mechanisms

for individual torrents quite inefficient. Inspired by peers' multiple torrent behavior, we develop a novel framework that

traces and recovers the available contents at peers across multiple torrents, and thus effectively amplifies the

possibilities of local sharing. We address the key design issues in this framework, in particular, the detection of peer

migration across the torrents. We develop a smart detection mechanism with shared trackers, which achieves 45

percent success rate without any tracker-level communication overhead. We further demonstrate strong evidence that

the migrations are not random, but follow certain patterns with correlations. This leads to torrent clustering, a practical

enhancement that can increase the detection rate to 75 percent, thus greatly facilitating locality across multiple torrents.

The simulation results indicate that our framework can successfully reduce the cross-ISP traffic and minimize the

possible degradation of peers' downloading experiences.

Exploring the Optimal Replication Strategy in P2P-VoD Systems: Characterization and Evaluation

Exploring Peer-to-Peer Locality in Multiple Torrent Environment

Exploiting Jamming-Caused Neighbor Changes for Jammer Localization



Projects

EGC

3266

EGC

3267

Content providers of P2P-Video-on-Demand (P2P-VoD) services aim to provide a high quality, scalable service to users,

and at the same time, operate the system with a manageable operating cost. Given the volume-based charging model by

ISPs, it is to the best interest of the P2P-VoD content providers to reduce peers' access to the content server so as to

reduce the operating cost. In this paper, we address an important open problem: what is the “optimal replication ratio”

in a P2P-VoD system such that peers will receive service from each other and at the same time, reduce the traffic to the

content server. We address two fundamental problems: (1) what is the optimal replication ratio of a movie given its

popularity, and (2) how to achieve the optimal ratios in a distributed and dynamic fashion. We formally show how movie

popularities can impact server's workload, and formulate the video replication as an optimization problem. We show that

the conventional wisdom of using the proportional replication strategy is non-optimal, and expand the design space to

both passive replacement policy and active push policy to achieve the optimal replication ratios. We consider practical

implementation issues, evaluate the performance of P2P-VoD systems and show that our algorithms can greatly reduce

server's workload and improve streaming quality.

Aggregation of data values plays an important role on distributed computations, in particular, over peer-to-peer and

sensor networks, as it can provide a summary of some global system property and direct the actions of self-adaptive

distributed algorithms. Examples include using estimates of the network size to dimension distributed hash tables or

estimates of the average system load to direct load balancing. Distributed aggregation using nonidempotent functions,

like sums, is not trivial as it is not easy to prevent a given value from being accounted for multiple times; this is

especially the case if no centralized algorithms or global identifiers can be used. This paper introduces Extrema

Propagation, a probabilistic technique for distributed estimation of the sum of positive real numbers. The technique

relies on the exchange of duplicate insensitive messages and can be applied in flood and/or epidemic settings, where

multipath routing occurs; it is tolerant of message loss; it is fast, as the number of message exchange steps can be

made just slightly above the theoretical minimum; and it is fully distributed, with no single point of failure and the result

produced at every node.

We explore two different threading approaches on a graphics processing unit (GPU) exploiting two different

characteristics of the current GPU architecture. The fat thread approach tries to minimize data access time by relying on

shared memory and registers potentially sacrificing parallelism. The thin thread approach maximizes parallelism and

tries to hide access latencies. We apply these two approaches to the parallel stochastic simulation of chemical reaction

systems using the stochastic simulation algorithm (SSA) by Gillespie [14]. In these cases, the proposed thin thread

approach shows comparable performance while eliminating the limitation of the reaction system's size.

Fat versus Thin Threading Approach on GPUs: Application to Stochastic Simulation of

Chemical Reactions

Extrema Propagation: Fast Distributed Estimation of Sums and Network Sizes



Projects

EGC

3269

EGC

3270

EGC

3268

We demonstrate that the network flux over the sensor network provides fingerprint information about the mobile users

within the field. Such information is exoteric in the physical space and easy to access through passive sniffing. We

present a theoretical model to abstract the network flux according to the statuses of mobile users. We fit the theoretical

model with the network flux measurements through Nonlinear Least Squares (NLS) and develop an algorithm that

iteratively approaches the NLS solution by Sequential Monte Carlo Estimation. With sparse measurements of the flux

information at individual sensor nodes, we show that it is easy to identify the mobile users within the network and

instantly track their movements without breaking into the details of the communicational packets. Our study indicates

that most of existing systems are vulnerable to such attack against the privacy of mobile users. We further propose a set

of countermeasures that redistribute and reshape the network traffic to preserve the location privacy of mobile users.

With a trace driven simulation, we demonstrate the substantial threats of the attacks and the effectiveness of the

proposed countermeasures.

Peer-to-peer (P2P) live video streaming systems have recently received substantial attention, with commercial

deployment gaining increased popularity in the internet. It is evident from our practical experiences with real-world

systems that, it is not uncommon for hundreds of thousands of users to choose to join a program in the first few

minutes of a live broadcast. Such a severe flash crowd phenomenon in live streaming poses significant challenges in

the system design. In this paper, for the first time, we develop a mathematical model to: 1) capture the fundamental

relationship between time and scale in P2P live streaming systems under a flash crowd, and 2) explore the design

principle of population control to alleviate the impact of the flash crowd. We carry out rigorous analysis that brings forth

an in-depth understanding on effects of the gossip protocol and peer dynamics. In particular, we demonstrate that there

exists an upper bound on the system scale with respect to a time constraint. By trading peer startup delays in the initial

stage of a flash crowd for system scale, we design a simple and flexible population control framework that can alleviate

the flash crowd without the requirement of otherwise costly server deployment.

The paper presents a programming language, DSystemJ, for dynamic distributed Globally Asynchronous Locally

Synchronous (GALS) systems, its formal model of computation, formal syntax and semantics, its compilation and

Formal Semantics, Compilation and Execution of the GALS Programming Language DSystemJ

Flash Crowd in P2P Live Streaming Systems: Fundamental Characteristics and Design Implications

Fingerprinting Mobile User Positions in Sensor Networks: Attacks and Countermeasures



Projects

EGC

3273

EGC

3272

EGC

3273

EGC

3271

implementation. The language is aimed at dynamic distributed systems, which use socket based communication

protocols for communicating between components. DSystemJ allows the creation and control at runtime of

asynchronous processes called clock-domains, their mobility on a distributed execution platform, as well as the runtime

reconfiguration of the system's functionality and topology. As DSystemJ is based on a GALS model of computation and

has a formal semantics, it offers very safe mechanisms for implementation of distributed systems, as well as potential

for their formal verification. The details and principles of its compilation, as well as its required runtime support are

described. The runtime support is implemented in the SystemJ GALS language that can be considered as a static subset

of DSystemJ.

In this paper, we propose a new class of graphs called generalized recursive circulant graphs which is an extension of

recursive circulant graphs. While retaining attractive properties of recursive circulant graphs, the new class of graphs

achieve more flexibility in varying the number of vertices. Some network properties of recursive circulant graphs, like

degree, connectivity and diameter, are adapted to the new graph class with more concise expression. In particular, we

use a multidimensional vertex labeling scheme in generalized recursive circulant graphs. Based on the labeling scheme,

a shortest path routing algorithm for the graph class is proposed. The correctness of the routing algorithm is also

proved in this paper.

We consider the use of commodity graphics processing units (GPUs) for the common task of numerically integrating

ordinary differential equations (ODEs), achieving speedups of up to 115-fold over comparable serial CPU

implementations, and 15-fold over multithreaded CPU code with SIMD intrinsics. Using Lorenz '96 models as a case

study, single and double precision benchmarks are established for both the widely used DOPRI5 method and

computationally tailored low-storage RK4(3)5[2R+]C. A range of configurations are assessed on each, including

multithreading and SIMD intrinsics on the CPU, and GPU kernels parallelized over both the dimensionality of the ODE

system and number of trajectories. On the GPU, we draw particular attention to the problem of variable task-length

among threads of the same warp, proposing a lightweight strategy of assigning multiple data items to each thread to

reduce the prevalence of redundant operations. A simple analysis suggests that the strategy can draw performance

close to that of ideal parallelism, while empirical results demonstrate up to a 10 percent improvement over the standard

approach.

Grouping-Enhanced Resilient Probabilistic En-Route Filtering of Injected False Data in WSNs

GPU Acceleration of Runge-Kutta Integrators

Generalized Recursive Circulant Graphs



Projects

EGC

3276

EGC

3275

EGC

3274

In wireless sensor networks, the adversary may inject false reports to exhaust network energy or trigger false alarms

with compromised sensor nodes. In response to the problems of existing schemes on the security resiliency,

applicability and filtering effectiveness, this paper proposes a scheme, referred to as Grouping-enhanced Resilient

Probabilistic En-route Filtering (GRPEF). In GRPEF, an efficient distributed algorithm is proposed to group nodes

without incurring extra groups, and a multiaxis division based approach for deriving location-aware keys is used to

overcome the threshold problem and remove the dependence on the sink immobility and routing protocols. Compared

to the existing schemes, GRPEF significantly improves the effectiveness of the en-route filtering and can be applied to

the sensor networks with mobile sinks while reserving the resiliency.

We show that the 2atimes a rectangular twisted torus introduced by Cámara et al. [5] is edge decomposable into two

Hamiltonian cycles. In the process, the 2a × a × a prismatic twisted torus is edge decomposable into three Hamiltonian

cycles, and the 2a × a × a prismatic doubly twisted torus admits two edge-disjoint Hamiltonian cycles.

With Moore's law supplying billions of transistors on-chip, embedded systems are undergoing a transition from single-

core to multicore to exploit this high-transistor density for high performance. Embedded systems differ from traditional

high-performance supercomputers in that power is a first-order constraint for embedded systems; whereas,

performance is the major benchmark for supercomputers. The increase in on-chip transistor density exacerbates

power/thermal issues in embedded systems, which necessitates novel hardware/software power/thermal management

techniques to meet the ever-increasing high-performance embedded computing demands in an energy-efficient manner.

This paper outlines typical requirements of embedded applications and discusses state-of-the-art hardware/software

high-performance energy-efficient embedded computing (HPEEC) techniques that help meeting these requirements. We

also discuss modern multicore processors that leverage these HPEEC techniques to deliver high performance per watt.

Finally, we present design challenges and future research directions for HPEEC system development.

Due to their capability to hide the complexity generated by the messages exchanged between processes, shared objects

are one of the main abstractions provided to developers of distributed applications. Implementations of such objects, in

modern distributed systems, have to take into account the fact that almost all services, implemented on top of

distributed infrastructures, are no longer fully managed due to either their size or their maintenance cost. Therefore,

Implementing a Regular Register in an Eventually Synchronous Distributed System Prone to Continuous Churn

High-Performance Energy-Efficient Multicore Embedded Computing

Hamiltonian Decomposition of the Rectangular Twisted Torus



Projects

EGC

3278

EGC

3277

these infrastructures exhibit several autonomic behaviors in order to, for example, tolerate failures and continuous

arrival and departure of nodes (churn phenomenon). Among all the shared objects, the register object is a fundamental

one. Several protocols have been proposed to build fault resilient registers on top of message-passing system, but,

unfortunately, failures are not the only challenge in modern distributed systems and new issues arise in the presence of

churn. This paper addresses the construction of a multiwriter/multireader regular register in an eventually synchronous

distributed system affected by the continuous arrival/departure of participants. In particular, a general protocol

implementing a regular register is proposed and feasibility conditions associated with the arrival and departure of the

processes are given. The protocol is proved correct under the assumption that a constraint on the churn is satisfied.

Greedy forwarding is a simple yet efficient technique employed by many routing protocols. It is ideal to realize point-to-

point routing in wireless sensor networks because packets can be delivered by only maintaining a small set of

neighbors' information regardless of network size. It has been successfully employed by geographic routing, which

assumes that a packet can be moved closer to the destination in the network topology if it is forwarded geographically

closer to the destination in the physical space. This assumption, however, may lead packets to the local minimum where

no neighbors of the sender are closer to the destination or low-quality routes that comprise long distance hops of low

packet reception ratio. To address the local minimum problem, we propose a topology aware routing (TAR) protocol that

efficiently encodes a network topology into a low-dimensional virtual coordinate space where hop distances between

pairwise nodes are preserved. Based on precise hop distance comparison, TAR can assist greedy forwarding to find the

right neighbor that is one hop closer to the destination and achieve high success ratio of packet delivery without

location information. Further, we improve the routing quality by embedding a network topology based on the metric of

expected transmission count (ETX). ETX embedding accurately encodes both a network's topological structure and

channel quality to nodes' small size virtual coordinates, which helps greedy forwarding to guide a packet along the

optimal path that has the fewest number of transmissions. We evaluate our approaches through both simulations and

experiments, showing that routing performance are improved in terms of routing success ratio and routing cost.

Maintaining interactivity is one of the key challenges in distributed virtual environments (DVEs). In this paper, we

consider a new problem, termed the interactivity-constrained server provisioning problem, whose goal is to minimize the

number of distributed servers needed to achieve a prespecified level of interactivity. We identify and formulate two

variants of this new problem and show that they are both NP-hard via reductions to the set covering problem. We then

propose several computationally efficient approximation algorithms for solving the problem. The main algorithms

exploit dependencies among distributed servers to make provisioning decisions. We conduct extensive experiments to

Interactivity-Constrained Server Provisioning in Large-Scale Distributed Virtual Environments

Improving End-to-End Routing Performance of Greedy Forwarding in Sensor Networks



Projects

EGC

3280

EGC

3279

evaluate the performance of the proposed algorithms. Specifically, we use both static Internet latency data available

from prior measurements and topology generators, as well as the most recent, dynamic latency data collected via our

own large-scale deployment of a DVE performance monitoring system over PlanetLab. The results show that the newly

proposed algorithms that take into account interserver dependencies significantly outperform the well-established set

covering algorithm for both problem variants.

This paper proposes large-scale transient stability simulation based on the massively parallel architecture of multiple

graphics processing units (GPUs). A robust and efficient instantaneous relaxation (IR)-based parallel processing

technique which features implicit integration, full Newton iteration, and sparse LU-based linear solver is used to run the

multiple GPUs simultaneously. This implementation highlights the combination of coarse-grained algorithm-level

parallelism with fine-grained data-parallelism of the GPUs to accelerate large-scale transient stability simulation.

Multithreaded parallel programming makes the entire implementation highly transparent, scalable, and efficient. Several

large test systems are used for the simulation with a maximum size of 9,984 buses and 2,560 synchronous generators all

modeled in detail resulting in matrices that are larger than 20, 000 × 20, 000.

As sensors are energy constrained devices, one challenge in wireless sensor networks (WSNs) is to guarantee coverage

and meanwhile maximize network lifetime. In this paper, we leverage prediction to solve this challenging problem, by

exploiting temporal-spatial correlations among sensory data. The basic idea lies in that a sensor node can be turned off

safely when its sensory information can be inferred through some prediction methods, like Bayesian inference. We

adopt the concept of entropy in information theory to evaluate the information uncertainty about the region of interest

(RoI). We formulate the problem as a minimum weight submodular set cover problem, which is known to be NP hard. To

address this problem, an efficient centralized truncated greedy algorithm (TGA) is proposed. We prove the performance

guarantee of TGA in terms of the ratio of aggregate weight obtained by TGA to that by the optimal algorithm.

Considering the decentralization nature of WSNs, we further present a distributed version of TGA, denoted as DTGA,

which can obtain the same solution as TGA. The implementation issues such as network connectivity and

communication cost are extensively discussed. We perform real data experiments as well as simulations to demonstrate

the advantage of DTGA over the only existing competing algorithm [1] and the impacts of different parameters

associated with data correlations on the network lifetime.

Leveraging Prediction to Improve the Coverage of Wireless Sensor Networks

Large-Scale Transient Stability Simulation of Electrical Power Systems on Parallel GPUs



Projects

EGC

3281

EGC

3282

EGC

3283

Energy awareness for computation and protocol management is becoming a crucial factor in the design of protocols

and algorithms. On the other hand, in order to support node mobility, scalable routing strategies have been designed

and these protocols try to consider the path duration in order to respect some QoS constraints and to reduce the route

discovery procedures. Often energy saving and path duration and stability can be two contrasting efforts and trying to

satisfy both of them can be very difficult. In this paper, a novel routing strategy is proposed. This proposed approach

tries to account for link stability and for minimum drain rate energy consumption. In order to verify the correctness of

the proposed solution a biobjective optimization formulation has been designed and a novel routing protocol called

Link-stAbility and Energy aware Routing protocols (LAER) is proposed. This novel routing scheme has been compared

with other three protocols: PERRA, GPSR, and E-GPSR. The protocol performance has been evaluated in terms of Data

Packet Delivery Ratio, Normalized Control Overhead, Link duration, Nodes lifetime, and Average energy consumption.

In this paper, we address the problem of balancing the network traffic load when the data generated in a wireless sensor

network is stored on the sensor node themselves, and accessed through querying a geographic hash table. Existing

approaches allow balancing network load by changing the georouting protocol used to forward queries in the

geographic hash table. However, this comes at the expense of considerably complicating the routing process, which no

longer occurs along (near) straight-line trajectories, but requires computing complex geometric transformations. In this

paper, we demonstrate that it is possible to balance network traffic load in a geographic hash table without changing the

underlying georouting protocol. Instead of changing the (near) straight-line georouting protocol used to send a query

from the node issuing the query (the source) to the node managing the queried key (the destination), we propose to

“reverse engineer” the hash function used to store data in the network, implementing a sort of “load-aware” assignment

of key ranges to wireless sensor nodes. This innovative methodology is instantiated into two specific approaches: an

analytical one, in which the destination density function yielding quasiperfect load balancing is analytically

characterized under uniformity assumptions for what concerns location of nodes and query sources; and an iterative,

heuristic approach that can be used whenever these uniformity assumptions are not fulfilled. In order to prove

practicality of our load balancing methodology, we have performed extensive simulations resembling realistic wireless

sensor network deployments showing the effectiveness of the two proposed approaches in considerably improving load

balancing and extending network lifetime. Simulation results also show that our proposed technique achieves better

load balancing than an existing approach based on modifying georouting.

Meeting Soft Deadlines in Scientific Workflows Using Resubmission Impact

Load Balancing Hashing in Geographic Hash Tables

Link-Stability and Energy Aware Routing Protocol in Distributed Wireless Networks



Projects

EGC

3284

EGC

3285

We propose a new heuristic called Resubmission Impact to support fault tolerant execution of scientific workflows in

heterogeneous parallel and distributed computing environments. In contrast to related approaches, our method can be

effectively used on new or unfamiliar environments, even in the absence of historical executions or failure trace models.

On top of this method, we propose a dynamic enactment and rescheduling heuristic able to execute workflows with a

high degree of fault tolerance, while taking into account soft deadlines. Simulated experiments of three real-world

workflows in the Austrian Grid demonstrate that our method significantly reduces the resource waste compared to

conservative task replication and resubmission techniques, while having a comparable makespan and only a slight

decrease in the success probability. On the other hand, the dynamic enactment method manages to successfully meet

soft deadlines in faulty environments in the absence of historical failure trace information or models.

Broadcast is an essential and widely used operation in multihop wireless networks. Minimum latency broadcast

scheduling (MLBS) aims to find a collision-free scheduling for broadcast with the minimum latency. Previous work on

MLBS mostly assumes that nodes are always active, and, thus, is not suitable for duty-cycled scenarios. In this paper,

we investigate the MLBS problem in duty cycled multihop wireless networks (MLBSDC problem). We prove both the one-

to-all and the all-to-all MLBSDC problems to be NP hard. We propose a novel approximation algorithm called OTAB for

the one-to-all MLBSDC problem, and two approximation algorithms called UTB and UNB for the all-to-all MLBSDC

problem under the unit-size and the unbounded-size message models, respectively. The approximation ratios of the

OTAB, UTB, and UNB algorithms are at most 17|T|, 17|T| + 20, and (Δ + 22)|T|, respectively, where |T| denotes the number

of time slots in a scheduling period, and Δ denotes the maximum node degree of the network. The overhead of our

algorithms is at most constant times as large as the minimum overhead in terms of the total number of transmissions.

We also devise a method called Prune to further reduce the overhead of our algorithms. Extensive simulations are

conducted to evaluate the performance of our algorithms.

Multicluster systems have emerged as a promising infrastructure for provisioning of cost-effective high-performance

computing and communications. Analytical models of communication networks in cluster systems have been widely

reported. However, for tractability and simplicity, the existing models are based on the assumptions that the network

traffic follows the nonbursty Poisson arrival process and the message destinations are uniformly distributed. Recent

measurement studies have shown that the traffic generated by real-world applications reveals the bursty nature in both

the spatial domain (i.e., nonuniform distribution of message destinations) and temporal domain (i.e., bursty message

arrival process). In order to obtain a comprehensive understanding of the system performance, a novel analytical model

is developed for communication networks in multicluster systems in the presence of the spatio-temporal bursty traffic.

The spatial traffic burstiness is captured by the communication locality and the temporal traffic burstiness is modeled

Modeling and Analysis of Communication Networks in Multi cluster Systems under Spatio-Temporal Bursty Traffic

Minimum Latency Broadcast Scheduling in Duty-Cycled Multihop Wireless Networks



Projects

EGC

3286

EGC

3287

EGC

3288

by the Markov-modulated Poisson process. After validating its accuracy through extensive simulation experiments, the

model is used to investigate the impact of bursty message arrivals and communication locality on network performance.

The analytical results demonstrate that the communication locality can relieve the degrading effects of bursty message

arrivals on the network performance.

Memoryless online routing (MOR) algorithms are suitable for the applications only using local information to find paths,

and Delaunay triangulations (DTs) are the class of geometric graphs widely proposed as network topologies. Motivated

by these two facts, this paper reports a variety of new MOR algorithms that work for Delaunay triangulations, thus

greatly enriching the family of such algorithms. This paper also evaluates and compares these new algorithms with

three existing MOR algorithms. The experimental results shed light on their performance in terms of both Euclidean and

link metrics, and also reveal certain properties of Delaunay triangulations. Finally, this paper poses three open

problems, with their importance explained.

In computer networks, multicast models a class of data dissemination applications, where a common data item is routed

to multiple receivers simultaneously. The routing of multicast flows across the network may incur a cost, and such a

cost is to be recovered from payments by receivers who enjoy the multicast service. In reality, a group of potential

multicast receivers exist at different network locations. Each receiver has a valuation for receiving the multicast service,

but such valuation is private information known to itself. A multicast scheme asks each potential receiver to report her

valuation, then decides which subset of potential receivers to serve, how to route the multicast flow to them, and how

much to charge each of them. A multicast scheme is stragegyproof if no receiver has incentive to lie about her true

valuation. It is further group strategyproof if no group of colluding receivers has incentive to lie. We study multicast

schemes that target group strategyproofness, in both directed and undirected networks. Our main results reveal that

under group strategyproofness, a compromise is necessary in either routing optimality or budget balance. We also

design multicast schemes that pursue maximum budget balance while guaranteeing group stragetyproofness and

routing optimality.

On Coverage of Wireless Sensor Networks for Rolling Terrains

On Achieving Group-Strategy proof Multicast

New Memory-less Online Routing Algorithms for Delaunay Triangulations



Projects

EGC

3289

EGC

3290

Deriving the proper density to achieve the region coverage for random sensors deployment is a fundamentally important

problem in the area of wireless sensor networks. Most existing works on sensor coverage mainly concentrate on the

two-dimensional (2D) plane coverage which assume that all the sensors are deployed on an ideal plane. In contrast,

sensors are also deployed on the three-dimensional (3D) rolling surfaces in many real applications. Toward this end, we

study the coverage problem of wireless sensor networks for the rolling terrains, and derive the expected coverage ratios

under the stochastic sensors deployment. According to the different terrain features, we investigate two kinds of terrain

coverage problems: the regular terrain coverage problem and the irregular terrain coverage problem. Specifically, we

derive the general expression of the expected coverage ratio for an arbitrary surface z=f(x, y) and build two models, cone

model and Cos-revolution model, to estimate the expected coverage ratios for regular terrains. For irregular terrains, we

propose a digital elevation model (DEM) based method to calculate the expected coverage ratio and design an algorithm

to estimate the expected coverage ratio of an interested region by using only the contour map of this region. We also

conduct extensive simulations to validate and evaluate our proposed models and schemes.

Wireless Sensor Networks (WSNs) are key for various applications that involve long-term and low-cost monitoring and

actuating. In these applications, sensor nodes use batteries as the sole energy source. Therefore, energy efficiency

becomes critical. We observe that many WSN applications require redundant sensor nodes to achieve fault tolerance

and Quality of Service (QoS) of the sensing. However, the same redundancy may not be necessary for multihop

communication because of the light traffic load and the stable wireless links. In this paper, we present a novel sleep-

scheduling technique called Virtual Backbone Scheduling (VBS). VBS is designed for WSNs has redundant sensor

nodes. VBS forms multiple overlapped backbones which work alternatively to prolong the network lifetime. In VBS,

traffic is only forwarded by backbone sensor nodes, and the rest of the sensor nodes turn off their radios to save

energy. The rotation of multiple backbones makes sure that the energy consumption of all sensor nodes is balanced,

which fully utilizes the energy and achieves a longer network lifetime compared to the existing techniques. The

scheduling problem of VBS is formulated as the Maximum Lifetime Backbone Scheduling (MLBS) problem. Since the

MLBS problem is NP-hard, we propose approximation algorithms based on the Schedule Transition Graph (STG) and

Virtual Scheduling Graph (VSG). We also present an Iterative Local Replacement (ILR) scheme as a distributed

implementation. Theoretical analyses and simulation studies verify that VBS is superior to the existing techniques.

Unstructured peer-to-peer (P2P) file-sharing networks are popular in the mass market. As the peers participating in

unstructured networks interconnect randomly, they rely on flooding query messages to discover objects of interest and

thus introduce remarkable network traffic. Empirical measurement studies indicate that the peers in P2P networks have

similar preferences, and have recently proposed unstructured P2P networks that organize participating peers by

On Optimizing Overlay Topologies for Search in Unstructured Peer-to-Peer Networks

On Maximizing the Lifetime of Wireless Sensor Networks Using Virtual Backbone Scheduling



Projects

EGC

3292

EGC

3291

exploiting their similarity. The resultant networks may not perform searches efficiently and effectively because existing

overlay topology construction algorithms often create unstructured P2P networks without performance guarantees.

Thus, we propose a novel overlay formation algorithm for unstructured P2P networks. Based on the file sharing pattern

exhibiting the power-law property, our proposal is unique in that it poses rigorous performance guarantees. Theoretical

performance results conclude that in a constant probability, 1) searching an object in our proposed network efficiently

takes O(lnc N) hops (where c is a small constant), and 2) the search progressively and effectively exploits the similarity

of peers. In addition, the success ratio of discovering an object approximates 100 percent. We validate our theoretical

analysis and compare our proposal to competing algorithms in simulations. Based on the simulation results, our

proposal clearly outperforms the competing algorithms in terms of 1) the hop count of routing a query message, 2) the

successful ratio of resolving a query, 3) the number of messages required for resolving a query, and 4) the message

overhead for maintaining and formatting the overlay.

In this paper, we propose a systematic approach to designing and deploying a RFID Assisted Navigation System (RFID-

ANS) for VANETs. RFID-ANS consists of passive tags deployed on roads to provide navigation information while the

RFID readers attached to the center of the vehicle bumper query the tag when passing by to obtain the data for

navigation guidance. We analyze the design criteria of RFID-ANS and present the design of the RFID reader in detail to

support vehicles at high speeds. We also jointly consider the scheduling of the read attempts and the deployment of

RFID tags based on the navigation requirements to support seamless navigations. The estimation of the vehicle position

and its accuracy are also investigated.

We are interested in the sensor networks for scientific applications to cover and measure statistics on the sea surface.

Due to flows and waves, the sensor nodes may gradually lose their positions; leaving the points of interest uncovered.

Manual readjustment is costly and cannot be performed in time. We argue that a network of mobile sensor nodes which

can perform self-adjustment is the best candidate to maintain the coverage of the surface area. In our application, we

face a unique double mobility coverage problem. That is, there is an uncontrollable mobility, U-Mobility, by the flows

which breaks the coverage of the sensor network. Moreover, there is also a controllable mobility, C-Mobility, by the

mobile nodes which we can utilize to reinstall the coverage. Our objective is to build an energy efficient scheme for the

sensor network coverage issue with this double mobility behavior. A key observation of our scheme is that the motion of

the flow is not only a curse but should also be considered as a fortune. The sensor nodes can be pushed to some

locations by the U-Mobility that potentially help to improve the overall coverage. With that taken into consideration, more

efficient movement decision can be made. To this end, we present a dominating set maintenance scheme to maximally

On the Double Mobility Problem for Water Surface Coverage with Mobile Sensor Networks

On the Design and Deployment of RFID Assisted Navigation Systems for VANETs



Projects

EGC

3293

EGC

3294

EGC

3295

exploit the U-Mobility and balance the energy consumption among all the sensor nodes. We prove that the coverage is

guaranteed in our scheme. We further propose a fully distributed protocol that addresses a set of practical issues.

Through extensive simulation, we demonstrate that the network lifetime can be significantly extended, compared to a

straightforward back-to-original reposition scheme.

Consider a wireless multihop network where nodes are randomly distributed in a given area following a homogeneous

Poisson process. The hop count statistics, viz. the probabilities related to the number of hops between two nodes, are

important for performance analysis of the multihop networks. In this paper, we provide analytical results on the

probability that two nodes separated by a known euclidean distance are k hops apart in networks subject to both

shadowing and small-scale fading. Some interesting results are derived which have generic significance. For example, it

is shown that the locations of nodes three or more hops away provide little information in determining the relationship

of a node with other nodes in the network. This observation is useful for the design of distributed routing, localization,

and network security algorithms. As an illustration of the application of our results, we derive the effective energy

consumption per successfully transmitted packet in end-to-end packet transmissions. We show that there exists an

optimum transmission range which minimizes the effective energy consumption. The results provide useful guidelines

on the design of a randomly deployed network in a more realistic radio environment.

This paper presents an online scheduling methodology for task graphs with communication edges for multiprocessor

embedded systems. The proposed methodology is designed for task graphs which are dynamic in nature either due to

the presence of conditional paths or due to presence of tasks whose execution times vary. We have assumed

homogeneous processors with broadcast and point-to-point communication models and have presented online

algorithms for them. We show that this technique adapts better to variation in task graphs at runtime and provides better

schedule length compared to a static scheduling methodology. Experimental results indicate up to 21.5 percent average

improvement over purely static schedulers. The effects of model parameters like number of processors, memory, and

other task graph parameters on performance are investigated in this paper.

Resource allocation and job scheduling are the core functions of grid computing. These functions are based on

adequate information of available resources. Timely acquiring resource status information is of great importance in

Online System for Grid Resource Monitoring and Machine Learning-Based Prediction

Online Scheduling of Dynamic Task Graphs with Communication and Contention for Multiprocessors

On the Hop Count Statistics in Wireless Multi hop Networks Subject to Fading



Projects

EGC

3297

EGC

3296

EGC

3298

ensuring overall performance of grid computing. This work aims at building a distributed system for grid resource

monitoring and prediction. In this paper, we present the design and evaluation of a system architecture for grid resource

monitoring and prediction. We discuss the key issues for system implementation, including machine learning-based

methodologies for modeling and optimization of resource prediction models. Evaluations are performed on a prototype

system. Our experimental results indicate that the efficiency and accuracy of our system meet the demand of online

system for grid resource monitoring and prediction.

Loops are the main source of parallelism in many applications. This paper solves the open problem of extracting the

maximal number of iterations from a loop to run parallel on chip multiprocessors. Our algorithm solves it optimally by

migrating the weights of parallelism-inhibiting dependences on dependence cycles in two phases. First, we model

dependence migration with retiming and formulate this classic loop parallelization into a graph optimization problem,

i.e., one of finding retiming values for its nodes so that the minimum nonzero edge weight in the graph is maximized. We

present our algorithm in three stages with each being built incrementally on the preceding one. Second, the optimal

code for a loop is generated from the retimed graph of the loop found in the first phase. We demonstrate the

effectiveness of our optimal algorithm by comparing with a number of representative nonoptimal algorithms using a set

of benchmarks frequently used in prior work and a set of graphs generated by TGFF.

With the recent advent of cloud computing, the concept of outsourcing computations, initiated by volunteer computing

efforts, is being revamped. While the two paradigms differ in several dimensions, they also share challenges, stemming

from the lack of trust between outsourcers and workers. In this work, we propose a unifying trust framework, where

correct participation is financially rewarded: neither participant is trusted, yet outsourced computations are efficiently

verified and validly remunerated. We propose three solutions for this problem, relying on an offline bank to generate and

redeem payments; the bank is oblivious to interactions between outsourcers and workers. We propose several attacks

that can be launched against our framework and study the effectiveness of our solutions. We implemented our most

secure solution and our experiments show that it is efficient: the bank can perform hundreds of payment transactions

per second and the overheads imposed on outsourcers and workers are negligible.

Performance Analysis of Cloud Computing Centers Using M/G/m/m+r Systems

Payments for Outsourced Computations

Optimally Maximizing Iteration-Level Loop Parallelism



Projects

EGC

3299

EGC

3300

Successful development of cloud computing paradigm necessitates accurate performance evaluation of cloud data

centers. As exact modeling of cloud centers is not feasible due to the nature of cloud centers and diversity of user

requests, we describe a novel approximate analytical model for performance evaluation of cloud server farms and solve

it to obtain accurate estimation of the complete probability distribution of the request response time and other important

performance indicators. The model allows cloud operators to determine the relationship between the number of servers

and input buffer size, on one side, and the performance indicators such as mean number of tasks in the system,

blocking probability, and probability that a task will obtain immediate service, on the other.

Wireless sensor networks (WSNs) have been widely used in many areas for critical infrastructure monitoring and

information collection. While confidentiality of the message can be ensured through content encryption, it is much more

difficult to adequately address source-location privacy (SLP). For WSNs, SLP service is further complicated by the

nature that the sensor nodes generally consist of low-cost and low-power radio devices. Computationally intensive

cryptographic algorithms (such as public-key cryptosystems), and large scale broadcasting-based protocols may not be

suitable. In this paper, we first propose criteria to quantitatively measure source-location information leakage in routing-

based SLP protection schemes for WSNs. Through this model, we identify vulnerabilities of some well-known SLP

protection schemes. We then propose a scheme to provide SLP through routing to a randomly selected intermediate

node (RSIN) and a network mixing ring (NMR). Our security analysis, based on the proposed criteria, shows that the

proposed scheme can provide excellent SLP. The comprehensive simulation results demonstrate that the proposed

scheme is very efficient and can achieve a high message delivery ratio. We believe it can be used in many practical

applications.

Recently, several data aggregation schemes based on privacy homomorphism encryption have been proposed and

investigated on wireless sensor networks. These data aggregation schemes provide better security compared with

traditional aggregation since cluster heads (aggregator) can directly aggregate the ciphertexts without decryption;

consequently, transmission overhead is reduced. However, the base station only retrieves the aggregated result, not

individual data, which causes two problems. First, the usage of aggregation functions is constrained. For example, the

base station cannot retrieve the maximum value of all sensing data if the aggregated result is the summation of sensing

data. Second, the base station cannot confirm data integrity and authenticity via attaching message digests or

signatures to each sensing sample. In this paper, we attempt to overcome the above two drawbacks. In our design, the

base station can recover all sensing data even these data has been aggregated. This property is called “recoverable.”

Experiment results demonstrate that the transmission overhead is still reduced even if our approach is recoverable on

RCDA: Recoverable Concealed Data Aggregation for Data Integrity in Wireless Sensor Networks

Quantitative Measurement and Design of Source-Location Privacy Schemes for Wireless Sensor Networks



Projects

EGC

3302

EGC

3301

sensing data. Furthermore, the design has been generalized and adopted on both homogeneous and heterogeneous

wireless sensor networks.

RFID tag identification is a crucial problem in UHF RFID systems. Traditional tag identification algorithms can be

classified into two categories, ALOHA-based and tree-based. Both of them are inefficient due to the incidental high

coordination cost. In this paper, we bring CSMA into UHF RFID systems to enhance tag read rate by reducing

coordination cost. However, it is not straightforward due to the simple hardware design of passive RFID tags, which is

unable to sense the transmissions or collisions of other tags. To tackle this challenge, we propose receiver-based CSMA

(RCSMA) in this paper. In RCSMA, the reader notifies the tags channel condition. According to different sensing results

of reader's notifications, the tags take corresponding actions, e.g., random back off. RCSMA does not require special

RFID tag hardware design. An absorbing Markov chain model is presented to analyze the performance of RCSMA and

shown to be consistent with the simulation results. Compared with optimized ALOHA-based algorithms and optimized

tree-based algorithms, RCSMA can enhance the tag read rate by 30-70 percent under different reader and tag data rates.

This paper presents the design, deployment, and evaluation of a real-world sensor network system in an active volcano -

Mount St. Helens. In volcano monitoring, the maintenance is extremely hard and system robustness is one of the

biggest concerns. However, most system research to date has focused more on performance improvement and less on

system robustness. In our system design, to address this challenge, automatic fault detection and recovery mechanisms

were designed to autonomously roll the system back to the initial state if exceptions occur. To enable remote

management, we designed a configurable sensing and flexible remote command and control mechanism with the

support of a reliable dissemination protocol. To maximize data quality, we designed event detection algorithms to

identify volcanic events and prioritize the data, and then deliver higher priority data with higher delivery ratio with an

adaptive data transmission protocol. Also, a light-weight adaptive linear predictive compression algorithm and localized

TDMA MAC protocol were designed to improve network throughput. With these techniques and other improvements on

intelligence and robustness based on a previous trial deployment, we air-dropped 13 stations into the crater and around

the flanks of Mount St. Helens in July 2009. During the deployment, the nodes autonomously discovered each other

even in-the-sky and formed a smart mesh network for data delivery immediately. We conducted rigorous system

evaluations and discovered many interesting findings on data quality, radio connectivity, network performance, as well

as the influence of environmental factors.

Real-World Sensor Network for Long-Term Volcano Monitoring: Design and Findings

RCSMA: Receiver-Based Carrier Sense Multiple Access in UHF RFID Systems



Projects

EGC

3305

EGC

3303

EGC

3304

Some of today's TM systems implement the two-phase-locking (2PL) algorithm which aborts transactions every time a

conflict occurs. 2PL is a simple algorithm that provides fast transactional operations. However, it limits concurrency in

benchmarks with high contention because it increases the rate of aborts. We propose the use of a more relaxed

concurrency control algorithm to provide better concurrency. This algorithm is based on the conflict-serializability (CS)

model. Unlike 2PL, it allows some transactions to commit successfully even when they make conflicting accesses. We

implement this algorithm in a STM system and evaluate its performance on 16 cores using standard benchmarks. Our

evaluation shows that the algorithm improves the performance of applications with long transactions and high abort

rates. Throughput is improved by up to 2.99 times despite the overheads of testing for CS at runtime. These

improvements come with little additional implementation complexity and require no changes to the transactional

programming model. We also propose an adaptive approach that switches between 2PL and CS to mitigate the overhead

in applications that have low abort rates.

Weak reliability and low energy efficiency are the inherent problems in Underwater Sensor Networks (USNs)

characterized by the acoustic channels. Although multiple-path communications coupled by Forward Error Correction

(FEC) can achieve high performance for USNs, the low probability of successful recovery of received packets in the

destination node significantly affects the overall Packet Error Rate (PER) and the number of multiple paths required,

which in turn becomes a critical factor for reliability and energy consumption. In this paper, a novel Multiple-path FEC

approach (M-FEC) based on Hamming Coding is proposed for improving reliability and energy efficiency in USNs. A

Markovian model is developed to formulate the probability of M-FEC and calculate the overall PER for the proposed

decision and feedback scheme, which can reduce the number of the multiple paths and achieve the desirable overall

PER in M-FEC. Compared to the existing multipath communication scheme, extensive simulation experiments show that

the proposed approach achieves significantly lower packet delay while consuming only 20-30 percent of energy in

multiple-path USNs with various Bit Error Rates (BER).

In unstructured peer-to-peer networks, the average response latency and traffic cost of a query are two main

performance metrics. Controlled-flooding resource query algorithms are widely used in unstructured networks such as

Revisiting Dynamic Query Protocols in Unstructured Peer-to-Peer Networks

Reliable and Energy-Efficient Multipath Communications in Underwater Sensor Networks

Relaxed Concurrency Control in Software Transactional Memory



Projects

EGC

3306

EGC

3307

peer-to-peer networks. In this paper, we propose a novel algorithm named Selective Dynamic Query (SDQ). Based on

mathematical programming, SDQ calculates the optimal combination of an integer TTL value and a set of neighbors to

control the scope of the next query. Our results demonstrate that SDQ provides finer grained control than other

algorithms: its response latency is close to the well-known minimum one via Expanding Ring; in the mean time, its

traffic cost is also close to the minimum. To our best knowledge, this is the first work capable of achieving a best trade-

off between response latency and traffic cost.

Runtime detection of contextual properties is one of the primary approaches to enabling context-awareness in pervasive

computing scenarios. Among various properties the applications may specify, the concurrency property, i.e., property

delineating concurrency among contextual activities, is of great importance. It is because the concurrency property is

one of the most frequently specified properties by context-aware applications. Moreover, the concurrency property

serves as the basis for specification of many other properties. Existing schemes implicitly assume that context

collecting devices share the same notion of time. Thus, the concurrency property can be easily detected. However, this

assumption does not necessarily hold in pervasive computing environments, which are characterized by the

asynchronous coordination among heterogeneous computing entities. To cope with this challenge, we identify and

address three essential issues. First, we introduce logical time to model behavior of the asynchronous pervasive

computing environment. Second, we propose the logic for specification of the concurrency property. Third, we propose

the Concurrent contextual Activity Detection in Asynchronous environments (CADA) algorithm, which achieves runtime

detection of the concurrency property. Performance analysis and experimental evaluation show that CADA effectively

detects the concurrency property in asynchronous pervasive computing scenarios.

In RFID literature, most “privacy-preserving” protocols require the reader to search all tags in the system in order to

identify a single tag. In another class of protocols, the search complexity is reduced to be logarithmic in the number of

tags, but it comes with two major drawbacks: it requires a large communication overhead over the fragile wireless

channel, and the compromise of a tag in the system reveals secret information about other, uncompromised, tags in the

same system. In this work, we take a different approach to address time complexity of private identification in large-

scale RFID systems. We utilize the special architecture of RFID systems to propose a symmetric-key privacy-preserving

authentication protocol for RFID systems with constant-time identification. Instead of increasing communication

overhead, the existence of a large storage device in RFID systems, the database, is utilized for improving the time

efficiency of tag identification.

Scalable RFID Systems: A Privacy-Preserving Protocol with Constant-Time Identification

Runtime Detection of the Concurrency Property in Asynchronous Pervasive Computing Environments



Projects

EGC

3310

EGC

3308

EGC

3309

Self-protection refers to the ability for a system to detect illegal behaviors and to fight-back intrusions with counter-

measures. This article presents the design, the implementation, and the evaluation of a self-protected system which

targets clustered distributed applications. Our approach is based on the structural knowledge of the cluster and of the

distributed applications. This knowledge allows to detect known and unknown attacks if an illegal communication

channel is used. The current prototype is a self-protected JEE infrastructure (Java 2 Enterprise Edition) with firewall-

based intrusion detection. Our prototype induces low-performance penalty for applications.

Existing data storage systems based on the hierarchical directory-tree organization do not meet the scalability and

functionality requirements for exponentially growing data sets and increasingly complex metadata queries in large-

scale, Exabyte-level file systems with billions of files. This paper proposes a novel decentralized semantic-aware

metadata organization, called SmartStore, which exploits semantics of files' metadata to judiciously aggregate

correlated files into semantic-aware groups by using information retrieval tools. The key idea of SmartStore is to limit

the search scope of a complex metadata query to a single or a minimal number of semantically correlated groups and

avoid or alleviate brute-force search in the entire system. The decentralized design of SmartStore can improve system

scalability and reduce query latency for complex queries (including range and top-k queries). Moreover, it is also

conducive to constructing semantic-aware caching, and conventional filename-based point query. We have implemented

a prototype of SmartStore and extensive experiments based on real-world traces show that SmartStore significantly

improves system scalability and reduces query latency over database approaches. To the best of our knowledge, this is

the first study on the implementation of complex queries in large-scale file systems.

In this paper, we propose a Fine Grained Cycle Sharing (FGCS) system capable of exploiting idle Graphics Processing

Units (GPUs) for accelerating sequence homology search in local area network environments. Our system exploits short

idle periods on GPUs by running small parts of guest programs such that each part can be completed within hundreds

of milliseconds. To detect such short idle periods from the pool of registered resources, our system continuously

monitors keyboard and mouse activities via event handlers rather than waiting for a screensaver, as is typically

deployed in existing systems. Our system also divides guest tasks into small parts according to a performance model

that estimates execution times of the parts. This task division strategy minimizes any disruption to the owners of the

Sequence Homology Search Using Fine Grained Cycle Sharing of Idle GPUs

Semantic-Aware Metadata Organization Paradigm in Next-Generation File Systems

Self-Protection in a Clustered Distributed System



Projects

EGC

3311

EGC

3312

GPU resources. Experimental results show that our FGCS system running on two nondedicated GPUs achieves 111-116

percent of the throughput achieved by a single dedicated GPU. Furthermore, our system provides over two times the

throughput of a screensaver-based system. We also show that the idle periods detected by our system constitute half of

the system uptime. We believe that the GPUs hidden and often unused in office environments provide a powerful

solution to sequence homology search.

Surveillance is a critical problem for harbor protection, border control or the security of commercial facilities. The

effective protection of vast near-coast sea surfaces and busy harbor areas from intrusions of unauthorized marine

vessels, such as pirates smugglers or, illegal fishermen is particularly challenging. In this paper, we present an

innovative solution for ship intrusion detection. Equipped with three-axis accelerometer sensors, we deploy an

experimental Wireless Sensor Network (WSN) on the sea's surface to detect ships. Using signal processing techniques

and cooperative signal processing, we can detect any passing ships by distinguishing the ship-generated waves from

the ocean waves. We design a three-tier intrusion detection system with which we propose to exploit spatial and

temporal correlations of an intrusion to increase detection reliability. We conduct evaluations with real data collected in

our initial experiments, and provide quantitative analysis of the detection system, such as the successful detection ratio,

detection latency, and an estimation of an intruding vessel's velocity.

In today's data centers, precisely controlling server power consumption is an essential way to avoid system failures

caused by power capacity overload or overheating due to increasingly high server density. While various power control

strategies have been recently proposed, existing solutions are not scalable to control the power consumption of an

entire large-scale data center, because these solutions are designed only for a single server or a rack enclosure. In a

modern data center, however, power control needs to be enforced at three levels: rack enclosure, power distribution

unit, and the entire data center, due to the physical and contractual power limits at each level. This paper presents SHIP,

a highly scalable hierarchical power control architecture for large-scale data centers. SHIP is designed based on well-

established control theory for analytical assurance of control accuracy and system stability. Empirical results on a

physical testbed show that our control solution can provide precise power control, as well as power differentiations for

optimized system performance and desired server priorities. In addition, our extensive simulation results based on a real

trace file demonstrate the efficacy of our control solution in large-scale data centers composed of 5,415 servers.

SHIP: A Scalable Hierarchical Power Control Architecture for Large-Scale Data Centers

Ship Detection with Wireless Sensor Networks



Projects

EGC

3313

EGC

3314

EGC

3315

EGC

3314

In this paper, we focus on critical event monitoring in wireless sensor networks (WSNs), where only a small number of

packets need to be transmitted most of the time. When a critical event occurs, an alarm message should be broadcast to

the entire network as soon as possible. To prolong the network lifetime, some sleep scheduling methods are always

employed in WSNs, resulting in significant broadcasting delay, especially in large scale WSNs. In this paper, we propose

a novel sleep scheduling method to reduce the delay of alarm broadcasting from any sensor node in WSNs. Specifically,

we design two determined traffic paths for the transmission of alarm message, and level-by-level offset based wake-up

pattern according to the paths, respectively. When a critical event occurs, an alarm is quickly transmitted along one of

the traffic paths to a center node, and then it is immediately broadcast by the center node along another path without

collision. Therefore, two of the big contributions are that the broadcasting delay is independent of the density of nodes

and its energy consumption is ultra low. Exactly, the upper bound of the broadcasting delay is only 3D+2L, where D is

the maximum hop of nodes to the center node, L is the length of sleeping duty cycle, and the unit is the size of time slot.

Extensive simulations are conducted to evaluate these notable performances of the proposed method compared with

existing works.

A major challenge in wireless networks is the ability to maximize the throughput. High throughput in a wireless network

requires a relatively low complex scheduling policy with a provable efficiency ratio, which is a measure of the

performance of the policy in terms of throughput and stability. For most scheduling policies that achieve provable

ratios, at the onset of every frame, a selection is made of a subset of links to transmit data in the immediately following

frame. In this paper, we propose a policy that allows links to transmit data in any future frame by means of frame

reservations. The new, reservation-based distributed scheduling approach will improve the capacity of the system and

provide greater throughput. First, we create a framework to analyze the stability of reservation-based scheduling

systems. Then, to demonstrate its efficacy, we propose a reservation-based distributed scheduling policy for IEEE

802.16 mesh networks and use the new framework to find sufficient conditions for the stability of the network under this

policy, i.e., we find a lower bound for its efficiency ratio. Finally, by means of simulation, we validate the mathematical

analysis and compare the performance of our policy with nonreservation-based policies.

Multiprocessor operating systems (OSs) pose several unique and conflicting challenges to System Virtual Machines

(System VMs). For example, most existing system VMs resort to gang scheduling a guest OS's virtual processors

Supporting Overcommitted Virtual Machines through Hardware Spin Detection

Stability Analysis of Reservation-Based Scheduling Policies in Wireless Networks

Sleep Scheduling for Critical Event Monitoring in Wireless Sensor Networks



Projects

EGC

3316

EGC

3317

(VCPUs) to avoid OS synchronization overhead. However, gang scheduling is infeasible for some application domains,

and is inflexible in other domains. In an overcommitted environment, an individual guest OS has more VCPUs than

available physical processors (PCPUs), precluding the use of gang scheduling. In such an environment, we demonstrate

a more than two-fold increase in application runtime when transparently virtualizing a chip-multiprocessor's cores. To

combat this problem, we propose a hardware technique to detect when a VCPU is wasting CPU cycles, and preempt that

VCPU to run a different, more productive VCPU. Our technique can dramatically reduce cycles wasted on OS

synchronization, without requiring any semantic information from the software. We then present a server consolidation

case study to demonstrate the potential of more flexible scheduling policies enabled by our technique. We propose one

such policy that logically partitions the CMP cores between guest VMs. This policy increases throughput by 10-25

percent for consolidated server workloads due to improved cache locality and core utilization.

Cache sharing on modern Chip Multiprocessors (CMPs) reduces communication latency among corunning threads, and

also causes interthread cache contention. Most previous studies on the influence of cache sharing have concentrated

on the design or management of shared cache. The observed influence is often constrained by the reliance on

simulators, the use of out-of-date benchmarks, or the limited coverage of deciding factors. This paper describes a

systematic measurement of the influence with most of the potentially important factors covered. The measurement

shows some surprising results. Contrary to commonly perceived importance of cache sharing, neither positive nor

negative effects from the cache sharing are significant for most of the program executions in the PARSEC benchmark

suite, regardless of the types of parallelism, input data sets, architectures, numbers of threads, and assignments of

threads to cores. After a detailed analysis, we find that the main reason is the mismatch between the software design

(and compilation) of multithreaded applications and CMP architectures. By performing source code transformations on

the programs in a cache-sharing-aware manner, we observe up to 53 percent performance increase when the threads are

placed on cores appropriately, confirming the software-hardware mismatch as a main reason for the observed

insignificance of the influence from cache sharing, and indicating the important role of cache-sharing-aware

transformations-a topic only sporadically studied so far-for exerting the power of shared cache.

Mobile sinks (MSs) are vital in many wireless sensor network (WSN) applications for efficient data accumulation,

localized sensor reprogramming, and for distinguishing and revoking compromised sensors. However, in sensor

networks that make use of the existing key predistribution schemes for pairwise key establishment and authentication

between sensor nodes and mobile sinks, the employment of mobile sinks for data collection elevates a new security

challenge: in the basic probabilistic and q-composite key predistribution schemes, an attacker can easily obtain a large

number of keys by capturing a small fraction of nodes, and hence, can gain control of the network by deploying a

The Three-Tier Security Scheme in Wireless Sensor Networks with Mobile Sinks

The Significance of CMP Cache Sharing on Contemporary Multithreaded Applications



Projects

EGC

3318

EGC

3319

replicated mobile sink preloaded with some compromised keys. This article describes a three-tier general framework

that permits the use of any pairwise key predistribution scheme as its basic component. The new framework requires

two separate key pools, one for the mobile sink to access the network, and one for pairwise key establishment between

the sensors. To further reduce the damages caused by stationary access node replication attacks, we have strengthened

the authentication mechanism between the sensor and the stationary access node in the proposed framework. Through

detailed analysis, we show that our security framework has a higher network resilience to a mobile sink replication

attack as compared to the polynomial pool-based scheme.

This paper investigates throughput and delay based on a newly predominant traffic pattern, called converge-cast, where

each of the n nodes in the network act as a destination with k randomly chosen sources corresponding to it. Adopting

Multiple-Input-Multiple-Output (MIMO) technology, we devise two many-to-one cooperative schemes under converge-

cast for both static and mobile ad hoc networks (MANETs), respectively. In a static network, our scheme highly utilizes

hierarchical cooperation MIMO transmission. This feature overcomes the bottleneck which hinders converge-cast traffic

from yielding ideal performance in traditional ad hoc network, by turning the originally interfering signals into

interference-resistant ones. It helps to achieve an aggregate throughput up to Ω(n1-ε) for any ε >; 0. In the mobile ad hoc

case, our scheme characterizes on joint transmission from multiple nodes to multiple receivers. With optimal network

division where the number of nodes per cell is constant bounded, the achievable per-node throughput can reach Θ(1)

with the corresponding delay reduced to Θ(k). The gain comes from the strong and intelligent cooperation between

nodes in our scheme, along with the maximum number of concurrent active cells and the shortest waiting time before

transmission for each node within a cell. This, to a great extent, increases the chances for each destination to receive

the data it needs with minimum overhead on extra transmission. Moreover, our converge-based analysis well unifies and

generalizes previous work since the results derived from converge-cast in our schemes can also cover other traffic

patterns. Last but not least, our cooperative schemes are of interest not only from a theoretical perspective but also

shed light on future design of MIMO schemes in wireless networks.

Contemporary traffic demands call for efficient infrastructures capable of sustaining increasing volumes of social

communications. In this work, we focus on improving the properties of wireless multihop networks with social features

through network evolution. Specifically, we introduce a framework, based on inverse Topology Control (iTC), for

distributively modifying the transmission radius of selected nodes, according to social paradigms. Distributed iTC

mechanisms are proposed for exploiting evolutionary network churn in the form of edge/node modifications, without

significantly impacting available resources. We employ continuum theory for analytically describing the proposed top-

down approach of infusing social features in physical topologies. Through simulations, we demonstrate how these

Topology Enhancements in Wireless Multihop Networks: A Top-Down Approach

Converge cast with MIMO in Wireless Networks



Projects

EGC

3320

EGC

3321

EGC

3322

EGC

3319

EGC

3322

mechanisms achieve their goal of reducing the average path length, so as to make a wireless multihop network scale

like a social one, while retaining its original multihop character. We study the impact of the proposed topology

modifications on the operation and performance of the network with respect to the average throughput, delay, and

energy consumption of the induced network.

Online forums have long since been the most popular platform for people to communicate and share ideas. Nowadays,

with the boom of multimedia sharing, users tend to share more and more with their online peers within online

communities such as forums. The server-client model of forums has been used since its creation in the mid-1990s.

However, this model has begun to fall short in meeting the increasing need of bandwidth and storage resources as an

increasing number of people share more and more multimedia content. In this work, we first investigate the unique

properties of forums based on the data collected from the Disney discussion boards. According to these properties, we

design a scheme to support P2P-based multimedia sharing in forums called Multimedia Board (MBoard). Extensive

trace-driven simulation results utilizing real trace data show that MBoard can significantly reduce the load on the server

while maintaining a high quality of service for the users.

Dense matrix inversion is a basic procedure in many linear algebra algorithms. Any factorization-based dense matrix

inversion algorithm involves the inversion of one or two triangular matrices. In this work, we present an improved

implementation of a parallel triangular matrix inversion for heterogeneous multicore CPU/dual-GPU systems.

The Web Services Atomic Transactions (WS-AT) specification makes it possible for businesses to engage in standard

distributed transaction processing over the Internet using Web Services technology. For such business applications,

trustworthy coordination of WS-AT is crucial. In this paper, we explain how to render WS-AT coordination trustworthy by

applying Byzantine Fault Tolerance (BFT) techniques. More specifically, we show how to protect the core services

described in the WS-AT specification, namely, the Activation service, the Registration service, the Completion service

and the Coordinator service, against Byzantine faults. The main contribution of this work is that it exploits the semantics

of the WS-AT services to minimize the use of Byzantine Agreement (BA), instead of applying BFT techniques naively,

which would be prohibitively expensive. We have incorporated our BFT protocols and mechanisms into an open-source

framework that implements the WS-AT specification. The resulting BFT framework for WS-AT is useful for business

applications that are based on WS-AT and that require a high degree of dependability, security, and trust.

Trustworthy Coordination of Web Services Atomic Transactions

Triangular Matrix Inversion on Heterogeneous Multicore Systems

Toward P2P-Based Multimedia Sharing in User Generated Contents



Projects

EGC

3323

EGC

3324

Read-copy update (RCU) is a synchronization technique that often replaces reader-writer locking because RCU's read-

side primitives are both wait-free and an order of magnitude faster than uncontended locking. Although RCU updates

are relatively heavy weight, the importance of read-side performance is increasing as computing systems become more

responsive to changes in their environments. RCU is heavily used in several kernel-level environments. Unfortunately,

kernel-level implementations use facilities that are often unavailable to user applications. The few prior user-level RCU

implementations either provided inefficient read-side primitives or restricted the application architecture. This paper fills

this gap by describing efficient and flexible RCU implementations based on primitives commonly available to user-level

applications. Finally, this paper compares these RCU implementations with each other and with standard locking, which

enables choosing the best mechanism for a given workload. This work opens the door to widespread user-application

use of RCU.

We propose and analyze a distributed cooperative caching strategy based on the Evolutive Summary Counters (ESC), a

new data structure that stores an approximated record of the data accesses in each computing node of a search engine.

The ESC capture the frequency of accesses to the elements of a data collection, and the evolution of the access patterns

for each node in a network of computers. The ESC can be efficiently summarized into what we call ESC-summaries to

obtain approximate statistics of the document entries accessed by each computing node. We use the ESC-summaries to

introduce two algorithms that manage our distributed caching strategy, one for the distribution of the cache contents,

ESC-placement, and another one for the search of documents in the distributed cache, ESC-search. While the former

improves the hit rate of the system and keeps a large ratio of data accesses local, the latter reduces the network traffic

by restricting the number of nodes queried to find a document. We show that our cooperative caching approach

outperforms state-of-the-art models in both hit rate, throughput, and location recall for multiple scenarios, i.e., different

query distributions and systems with varying degrees of complexity.

User-Level Implementations of Read-Copy Update

Using Evolutive Summary Counters for Efficient Cooperative Caching in Search Engines

ieee projects 2012 2013 - parallal and distributed computing

Education

routing performance

student projects parallel

delay tolerant networks

packet delivery rate

distributed computing

model performance

routing process

fold performance improvement