arxiv:1803.01451v2 [math.oc] 20 apr 2018 · a comprehensive decision-making system must also be...

16
Near-optimal planning using approximate dynamic programming to enhance post-hazard community resilience management Saeed Nozhati a,1,* , Yugandhar Sarkale b,1 , Bruce Ellingwood a , Edwin K.P. Chong b,c , Hussam Mahmoud a a Dept. of Civil and Environmental Engineering, Colorado State University, Fort Collins, CO 80523-1372 b Dept. of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523-1373 c Dept. of Mathematics, Colorado State University, Fort Collins, CO 80523-1874 Abstract The lack of a comprehensive decision-making approach at the community level is an important problem that warrants immediate attention. Network-level decision-making algorithms need to solve large-scale optimization problems that pose computational challenges. The complexity of the optimization problems increases when various sources of uncertainty are considered. This research introduces a sequential discrete optimization approach, as a decision-making framework at the community level for recovery management. The proposed mathematical approach leverages approximate dynamic programming along with heuristics for the determination of recovery actions. Our methodology overcomes the curse of dimensionality and manages multi-state, large-scale infrastructure systems following disasters. We also provide computational results showing that our methodology not only incorporates recovery policies of responsible public and private entities within the community but also substantially enhances the performance of their underlying strategies with limited resources. The methodology can be implemented efficiently to identify near-optimal recovery decisions following a severe earthquake based on multiple objectives for an electrical power network of a testbed community coarsely modeled after Gilroy, California, United States. The proposed optimization method supports risk-informed community decision makers within chaotic post-hazard circumstances. Keywords: Approximate Dynamic Programming, Combinatorial Optimization, Community Resilience, Electrical Power Network, Rollout Algorithm 1. Introduction In the modern era, the functionality of infras- tructure systems is of significant importance in pro- viding continuous services to communities and in supporting their public health and safety. Nat- ural and anthropogenic hazards pose significant challenges to infrastructure systems and cause un- desirable system malfunctions and consequences. Past experiences show that these malfunctions are not always inevitable despite design strategies like * Corresponding author Email addresses: [email protected] (Saeed Nozhati ), [email protected] (Yugandhar Sarkale), [email protected] (Bruce Ellingwood), [email protected] (Edwin K.P. Chong), [email protected] (Hussam Mahmoud) 1 Contributed equally, listed alphabetically. increasing system redundancy and reliability [1]. Therefore, a sequential rational decision-making framework should enable malfunctioned systems to be restored in a timely manner after the hazards. Further, post-event stressors and chaotic circum- stances, time limitations, budget and resource con- straints, and complexities in the community recov- ery process, which are twinned with catastrophe, highlight the necessity for a comprehensive risk- informed decision-making framework for recovery management at the community level. A compre- hensive decision-making framework must take into account indirect and delayed consequences of deci- sions (also called the post-effect property of deci- sions), which requires foresight or planning. Such a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision spaces to make the most rational plans at the community Preprint submitted to Reliability Engineering & System Safety April 23, 2018 arXiv:1803.01451v2 [math.OC] 20 Apr 2018

Upload: others

Post on 01-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

Near-optimal planning using approximate dynamic programmingto enhance post-hazard community resilience management

Saeed Nozhatia,1,∗, Yugandhar Sarkaleb,1, Bruce Ellingwooda, Edwin K.P. Chongb,c, Hussam Mahmouda

aDept. of Civil and Environmental Engineering, Colorado State University, Fort Collins, CO 80523-1372bDept. of Electrical and Computer Engineering, Colorado State University, Fort Collins, CO 80523-1373

cDept. of Mathematics, Colorado State University, Fort Collins, CO 80523-1874

Abstract

The lack of a comprehensive decision-making approach at the community level is an important problemthat warrants immediate attention. Network-level decision-making algorithms need to solve large-scaleoptimization problems that pose computational challenges. The complexity of the optimization problemsincreases when various sources of uncertainty are considered. This research introduces a sequential discreteoptimization approach, as a decision-making framework at the community level for recovery management.The proposed mathematical approach leverages approximate dynamic programming along with heuristicsfor the determination of recovery actions. Our methodology overcomes the curse of dimensionality andmanages multi-state, large-scale infrastructure systems following disasters. We also provide computationalresults showing that our methodology not only incorporates recovery policies of responsible public andprivate entities within the community but also substantially enhances the performance of their underlyingstrategies with limited resources. The methodology can be implemented efficiently to identify near-optimalrecovery decisions following a severe earthquake based on multiple objectives for an electrical power networkof a testbed community coarsely modeled after Gilroy, California, United States. The proposed optimizationmethod supports risk-informed community decision makers within chaotic post-hazard circumstances.

Keywords: Approximate Dynamic Programming, Combinatorial Optimization, Community Resilience,Electrical Power Network, Rollout Algorithm

1. Introduction

In the modern era, the functionality of infras-tructure systems is of significant importance in pro-viding continuous services to communities and insupporting their public health and safety. Nat-ural and anthropogenic hazards pose significantchallenges to infrastructure systems and cause un-desirable system malfunctions and consequences.Past experiences show that these malfunctions arenot always inevitable despite design strategies like

∗Corresponding authorEmail addresses: [email protected]

(Saeed Nozhati ), [email protected](Yugandhar Sarkale), [email protected](Bruce Ellingwood), [email protected] (EdwinK.P. Chong), [email protected] (HussamMahmoud)

1Contributed equally, listed alphabetically.

increasing system redundancy and reliability [1].Therefore, a sequential rational decision-makingframework should enable malfunctioned systems tobe restored in a timely manner after the hazards.Further, post-event stressors and chaotic circum-stances, time limitations, budget and resource con-straints, and complexities in the community recov-ery process, which are twinned with catastrophe,highlight the necessity for a comprehensive risk-informed decision-making framework for recoverymanagement at the community level. A compre-hensive decision-making framework must take intoaccount indirect and delayed consequences of deci-sions (also called the post-effect property of deci-sions), which requires foresight or planning. Sucha comprehensive decision-making system must alsobe able to handle large-scale scheduling problemsthat encompass large combinatorial decision spacesto make the most rational plans at the community

Preprint submitted to Reliability Engineering & System Safety April 23, 2018

arX

iv:1

803.

0145

1v2

[m

ath.

OC

] 2

0 A

pr 2

018

Page 2: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

level.Developing efficient computational methodolo-

gies for sequential decision-making problems hasbeen a subject of significant interest [2, 3]. In thecontext of civil engineering, several studies haveutilized the framework of dynamic programmingfor management of bridges and pavement mainte-nance [4–8]. Typical methodological formulationsemploy principles of dynamic programming thatutilize state-action pairs. In this study, we de-velop a powerful and relatively unexplored method-ological framework of formulating large infrastruc-ture problems as string-actions, which will be de-scribed in Section 5.2. Our formulation does notrequire an explicit state-space model; therefore, itis shielded against the common problem of stateexplosion when such methodologies are employed.The sequential decision-making methodology pre-sented here not only manages network-level infras-tructure but also considers the interconnectednessand cascading effects in the entire recovery processthat have not been addressed in the past studies.

Dynamic programming formulations frequentlysuffer from the curse of dimensionality. This prob-lem is further aggravated when we have to deal withlarge combinatorial decision spaces characteristic ofcommunity recovery. Therefore, using approxima-tion techniques in conjunction with the dynamicprogramming formalism is essential. There are sev-eral approximation techniques available in the lit-erature [9–12]. Here, we use a promising class ofapproximation techniques called rollout algorithms.We show how rollout algorithms blend naturallywith our string-action formulation. Together, theyform a robust tool to overcome some of the lim-itations faced in the application of dynamic pro-gramming techniques to massive real-world prob-lems. The proposed approach is able to handle thecurse of dimensionality in its analysis and manage-ment of multi-state, large-scale infrastructure sys-tems and data sources. The proposed methodologyis also able to consider and improve the current re-covery policies of responsible public and private en-tities within the community.

Among infrastructure systems, electrical powernetworks (EPNs) are particularly critical insofar asthe functionality of most other networks, and criti-cal facilities depend on EPN functionality and man-agement. Hence, the method is illustrated in an ap-plication to recovery management of the modeledEPN in Gilroy, California following a severe earth-quake. The illustrative example shows how the pro-

posed approach can be implemented efficiently toidentify near-optimal recovery decisions. The com-puted near-optimal decisions restored the EPN ofGilroy in a timely manner, for residential buildingsas well as main food retailers, as an example of crit-ical facilities that need electricity to support publichealth in the aftermath of hazards.

The remainder of this study is structured as fol-lows. In Section 2, we introduce the backgroundof system resilience and the system modeling usedin this study. In Section 3, we introduce the casestudy used in this paper. In Section 4, we describethe earthquake modeling, fragility, and restorationassessments. In Section 5, we provide a mathemat-ical formulation of our optimization problem. InSection 6, we describe the solution method to solvethe optimization problem. In Section 7, we demon-strate the performance of the rollout algorithm withthe string-action formulation through multiple sim-ulations. In Section 8, we present a brief conclusionof this research.

2. System Resilience

The term resilience is defined in a variety ofways. Generally speaking, resilience can be definedas “the ability to prepare for and adapt to changingconditions and withstand and recover rapidly fromdisruptions”[13]. Hence, resilience of a community(or a system) is usually delineated with the measureof community functionality, shown by the verticalaxis of Fig. 1 and four attributes of robustness, ra-pidity, redundancy, and resourcefulness [14]. Fig. 1illustrates the concept of functionality, which canbe defined as the ability of a system to support itsplanned mission, for example, by providing electric-ity to people and facilities. The understanding ofinterdependencies among the components of a sys-tem is essential to quantify system functionality andresilience. These interdependencies produce cas-cading failures where a large-scale cascade may betriggered by the malfunction of a single or few com-ponents [15]. Further, they contribute to the recov-ery rate and difficulty of the entire recovery processof a system. Different factors affect the recoveryrate of a system, among which modification beforedisruptive events (ex-ante mitigations), different re-covery policies (ex-post actions), and nature of thedisruption are prominent [16]. Fig. 1 also highlightsdifferent sources of uncertainty that are associatedwith community functionality assessment and haveremarkable impacts in different stages from prior to

2

Page 3: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

Figure 1: Schematic representation of resilience concept(adopted from [14, 18])

the event to the end of the recovery process. There-fore, any employed model to assess the recovery pro-cess should be able to consider the impacts of theinfluencing parameters.

In this study, the dependency of networks is mod-eled through an adjacency matrix A = [xij ], wherexij ∈ [0, 1] indicates the magnitude of dependencybetween components i and j [17]. In this gen-eral form, the adjacency matrix A can be a time-dependent stochastic matrix to capture the uncer-tainties in the dependencies and probable time-dependent variations.

According to the literature, the resilience index Rfor each system is defined by the following equation[14, 19]:

R =

∫ te+TLC

te

Q(t)

TLCdt (1)

where Q(t) is the functionality of a system at time t,TLC is the control time of the system, and te is thetime of occurrence of event e, as shown in Fig. 1.

3. Description of Case Study

In the case study of this paper, the communityin Gilroy, California, USA is used as an exampleto illustrate the proposed approach. Gilroy is lo-cated approximately 50 kilometers (km) south ofthe city of San Jose with a population of 48,821 atthe time of the 2010 census (see Fig. 2) [20]. Thestudy area is divided into 36 gridded rectangles todefine the community and encompasses 41.9 km2

area of Gilroy. In this study, we do not cover allthe characteristics of Gilroy; however, the adoptedmodel has a resolution that is sufficient to study the

Figure 2: Map of Gilroy’s population over the defined grids

methodology at the community level under hazardevents.

Gilroy contains 13 food retailers, but six main re-tailers, each of which has more than 100 employees,provide the main food requirements of Gilroy in-habitants [21], as shown in Fig. 3 and summarizedin Table 1.

To assign the probabilities of shopping activityto each urban grid rectangle, the gravity model [22]is used. The gravity model identifies the shoppinglocation probabilistically, given the location of resi-dences. These probabilities are assigned to be pro-portional to food retailers' capacities and inverselycorresponding to retailers' distances from centers ofurban grid rectangles. Consequently, distant smalllocations are less likely to be selected than closelarge locations.

If the center of an urban grid is c, then food re-tailer r is chosen according to the following distri-bution [22]:

P (r|c) ∝ wrebTcr (2)

where wr is the capacity of food retailer r, deter-mined by Table 1, b is a negative constant, and Tcris the travel time from urban grid rectangle c tofood retailer r. Google's Distance Matrix API wascalled from within R by using the ggmap package[23] to provide distances and travel times for theassumed transportation mode of driving.

Fig. 4 depicts the EPN components, locatedwithin the defined boundary. Llagas power sub-station, the main source of power in the definedboundary, is supplied by an 115 kV transmissionline. Distribution line components are positionedat 100 m and modeled from the substation to the

3

Page 4: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

Table 1: The main food retailers of Gilroy

Food Retailer Walmart Costco Target Mi Pueblo Food Nob Hill Foods SafewayNumber of Employees 395 220 130 106 100 130

Figure 3: Gilroy’s main food retailers

urban grids centers, food retailers, and water net-work facilities. In this study, the modeled EPN has327 components.

4. Hazard and Damage Assessment

4.1. Earthquake Simulation

The seismicity of the Gilroy region of Califor-nia is mainly controlled by the San Andreas Fault(SAF), which caused numerous destructive earth-quakes like the Loma Prieta earthquake [24]. Thespatial estimation of ground-motion amplitudesfrom earthquakes is an essential element of risk as-sessment, typically characterized by ground-motionprediction equations (GMPEs). GMPEs requireseveral parameters, such as earthquake magnitudeMw, fault properties (Fp), soil conditions (i.e., theaverage shear-wave velocity in the top 30 m of soil,Vs30), and epicentral distances (R) to compute theseismic intensity measure (IM) at any point. Mod-ern GMPEs typically take the form

ln(IM) = f(Mw, R, Vs30, Fp) + ε1σ + ε2τ

ln(IM) = ln(IM) + ε1σ + ε2τ(3)

where σ and τ reflect the intra-event (within event)and inter-event (event-to-event) uncertainty respec-tively [25]. In this study, the GMPE proposed by

Figure 4: The modeled electrical power network of Gilroy

Figure 5: The map of shear velocity at Gilroy area

Ambrahamson et al. [26] is used, and a ground mo-tion similar to the Loma Prieta earthquake, one ofthe most devastating hazards that Gilroy has ex-perienced [24], with epicenter approximately 12 kmof Gilroy downtown on the SAF projection is sim-ulated. Figs. 5 and 6 show the map of Vs30 andground motion field for Peak Ground Acceleration(PGA), respectively.

4.2. Fragility Function and Restoration

In the event of an earthquake, the relations be-tween ground-motion intensities and earthquakedamage are pivotal elements in the loss estimationand the risk analysis of a community. Fragilitycurves describe the probability of experiencing or

4

Page 5: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

Figure 6: The simulation of median of peak ground acceler-ation field

exceeding a particular level of damage as a func-tion of hazard intensity. It is customary to modelcomponent fragilities with lognormal distributions[27]. The conditional probability of being in or ex-ceeding a particular damage state (ds), conditionedon a particular level of intensity measure IM = im,is defined by

P (DS ≥ ds|IM = im) = Φ

(ln(im)− λ

ξ

)(4)

where Φ is the standard normal distribution; λ andξ are the mean and standard deviation of ln(im).The fragility curves can be obtained based on a)post-earthquake damage evaluation data (empiri-cal curves) [28] b) structural modeling (analyticalcurves) [29] c) expert opinions (heuristics curves)[30]. In the present study, the seismic fragilitycurves included in [31, 32] are used for illustration.

To restore a network, a number of availableresource units, N , as a generic single numberincluding equipment, replacement components,and repair crews are considered for assignmentto damaged components, and each damagedcomponent is assumed to require only one unit ofresource [33]. The restoration times based on thelevel of damage, used in this study, are presentedin Table 2, based on [31, 34].

5. Optimization Problem Description

5.1. Introduction

After an earthquake event occurs, each EPNcomponent ends up in one of the damage states asshown in Table 2. Let the total number of dam-aged components be M . Note that M ≤ 327. Both

M and N are non-negative integers. Also, in thisstudy, N �M . This assumption is justified by theavailability of limited resources with the plannerwhere large number of components are damaged inthe aftermath of a severe hazard.

A decision maker or planner has the task of as-signing units of resources to these damaged com-ponents. A decision maker has a heuristic or ex-pert policy on the basis of which he can make hisdecisions to optimize multiple objectives. The pre-cise nature of the objective of the planner can vary,which will be described in detail in Section 5.2. Par-ticularly, at the first decision epoch, the decisionmaker or a resource planner deploys N unit of re-sources at N out of M damaged components. Eachunit of resource is assigned to a distinct damagedcomponent. At every subsequent decision epoch,the planner must have an option of reassigning someor all of the resources to new locations based on hisheuristics and objectives. He must have the flexi-bility of such a reassignment even if the repair workat the currently assigned locations is not complete.At every decision epoch, it is possible to forestallthe reassignment of the units of resource that havenot completed the repair work; however, we chooseto solve the more general problem of preemptive as-signment, where non-preemption at few or all thelocations is a special case of our problem. Thepreemptive assignment problem is a richer decisionproblem than the non-preemptive case in the sensethat the process of optimizing the decision actionsis a more complex task because the size of the de-cision space is bigger.

In this study, we assume that the outcome ofthe decisions is fully predictable [35, 36]. We arepreparing a separate study to address the relaxationof this assumption, i.e., when outcomes of decisionsexhibit uncertainty. The modified methods to dealwith the stochastic problem will form a part of aseparate paper [37].

We improve upon the solutions offered by heuris-tics of the planner by formulating our optimizationproblem as a dynamic program and solving it usingthe rollout approach.

5.2. Optimization Problem Formulation

Suppose that the decision maker starts makingdecisions and assigning repair locations to differ-ent units of resource. The number of such non-trivial decisions to be made is less than or equal toM − N . When M becomes less than or equal to

5

Page 6: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

Table 2: Restoration times based on the level of damage

Damage StatesComponent Undamaged Minor Moderate Extensive CompleteElectric sub-station 0 1 3 7 30Transmission line component 0 0.5 1 1 2Distribution line component 0 0.5 1 1 1

N (because of sequential application of repair ac-tions to the damaged components), the assignmentof units of resource becomes a trivial problem in oursetting because each unit can simply be assignedone to one, in any order, to the damaged compo-nents. Consequently, a strict optimal assignmentcan be achieved in the trivial case. The size of thistrivial assignment problem reduces by one for everynew decision epoch until all the remaining damagedcomponents are repaired. The additional units ofresources retire because deploying more than oneunit of resource to the same location does not de-crease the repair time associated with that damagedcomponent. Henceforth, we focus on the non-trivialassignment problem.

Let the variable t denote the decision epoch,and let Dt be the set of all damaged componentsbefore a repair action xt is performed. Let tenddenote the decision epoch at which repair actionxtend is selected so that |Dtend+1| ≤ N . Note thatt ∈ A := (1, 2, . . . , tend). Let X = (x1, x2 . . . , xtend)represent the string of actions owing to the non-trivial assignment. We say that a repair action iscompleted when at least 1 out of the N damagedcomponents is repaired. Let P(Dt) be the powersetof Dt. Let,

PN (Dt) = {C ∈ P(Dt) : |C| = N} (5)

so that xt ∈ PN (Dt). Let Rt be the set of all re-paired components after the repair action xt is com-pleted. Note that Dt+1 = Dt\Rt, ∀t ∈ A, where1 ≤ |Dtend+1| ≤ N , and the decision-making prob-lem moves into the trivial assignment problem pre-viously discussed.

We wish to calculate a string X of repair actionsthat optimizes our objective functions F (X). Wedeal with two objective functions in this study de-noted by mapping F1 and F2.

• Objective 1: Let the variable p represent thepopulation of Gilroy and γ represent a constantthreshold. Let X1 = (x1, . . . , xi) be the stringof repair actions that results in restoration of

electricity to γ × p number of people. Here,xi ∈ PN (Di), where Di is the number of dam-aged component at the ith decision epoch. Letn represent the time required to restore elec-tricity to γ × p number of people as a result ofrepair actions X1. Formally,

F1 (X1) = n. (6)

Objective 1 is to compute the optimal solutionX∗1 given by

X∗1 = arg minX1

F1(X1). (7)

We explain the precise meaning of restorationof electricity to people in more detail in Sec-tion 7.1. To sum up, in objective 1, our aim isto find a string of actions that minimizes thenumber of days needed to restore electricity toa certain fraction (γ) of the total population ofGilroy.

• Objective 2: We define the mapping F2 interms of number of people who have electric-ity per unit of time; our objective is to max-imize this mapping over a string of repair ac-tions. Let the variable kt denote the total timeelapsed between the completion of repair ac-tion xt−1 and xt, ∀t ∈ A\{1}, in which k1 isthe time elapsed between the start and com-pletion of repair action x1. Let ht be the totalnumber of people who have electricity after therepair action xt is complete. Then,

F2(X) =1

ktend

tend∑t=1

ht × kt. (8)

We are interested in the optimal solution X∗

given by

X∗ = arg maxX

F2(X). (9)

Note that our objective function in the secondcase F2(X) mimics the resilience index and can be

6

Page 7: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

interpreted in terms of (1). Particularly, the inte-gral in (1) is replaced by a sum because of discretedecision epochs, Q(t) is replaced by the productht× kt, ktend is analogous to TLC , and the integrallimits are changed to represent the discrete decisionepochs.

6. Optimization Problem Solution

Calculating X∗ or X∗1 is a sequential optimiza-tion problem. The decision maker applies the re-pair action xt at the decision epoch t to maximizeor minimize a cumulative objective function. Thestring of actions, as represented in X or X1, arean outcome of this sequential decision-making pro-cess. This is particularly relevant in the contextof dynamic programming where numerous solutiontechniques are available for the sequential optimiza-tion problem. Rollout is one such method thatoriginated in dynamic programming. It is possi-ble to use the dynamic programming formalism todescribe the method of rollout, but here we accom-plish this by starting from first principles [38]. Wewill draw comparisons between rollout with firstprinciples and rollout in dynamic programming atsuitable junctions. The description of the rolloutalgorithm is inherently tied with the notion of ap-proximate dynamic programming.

6.1. Approximate Dynamic Programming

Let’s focus our attention on objective 1. Theextension of this methodology to objective 2 isstraightforward; we need to adapt notation usedfor objective 2 in the methodology presented below,and a maximization problem replaces a minimiza-tion problem. Recall that we are interested in theoptimal solution X∗1 given by (7). This can be cal-culated in the following manner:First calculate x∗1 as follows:

x∗1 ∈ arg minx1

J1(x1) (10)

where the function J1 is defined by

J1(x1) = minx2,...,xi

F1(X1). (11)

Next, calculate x∗2 as:

x∗2 ∈ arg minx2

J2(x∗1, x2) (12)

where the function J2 is defined by

J2(x1, x2) = minx3,...,xi

F1(X1). (13)

Similarly, we calculate the α-solution as follows:

x∗α ∈ arg minxα

Jα(x∗1, . . . , x∗α−1, xα) (14)

where the function Jα is defined by

Jα(x1, . . . , xα) = minxα+1,...,xi

F1(X1). (15)

The functions Jα are called the optimal cost-to-gofunctions and are defined by the following recursion:

Jα(x1, . . . , xα) = minxα+1

Jα+1(x1, . . . , xα+1), (16)

where the boundary condition is given by:

Ji(X1) = F1(X1). (17)

Note that J is a standard notation used to representcost-to-go functions in the dynamic programmingliterature.

The approach discussed above to calculate theoptimal solutions is typical of the dynamic pro-gramming formulation. However, except for veryspecial problems, such a formulation cannot besolved exactly because calculating and storing theoptimal cost-to-go functions Jα can be numeri-cally intensive. Particularly, for our problem, let|PN (Dt)| = βt; then the storage of Jα requires atable of size

Sα =

α∏t=1

βt, (18)

where α ≤ i for objective 1, and α ≤ tend for objec-tive 2. In the dynamic programming literature, thisis called as the curse of dimensionality. If we con-sider objective 2 and wish to calculate Jα such thatα = M−N (we assume for the sake of this examplethat only a single damaged component is repairedat each t), then for 50 damaged components and10 unit of resources, Sα ≈ 10280. In practice, Jα in(14) is replaced by an approximation denoted by Jα.In the literature, Jα is called as a scoring functionor approximate cost-to-go function [39]. One way tocalculate Jα is with the aid of a heuristic; there areseveral ways to approximate Jα that do not utilizeheuristic algorithms. All such approximation meth-ods fall under the category of approximate dynamicprogramming.

The method of rollout utilizes a heuristic in theapproximation process. We provide a more detaileddiscussion on the heuristic in Section 6.2. Supposethat a heuristic H is used to approximate the min-imization in (15), and let Hα(x1, . . . , xα) denote

7

Page 8: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

the corresponding approximate optimal value; thenrollout yields the suboptimal solution by replacingJα with Hα in (14):

xα ∈ arg minxα

Hα(x1, . . . , xα−1, xα). (19)

The heuristic used in the rollout algorithm is usu-ally termed as the base heuristic. In many practicalproblems, rollout results in a significant improve-ment over the underlying base heuristic to solve theapproximate dynamic programming problem [39].

6.2. Rollout Algorithm

It is possible to define the base heuristic H inseveral ways:

(i) The current recovery policy of regionally re-sponsible public and private entities,

(ii) The importance analyses that prioritize the im-portance of components based on the consid-ered importance factors [40],

(iii) The greedy algorithm that computes thegreedy heuristic [41, 42],

(iv) A random policy without any pre-assumption,

(v) A pre-defined empirical policy; e.g., baseheuristic based on the maximum node and linkbetweenness (shortest path), as for example,used in the studies of [33, 43].

The rollout method described in Section 6.1, us-ing first principles and string-action formulation,for a discrete, deterministic, and sequential opti-mization problem has interpretations in terms ofthe policy iteration algorithm in dynamic program-ming. The policy iteration algorithm (see [44] forthe details of the policy iteration algorithm includ-ing the definition of policy in the dynamic program-ming sense) computes an improved policy (policyimprovement step), given a base policy (station-ary), by evaluating the performance of the basepolicy. The policy evaluation step is typically per-formed through simulations [37]. Rollout policy canbe viewed as the improved policy calculated usingthe policy iteration algorithm after a single iter-ation of the policy improvement step. For a dis-crete and deterministic optimization problem, thebase policy used in the policy iteration algorithmis equivalent to the base heuristic, and the rolloutpolicy consists of the repeated application of thisheuristic. This approach was used by the authors

in [39] where they provide performance guaranteeson the basic rollout approach and discuss variationsto the rollout algorithm. Henceforth, for our pur-poses, base policy and base heuristic will be consid-ered indistinguishable.

On a historical note, the term rollout was firstcoined by Tesauro in reference to creating computerprograms that play backgammon [45]. An approachsimilar to rollout was also shown much earlier in[46].

Ideally, we would like the rollout method to neverperform worse than the underlying base heuristic(guarantee performance). This is possible undereach of the following three cases [39]:

1. The rollout method is terminating (called asoptimized rollout).

2. The rollout method utilizes a base heuristicthat is sequentially consistent (called as roll-out).

3. The rollout method is terminating and utilizesa base heuristic that is sequentially improving(extended rollout and fortified rollout).

A sequentially consistent heuristic guarantees thatthe rollout method is terminating. It also guaran-tees that the base heuristic is sequentially improv-ing. Therefore, 3 and 1 are the special cases of2 with a less restrictive property imposed on thebase heuristic (that of sequential improvement ortermination). When the base heuristic is sequen-tially consistent, the fortified and extended rolloutmethod are the same as the rollout method.

A heuristic must posses the property of termina-tion to be used as a base heuristic in the rolloutmethod. Even if the base heuristic is terminat-ing, the rollout method need not be terminating.Apart from the sequential consistency of the baseheuristic, the rollout method is guaranteed to beterminating if it is applied on problems that ex-hibit special structure. Our problem exhibits sucha structure. In particular, a finite number of dam-aged components in our problem are equivalent tothe finite node set in [39]. Therefore, the rolloutmethod in this study is terminating. In such a sce-nario, we could use the optimized rollout algorithmto guarantee performance without putting any re-striction on the base heuristic to be used in the pro-posed formulation; however, a wiser base heuristiccan potentially enhance further the computed roll-out policy. Nevertheless, our problem does not re-quire any special structure on the base heuristic for

8

Page 9: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

the rollout method to be sequentially improving,which is justified later in this section.

In the terminology of dynamic programming, abase heuristic that admits sequential consistency isanalogous to the Markov or stationary policy. Sim-ilarly, the terminating rollout method defines a roll-out policy that is stationary.

Two different base heuristics are considered inthis study. The first base heuristic is a randomheuristic denoted by H. The idea behind consid-eration of this heuristic is that in actuality thereare cases where there is no thought-out strategy orthe computation of such a scheme is computation-ally expensive. We will show though simulationsthat the rollout formulation can accept a randombase policy at the community level from a decisionmaker and improve it significantly. The secondbase heuristic is called a smart heuristic becauseit is based on the importance of components andexpert judgment, denoted by H. The importancefactors used in prioritizing the importance of thecomponents can accommodate the contribution ofeach component in the network. This base heuris-tic is similar in spirit to the items (ii) and (v) listedabove. More description on the assignment of unitsof resources based on H and H is described in Sec-tion 7.1.1. We also argue there that H and H aresequentially consistent. Therefore, in this study,and for our choice of heuristics, the extended, for-tified, and rollout method are equivalent.

LetH be any heuristic algorithm; the state of thisalgorithm at the first decision epoch is j1, wherej1 = (x1). Similarly, the state of the algorithm atthe αth decision epoch is the α-solution given byjα = (x1, . . . , xα), i.e., the algorithm generates thepath of the states (j1, j2, . . . , jα). Note that j0 is thedummy initial state of the algorithm H. The algo-rithm H terminates when α = i for objective 1, andα = tend for objective 2. Henceforth, in this section,we consider only objective 1 without any loss of gen-erality. Let Hα(jα) denote the cost-to-go startingfrom the α-solution, generated by applying H (i.e.,H is used to evaluate the cost-to-go). The cost-to-go associated with the algorithm H is equal to theterminal reward, i.e., Hα(jα) = F1(X1). Therefore,we have: H1(j1) = H2(j2) = . . . = Hi(ji). We usethis heuristic cost-to-go in (14) to find an approx-imate solution to our problem. This approxima-tion algorithm is termed as “Rollout on H” (RH)owing to its structure that is similar to the ap-proximate dynamic programming approach rollout.The RH algorithm generates the path of the states

(j1, j2, . . . , ji) as follows:

jα = arg minδ∈N(jα−1)

J(δ), α = 1, . . . , i (20)

where, jα−1 = (x1, . . . , xα−1), and

N(jα−1) = {(x1, . . . , xα−1, x)|x ∈ PN (Dα)}. (21)

The algorithm RH is sequentially improving withrespect to H and outperforms H (see [47] for thedetails of the proof).

The RH algorithm described above is termed asone-step lookahead approach because the repair ac-tion at any decision epoch t (current step) is opti-mized by minimizing the cost-to-go given the repairaction at t (see (20)). It is possible to generalizethis approach to incorporate multi-step lookahead.Suppose that we optimize the repair actions at anydecision epoch t and t + 1 (current and the nextstep combined) by minimizing the cost-to-go giventhe repair actions for the current and next steps.This can be viewed as a two-step lookahead ap-proach. Note the similarity of this approach withthe dynamic programming formulation from firstprinciples in Section 6.1, except for the difficultyof estimating the cost-to-go values J exactly. Also,note that a two-step lookahead approach is com-putationally more intensive than the one-step ap-proach. In principle, it is possible to extend it tostep size λ, where 1 ≤ λ ≤ i. However, as λ in-creases, the computational complexity of the algo-rithm increases exponentially. Particularly, when λis selected equal to i at the first decision epoch, theRH algorithm finds the exact optimal solution byexhaustively searching through all possible combi-nations of repair action at each t, with computa-tional complexity O(Si). Also, note that RH pro-vides a tighter upper bound on the optimal objec-tive value compared to the bound obtained fromthe original heuristic approach.

7. Results

7.1. Discussion

We show simulation results for two differentcases. In Case 1, we assume that people have elec-tricity when their household units have electricity.Recall that the city is divided into different grid-ded rectangles according to population heat maps(Fig. 2), and different components of the EPN net-work serving these grids are depicted in Fig. 4. The

9

Page 10: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

entire population living in a particular gridded rect-angle will not have electricity until all the EPNcomponents serving that grid are either undamagedor repaired post-hazard (functional). Conversely, ifthe EPN components serving a particular griddedrectangle are functional, all household units in thatgridded rectangle are assumed to have electricity.

In Case 2, along with household units, we incor-porate food retailers into the analysis. We say thatpeople have the benefit of electric utility only whenthe EPN components serving their gridded rectan-gles are functional, and they go to a food retailerthat is functional. A food retailer is functional (inthe electric utility sense) when all the EPN com-ponents serving the retailer are functional. Themapping of number of people who access a partic-ular food retailer is done at each urban grid rect-angle and follows the gravity model explained inSection 3.

In both the cases, the probability that a criticalfacility like a food retailer or an urban grid rectangleG has electricity is

P (EG) := P

(n⋂l=1

EEl

)(22)

where n is the minimum number of EPN compo-nents required to supply electricity to G, EG is theevent that G has electricity, and EEl is the eventthat the lth EPN component has electricity. Thesample space is a singleton set that has the out-come, “has electricity.”

For all the simulation results provided hence-forth, the number of units of resource available withthe planner is fixed at 10.

7.1.1. Case 1: Repair Action Optimization of EPNfor Household Units

The search space PN (Dt) undergoes a combina-torial explosion for modest values of N and Dt, ateach t, until few decision epochs before moving intothe trivial assignment problem, where the value ofβt is small. Because of the combinatorial nature ofthe assignment problem, we would like to reducethe search space for the rollout algorithm, at eacht, without sacrificing on the performance. Becausewe consider EPN for only household units in thissection, it is possible to illustrate techniques, to re-duce the size of the search space for our rolloutalgorithm, that provide a good insight into formu-lating such methods for other similar problems. Wepresent two representative methods to deal with

Figure 7: Electrical Power Network of Gilroy

the combinatorial explosion of the search space,namely, 1-step heuristic and N-step heuristic. Notethat these heuristics are not the same as the baseheuristic H or H.

Before we describe the 1-step and N-step heuris-tic, we digress to discuss H and H. Both, H andH, have a preordained method of assigning unitsof resources to the damaged locations. This or-der of assignment remains fixed at each t. In H,this order is decided randomly; while in H, it is de-cided based on importance factors. Let’s illustratethis further with the help of an example. Supposethat we name each of the components of the EPNwith serial numbers from 1 to 327 as shown par-tially in Fig. 7; the assignment of these numbers tothe EPN components is based on H and remainsfixed at each t, where a damaged component witha lower number is always assigned unit of resourcebefore a damaged component with a higher number,based on the availability of units of resource. There-fore, the serial numbers depict the preordained pri-ority assigned to components that is decided be-fore decision-making starts. E.g., if the componentnumber 21 and 22 are both damaged, the decisionmaker will assign one unit of resource to compo-nent 21 first and then schedule repair of component22, contingent on availability of resources. Such afixed pre-decided assignment of unit of resource byheuristic algorithm H and H matches the definitionof a consistent path generation in [39]. Therefore,H and H are sequentially consistent. Note that theassignment of numbers 1 to 327 in Fig. 7 is assumedonly for illustration purposes; the rollout methodcan incorporate a different preordained order de-fined by H and H.

We now discuss the 1-step and N-step heuristic.

10

Page 11: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

1−19

20−22 202−207 313−321

23−35 45−52 208−211 322−327

36−44 225−231 53−57 212−224

Figure 8: Electrical Power Network Tree

In Fig. 7, note that each successive EPN compo-nent (labeled 1-327), is dependent upon the priorEPN component for electricity. E.g., component227 is dependent upon component 50. Similarly,component 55 is dependent upon component 50.The components in the branch 53-57 and 225-231depend upon the component 52 for electricity. Weexploit this serial nature of an EPN by represent-ing the EPN network as a tree structure as shownin Fig. 8. Each number in the EPN tree representsan EPN component; each node represents a groupof components determined by the label of the node,and the arcs of the tree capture the dependence ofthe nodes.

If the number of damaged components in the rootnode of the EPN tree is greater than N , then itwould be unwise to assign a unit of resource at thefirst decision epoch to the fringe nodes of our EPNtree because we do not get any benefit until werepair damaged components in the root node. Assoon as the number of damaged components in theroot node of the EPN tree becomes less than N ,only then we explore the assignment problem atother levels of the EPN tree.1-step Heuristic: We increase the pool of can-

didate damaged components, where the assignmentof units of resources must be considered, to all thedamaged components of the next level of the EPNtree if and only if the number of damaged compo-nents at the current level of the EPN tree is less

Figure 9: Electrical power network recovery due to baseheuristic H with one standard deviation band

than N . Even after considering the next level ofthe EPN tree, if the number of damaged compo-nents is less than N , we take one more step and ac-count for all damaged components two levels belowthe current level. We repeat this until the pool ofcandidate damaged components is greater than orequal to N , or the levels of EPN tree are exhausted.

N-step Heuristic: Note that it might be possi-ble to ignore few nodes at each level of the EPN treeand assign units of resources to only some promis-ing nodes. This is achieved in the N-step heuris-tic (here N in N-step is not same as N -number ofworkers). Specifically, if the number of the dam-aged components at the current level of the EPNtree is less than N , then the algorithm searches fora node at the next level that has the least num-ber of damaged components, adds these damagedcomponents to the pool of damaged components,and checks if the total number of damaged compo-nents at all explored levels is less than N . If thetotal number of candidate damaged components isstill less than N , the previous process is repeated atunexplored nodes until the pool of damaged compo-nents is greater than or equal to N , or the levels ofthe EPN tree are exhausted. Essentially, we do notconsider the set (Dt) of all damaged components,at each t, but only a subset of Dt denoted by D1

t

(1-step heuristic) and DNt (N-step heuristic).

We simulate multiple damage scenarios followingan earthquake with the models discussed previouslyin Section 4. On average, the total number of dam-aged components in any scenario exceeds 60%.

We show the performance of H in Fig. 9. Thefaint lines depict plots of EPN recovery for multi-ple scenarios when H is used for decision making.Here the objective pursued by the decision makeris objective 2. The black line shows the mean of all

11

Page 12: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

0 5 10 15 20 25 30 35 40Time (days)

0

10

20

30

40

50

Num

ber

of p

eopl

e w

ith e

lect

ricity

103

Base HeuristicRollout w/ 1-step HeuristicRollout w/ N-step Heuristic

Figure 10: Comparison of base heuristic (H) vs. rolloutalgorithm with 1-step heuristic vs. rollout algorithm withN-step heuristic for multiple scenarios for objective 2.

the recovery trajectories, and the red lines show thestandard deviation. Henceforth, in various plots,instead of plotting the recovery trajectories for allthe scenarios, we compare the mean of the differenttrajectories.

Figure 11: Histogram of F2(X) with Base (H), rollout with1-step and rollout with N-step heuristic for multiple scenarios

Fig. 10 shows the performance of our RH al-gorithm with respect to H. Our simulation re-sults demonstrate significant improvement over al-gorithm H when RH is used, for both the 1-stepcase and the N-step case. Another result is the per-formance shown by the 1-step heuristic with respectto the N-step heuristic. Even though the N-stepheuristic skips some of the nodes at each level inEPN tree, to accommodate more promising nodesinto the search process, the performance improve-ment shown is minimal. Even though all the dam-aged components are not used to define the searchspace of the rollout algorithm, only a small subsetis chosen with the use of either 1-step and N-stepheuristic (limited EPN tree search), the improve-ment shown by RH over H is significant. This is

0 10 20 30 40 50Time (days)

0

10

20

30

40

50

Num

ber

of p

eopl

e w

ith e

lect

ricity

103

Base HeuristicRollout with 1-step HeuristicRollout with N-step Heuristic

Figure 12: Comparison of base heuristic (H) vs. rolloutalgorithm with 1-step heuristic vs. rollout algorithm withN-step heuristic for multiple scenarios for objective 2.

because pruning the search space of rollout algo-rithm using a subset of Dt (restricting an exhaus-tive search), is only a small part of the entire rolloutalgorithm. Further explanation for such a behavioris suitably explained in Section 7.1.2.

Fig. 11 shows the histogram of values of F2(X)for multiple scenarios, as a result of application ofstring-actions computed using H and RH (1-stepand N-step heuristic). The rollout algorithms showsubstantial improvement over H, for our problem.

Fig. 12 shows simulation results for multiple sce-narios when objective 2 is optimized, but when His considered instead of H. This simulation studyhighlights interesting behavior exhibited by the roll-out algorithm. In the initial phase of decision mak-ing, the algorithm H competes with the rollout al-gorithm RH, slightly outperforming the rollout al-gorithm in many instances. However, after a periodof 10 days, rollout (both 1-step and N-step heuris-tic) comprehensively outperforms H. Because roll-out has the lookahead property, it offers conserva-tive repair decisions initially (despite staying com-petitive with H), in anticipation of overcoming theloss suffered due to initial conservative decisions.Optimizing with foresight is an emergent behaviorexhibited by our optimization methodology, whichcan offer significant advantages in critical decision-making scenarios.

7.1.2. Case 2: Repair Action Optimization of EPNfor Household Units & Retailers

It is difficult to come up with techniques similarto 1-step and N-step heuristic to reduce the size ofthe search space PN (Dt) for objective 1 and ob-jective 2, when multiple networks are consideredsimultaneously in the analysis. This is because any

12

Page 13: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

0 20 40 60 80 100Number of scenarios

5

10

15

20

25

30

Num

ber

of d

ays

to r

each

thre

shol

d

Base HeuristicRollout

Figure 13: Cumulative moving average plot for objective 1where γ = .8 with base heuristic (H) and rollout algorithm

such technique would have to simultaneously bal-ance the pruning of candidate damaged componentsin Fig. 8 (to form subsets like D1

t and DNt ), serving

both food retailers and household units. There isno natural/simple way of achieving this as in thecase of the 1-step and N-step heuristics where onlyhousehold units were considered. Whenever we con-sider complex objective functions and interactionbetween networks, it is difficult to prune the actionspace in a physically meaningful way just by apply-ing heuristics. For our case, any such heuristic willhave to incorporate the gravity model explained inSection 3. The heuristic must also consider the net-work topology and actual physical position of im-portant EPN components within the network. Aspreviously seen, our methodology works well evenif we select only a small subset of Dt (D1

t and DNt )

to construct PN (Dt) to avoid huge computationalcosts. This is because our methodology leveragesthe use of the one-step lookahead property withconsistent and sequential improvement over algo-rithm H or H, to overcome any degradation in per-formance as a result of choosing Dt � Dt. This isfurther justified in the simulation results shown inFigs. 13 to 16.

In Figs. 13 and 14, H is used as the base heuristic.This base heuristic is the same as the one used inthe simulations shown in Case 1, which is not par-ticularly well tuned for Case 2. Despite this, RHshows a stark improvement over H. Fig. 13 showsthat the rollout algorithm, with the random selec-tion of candidate damaged components (Dt), signif-icantly outperforms H for objective 1. When we se-lect the candidate damaged locations randomly, inaddition to the randomly selected damaged compo-nents, we add to the set PN (Dt) the damaged com-

0 5 10 15 20 25 30 35 40Time (Days)

0

10

20

30

40

50

No.

of p

eopl

e be

nefit

ted

by E

PN

rec

over

y 103

Base HeuristicRollout

Figure 14: Comparison of base heuristic (H) vs rollout algo-rithm

0 20 40 60 80 100Number of scenarios

7

8

9

10

11

12

13

14

Num

ber

of d

ays

to r

each

thre

shol

d Base HeuristicRollout

Figure 15: Comparison of base heuristic (H) vs. rolloutalgorithm for multiple scenarios for objective 1.

ponents selected by H at each t. For the rolloutalgorithm, the mean number of days, over multipledamage scenarios, to provide electricity to γ timesthe total number of people is approximately 8 days,whereas for the base heuristic it is approximately30. Similarly, Fig. 14 shows that for objective 2the benefit of EPN recovery per unit of time withrollout is significantly better than H.

Figs. 15 and 16 provide the synopsis of the re-sults for both the objectives when H is considered.In Fig. 15, the number of days to reach a thresholdγ = 0.8 as a result of H algorithm is better than H.However, note that the number of days to achieveobjective 1 is still fewer using the rollout algorithm.The key inference from this observation is that roll-out might not always significantly outperform thebase heuristic but will never perform worse thanthe underlying heuristic.

For simulations in Figs. 15 and 16, the candi-date damaged locations in defining the search spacefor the rollout algorithm are again chosen randomlyand are a subset of Dt. As in the case of simula-

13

Page 14: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

0 5 10 15 20 25 30Time (days)

0

10

20

30

40

50

No.

of p

eopl

e be

nefit

ted

by E

PN

rec

over

y 103

Base HeuristicRollout

Figure 16: Comparison of base heuristic (H) vs. rolloutalgorithm for multiple scenarios for objective 2.

tions in Fig. 13, we add damaged components se-lected by H to the set PN (Dt). Note that the num-ber of days required to restore electricity to 80% ofpeople in Fig. 13 is a day less than that requiredin Fig. 15 despite performing rollout on a randombase heuristic instead of the smart base heuristicused in the later. This can be attributed to 3 rea-sons: a) The damaged components in the set (Dt)are chosen randomly for each simulation case. b) Hwas designed for the simulations in Section. 7.1.1and is not particularly well tuned for simulationswhen both household units and retailers are con-sidered simultaneously. c) H is used in simulationsin Fig. 15 to approximate the cost-to-go functionwhereas H is used in simulations in Fig. 13 for theapproximation of the scoring function.

Fig. 16 shows a behaviour similar to Fig. 12 (Case1) where H might outperform rollout in the short-term (as a result of myopic decision making on thepart of the heuristic), but in the long run, roll-out compressively improves upon the string-actionsprovided by algorithm H.

7.2. Computational Efforts

We provide a brief description of the computa-tional efforts undertaken to optimize our decision-making problem. We have simulated multiple dam-age scenarios for each simulation result provided inthis work. The solution methodology was imple-mented in MATLAB. The rollout algorithm givesmultiple calls to the base heuristic function andsearches for string-actions over the set PN (Dt),PN (D1

t ), or PN (DNt ). This is akin to giving mil-

lions of calls to the simulator. Therefore, our imple-mentation of the code had to achieve a run time onthe order of few micro seconds (µs) so that millions

of calls to the simulator are possible. A single callto the simulator will be defined as evaluating somerepair action xt ∈ PN (Dt), PN (D1

t ), or PN (DNt )

and completing the rollout process until all thedamaged components are repaired (complete roll-out [48]). Despite the use of best software practices,to mitigate large action spaces by coding a fast sim-ulator, it is imperative to match it with good hard-ware capability for simulations of significant size. Itis possible to parallelize the simulation process tofully exploit modern day hardware capabilities. Weran our simulations on the Summit super computer(see Section 9). Specifically, (100/376) PoweredgeC6320 Haswell nodes were used, where each nodehas 2x e5-2680v3 (2400/9024) cores. In Case 1, wenever encountered any |PN (D1

t )| or |PN (DNt )| ex-

ceeding 106 for any damage scenario, whereas forcase 2 we limited the |PN (Dt)| to at most 105,for all damage scenarios. On this system, with asimulator call run time of 1 µs, we managed torun 1011 simulator calls per node for different sim-ulation plots. Our computational efforts empha-size that for massive combinatorial decision-makingproblems, it is possible to achieve near-optimal per-formance with our methodology by harnessing pow-erful present day computational systems and imple-menting sound software practices.

8. Concluding Remarks

In this study, we proposed an optimization for-mulation based on the method of rollout, for theidentification of near-optimal community recoveryactions, following a disaster. Our methodologyutilizes approximate dynamic programming algo-rithms along with heuristics to overcome the curseof dimensionality in its analysis and management oflarge-scale infrastructure systems. We have shownthat the proposed approach can be implementedefficiently to identify near-optimal recovery deci-sions following a severe earthquake for the electri-cal power serving Gilroy. Different base heuristics,used at the community-level recovery, are used inthe simulation studies. The intended methodol-ogy and the rollout policies significantly enhancedthese base heuristics. Further, the efficient per-formance of the rollout formulation in optimizingdifferent common objective functions for commu-nity resilience is demonstrated. We believe thatthe methodology is adaptable to other infrastruc-ture systems and hazards.

14

Page 15: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

9. Acknowledgement

The research herein has been funded by theNational Science Foundation under Grant CMMI-1638284. This support is gratefully acknowledged.Any opinions, findings, conclusions, or recommen-dations presented in this material are solely those ofthe authors and do not necessarily reflect the viewsof the National Science Foundation.

This work utilized the RMACC Summit super-computer, which is supported by the National Sci-ence Foundation (awards ACI-1532235 and ACI-1532236), the University of Colorado Boulder andColorado State University. The RMACC Summitsupercomputer is a joint effort of the University ofColorado Boulder and Colorado State University.

10. References

References

[1] J. A. Nachlas, Reliability Engineering: ProbabilisticModels and Maintenance Methods, CRC Press, 2017.

[2] M. J. Kochenderfer, Decision Making Under Uncer-tainty: Theory and Application, MIT Press, 2015.

[3] D. P. Bertsekas, Dynamic Programming and OptimalControl, Vol. 1, Athena Sci. Belmont, MA, 1995.

[4] H. Meidani, R. Ghanem, Random Markov decision pro-cesses for sustainable infrastructure systems, Struct.and Infrastruct. Eng. 11 (5) (2015) 655–667.

[5] D. M. Frangopol, M. J. Kallen, J. M. Van Noortwijk,Probabilistic models for life-cycle performance of dete-riorating structures: review and future directions, Prog.in Struct. Eng. and Mater. 6 (4) (2004) 197–212.

[6] H. Ellis, M. Jiang, R. B. Corotis, Inspection, Main-tenance, and Repair with Partial Observability, J. ofInfrastruct. Syst. 1 (2) (1995) 92–99.

[7] R. B. Corotis, J. Hugh Ellis, M. Jiang, Modeling of risk-based inspection, maintenance and life-cycle cost withpartially observable Markov decision processes, Struct.and Infrastruct. Eng. 1 (1) (2005) 75–84.

[8] E. Fereshtehnejad, A. Shafieezadeh, A randomizedpoint-based value iteration POMDP enhanced with acounting process technique for optimal managementof multi-state multi-element systems, Struct. Saf. 65(2017) 113–125.

[9] D. P. Bertsekas, J. N. Tsitsiklis, Neuro-Dynamic Pro-gramming, 1st Edition, Athena Sci., 1996.

[10] K. D. Kuhn, Network-Level Infrastructure ManagementUsing Approximate Dynamic Programming, J. of In-frastruct. Syst. 16 (2) (2009) 103–111.

[11] A. Medury, S. Madanat, Incorporating network consid-erations into pavement management systems: A casefor approximate dynamic programming, Transp. Res.Part C: Emerg. Technol. 33 (2013) 134–150.

[12] L. Busoniu, R. Babuska, B. De Schutter, D. Ernst, Re-inforcement Learning and Dynamic Programming Us-ing Function Approximators, Vol. 39, CRC Press, 2010.

[13] Office of the Press Secretary, The White House, Presi-dential Policy Directive – Critical Infrastructure Secu-rity and Resilience (Feb 2013).

[14] M. Bruneau, S. E. Chang, R. T. Eguchi, G. C. Lee,T. D. ORourke, A. M. Reinhorn, M. Shinozuka, K. Tier-ney, W. A. Wallace, D. Von Winterfeldt, A Frameworkto Quantitatively Assess and Enhance the Seismic Re-silience of Communities, Earthq. Spectra 19 (4) (2003)733–752.

[15] S. V. Buldyrev, R. Parshani, G. Paul, H. E. Stanley,S. Havlin, Catastrophic cascade of failures in interde-pendent networks, Nat. 464 (7291) (2010) 1025.

[16] A. Barabadi, Y. Z. Ayele, Post-disaster infrastructurerecovery: Prediction of recovery rate using historicaldata, Reliab. Eng. & Syst. Saf. 169 (2018) 209–223.

[17] D. J. Watts, S. H. Strogatz, Collective dynamics ofsmall-world networks, Nat. 393 (6684) (1998) 440.

[18] T. McAllister, Research Needs for Developing a Risk-Informed Methodology for Community Resilience, J. ofStruct. Eng. 142 (8).

[19] G. P. Cimellaro, D. Solari, V. Arcidiacono, C. S.Renschler, A. Reinhorn, M. Bruneau, Community re-silience assessment integrating network interdependen-cies, Earthq. Eng. Res. Inst., 2014. doi:10.4231/

D3930NV8W.[20] The Association of Bay Area Governments (City of

Gilroy Annex) (2011).URL http://resilience.abag.ca.gov/wp-content/

documents/2010LHMP/Gilroy-Annex-2011.pdf

[21] M. Harnish, 2015-2023 HOUSING ELEMENT POL-ICY DOCUMENT AND BACKGROUND REPORT(Dec 2014).URL http://www.gilroy2040.com/wp-content/

uploads/2015/03/GilHE_Adopted_HE_Compiled_web.

pdf

[22] A. Adigaa, A. Agashea, S. Arifuzzamana, C. L.Barretta, R. Beckmana, K. Bisseta, J. Chena,Y. Chungbaeka, S. Eubanka, E. Guptaa, M. Khana,C. J. Kuhlmana, H. S. Mortveita, E. Nordberga,C. Riversa, P. Stretza, S. Swarupa, A. Wilsona,D. Xiea, Generating a synthetic population of theUnited States, Tech. rep. (Jan 2015).URL http://citeseerx.ist.psu.edu/viewdoc/

download?doi=10.1.1.712.1618&rep=rep1&type=pdf

[23] D. Kahle, H. Wickham, ggmap: Spatial Visualizationwith ggplot2, The R J. 5 (2013) 144–162.URL http://journal.r-project.org/archive/2013-

1/kahle-wickham.pdf

[24] National Research Council, Practical Lessons from theLoma Prieta Earthquake, The Natl. Acad. Press, Wash-ington, DC, 1994. doi:10.17226/2269.

[25] N. Jayaram, J. W. Baker, Correlation model for spa-tially distributed ground-motion intensities, Earthq.Eng. & Struct. Dyn. 38 (15) (2009) 1687–1708.

[26] N. A. Abrahamson, W. J. Silva, R. Kamai, Update ofthe AS08 ground-motion prediction equations based onthe NGA-West2 data set, Pac. Earthq. Eng. Res. Cent.,2013.

[27] R. P. Kennedy, M. K. Ravindra, Seismic fragilities fornuclear power plant risk studies, Nucl. Eng. and Des.79 (1) (1984) 47–68.

[28] M. Colombi, B. Borzi, H. Crowley, M. Onida,F. Meroni, R. Pinho, Deriving vulnerability curves us-ing Italian earthquake damage data, Bull. of Earthq.Eng. 6 (3) (2008) 485–504.

[29] A. Singhal, A. S. Kiremidjian, Method for ProbabilisticEvaluation of Seismic Structural Damage, J. of Struct.Eng. 122 (12) (1996) 1459–1467.

15

Page 16: arXiv:1803.01451v2 [math.OC] 20 Apr 2018 · a comprehensive decision-making system must also be able to handle large-scale scheduling problems that encompass large combinatorial decision

[30] K. Jaiswal, W. Aspinall, D. Perkins, D. Wald,K. Porter, Use of Expert Judgment Elicitation to Esti-mate Seismic Vulnerability of Selected Building Types,in: Proc. of the 15th World Conf. on Earthq. Eng., Lis-bon, Portugal, 2012.

[31] Department of Homeland Security, Emergency Pre-paredness and Response Directorate, FEMA, Mitiga-tion Division, Multi-hazard Loss Estimation Methodol-ogy, Earthquake Model: HAZUS-MH MR1, AdvancedEngineering Building Module, Washington, DC (Jan2003).URL https://www.hsdl.org/?view&did=11343

[32] L. Xie, J. Tang, H. Tang, Q. Xie, S. Xue, Seis-mic Fragility Assessment of Transmission Towers viaPerformance-based Analysis, in: Proc. of the 15thWorld Conf. on Earthq. Eng., Lisbon, Portugal.

[33] M. Ouyang, L. Duenas-Osorio, X. Min, A three-stageresilience analysis framework for urban infrastructuresystems, Struc. Saf. 36 (2012) 23–31.

[34] S. Nozhati, B. R. Ellingwood, H. Mahmoud, J. W.van de Lindt, Identifying and Analyzing Interdepen-dent Critical Infrastructure in Post-Earthquake UrbanReconstruction, in: Proc. of the 11th Natl. Conf. inEarthq. Eng., Earthq. Eng. Res. Inst., Los Angeles, CA,2018.

[35] S. Nozhati, B. R. Ellingwood, H. Mahmoud, Y. Sarkale,E. K. P. Chong, N. Rosenheim, An approximate dy-namic programming approach to community recoverymanagement, in: Eng. Mech. Inst., 2018.

[36] S. Nozhati, Y. Sarkale, B. R. Ellingwood, E. K. P.Chong, H. Mahmoud, A combined approximate dy-namic programming & simulated annealing optimiza-tion method to address community-level food securityin the aftermath of disasters, in: submitted to iEMSs2018. arXiv:1804.00250.

[37] Y. Sarkale, S. Nozhati, E. K. P. Chong, B. R. Elling-wood, H. Mahmoud, Solving Markov decision processesfor network-level post-hazard recovery via simulationoptimization and rollout, in: submitted to IEEE CASE2018 (Invited Paper). arXiv:1803.04144.

[38] D. P. Bertsekas, Rollout Algorithms for Discrete Opti-mization: A Survey, Springer-Verlag New York, 2013,pp. 2989–3013. doi:10.1007/978-1-4419-7997-1_8.

[39] D. P. Bertsekas, J. N. Tsitsiklis, C. Wu, Rollout Algo-rithms for Combinatorial Optimization, J. of Heuristics3 (3) (1997) 245–262. doi:10.1023/A:1009635226865.

[40] N. Limnios, Fault Trees, Wiley-ISTE, 2013.[41] Y. Ouyang, S. Madanat, Optimal scheduling of rehabil-

itation activities for multiple pavement facilities: exactand approximate solutions, Transportation Res. PartA: Policy and Pract. 38 (5) (2004) 347–365. doi:

10.1016/j.tra.2003.10.007.[42] Y. Liu, E. K. P. Chong, A. Pezeshki, Bounding the

greedy strategy in finite-horizon string optimization,in: 2015 IEEE 54th Annu. Conf. on Decis. and Con-trol (CDC), IEEE, 2015, pp. 3900–3905.

[43] H. Masoomi, J. W. van de Lindt, Restoration and func-tionality assessment of a community subjected to tor-nado hazard, Struct. and Infrastruct. Eng. 14 (3) (2018)275–291.

[44] R. A. Howard, Dynamic Programming and Markov Pro-cesses, MIT Press, Cambridge, MA, 1960.

[45] G. Tesauro, G. R. Galperin, On-line Policy Improve-ment using Monte-Carlo Search, in: M. C. Mozer,M. I. Jordan, T. Petsche (Eds.), Advances in Neural

Information Processing Systems 9, MIT Press, 1997,pp. 1068–1074.URL http://papers.nips.cc/paper/1302-on-line-

policy-improvement-using-monte-carlo-search.pdf

[46] B. Abramson, Expected-Outcome: A General Model ofStatic Evaluation, IEEE Trans. on Pattern Anal. andMach. Intell. 12 (2) (1990) 182–193. doi:10.1109/34.

44404.[47] S. Ragi, H. D. Mittelmann, E. K. P. Chong, Direc-

tional Sensor Control: Heuristic Approaches, IEEESens. J. 15 (1) (2015) 374–381. doi:10.1109/JSEN.

2014.2343019.[48] D. P. Bertsekas, Rollout Algorithms for Constrained

Dynamic Programming, Tech. rep., Lab. for Inf. & De-cis. Syst. (Apr 2005).

16