Can the Markovian influence graph simulate cascading resilience from historical outage data?

It is challenging to simulate the cascading line outages that can follow initial damage to the electric power transmission system from extreme events. Instead of model-based simulation, we propose using a Markovian influence graph driven by historical utility data to sample the cascades. The sampling method encompasses the rare, large cascades that contribute greatly to the blackout risk. This suggested new approach contributes a high-level simulation of cascading line outages that is driven by standard utility data.


I. INTRODUCTION
Historical transmission line outage data that is routinely collected by utilities can be processed and grouped into cascades of outages [1]. These processed cascades are the observational bedrock for the study of cascading failure, since they occurred in practice. However, if one assumes some initial outages and seeks to predict the probabilistic extent of further cascading, ranging from no further outages to blackouts, the historical cascades are limiting: the particular initial outages may not occur in the historical cascades, and even if they do occur once or twice, it is only one or two samples of the possible cascading outcomes. To address this problem, we propose using the Markovian influence graph to describe the statistics of the historical outage data, and then sample from the Markovian influence graph to simulate the consequences of some assumed initial outages. This gives a high-level and flexible statistical model of cascading that can be driven by standard utility data. In suggesting this approach, we are motivated by the resilience problem of estimating the cascading that can follow damage to the power transmission system in an extreme event [2]- [4].
Extreme events such as storm, fire, or earthquake can damage multiple power system components. Then further power system components can outage in a cascade. Usually the cascading only outages components without damaging them, but the cascading does make the blackout more widespread and impactful, and can seriously hinder the subsequent recovery from the event. There is considerable expertise modeling probabilistic power system component failure under extreme conditions of wind, flooding, icing, earthquake and fire. However, the cascading phase of resilience is much less well We thank Benjamin Carreras of BACV Solutions, Oak Ridge TN, David E. Newman of University of Alaska-Fairbanks, and José-Miguel Reynolds-Baredo of Universidad Carlos III de Madrid for providing the OPA results used to determine load shed. We thank Bonneville Power Administration for making publicly available the outage data that made this paper possible. The analysis and any conclusions are strictly those of the authors and not of BPA. We thank anonymous reviewers for their comments improving the paper. This work was supported by USA NSF grants 1609080 and 1735354. characterized. Given the initially damaged components, one can simulate the cascading using a model-based simulation. While useful, simulation only captures a limited subset of approximated cascading mechanisms. The alternative that we suggest and explore in this paper uses a Markovian influence graph driven by historical utility data to generate samples of the cascaded transmission lines. Throughout the paper we are interested in properly sampling from the largest cascades since these dominate the risk [5], because straightforward sampling does not work well for the larger cascades.
There is a large literature on model-based simulation of cascading (reviewed in [6], [7]), and substantial literature on influence or interaction graphs and fault chains driven by simulated data (reviewed in [8], [9]). To our knowledge, the only previous work on influence graphs driven by real data is [8], [10].
The need for higher-level statistical simulation of cascading arose from broader studies of the multiple phases of resilience. For example, Romero [2] optimized investments to improve resilience to earthquakes, and discussed but did not model the cascading phase of resilience. Recently Kelly-Gorham [3], [4] proposed a high-level statistical method driven largely by observed statistics called CRISP to quantify power transmission system overall resilience in all its phases. CRISP models the cascading phase of resilience by sampling from a probability distribution of the total number of lines out based on historical data. Then, in [4], given the number of lines out, the lines outaged in a cascade are chosen in accordance with an observed probability distribution [11] of network distance between cascaded line outages.
In this paper, given the initial outages, we aim to more accurately capture the spatial and interdependence statistics of which lines outage by simulating the recently developed Markovian influence graph driven by utility data. We also discuss possibilities for determining the load shed from the samples of cascaded lines and hence the risk of a widespread blackout caused by the extreme event and the cascading.

II. HISTORICAL DATA AND INFLUENCE GRAPH
This section summarizes the historical outage data and the Markovian influence graph used in this paper; all detail is referred to [8], [11].
Detailed transmission line outage data are routinely collected by utilities worldwide, such as in North America's Transmission Availability Data System (TADS). Here, as an illustration, we use some publicly available historical line outage data [12] recorded over fourteen years by a utility in the WECC interconnection of the USA. The data includes the forced line outages, names of the outaged lines, and their start time to the nearest minute. There are 10942 forced outages and over 500 different transmission lines outaging in the data with rated voltages ranging from 69 kV to 500 kV.
The line outages in the historical utility data are grouped into cascades and then divided into generations in each cascade using a simple method based on the outage timing that is described in [11]. This data processing combines into 6687 cascades a variety of dependent outages, including the initiating outages, cascading interactions within the network, and outages occurring in faster succession with a common cause such as bad weather. Thus the extreme conditions of interest in this paper are included in the data and its processing.
The cascades of outages divided into generations are then processed to form the Markovian influence graph. The Markovian influence graph is a Markov chain with discrete time steps and a discrete state space. Each new time step corresponds to a new generation of cascading. Each observed line outage (or a set of line outages that occur nearly simultaneously) forms a state of the Markov chain, and statistics of the pairwise interaction of states in successive generations are extracted. For example, given a specific line outage in a cascade generation, the probabilities of transition to other specific line outages in the next generation of the cascade are estimated from the transitions that occurred in the data. That is, the estimated transition probabilities represent the influence between successive line outages. At each state, there is a probability of the cascade stopping, which is represented as a transition to a special state with no line outages that is indicated by the empty set of outages {}. After the first transition to {}, the influence graph remains stopped at {} for all subsequent generations.
A simple 3-line toy example of forming the Markovian influence graph from outage data is shown in Figure 1. The Markov chain indicated by the influence graph at the bottom of Figure 1, models the statistics of outage data at the top of Figure 1. This example uses a constant transition matrix for ease of illustration, but the practical Markovian influence graph has varying transition matrices for different generations. This is because the transitions vary, especially between the first transition and subsequent transitions and because particular care is taken in forming the influence graph that the overall stopping probabilities at each generation match those in the data. The formation of the influence graph is somewhat intricate [8] and is designed to leverage the sparse data as much as feasible, particularly for the higher cascade generations that are rarer in the observed data.
The Markovian influence graph resulting from the historical outage data is a Markov chain with 1094 states. 543 of the states are single line outages, and the rest of the states have several line outages that occurred nearly simultaneously.
The Markovian influence graph in [8] exploits the asymptotic properties of the Markov chain to find the transmission lines most involved in large blackouts. In this paper we sample from the Markov chain to find cascades consistent with the assumed initial failures and the statistics of how cascades spread on the network.

III. SAMPLING CASCADES WITH THE INFLUENCE GRAPH
Let Y 0 be the set of initially failed lines that are damaged by the extreme event. We express Y 0 as a disjoint union of m Markov chain states: Let the rth Markov chain starting from state x (r) 0 but subsequently avoiding any initially failed states be X .. is easily obtained by preventing transitions to states in Y 0 by deleting the columns of the transition matrix corresponding to states in Y 0 and renormalizing. ) We write |x| for the number of line outages in state x. The number of lines out in the rth chain is and the total number of lines out is Note that (2) and (3) neglect any repeats of lines out within or between chains.

A. Simulating the influence graph
We first describe a straightforward but inferior way to do the simulation. For the rth chain we need to simulate X is produced at step j. Then the next state is produced as follows: Let e (r) j be the row vector with a one at the index of state x (r) j and zeros elsewhere. Let P k be the transition matrix from generation k to k + 1. Then e (r) j P k is a probability distribution over the states not in Y 0 1 . Sample from this probability distribution to obtain the state x (r) The problem with this straightforward way to do the simulation is that it will mainly sample the frequent short cascades with few line outages, so that a huge number of samples is needed to accurately estimate the longer cascades. An advantage of the influence graph is that it can be easily modified to sample more uniformly over the range of the possible cascades by manipulating the cascade stopping probabilities. Instead of allowing chains to stop by themselves, the stopping is inhibited until a maximum number of cascade generations g max is simulated, and then the chain stops. At each generation before g max , the line outages are recorded, and, although the chain does not stop, the probability that the state would have stopped is recorded. This gives many samples of the number of line outages for each of the generations 0, 1, 2, .., g max , and these samples range from a small to a large number of line outages. And the probability of stopping at each intermediate length cascade can be calculated.
We now give the details of this improved simulation. Suppose the rth chain is simulated and is at state x (r) k at generation k < g max . When the simulation samples from the probability distribution to obtain the next state x is the entry in the first column of the transition matrix P k corresponding to x (r) k . The probability that the rth chain has exactly k generations is More precisely, we have simulated (realized) one particular sequence of states x k that avoid stopping. Now, conditioned on the states that do happen occurring in this sequence, we compute in the Markov chain that does not avoid stopping the probability of stopping at generation k with (4).
We indicate the first simulation of the rth chain by the superscript (r; 1). We perform the first simulation of the rth chain up to generation g max and extract results for each generation k ≤ g max . For generation k, the total number of lines out is and the probability of n (r;1) k lines out is equal to q (r;1) k , since the number of lines out increases at each non-stopping generation. Repeating the simulation of the rth chain t times for the same initial state x and their probabilities q (r;s) k for s = 1, 2, ..., t and k = 0, 1, ..., g max . All these results are combined to give the distribution of the number of line outages N (r) in the simulations of the rth chain: where the indicator function I[·] limits the sums in (6) to the results giving v line outages. Thus (6) is the average of all the probabilities corresponding to the t possible occurrences of v line outages in the simulations of the rth chain. Then, according to (3) and assuming the chains are independent, we evaluate the distribution of the total number of lines out N by convolving the distributions N (1) , N (2) , ..., N (m) . The convolution is done by multiplying probability generating functions: The coefficient of

IV. PROBABILITY DISTRIBUTION OF LOAD SHED
The load shedding of a cascade is denoted as L. We want to estimate f L , the probability distribution of load shed. We do this by conditioning on the number of line outages.
The number of line outages N ranges from 0 to max , where 0 = number of lines in Y 0 . We partition the range of N into K disjoint bins B 1 , B 2 , ..., B K so that We use the following subsections to obtain f L|N ∈Bκ the distribution of the load shed given that the number of lines out are in bin B κ . The bins (8) are chosen large enough so that there is sufficient data in each bin to be able to approximate f L|N ∈Bκ .
From the distribution of N provided in section III, we can easily evaluate the bin probabilities: The idea is to evaluate the distribution of load shed f L by conditioning on the number of lines in the bins: We now use the OPA simulation to approximate f L|N ∈Bκ .

A. Load shed given the number of lines out
Given that the number of lines N outaged after cascading are in bin B κ , we want to obtain the distribution of load shed f L|N ∈Bκ . We use a probability distribution of load shed because we are trying to estimate the risk of a future extreme event, and the power system loading condition, generator dispatch and maintenance status for a future event are uncertain and variable. This variability will produces different load sheds for the same line outages, or the same number of line outages.
The OPA model [13]- [16] has been validated to approximate well the observed bulk statistics of blackouts of the WECC [17], [18]. Here, noting that our historical data is from part of WECC, we use OPA results on a 1553 bus model of WECC to generate the conditional distributions of load shed f L|N ∈Bκ , κ = 1, 2, 3, ..., 12. Note that OPA is a longterm simulation that samples from a variety of grid loading conditions. The OPA results consist of 58 903 cascades. Each cascade yields the load shed and the number of lines out.
The OPA results are easily sorted into the bins B 1 , B 2 , ..., B 12 according to the number of lines outaged in each cascade. Bin B κ has κ line outages for 1 ≤ κ ≤ 11, and bin B 12 has 12 or more line outages. Each bin has at least 83 data points. The empirical distribution for load shed in bin B κ is fit with the lognormal distribution f L|N ∈Bκ . Figure 2 shows three of these fits. The data points that have a fraction of load shed less than 0.01 are excluded. The mean µ and standard deviation σ of the lognormal distributions for the 12 bins are shown in Table I. The mean and standard deviation increase as the number of line outages in cascades increase, as expected. Moreover, the Kolmogorov-Smirnov test for each bin's fitting has a p-value at least 0.1, so these fits are statistically significant.

A. Simulation of line outages
We use the improved sampling of section III-A to sample cascades from the Markovian influence graph formed from the utility data in section II. Specifically, starting with assumed 3 initial outages, we simulate 100 cascades up to g max = 100 generations. Since the simulation also records data for each cascade stopping at any generation before 100 generations, this is equivalent to simulating 10 000 cascades, in which 100 cascades have 1 generation, 100 have 2 generations, and so on.
To contrast the improved sampling with straightforward sampling, we also simulate 10 000 cascades with the same initial outages using the straightforward sampling method of simply simulating until the cascade stops, with no special control of the stopping. The two simulations have close execution times. Figure 3 shows that the survival functions match except for some variability in the tail due to limited samples from the straightforward sampling. With the same simulation time, the improved sampling has two benefits: it has smaller standard deviations and generates more large cascade samples. For example, the standard deviation of the probability that cascades have more than 30 line outages is 0.00004 for improved sampling, and 0.0004 for straightforward sampling. As the number of line outages increases, this advantage is even more significant. The straightforward sampling focuses on the small cascades and does not sample enough large cascades to accurately estimate the large cascades. In contrast, the improved sampling samples uniformly across a full range of cascade sizes to better estimate a longer tail. The Markovian influence graph flexibly allows this improved sampling, addressing the straightforward sampling problem common in the literature of inherently undersampling large cascades.
Although in this paper we only estimate the distribution of the number of lines out, there is a wealth of detailed information in the simulated cascades that could be useful.

B. Distribution of load shed
After estimating distribution of the number of line outages N , we proceed to estimate the distribution of load shed using the method described in Section IV. Subsection IV-A calculates the conditional lognormal load shed distributions f L|N ∈B1 , f L|N ∈B2 , ..., f L|N ∈B12 . The distribution of N gives the bin weights b 1 , b 2 , ..., b 12 according to (9). Then the distribution of load shed f L is the mixture of f L|N ∈B1 , f L|N ∈B2 , ..., f L|N ∈B12 weighted by b 1 , b 2 , ..., b 12 as described by (10). Figure 4 shows the survival function of the distribution of load shed f L given 3 initial outages (red solid curve). In Figure  4, we also vary the number of initial outages to simulate different initial line damage scenarios. As the number of initial outages increases, the probability of large load shed increases. This section describes and contrasts the strengths and weaknesses of model-based simulation and simulation of the Markovian influence graph driven by historical data. Realism: A major limitation is that model-based simulations are practically constrained to approximate a limited subset of cascading mechanisms. The Markovian influence graph driven by historical data uses the statistics of real cascades, which encompass all the cascading mechanisms encountered in the historical period. It produces many cascades not observed in the real cascades. However, the Markovian influence graph does not describe the pairwise interactions between outages that could happen but that did not happen in the historical period. The power grid slowly changes over the historical period as components age and upgrades to the grid and operational procedures are made. The Markovian influence graph pools together all the interactions in the power system over the historical period. For example, if an interaction was mitigated half way during the historical period, it still contributes a possible interaction to the Markovian influence graph. Validation: When used to predict cascades, model-based simulation can produce cascades that are often judged to be credible, but most model-based simulations are not yet validated against historical cascade data. (One of the exceptions is the OPA simulation used in subsection IV-A, which is validated with WECC data in [17], [18].) An appropriate validation is reproducing the form of cascade statistics, and there is notable progress towards this goal [19], [20]. On the other hand, the Markovian influence graph describes the historical statistics of successive line outages, and reproduces the statistics of numbers of lines out [8], so important aspects of validation are inherent or already checked. There are additional assumptions in the processing of the historical data and the influence graph formulation, and some of these issues are discussed below. However, the more subtle validations of the Markovian influence graph would seem to require more elaborate approaches to statistical validation. Sampling of grid conditions: Another requirement, which is not always satisfied in the literature, is that model-based simulations should sample appropriately from a range of grid operating conditions [21]. Historical data inherently samples all the actual grid conditions encountered over the observed time, and this is often an appropriate sampling. Sampling of cascades: Predicted cascading is inherently probabilistic due to the many interactions and protection actions that involve thresholding in an uncertain environment. Note that even "deterministic" model-based simulations can sample from cascade possibilities by randomizing the grid conditions. As regards sampling technique, the Markovian influence graph easily allows computing the rarer but riskier long cascades while tracking the outcomes and probabilities of all the truncations of the long cascades, as explained in subsection III-A. A corresponding advantage in computing the largest cascades can be achieved for model-based simulation using splitting [22] or other methods. Markov assumption: The Markovian influence graph only describes the statistics of successive Markov states in the historical cascades. Each Markov state is a specific line outage or set of line outages. The issue is the extent to which one can assume that knowing the state in a cascade generation is sufficient to approximate the statistics of which state is in the next generation. This is a pragmatic but fairly strong assumption. Limited data: The Markovian influence graph is formed from historical data, which is limited in extent, especially for the higher cascade generations. This limitation can be partially mitigated [8], but not eliminated. In practice the higher generations are combined together in some ways to get sufficient data. Model-based simulations can, if not too detailed, produce larger amounts of cascading data. Commonality between cascades: In this paper, the Markovian influence graph describes the statistics of all types of cascades, but some of these may not be the cascades of interest. That is, there is an assumption that the same set of probabilistic cascading interactions tend to occur for all cascades. In particular, statistical patterns in small cascades are to some extent extrapolated to large cascades. It is certainly possible to restrict the historical data to the subset of cascades of interest if the subset is large enough, but there is the tradeoff that as data set becomes smaller, estimation becomes more uncertain. In model-based simulation it seems easier to restrict the cascades simulated, but the challenges of validation for the restricted subset of cascades remain.

VII. COMBINING HISTORICAL AND SIMULATED DATA
We observe that the Markovian influence graph could be used to combine historical cascading data from utilities with cascading data produced by model-based simulation. One method is to simply combine the cascades from both data sets and then form the Markovian influence graph from the combined data set. This method would weight the historical and model-based cascades according to their respective amounts of data. If a different weighting is desired; for example, an equal weight to historical and model-based cascades despite differing amounts of data, then this can be easily achieved by forming two separate Markovian influence graphs and taking their weighted average.
In particular, suppose that the historical cascades give a Markovian influence graph with transition matrices P hist k , k = 1, 2, ... and the model-based cascades give a Markovian influence graph with transition matrices P model k , k = 1, 2, .... We form the union of the states of the historical and modelbased influence graphs and add rows and columns of zeros to the transition matrices to expand them so that they describe transitions between the union of the states. This gives transition matrices P histexpand k and P modelexpand k , k = 1, 2, .... Then, if the desired weight on the historical data is w hist with 0 < w hist < 1, then the weighted average Markovian influence graph has transition matrices w hist P histexpand k + (1 − w hist )P modelexpand k , k = 1, 2, ... This construction relies on the fact that the weighted average of stochastic matrices are stochastic matrices. Combining the two influence graphs as described above is straightforward, assuming that the components that outage in the cascades are described in the same way and in the same format.

VIII. CONCLUSION
This paper suggests a new form of cascading simulation driven by the detailed transmission line outage data that is routinely collected by utilities. This historical outage data is first processed into cascades and generations within cascades, and then used to form the Markovian influence graph that describes the statistics of outages in successive cascade generations as a Markov chain. Some initial line outages are assumed, and in this paper these are the lines damaged by some extreme events, such as weather, fire, icing, or earthquake. Our immediate aim is to simulate and quantify the cascading of line outages after the initial damage. The Markovian influence graph is sampled to produce the simulated cascades. The simulated cascades are statistically similar to but more variable than the cascades in the historical data. The Markovian influence graph easily allows improved sampling that is more uniform across all sizes of cascades, and this gives better estimates of the large cascades that are rare but significant for cascade risk.
The Markovian influence graph produces cascades of specific line outages but no direct estimates of load shed. We show one way to estimate load shed by using a model-based simulation, OPA, to evaluate the probability distribution of load shed conditioned on the number of line outages. The distribution of load shed is then a weighted sum of these conditional distributions, with the weights determined by the line outage statistics produced by the Markovian influence graph. Other methods of estimating load shed can be developed and compared in future work. The combined result of the Markovian influence graph cascading simulation and the load shed estimation is probability distributions of load shed for choices of specific initial lines damaged by the extreme event. We also explain how the Markovian influence graph can describe a weighted combination of cascading data from historical data and from the model-based simulation.
We critically examine the strengths and limitations of model-based cascading simulation versus cascading simulation driven by historical data via the Markovian influence graph. While more work is needed to develop and test the Markovian influence graph simulation, it seems to be a viable alternative to model-based simulation. It is particularly exciting that the Markovian influence graph simulation has strengths and weaknesses that complement those of model-based simulation, so that future decisions about cascading resilience could be informed by two different and complementary approaches to cascading simulation.