Stochastic Petri Nets and Discrete Event Simulation Notes

Molloy 1985: Discrete Time Stochastic Petri Nets
Glynn 1989: A GSMP Formalism for Discrete Event Systems
pomp docs: A quick note on Exponential and Geometric processes from the pomp documentation

Discrete Time Petri Nets (Molloy 1985 DOI:10.1109/TSE.1985.232230)

For a discrete time stochastic Petri net (DTSPN), on each time step, if no events are in conflict, then the probability of any combination of firings and non-firings (a binary vector) will simply be a product of \(\rho_{i}\) and \(1-\rho_{i}\). The probability \(\rho_{i}\) are assigned by the designer a priori, and are “the probability that the enabled transition \(t_{i}\) fires at the next time step, given (conditioned on) the fact that no other transition fires.” This means the firing times are Geometrically distributed, if \(\rho_{i}<1\).

Molloy’s equation 4 is: \[\rho_{i} = \frac{P[E_{i}]}{P[E_{i}\cup E_{0}]}\].

We can interpret this equation like this, the denominator is restricting the sample space to \(E_{i}\cup E_{0}\), that is, the world where either nothing happens or \(E_{i}\) happens (\(P[E_{i}\cup E_{0}] = P[E_{i}] + P[E_{0}] + P[E_{i}\cap E_{0}]\) but because the events are mutually exclusive, \(P[E_{i}\cap E_{0}] = 0\)). So the equation asks “given only nothing or \(E_{i}\) can happen, what’s the probability \(E_{i}\) happens?”, which is precisely the interpretation of \(\rho_{i}\) used by the modeller.

In a set of mutually exclusive transitions \(\{t_{i}\}\), that equation holds for all, so using the constraint all \(0<\rho_{i}<1\), one can solve for \(P[E_{0}]\) (he does not show this). Let’s consider the simple case with 2 mutually exclusive transitions. We get 3 equations, and 3 unknowns (the elementary unconditional probabilities). The denominators are sums here, he writes unions but because the events are mutually exclusive the intersection, coming from the inclusion-exclusion principle is the zero set.

\[ P[E_{0}] + P[E_{1}] + P[E_{2}] = 1 \\ \rho_{1} = \frac{P[E_{1}]}{P[E_{1}] + P[E_{0}]} \\ \rho_{2} = \frac{P[E_{2}]}{P[E_{2}] + P[E_{0}]} \] Solving these is easy, start with \(P[E_{1}]\):

\[ \rho_{1} = \frac{P[E_{i}]}{P[E_{i}\cup E_{0}]} = \frac{P[E_{1}]}{P[E_{1}] + P[E_{0}]} \\ \rho_{1}( P[E_{1}] + P[E_{0} )= P[E_{1}] \\ \rho_{1}P[E_{1}] + \rho_{1}P[E_{0} = P[E_{1}] \\ \rho_{1}P[E_{0}] = P[E_{1}] - \rho_{1}P[E_{1}] \\ \rho_{1}P[E_{0}] = P[E_{1}] (1 - \rho_{1}) \\ P[E_{1}] = \frac{\rho_{1}}{(1 - \rho_{1})}P[E_{0}] \\ \] And that holds generally with sets of mutually exclusive transitions. Then just use the knowledge that the sum of all events is 1 to solve for \(P[E_{0}]\).

The case for \(\{t_{i}\}\) sets larger than 2 follows generally. The key is just realizing that due to mutual exclusivity, the denominators of the conditional probabilities \(\rho_{i}\) can be changed so \(P[E_{0}]\) does not appear in them, solve for all the \(P[E_{i}]\) values, then use the sum to 1 constraint to get \(P[E_{0}]\). Molloy gives a formula for the probability that a particular transition in a set fires to be:

\[ P[E_{i}] = \frac{\frac{\rho_{i}}{1-\rho_{i}}}{ 1 + \sum \frac{\rho_{j}}{1-\rho_{j}} } \]

molloy <- function(rho,i){
  (rho[i] / (1 - rho[i]))  / ( 1 + sum( rho / (1 - rho) ))
}

rho <- c(0.5,0.25)

We can check our solution works against his:

fractions((rho[1] - prod(rho)) / (1 - prod(rho)))

## [1] 3/7

fractions(molloy(rho,1))

## [1] 3/7

If \(\rho_{i}=1\) is allowed for some subset \(T_{D}\subset T_{E}\) in the set of enabled, mutually exclusive (conflicting) transitions, the previous equations do not make sense. Molloy gives the only reasonable conflict resolution as saying that for \(t_{i}\in T_{E}\) where \(\rho_{i}<1\), their probability of firing \(P[E_{i}]\) is zero. For \(t_{i}\in T_{D}\):

\[ P[E_{i}] = \frac{1}{|T_{D}|} \]

By making “chains” of \(\rho_{i}=1\) transitions and places, one can implement deterministic firing time distributions into DTSPN.

Now, that worked nicely when the sets of transitions were all mutually exclusive, but there are complications he remarks upon. For the 4 token SPN in Fig. 1, we have a set of transitions that are in conflict but not mutually exclusive. Based on the firing rules, the sample space contains the following exhaustive, mutually exclusive events:

\[ S = \{E_{0}, E_{1}, E_{2}, E_{3}, E_{12}, E_{23}, E_{13}\} \] So we can have no transitions fire, any single transition fire, or any combination of 2 transitions firing. The modeller gives us \(\{\rho_{1}, \rho_{2}, \rho_{3}\}\). We know \(P[E_{i}]\) for each of the events corresponding to a single transition firing from before. How to get the probability of the 2-set firing? He gives as an example \(P[E_{23}]\) but doesn’t show his work, and references his PhD thesis which is sadly not available online. We’ll try to reverse engineer his solution.

Assume that we declared:

\[ \rho_{2}\rho_{3} = \frac{P[E_{23}]}{P[E_{0} \cup E_{2} \cup E_{3} \cup E_{23}]} = \frac{P[E_{23}]}{ P[E_{0}] + P[E_{2}] + P[E_{3}] + P[E_{23}]} \] So that it’s the probability of both of them firing when the sample space is restricted to nothing firing, either one or the other, or both. Because of mutual exclusivity, we turn the denominator into a sum of probabilities. Now solve:

\[ \rho_{2}\rho_{3}(P[E_{0}] + P[E_{2}] + P[E_{3}] + P[E_{23}]) = P[E_{23}] \\ \rho_{2}\rho_{3}P[E_{0}] + \rho_{2}\rho_{3}P[E_{2}] + \rho_{2}\rho_{3}P[E_{3}] + \rho_{2}\rho_{3}P[E_{23}] = P[E_{23}] \\ \rho_{2}\rho_{3}P[E_{0}] + \rho_{2}\rho_{3}P[E_{2}] + \rho_{2}\rho_{3}P[E_{3}] = P[E_{23}] - \rho_{2}\rho_{3}P[E_{23}] \\ \rho_{2}\rho_{3}P[E_{0}] + \rho_{2}\rho_{3}P[E_{2}] + \rho_{2}\rho_{3}P[E_{3}] = P[E_{23}] (1- \rho_{2}\rho_{3}) \\ \rho_{2}\rho_{3}(P[E_{0}] + P[E_{2}] + P[E_{3}]) = P[E_{23}] (1- \rho_{2}\rho_{3}) \\ \frac{\rho_{2}\rho_{3}}{1- \rho_{2}\rho_{3}}(P[E_{0}] + P[E_{2}] + P[E_{3}]) = P[E_{23}] \\ \frac{\rho_{2}\rho_{3}}{1- \rho_{2}\rho_{3}} P[E_{0}] \left( 1 + \frac{\rho_{2}}{1-\rho_{2}} + \frac{\rho_{3}}{1-\rho_{3}} \right) = P[E_{23}] \\ \] Which is equal to his equation 7.

A GSMP Formalism for Discrete Event Systems (Glynn 1989) DOI:10.1109/5.21067

2 definitions:

CVDS: continuous variable dynamical system
DEDS: discrete event dynamical system

The CVDS/DEDS analogy

An output process \(s(t)\) for CVDS is defined as \(s(t) = h(x(t))\) where \(x(t)\) is some transformation of the internal state of the system. Nothing here yet corresponds to a map or differential giving system evolution.

For a DEDS he lets \(S(t)\) be the output process for a system with piecewise constant trajectories, such that if the DEDS is a queueing system it is:

\[ S(t) = \sum_{n=0}^{\infty} S_{n} I(\Lambda_{n} \leq t < \Lambda(n+1)) \]

Then, \(\Delta_{n+1} = \Lambda(n+1) - \Lambda(n)\) is the time that the system dwells in the state attained by the n-th transition, and \(S_{n}\) is the associated output value. The output process needs the existence of a stochastic process \(X = (X_{n}:n\geq 0)\) describing the evolution of the internal state of the system. The “observables” are related by: \((S_{n},\Delta_{n}) = (h_{1}(X_{n}),h_{2}(X_{n}))\).

The next thing is how to describe the evolution of the internal state for each system. For CVDS we can use a mapping, \(x=f(x)\), which takes you from one spot in state space to another. Interestingly enough he notes that this representation is “noncausal”, creating difficulties both mathematically and computationally. For that reasons, we usually use a (possibly time-dependent) differential relationship \(x^{\prime}(t) = f(t,x(s):0\leq s \leq t)\). Then the model is “causally defined in terms of the infinitesimal characteristics of the system”, which is a local specification. For classic differential equations, the considered dynamics are \(x^{\prime}(t) = f(t,x(t))\).

For DEDS the “noncausal” representation is \(X = f(X,\eta)\), where \(\eta\) is a sequence of i.i.d. random variables. The causal representation is \(X_{n+1} = f_{n+1}(\eta_{n+1},X_{k}:0\leq k \leq n)\). For Markov processes, it becomes \(X_{n+1} = f_{n+1}(X_{n},\eta_{n+1})\).

GSMP (Generalized Semi-Markov Processes)

To discuss GSMP, Glynn uses the \(X_{n+1} = f_{n+1}(\eta_{n+1},X_{k}:0\leq k \leq n)\) DEDS representation. \(S\) is the countable set of states, \(E\) is the countable set of events, \(s\in S\) is a physical state a system may exist in, and \(E(s)\) is the finite set of events that can trigger transitions out of \(s\). For each \(e\in E(s)\), a clock \(c_{e}\) is associated which gives the amount of time elapsed since \(c_{e}\) was last activated. Each clock’s time elapses at a rate/speed \(r_{se}\). The rates can be a function of state. Let \(C(s)\) be the set of clock readings possible in \(s\in S\).

Algorithm for GSMP

To describe a general algorithm to simulate from a GSMP we’re gonna need more notation. A GSMP is a GSSMC (General State Space Markov Chain) on \(\Sigma\), the state space where the internal state sequence \(X\) takes values in \(\Sigma = \bigcup_{s\in S} {s}\times[0,\infty)\times C(s)\). Therefore, the moment the system arrives in a state at the n-th transition, occurring at time \(\Lambda(n)\), the state is given by \(X_{n}=(S_{n},\Delta_{n},C_{n})\). The first element of the state is the physical state, the second is the dwell time, and the third is the vector of (valid) clock readings. Because the clocks accrue time deterministically within a physical state, the clock vector upon entry to the state is sufficient to describe the system. The dwell time still needs to be accounted for in the state as the joint probability of a jump and destination state change as it accrues, meaning that the probability distribution any simulation algorithm should sample from is incomplete without incorporating it.

Let \(\overrightarrow{X}_{n}=(X_{0},...,X_{n})\) be a random variable and \(\overrightarrow{x}_{n}=(x_{0},...,x_{n})\) be a realization (sampled trajectory) from it, where \(x_{i}\in\Sigma\).

For each \(e\in E(s)\):

CDF \(F(t;\overrightarrow{x}_{n},e)\) over values \(0\leq t\) giving the probability \(e\) fires at time \(t\), if it were the first to fire.
The survival is \(\overline{F}(t;\overrightarrow{x}_{n},e) = 1 - F(t;\overrightarrow{x}_{n},e)\).
\(G(t+dt; \overrightarrow{x}_{n},e) = F(t+dt;\overrightarrow{x}_{n},e) \prod\limits_{e^\prime\in E(S_{n});e^\prime \neq e} \overline{F}(t;\overrightarrow{x}_{n},e^\prime)\): is the joint probability that \(e\) fires at time \(t+dt\), so that \(\Delta_{n+1}\in dt\), such that \(e^{*}_{n+1} = e\) (where asterisk means it attains the minimum, is the first to fire).

Glynn calls \(F\) a residual life distribution. I suspect that means it’s defined as the distribution over time remaining to fire, if “time” is a normalized, unit rate quantity, putting everything on an even footing. Otherwise the division we will see later on by the current clock speed doesn’t make sense. So when it’s sampled its asking the question “if this event has accured this much unit time so far, how much unit time is left before it fires?” It should be equivalent to integrated hazard.

In his notation \(F_{a}(x;\overrightarrow{x}_{n},e) = F(ax;\overrightarrow{x}_{n},e), a\geq 0\) to make the probabilities relative to the entry point \(\Lambda(n)\) but I think it’s more clear to Just Deal With it (the extra notation).

The simulation algorithm upon arriving in a state \(\overrightarrow{X}_{n}\) to sample \(X_{n+1}\) is:

For each enabled event \(e\in E(S_{n})\), sample a random variate \(Y_{e,n} \sim F(\cdot;\overrightarrow{X}_{n},e)\).
Set \(\Delta_{n+1} = \min\{Y_{e,n}/r_{S_{n},e}:e\in E(S_{n})\}\) and \(e^{*}_{n+1}\) is the unique event which achieves the minimum for \(\Delta_{n+1}\).
Sample the new physical state \(S_{n+1}\) from \(p(\cdot; \overrightarrow{X},e^{*}_{n+1})\).
Set:
1. \(C_{n+1,e}=-1\), for \(e\not\in E(S_{n+1})\) (newly disabled clocks are set to some null value)
2. \(C_{n+1,e}=0\), for \(e\in N(S_{n+1};S_{n},e^{*}_{n+1})\) (newly enabled clocks are set to zero)
3. \(C_{n+1,e} = C_{n+e} + \Delta_{n+1} r_{S_{n},e}\), for \(e\in 0(S_{n+1};S_{n},e^{*}_{n+1})\) (old clocks that still run are updated with how much they ticked in the last state dwell)

Such an algorithm can sample from time-inhomogeneous generalized semi-Markov processes. It’s inhomogeneous because the residual life distributions \(F\) and transition \(p\) can depend on the entire history of \(X\). Also, it probably draws far more random numbers than it needs to. Anderson would later (18 years later, to be specific) cut down the number to just 1 per event (after initialization), and also demystified the “residual life” distributions via random time changes. Thank god.

A quick note on Exponential and Geometric processes from the pomp documentation

When approximating a continuous time process with a discrete time one, fixups have to be made in order to have a proper approximation, meaning, something that will converge to the other one as \(\Delta t\rightarrow 0\). We’ll look at this through the example of approximating an Exponential distribution waiting time with a Geometric distribution, on a time step \(\Delta t\). The pomp docs do this. Consider a process that has expected time \(T\) such that:

\[ T\sim \text{Exponential}(R) \] Then if we set:

\[ r = \frac{\log(1 + R\Delta t)}{\Delta t} \]

And then the discrete process:

\[ N \sim \text{Geometric}(p = 1 - e^{-r\Delta t}) \]

Has the same expected waiting time \(\mathbb{E}[N\Delta t] = \mathbb{E}[T]\). They say “in particular, \(r \to R\) as \(\Delta t \to 0\)”.

Why should \(r\) be equal to this strange equation? Intuitively it would seem setting \(r = R\Delta t\) or something simple should work right? Not quite. Let’s first evaluate their claim about the limit.

\(\lim\limits_{\Delta t \to 0} \frac{\log(1 + R\Delta t)}{\Delta t}\) has an indeterminate form, so we apply L’Hopital’s rule.
\(\lim\limits_{\Delta t \to 0} \frac{ \frac{d}{d\Delta t} \left( \log(1 + R\Delta t) \right) }{ \frac{d}{d\Delta t} \left( \Delta t \right)} = \frac{ \left( \frac{1}{1 + R\Delta t} \right) \frac{d}{d\Delta t}\left( 1 + R\Delta t \right) }{1} = \left( \frac{1}{1 + R\Delta t} \right) \left( \frac{d}{d\Delta t}(1) + \frac{d}{d\Delta t}(R\Delta t) \right) = \frac{R}{1+R\Delta t}\)
In L’Hopital’s rule, don’t foget to actually take the limit: \(\lim\limits_{\Delta t \to 0} \frac{R}{1+R\Delta t} = R\)

Okay great. But how did they get that? Not so hard, just set expectations equal to each other and solve. This requires first assuming that the probability parameter of the Geometric random variate is equal to some exponential CDF (integrated hazard).

\(\Delta t \frac{ e^{-r\Delta t} }{1-e^{-r \Delta t}} = \frac{1}{R}\)
\(\frac{ e^{-r\Delta t} }{1-e^{-r \Delta t}} = \frac{1}{R\Delta t}\)
Take reciprocals of both sides: \(\left( \frac{ e^{-r\Delta t} }{1-e^{-r \Delta t}} \right)^{-1} = \left( \frac{1}{R\Delta t} \right)^{-1}\)
\(\frac{1-e^{-r \Delta t}}{e^{-r \Delta t}} = R\Delta t\)
\(e^{r \Delta t}(1-e^{-r \Delta t}) = R\Delta t\)
\(e^{r \Delta t}-(e^{r \Delta t}e^{-r \Delta t}) = R\Delta t\)
\(e^{r \Delta t}-e^{r \Delta t-r \Delta t} = R\Delta t\)
\(e^{r \Delta t}-1 = R\Delta t\)
\(e^{r \Delta t} = 1+R\Delta t\)
\(r \Delta t = \log{1+R\Delta t}\)
\(r = \frac{\log{(1+R\Delta t})}{\Delta t}\)