Transition Probabilities

Many of the general models implemented in Birdepy have no explicit expression for their transition probabilities \(p_{i,j}(t) := \mathbb P(Z(t)=j\mid Z(0)=i)\), \(i,j\in \mathcal S\), \(t\geq 0\). However, it is often desirable to compute these transition probabilities since they allow for a deeper understanding of the future (random) evolution of the process. Practically speaking, this may underpin some performance analysis of a system which is being modelled by the process. Additionally, transition probabilities allow likelihood functions of discretely-observed sample paths to be evaluated, which opens up the possibility of parameter estimation (as discussed in the next section). This page outlines nine methods of approximating \(p_{i,j}(t)\), as implemented in birdepy.probability(). Three of these methods are matrix-based and rely on state space truncation, another method uses Laplace transforms, and the remaining four methods use approximation models.

To obtain transition probabilities in BirDePy simply call birdepy.probability() (after importing BirDePy). For example, to obtain an array containing the probabilities of transitioning from \(Z(0)=20\) to \(Z(1)=25\) for the ‘Verhulst’ model with \(\boldsymbol \theta = (0.8, 0.4, 0.01, 0.001)\):

import birdepy as bd
bd.probability(20, 25, 1.0, [0.8, 0.4, 0.01, 0.001], model='Verhulst', method='expm')

Which returns:

array([[0.08189476]])

The method ‘expm’ can be replaced by one of the nine choices described on this page. Note that output is an array since it is possible to input initial and final population sizes as lists.

Matrix exponential

The Kolmogorov forward equation for CTMCs provides the foundational property that the collection of transition probability functions \(p_{i,j}(t)\) satisfy, for \(t\ge0\), the system of first order ordinary differential equations

\[\begin{split}\frac{ d p_{i,0}(t)}{d t} &= \mu_1 p_{i,1}(t) - \lambda_0p_{i,0}(t),\\ \frac{ d p_{i,j}(t)}{d t} &= \lambda_{j-1}p_{i,j-1}(t) + \mu_{j+1}p_{i,j+1}(t)- (\lambda_j+\mu_j)p_{i,j}(t), \quad j\ge 1,\\\end{split}\]

with \(p_{i,i}(0) = 1\) and \(p_{i,j}(0)=0\) for \(j\ne i\). Let \(Q\) be the generator of \(Z\), that is, the square matrix with diagonal entries \(q_{i,i}=-(\lambda_i+\mu_i)\), upper diagonal entries \(q_{i,i+1}=\lambda_i\), and lower diagonal entries \(q_{i-1,i}=\mu_i\) (and zeros elsewhere). Also collect the transition probabilities into a matrix \(P(t) = \big(p_{i,j}(t); i,j\in\mathcal S\big)\). Then the equation above can be written in matrix form as \(\frac{d}{d t} P(t) = P(t)Q\). When \(Q\) is finite it immediately follows that

\[P(t) = \lim_{k\to\infty}\sum_{n=0}^k \frac{1}{n!}(Qt)^{n} =: \exp(Qt).\]

See birdepy.probability(method=’expm’) for the implementation of this method in BirDePy.

Uniformization

Due to the special nature of the generator \(Q\) (specifically that it has non-negative off-diagonal entries and row-sums equal to 0), an alternative procedure known as uniformization, introduced by [Jensen1953], is available for computing \(P(t)\). Consider a discrete-time Markov chain (DTMC) with probability transition matrix \(A:=Q/a+I\), where \(a:=\max_z |\lambda_z+\mu_z|\) when this exists (for example when \(|\mathcal S|<\infty\)), and where \(I\) denotes the identity matrix of appropriate size. Suppose that this DTMC transitions at the event times of a Poisson process \((N(t),~t\ge0)\) with rate \(a\). Using this construction, we can show that

\[\mathbb P(Z(t)=j\mid Z(0)=i, N(t)=n)=(A^n)_{i,j}.\]

Therefore, conditioning on \(N(t)\) provides

\[P(t) = \lim_{k\to\infty}\sum_{n=0}^k \frac{(ta)^ne^{-ta}}{n!} A^n.\]

In addition to potentially requiring truncation of the state space for many models of interest, another source of approximation error for this method is that a finite \(k\) must be chosen in the above infinite series.

This method is discussed at length in [Grassman1977] and [vanDijkEtAl2018].

See birdepy.probability(method=’uniform’) for the implementation of this method in BirDePy.

Erlangization

Using probabilistic arguments, Erlangization is yet another matrix-based method for computing \(P(t)\). Let \(T\) be an exponentially distributed random variable with mean \(\eta^{-1}>0\). Define \(r^{(\eta)}_{i,j} := \mathbb P(Z(T) = j\mid Z(0)= i)\) and collect these quantities into \(R(\eta) = \big(r^{(\eta)}_{i,j},~i,j\in \mathcal S)\). The matrix \(R(\eta)\) can be viewed as the one-step transition probability matrix of a DTMC embedded in \(Z\) at the epochs of a Poisson process with rate \(\eta\). By conditioning on the first transition of \(Z\) and the expiry of the time \(T\), we obtain the recursive expressions

\[r^{(\eta)}_{i,j}= \frac{\mu_ir^{(\eta)}_{i-1,j} + \lambda_ir^{(\eta)}_{i+1,j} + \eta 1_{\{i=j\}}}{\lambda_i + \mu_i + \eta},\quad i,j\in\mathcal S.\]

This system of equations can be written in matrix form in terms of \(R(\eta)\), whose solution is given in terms of the generator \(Q\) by

\[R(\eta) = \eta\big(\eta I-Q)\big)^{-1}.\]

Therefore, if we let \(\eta:=k/t\) and \(S_{k,t}\) be an Erlang distributed random variable with rate parameter \(k/t\) and shape parameter \(k\), then the \((i,j)`th entry of the matrix :math:`R(k/t)^k\) contains \(\mathbb P(Z(S_{k,t})=j\mid Z(0)=i)\). The expected value of \(S_{k,t}\) is \(t\) and the variance of \(S_{k,t}\) is \(t/k\). This means that

\[P(t) = \lim_{k\to\infty}R(k/t)^k.\]

The Erlangization method for approximating transition probabilities is discussed in [AsmussenEtAl2002], [MandjesTaylor2016] and [StanfordEtAl2011] for models related to birth-and-death processes. Similar to the uniformization method, error arises in the Erlangization method since the state space may need to be truncated, and the infinite limit above needs to be approximated by a finite \(k\).

See birdepy.probability(method=’Erlang’) for the implementation of this method in BirDePy.

Inverse Laplace transform

The Laplace transform of the transition function \(p_{i,j}(t)\) is

\[f_{i,j}(s) = \mathcal L[p_{i,j}](s) = \int_0^\infty p_{i,j}(t)e^{-st} d t.\]

Let \(\frac{u_1}{v_1+}\frac{u_2}{v_2+}\frac{u_3}{v_3+}\cdots\) be a short-hand notation for the continued fraction

\[\dfrac{u_1}{v_1+\dfrac{u_2}{v_2+\dfrac{u_3}{v_3 +\cdots }}},\]

where \((u_i,~i=1,2,\dots)\) and \((v_i,~i=1,2,\dots)\) are sequences of real numbers. As first reported in [Murphy1975] and detailed in [CrawfordSuchard2012], the Laplace transform takes the continued fraction form

\[\begin{split}f_{i,j}(s) = \left\{ \begin{array}{ll} \left(\prod_{k=j+1}^i\mu_k\right)\frac{B_j(s)}{B_{i+1}(s)+}\frac{B_i(s)\,a_{i+2}}{b_{i+2}(s)+}\frac{a_{i+3}}{b_{i+3}(s)+}\cdots, & \text{for } j \le i,\\[0.5em] \left(\prod_{k=i}^{j-1}\lambda_k\right) \frac{B_i(s)}{B_{j+1}(s)+}\frac{B_j(s)\,a_{j+1}}{b_{j+2}(s)+}\frac{a_{j+3}}{b_{j+3}(s)+}\cdots, & \text{for } j \ge i, \end{array}\right.\end{split}\]

where

\[\begin{split}B_0(s)&=1,\\ B_1(s)&=b_1(s),\quad\text{and}\\ B_k(s)&=b_k(s)B_{k-1}(s)+a_kB_{k-2}(s),\quad\text{for } k\ge 2,\end{split}\]

with \(a_1=1\) and \(a_j = -\lambda_{j-2}{\mu_{j-1}}\) for \(j\ge2\), and \(b_1(s)=s+\lambda_0\) and \(b_j(s)=s+\lambda_{j-1}+\mu_{j-1}\) for \(j\ge2\). An advantage of the continued fraction form is that it can be evaluated using the Lentz algorithm to a user-specified error tolerance. The Laplace transform can then be numerically inverted to obtain \(p_{i,j}(t)\).

See birdepy.probability(method=’ilt’) for the implementation of this method in BirDePy.

Diffusion approximation

Define \((Z_i^{(r)},~r \in\mathbb N)\) to be a parametric family of CTMCs which evolve according to transition rates \(\lambda_z^{(r)} = r\lambda_{z/r}\) and \(\mu_z^{(r)} = r\mu_{z/r}\) with \(Z_i^{(r)}(0)=i\). For \(s \in[0, t]\) define the diffusion-scaled process

\[\tilde Z_i(s) = \lim_{r\to\infty}\sqrt{r}\left(\hat Z_i^{(r)}(s) - \hat z_i(s)\right),\]

where \(\hat Z_i^{(r)}:=\frac{1}{r} Z_i^{(r)}(s)\) and \(\hat z_i(s)\) satisfies

\[\frac{ d }{d s}\hat z_i(s) = \lambda_{\hat z_i(s)} - \mu_{\hat z_i(s)},\qquad \hat z_i(0) = i.\]

Loosely speaking, Theorem~3.5 in [Kurtz1971] can be used to show that, under some regularity conditions, \((\hat Z_i^{(r)}(s),~s\in[0,t])\) converges weakly, as \(r\to\infty\), in the space of cadlag functions on \([0, t]\) to a zero-mean Gaussian diffusion \((\tilde Z_i(s),~s\in[0,t])\) with variance

\[\sigma_i^2(s) := \text{Var}(\tilde Z(s)) = M(s)^2\int_0^s \left(\lambda_{\hat z(\tau)} + \mu_{\hat z(\tau)}\right) M(\tau)^{-2} d \tau\]

for each \(s\in[0, t]\), where \(M(s):= \exp\left(\int_0^s B(\tau)d \tau\right)\) with \(B(\tau) = H\left(\hat z(\tau)\right)\) defined in terms of

\[H(z) = \frac{d}{d z}\Big(\lambda_z - \mu_z\Big).\]

In particular this implies that

\[p_{i,j}(t) \approx \mathbb P(\tilde Z_i(t)=j).\]

Hence \(p_{i,j}(t)\) can be approximated by the probability density of a normally distributed random variable with mean \(\hat z_i(t)\) and variance \(\sigma_i^2(t)\) as given above.

This approach is very closely related to the functional central limit theorem. The diffusion approach to approximate transition probabilities is used in [RossEtAl2009], and discussed at length in [Allen2008].

See birdepy.probability(method=’da’) for the implementation of this method in BirDePy.

Ornstein–Uhlenbeck approximation

The diffusion approximation discussed above can be substantially simplified if it is assumed that \(Z\) is fluctuating about a steady state point \(z_{\text{eq}}\) of \(\hat z\). Such a point occurs when the birth rate is equal to the death rate, i.e., \(z_{\text{eq}}\) satisfies \(\lambda_{z_{\text{eq}}} = \mu_{z_{\text{eq}}}\). In this case, as argued in [RossEtAl2006], the limiting Gaussian diffusion is an Ornstein–Uhlenbeck process. Therefore, \(p_{i,j}(t)\) can be approximated by the density of a normally distributed random variable with mean \(\tilde z= z_{\text{eq}} + e^{H(z_{\text{eq}})t}(i - z_{\text{eq}})\) and variance \(\tilde\sigma^2 = \frac{\lambda_{z_{\text{eq}}} + \mu_{z_{\text{eq}}}}{2H(z_{\text{eq}})}\left(e^{2H(z_{\text{eq}})t}-1\right)\). This method assumes that \(z_{\text{eq}}\) is asymptotically stable (i.e., \(H(z_{\text{eq}})<0\)), and as such favors values of \(z_{\text{eq}}\) that minimize \(H(z_{\text{eq}})\).

See birdepy.probability(method=’oua’) for the implementation of this method in BirDePy.

Galton–Watson approximation

A PSDBDP can be approximated by a piecewise-linear birth-and-death process by decomposing time into sub-intervals of finite length, and letting the per-individual birth and death rates be constant over each time interval; more precisely, if \(z\) is the population size at the beginning of a time interval, then we let \(\lambda := \frac{1}{z}\lambda_{z}\) and \(\mu := \frac{1}{z}\mu_{z}\) over that interval.

Suppose we want to approximate \(p_{i,j}(t)\) for some \(i,j\in\mathcal{S}\) and \(t> 0\). Consider a linear birth-and-death process \(\mathring Z^{(b)}\) with per-individual birth rate \(\lambda = \frac{1}{b}\lambda_{b}\) and per-individual death rate \(\mu = \frac{1}{b}\mu_{b}\), where possible choices of \(b\) include \(b=i\), \(b=j\), \(b=\max(i,j)\), \(b=\min(i,j)\), and \(b=(i+j)/2\). The transition probability \(p_{i,j}(t)\) can then be approximated by

\[p_{i,j}(t) \approx \mathring p_{i,j}(t) :=\mathbb P(\mathring Z^{(b)}(t)=j\mid \mathring Z^{(b)}(0)=i).\]

What constitutes a good choice of \(b\) will depend highly on \(i\), \(j\) and the model under study. This approximation is particularly convenient since it is well known (see [Guttorp1991]) that \(\mathring p_{i,0}(t) = \beta_1(t)^{i}\), and for \(j\ge1\),

\[\mathring p_{i,j}(t) = \sum_{k=\max(0, i-j)}^{i-1} \binom{i}{k} \binom{j -1}{i-k-1} \beta_{1}(t)^k\Big[\big\{1-\beta_{1}(t)\big\}\big\{1-\beta_{2}(t)\big\}\Big]^{j-k} \beta_{2}(t)^{j-i+k},\]

where \(\beta_1(t)\) and \(\beta_2(t)\) are given by

\[\begin{split}\beta_1(t) = \left\{\begin{array}{cc} \mu\{\exp\big((\lambda-\mu)t\big)-1\}/\{\lambda \exp\big((\lambda-\mu)t\big)-\mu\} & \text{if } \lambda\ne \mu, \\ \lambda t/(1+\lambda t) & \text{if } \lambda = \mu. \end{array}\right.\end{split}\]

and

\[\begin{split}\beta_2(t) = \left\{ \begin{array}{cc} \lambda\beta_1(t)/\mu& \text{if } \lambda\ne \mu, \\ \beta_1(\tau) & \text{if } \lambda = \mu. \end{array}\right.\end{split}\]

See birdepy.probability(method=’gwa’) for the implementation of this method in BirDePy.

When the binomial coefficients above cause numerical problems or take a long time to compute, an alternative expression developed by [DavisonEtAl2021] using a saddlepoint approximation [Butler2007] may be used. See birdepy.probability(method=’gwasa’) for the implementation of this method in BirDePy.

Simulation

Using birdepy.simulate.discrete() or birdepy.gpu_functions.discrete() it is possible to obtain \(k\) realizations of \(Z(t)\) conditional on \(Z(0)=i\). The proportion of these realisations which equal \(j\) can be used to approximate the transition probability \(p_{i,j}(t)\). That is,

\[p_{i,j}(t) \approx \frac{1}{k}\sum_{n=1}^k 1_{\{\hat Z_n(t)=j\}}\]

where \(\hat Z_l(t)\) are simulated realisations of \(Z(t)\) and \(I_{\{A\}}\) evaluates to 1 when \(A\) is true and otherwise evaluates to 0.

See birdepy.probability(method=’sim’) and birdepy.gpu_functions.probability() for the implementations of this method in BirDePy.

Summary

The table below summarizes methods described on this page, and gives the label used to call them in birdepy.estimate().

Methods for computing transition probabilities.

Method

Label

Brief description

Matrix exponential

'expm'

Uses \(P(t)=\exp(Qt)\).

Uniformization

'uniform'

Evaluates probability using an approximating discrete-time process.

Erlangization

'Erlang'

Evaluates probability at an Erlang-distributed time.

Inverse Laplace transform

'ilt'

Numerically inverts Laplace transform.

Diffusion approx.

'da'

Approx. true model by a general diffusion-scaled model.

Ornstein–Uhlenbeck approx.

'oua'

Approx.true model by a simple diffusion process.

Galton–Watson approximation

'gwa'

Approx.~true model with a linear model.

Saddlepoint approximation

'gwasa'

As above combined with a saddlepoint approx

Simulation

'sim'

Average of Monte Carlo samples.

These methods are also described in detail in [HautphennePatch2021a], which can be downloaded here. If you use BirDePy for published research, then please cite this paper.

[Grassman1977]

Grassman, W., 1977. Transient solutions in Markovian queues. European Journal of Operations Research, 1(6):396–402.

[vanDijkEtAl2018]

van Dijk, N. M., van Brummelen, S. P. J., & Boucherie, R. J. (2018). Uniformization: Basics, extensions and applications. Performance evaluation, 118, 8-32.

[Jensen1953]

Jensen, A. (1953). Markoff chains as an aid in the study of Markoff processes. Scandinavian Actuarial Journal, 1953(sup1), 87-91.

[AsmussenEtAl2002]

Asmussen, S., Avram, F. and Usabel, M., 2002. Erlangian approximations for finite-horizon ruin probabilities, ASTIN Bulletin: The Journal of the IAA, 32(2)267–281.

[MandjesTaylor2016]

Mandjes, M. and Taylor, P., 2016. The running maximum of a level-dependent quasi-birth-death process, Probability in the Engineering and Informational Sciences, 30(2):212–223.

[StanfordEtAl2011]

Stanford, D.A., Yu, K. and Ren, J., 2011. Erlangian approximation to finite time ruin probabilities in perturbed risk models, Scandinavian Actuarial Journal, 2011(1):38–58.

[Murphy1975]

Murhy, J. A., & O’donohoe, M. R. (1975). Some properties of continued fractions with applications in Markov processes. IMA Journal of Applied Mathematics, 16(1), 57-71.

[CrawfordSuchard2012]

Crawford, F.W. and Suchard, M.A. Transition probabilities for general birth-and-death processes with applications in ecology, genetics, and evolution. Journal of Mathematical Biology, 65(3):553–580.

[Kurtz1971]

Kurtz, T.J., 1971. Limit theorems for sequences of jump Markov processes, Journal of Applied Probability, 8(2):344–356.

[RossEtAl2009]

Ross, J. V., Pagendam, D. E., & Pollett, P. K. (2009). On parameter estimation in population models II: multi-dimensional processes and transient dynamics. Theoretical Population Biology, 75(2-3), 123-132.

[Allen2008]

Allen, L. J. (2008). An introduction to stochastic epidemic models. In Mathematical Epidemiology (pp. 81-130). Springer, Berlin, Heidelberg.

[RossEtAl2006]

Ross, J. V., Taimre, T., & Pollett, P. K. (2006). On parameter estimation in population models. Theoretical Population Biology, 70(4), 498-510.

[Guttorp1991]

Guttorp, P., 1991. Statistical inference for branching processes, Vol. 122. Wiley-Interscience.

[DavisonEtAl2021]

Davison, A.C., Hautphenne, S. and Kraus, A., 2021. Parameter estimation for discretely observed linear birth‐and‐death processes. Biometrics, 77(1), pp.186-196.

[Butler2007]

Butler, R.W., 2007. Saddlepoint approximations with applications (Vol. 22). Cambridge University Press.