Simulation Algorithms

A few exceptions aside, explicit expressions for quantities of interest such as the probability distribution, or the expected value, of the population size of a general birth-and-death process at a given time, do not exist. By providing many examples of possible outcomes, Monte Carlo simulation can allow interesting questions related to these processes to be examined. Simulation can also be used to compare the performance of other tools, such as those used for parameter estimation, by providing synthetic data for the tools to be tested on. Last, but not least, some parameter estimation techniques directly rely on simulated sample paths to generate estimates.

Simulation of a birth-and-death process can be continuous or discrete. The goal of continuous simulation is to obtain a sequence of times at which the process transitions, together with the corresponding sequence of states the process transitions into. In a continuous simulation, the output provides sufficient information to recover the state of the process at any time within the simulation horizon. This is desirable since the output contains enough information for a user to know whether certain events have occurred, for example whether the process has crossed a certain boundary by a certain time. In contrast, the goal of discrete simulation is to obtain samples of a birth-and-death process at pre-specified observation times. In this case, the state of the process at those observation times is the output. An advantage of discrete simulation is that the same observation times can be specified for each sample path so that they can be combined together to find, for example, the expected value or variance of a process as a function of time.

Exact

Given an initial population size \(z_0\), to perform continuous simulation of a birth-and-death process over the interval \([0,t]\), initialize \(j:=0\), \(t_0:=0\), and then repeat these steps:

Generate an outcome \(\Delta\) of an exponentially distributed random variable with mean \((\lambda_{z_j}+\mu_{z_j})^{-1}\).
If \(t_j+\Delta\le t\), set \(t_{j+1}:=t_j+\Delta\); otherwise stop.
With probability \(\lambda_{z_j}/(\lambda_{z_j}+\mu_{z_j})\) set \(z_{j+1}:=z_j+1\); otherwise set \(z_{j+1}:=z_j-1\).
Set \(j:=j+1\).

See birdepy.simulate.continuous() for an implementation of this algorithm.

Discrete simulation proceeds in a similar fashion. Given an initial population size \(z_0\), to simulate a birth-and-death process at the time points \(t_0<t_1<\dots <t_n\), initialize \(j:=1\), \(s:=t_0+\Delta\) where \(\Delta\) is an outcome of an exponentially distributed random variable with mean \((\lambda_{z}+\mu_{z})^{-1}\), and \(z:=z_0\), and then repeat these steps:

While \(s \le t_j\):
1. With probability \(\lambda_{z}/(\lambda_{z}+\mu_{z})\) set \(z:=z+1\); otherwise set \(z:=z-1\).
2. Set \(s := s +\Delta\) where \(\Delta\) is an outcome of an exponentially distributed random variable with mean \((\lambda_{z}+\mu_{z})^{-1}\).
Set \(z_{j}:=z\).
If \(j<n\) set \(j:=j+1\); otherwise stop.

Observe that \(t_j\) has a different interpretation depending on whether continuous or discrete simulation is being performed. It is thought within the probability community that this exact approach to simulation of CTMCs was first considered by Joseph L. Doob and his collaborators around 1945. To our knowledge, the first implementation on a computer is mentioned in [Kendall1950], and the algorithm was then popularized more widely in [Gillespie1977].

See birdepy.simulate.discrete() (with method set to ‘exact’) for an implementation of this algorithm.

\(\tau\) leaping

A well-known drawback of exact simulation algorithms is that they may take considerable computational time to produce sample paths. An alternative approximate simulation method for population processes known as tau-leaping, introduced by [Gillespie2001], allows the user to trade accuracy for speed. This algorithm partitions time into intervals of predetermined constant length \(\tau\). Conditional on \(Z(0)=z\) at the start of an interval, the state of \(Z(\tau)\) is approximated by the difference of two Poisson distributed random variables \(\mathfrak L\) and \(\mathfrak M\) which are respectively intended to approximate the total number of births and deaths that occur within the interval. Given an initial population size \(z_0\), the basic Euler form of the algorithm can be used to perform discrete simulation by initializing \(j:=1\), \(s:=t_0+\tau\), and \(z:=z_0\), and then repeating these steps:

While \(s\le t_j\):
1. Set \(z := z +\mathfrak L -\mathfrak M\) where \(\mathfrak L\) and \(\mathfrak M\) are are outcomes of Poisson distributed random variables with respective means \(\lambda_{z}\tau\) and \(\mu_{z}\tau\).
2. Set \(s:=s+\tau\).
Set \(z_{j}:=z\).
If \(j<n\) set \(j:=j+1\); otherwise stop.

See birdepy.simulate.discrete() (with method set to ‘ea’) for an implementation of this algorithm.

Several variations of this algorithm have been developed which focus primarily on step-size selection and on ensuring that the population size does not go negative during the simulation.

Midpoint approximation

In [AndersonGangulyKurtz2011] a midpoint variant of the basic algorithm is developed and elegantly analyzed. The algorithm is the same as the Euler version described above with the exception that the two Poisson distributed random variables \(\mathfrak L\) and \(\mathfrak M\) have respective means \(\lambda_{z+\rho(z)}\tau\) and \(\mu_{z+\rho(z)}\tau\), where \(\rho(z)=\frac{1}{2}\tau\big(\lambda_z-\mu_z\big)\) approximates \(\mathbb E [Z(\frac{1}{2}\tau)-Z(0)\,|\,Z(0)=z]\). The basic idea here is that determining the number of births and deaths using an approximation to the population size at the midpoint of an interval may increase accuracy relative to using the population size at the beginning of the interval. This algorithm can provide substantial improvements in accuracy relative to the Euler version at only a minor increase in computational cost. Both of the above algorithms can be thought of as generating piecewise approximations to a general birth-and-death process, where zero-order approximations of the birth and death rates are utilized within each subinterval. See birdepy.simulate.discrete() (with method set to ‘ma’) for an implementation of this algorithm.

Galton–Watson approximation

Recently a new approach to perform discrete simulation of a general birth-and-death processes utilizing linear (first-order) approximations of the birth and death rates within each subinterval was proposed in [HautphennePatch2021b]. This algorithm approximates any birth-and-death process by piecewise-linear birth-and-death processes. It also uses the fact that a discretely-observed linear birth-and-death process corresponds to a linear fractional Galton–Watson process (see [Harris1963]). This means that for a linear birth-and-death process \(Z\) with birth rates \(\lambda_z=\lambda z\) and death rates \(\mu_z=\mu z\), the probability that a family generated by a single individual at time \(0\) ‘becomes extinct’ before time \(\tau\) is given by

\[\begin{split}\beta_1(\tau) = \left\{\begin{array}{cc} \mu\{\exp\big((\lambda-\mu)\tau\big)-1\}/\{\lambda \exp\big((\lambda-\mu)\tau\big)-\mu\} & \text{if } \lambda\ne \mu, \\ \lambda \tau/(1+\lambda \tau) & \text{if } \lambda = \mu. \end{array}\right.\end{split}\]

Each family which survives results in \(H\) individuals being present at time \(\tau\), where \(\mathbb P(H=h) = (1-\beta_2(\tau))\beta_2(\tau)^{h-1}\) with

\[\begin{split}\beta_2(\tau) = \left\{ \begin{array}{cc} \lambda\beta_1(\tau)/\mu& \text{if } \lambda\ne \mu, \\ \beta_1(\tau) & \text{if } \lambda = \mu. \end{array}\right.\end{split}\]

Therefore, given \(\mathring Z(0)=z\), a realization of \(\mathring Z(\tau)\) can be obtained by generating an outcome of the random variable \(\mathfrak B\) which is binomially distributed with \(z\) trials and success probability \(1-\beta_1(\tau)\) (defined above), and then using the fact that \(\mathring Z(\tau)\) follows a negative binomial distribution with parameters \(\mathfrak B\) and \(1-\beta_2(\tau)\), that is,

\[\mathbb P(\mathring Z(t_j) = k\,|\,\mathfrak B) = \binom{k+\mathfrak B-1}{k}\beta_2(\tau)^{\mathfrak B}(1-\beta_2(\tau))^k.\]

So, if \(Z\) is a general birth-and-death process, conditional on \(Z(0)\), \(Z(\tau)\) can be approximated by \(\mathring Z(\tau)\) by setting the per-individual rates of the approximating linear process \(\mathring Z\) equal to those of \(Z\) at time \(0\), that is, \(\lambda = \lambda_{Z(0)}/Z(0)\) and \(\mu=\mu_{Z(0)}/Z(0)\). Therefore, to obtain an approximate simulation of \((Z(t_j),~j=0,1,\dots,n)\), initialize \(j:=1\), \(s:=t_0+\tau\), and \(z:=z_0\), and repeat these steps:

While \(s\le t_j\):
1. Set \(\lambda := \lambda_{z}/z\) and \(\mu := \mu_{z}/z\).
2. Generate a binomially distributed random variable \(\mathfrak B\) with success probability \(1-\beta_1(\tau)\) with \(\beta_1(\tau)\) as given above and number of trials \(z\).
3. Generate a negatively binomially distributed random variable \(\mathfrak C\) with success probability \(1-\beta_2(\tau)\) with \(\beta_2(\tau)\) as given above and number of trials \(\mathfrak B\).
4. Set \(z := z +\mathfrak C\) and \(s:=s+\tau\).
Set \(z_j:=z\).
If \(j<n\) set \(j:=j+1\); otherwise stop.

This algorithm can be implemented efficiently and is highly accurate. It also explicitly avoids the possibility of the population size becoming negative during the simulation. See birdepy.simulate.discrete() (with method set to ‘gwa’) for an implementation of this algorithm.

Summary

The table below summarizes the discrete simulation methods described on this page, and gives the label used to call them in birdepy.simulate.discrete().

Methods for simulating sample paths of general birth-and-death processes.
Method	Label	Brief description
Exact	`'exact'`	Utilizes all jumps of the process.
Euler approximation	`'ea'`	Population changes between \(\tau\)-sized intervals governed by Poisson random variables with parameters depending on population sizes at beginning of intervals.
Midpoint approximation	`'ma'`	Population changes between \(\tau\)-sized intervals governed by Poisson random variables with parameters depending on estimate of population sizes at midpoints of intervals.
Galton–Watson approximation	`'gwa'`	Population changes between \(\tau\)-sized intervals governed by linear birth-and-death processes with parameters depending on population sizes at beginning of intervals.

These methods are also described in detail in [HautphennePatch2021a] and [HautphennePatch2021b], which can be downloaded here and here. If you use BirDePy for published research, then please cite these papers.

Reproducibility

BirDePy uses numpy.random to generate random numbers. Functions which utilize random numbers have a seed parameter that can be used for reproducibility. The seed parameters accept either an integer number or a Generator as input. Sometimes a script uses multiple functions and the user only wishes to specify a single seed. The recommended way to handle this situation is to create a generator rng with the preferred seed:

import numpy as np
seed = 2021
rng = np.random.default_rng(seed)

The generator can then be passed to all of the functions which have a seed parameter. For example:

import birdepy as bd
for _ in range(10):
    print(bd.simulate.discrete(1, 'Poisson', 0, times=[10], seed=rng)[0])

[Kendall1950]

Kendall, D.G., 1950. An artificial realization of a simple “birth-and-death” process. Journal of the Royal Statistical Society. Series B (Methodological), 12(1), pp.116-119.

[Gillespie1977]

Gillespie, D.T., 1977. Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81(25), pp.2340-2361.

[Gillespie2001]

Gillespie, D.T., 2001. Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of Chemical Physics, 115(4), pp.1716-1733.

[HautphennePatch2021b] (1,2)

Hautphenne, S. and Patch, B., 2021. Simulating population-size-dependent birth-and-death processes using CUDA and piecewise approximations. In Proceedings of the International Congress on Modelling and Simulation 2021. (to appear)

[AndersonGangulyKurtz2011]

Anderson, D.F., Ganguly, A. and Kurtz, T.G., 2011. Error analysis of tau-leap simulation methods. The Annals of Applied Probability, 21(6), pp.2226-2262.

[Harris1963]

Harris, T. E. (1963). The theory of branching processes (Vol. 6). Berlin: Springer.

[HautphennePatch2021a]

Hautphenne, S., and Patch, B., (2021). Birth-and-death Processes in Python: The BirDePy Package.