chouca
user guidechouca
is engine in R for stochastic
cellular automata, which are models that describes the temporal
dynamics of a 2D grid made of cells. Each cell can be in one of a finite
number of states, and switch states over time with probabilities that
depend on their neighborhood.
A picture being worth a thousand words, the output of a stochastic cellular automaton looks like the following image in practice:
(NOTE: Please see animated version there).
Stochastic cellular automata are widely used to describe the dynamics
of landscapes, for example, forests or corals growing over space. Yet,
implementations are often ad-hoc, and done by the authors themselves,
which is prone to bugs and often leads to slow implementations.
chouca
tries to avoid this by providing a common base onto
which such models can be built. The goal is for the user to declare the
probabilities of state transitions of the model, then
chouca
handles the rest.
chouca
supports a wide selection of models, while
preserving decent performance. This is done by writing most of the code
in C++, and being able to emit and compile C++ code on the fly for a
specific model at run time. This enables compiler optimizations that
would be impossible with pre-compiled code, and often improves
simulation speed by one or two orders of magnitude. chouca
can also use a memoisation-based strategy to run simulations, in which
probabilities are not recomputed for each cell at each iteration,
providing another boost in performance. For supported models,
chouca
can be 3 to 4 orders of magnitude faster than common
implementations.
While chouca
is built mainly for stochastic cellular
automata, it can also handle non-stochastic cellular automata, such as
Conway’s Game of Life or the rock-paper-scissor
automaton. This is because a deterministic cellular automaton is
simply a stochastic celular automata in which transition probabilities
are always ones or zero.
chouca
To give you an idea on how chouca
works, let’s implement
a small forest gap model (Kubo et al. 1996). In such model, cells can be
either ‘empty’, or filled with a tree, what we call here the ‘forest’
state. Trees produce seeds that can fly over the whole landscape and
land haphazardly in any cell. Thus, the probability of an empty cell to
switch to the forested state is proportional to p+, the proportion of
cells with trees in the landscape. If we represent the empty state as
0 and the forest state as +, this probability of transition is
thus:
P(0 → +) = αp+
where α is the rate at which seeds are produced by each tree.
Trees naturally die over time, and because wind bursts are stronger in empty areas, trees have a higher probability of dying when they have empty neighboring cells:
P(+ → 0) = δ0 + δqempty
Here, qempty denotes the proportion of cells around the focal cell that are empty, i.e. in the “empty” state. qempty is equal to one when a cell has all of its neighbors in the empty state, and zero when none of the neighbors is in this state.
This model has three parameters, α, δ0 and δ, and can be defined in
chouca
using the following syntax:
kubo_model <- camodel(
transition(from = "0", to = "+", ~ alpha * p["+"]), # reproduction of trees
transition(from = "+", to = "0", ~ delta0 + delta * q["0"]), # death
parms = list(alpha = 0.6, delta0 = 0.05, delta = 0.4),
wrap = TRUE,
neighbors = 8
)
We first declare two transitions, including the states between which
they occur, and an expression describing how to compute the probability.
In that expression, p
refers to the proportions of cells of
the landscape in each state, and q
refers to the the
proportion of neighbors in each state. q["+"]
for example
represents the proportion of neighbors of a cell in the +
state (forest), which is a number between 0 and 1.
We then set the argument parms
to a list containing the
numerical values of the model parameters. We use the argument
wrap = TRUE
to state that the model will be run on a space
in which the edges “wrap around”, and the first column/top row of the
landscape is neighboring the rightmost/bottom column (a toric space). We
use the argument neighbors = 8
to state that we want to
consider all 8 neighbors of a cell, i.e. including those that are
diagonal. This type of neighborhood is referred to as the Moore
neighborhood, the other supported type being a 4-way (or Von-Neumann)
neighborhood.
A summary of the model can be printed on the R command line, that recalls the model definition and options.
## Stochastic Cellular Automaton
##
## States: 0 +
##
## Transition: 0 -> +
## ~ alpha * p["+"]
## Transition: + -> 0
## ~ delta0 + delta * q["0"]
##
## Neighborhood kernel: x (8 counted)
## Wrap: TRUE
## Max error: 1.110223e-16 (OK)
## Max rel error: 3.100901e-16 (OK)
You can also plot the model structure (the states and the
transitions) as a graph, using the generic function
plot()
:
The next step is to define the initial landscape from which we want
to start the simulation. We can do so by using the function
generate_initmat
:
Here we initalize the covers to half forest and half empty space
using the argument pvec
, and use a landscape of 128 rows by
256 columns filled at random. We can now run the simulation:
## iter = 0 ( 0 %) 0: 0.5 +: 0.5
## iter = 25 ( 25 %) 0:0.232 +:0.768 [195.31 iter/s]
## iter = 50 ( 50 %) 0:0.232 +:0.768 [200.00 iter/s]
## iter = 75 ( 75 %) 0:0.228 +:0.772 [198.41 iter/s]
## iter = 100 (100 %) 0:0.227 +:0.773 [200.00 iter/s]
We can then display the results, either in the R console using
summary()
, as a plot using the generic function
plot()
, which displays the global covers of each state
through time, or image()
, which displays the landscape:
## Stochastic Cellular Automaton simulation
##
## Times: 0 to 100
## Landscape size: 128x256
## Global covers:
## t 0 +
## [96,] 95 0.2308655 0.7691345
## [97,] 96 0.2268982 0.7731018
## [98,] 97 0.2257690 0.7742310
## [99,] 98 0.2261963 0.7738037
## [100,] 99 0.2289734 0.7710266
## [101,] 100 0.2270508 0.7729492
## Landscapes saved: 2
##
## The following methods are available:
## image levels plot
##
## Extract simulation results using one of:
## out[["output"]][["covers"]]
## out[["output"]][["snapshots"]]
chouca
works only with models that use either a 4x4 or
8x8 neighborhood (von Neumann or Moore neighborhood, respectively), and
in which the effects are isotropic, i.e. all neighbors have a similar
effect regardless of whether they are above, or below a given cell.
Cellular automata that involve a preferencial direction cannot be
implemented in chouca
, for example with water running
downslope in a landscape (Mayor et al. 2013).
The probabilities of transition can have any form, but respecting those two principles will work better:
To be accurately computed, transition probabilities must be linear polynomials of the products of the local or global proportion of cells in a given state. In other words, a transition probability defined between states i and j as P(i → j) = 0.1 + pi + pjqi works perfectly, but something involving non-linear functions such as P(0 → +) = 0.1 + sin(pi) may not be correctly handled (see below for the consequences).
There is some tolerance for discontinuities in transition rules. Rules such as “if one neighbor is in state i, then the probability of transition is increased” are OK. However, such rule involving multiple types of neighbors are not, e.g. “if one neighbor is in state i and another one in state j, then…”. This type of discontinuity will lead to approximations in transition probabilities.
Overall, the assumed functional form of transitions probabilities must be the following:
P(a → b) = β0 + ∑s ∈ Sfs(qs) + ζ(p, p) + ζ(p, q) + ζ(q, q)
where S represents the set of states of the cellular automaton, fs can be any function of qs. ζ(x, y) is, for two vectors of length n x = (x1, x2, ..., xn) and y = (y1, y2, ..., yn), the sum of K terms ζ(x, y) = β1x1a0y2b0 + β2x2a1y1b1 + β3x3a2y1b1 + ...βKxnaKynbK
All the (βi)i ∈ [1 : K],
(a)i ∈ [1 : K],
(b)i ∈ [1 : K]
are constant model parameters. Many models can be represented using this
functional form - however, this may not always be the case, especially
as soon as non-linear functions are involved (exp, sin, etc.), or
functions with discontinuities. This functional form looks daunting, but
chouca
will warn you if the transition probabilities cannot
be accurately computed, so the best way to go is probably to write your
model and see if it is supported.
For all models, chouca
reports an estimation of the
error on the probabilities of transition, and whether probabilities can
reach problematic values above one or below zero. For example, the above
forest model can be perfectly represented by chouca
, so the
reported error is virtually zero:
## Stochastic Cellular Automaton
##
## States: 0 +
##
## Transition: 0 -> +
## ~ alpha * p["+"]
## Transition: + -> 0
## ~ delta0 + delta * q["0"]
##
## Neighborhood kernel: x (8 counted)
## Wrap: TRUE
## Max error: 1.110223e-16 (OK)
## Max rel error: 3.100901e-16 (OK)
The following model, which includes an exponential function of the neighborhood, has a non-zero error, because this exponential function cannot be perfectly approximated by a polynomial:
mod <- camodel(
transition(from = "dead", to = "live", ~ 0.1 + exp(0.2 * q["dead"] + p["dead"])),
transition(from = "live", to = "dead", ~ 0.1),
wrap = TRUE,
neighbors = 8
)
print(mod)
## Stochastic Cellular Automaton
##
## States: dead live
##
## Transition: dead -> live
## ~ 0.1 + exp(0.2 * q["dead"] + p["dead"])
## Transition: live -> dead
## ~ 0.1
##
## Neighborhood kernel: x (8 counted)
## Wrap: TRUE
## Max error: 6.575415e-05 (OK)
## Max rel error: 3.889084e-05 (OK)
The reported maximum absolute error on the probability of transitions
is 6.6e-05, which is very small. The relative maximum error, in other
words the error in percentage of the exact probability, is also very
small (3.9e-05). Because both these probabilities are very small, you
are probably safe to run this model in chouca
and obtain
correct results, despite the fact that it does not follow the ideal
form.
The following model gives more trouble:
mod <- camodel(transition(from = "dead", to = "live",
~ 0.1 + sin(pi * q["dead"] * p["dead"] )),
transition(from = "live", to = "dead",
~ 0.1),
wrap = TRUE,
neighbors = 8)
## Warning: Residual error in computed probabilities
## max error: 0.224
## max rel error: 0.866
## Problematic probability expression:
## Transition from dead to live
## ~ 0.1 + sin(pi * q["dead"] * p["dead"])
## Stochastic Cellular Automaton
##
## States: dead live
##
## Transition: dead -> live
## ~ 0.1 + sin(pi * q["dead"] * p["dead"])
## Transition: live -> dead
## ~ 0.1
##
## Neighborhood kernel: x (8 counted)
## Wrap: TRUE
## Max error: 0.2243486 (WARNING)
## Max rel error: 0.8656398 (WARNING)
Here, both the absolute and relative errors on the computed
probabilities are non-negligible (0.22 and 0.87). A warning is produced,
and the problematic transition is pointed out. You should probably
revise this model if you want to make it work with chouca
.
A possible approach is to approximate the function by a taylor expansion
of sin, for
example using the expansion in zero: $sin(x)
\approx x - \frac{x^3}{3!} + \frac{x^5}{5!}$.
sin_approx <- function(x) {
x - x^3/factorial(3) + x^5/factorial(5)
}
mod <- camodel(transition(from = "dead", to = "live",
~ 0.1 + sin_approx(pi * q["dead"] * p["dead"]) ),
transition(from = "live", to = "dead",
~ 0.1),
wrap = TRUE,
neighbors = 8)
print(mod) # negligible error this time, as we use an approximation of sin()
## Stochastic Cellular Automaton
##
## States: dead live
##
## Transition: dead -> live
## ~ 0.1 + sin_approx(pi * q["dead"] * p["dead"])
## Transition: live -> dead
## ~ 0.1
##
## Neighborhood kernel: x (8 counted)
## Wrap: TRUE
## Max error: 3.10853e-09 (OK)
## Max rel error: 3.224527e-09 (OK)
One of the main appeal of chouca
is its performance:
which is affected by various factors. The main one is the model
complexity. Using a model with transition probabilities that are
complicated products of global or neighbor covers, or that do not fit
well the expected functional form will lead to poor performance. When
possible, try to make your model follow this form. That being said,
several approaches exist to improve simulation speed, regardless of the
model: (i) on-the-fly compilation, (ii) memoisation of transition
probabilities and (iii) multithreading.
On-the-fly compilation can be enabled using the “compiled” engine to
run the cellular automaton. This is done by setting the
engine
control argument to the string “compiled”:
control_args <- list(engine = "compiled",
precompute_probas = FALSE) # see below for meaning of this parameter
out <- run_camodel(kubo_model,
initmat = init_mat,
times = seq(0, 512),
control = control_args)
## iter = 0 ( 0 %) 0: 0.5 +: 0.5
## iter = 128 ( 25 %) 0:0.231 +:0.769 [1523.81 iter/s]
## iter = 256 ( 50 %) 0:0.23 +:0.77 [1560.98 iter/s]
## iter = 384 ( 75 %) 0:0.23 +:0.77 [1580.25 iter/s]
## iter = 512 (100 %) 0:0.223 +:0.777 [1580.25 iter/s]
On first run, this model will take a few extra seconds to run as the C++ code corresponding to the model needs to be compiled. However, the simulation itself will be much faster. Compilation only needs to be done when the structure of the model or the size of the landscape changes. A simple change in parameter value will not trigger a new compilation, except when the probabilities of transition can be simplified, for example when a parameter is set to zero.
Another approach to better performance lies in the precomputation of
probabilities. The naive approach to run a CA is to consider each cell
in the landscape one by one, and compute its probability of transition
to every state it could switch to. This may entail a lot of repeated
computations, as many cells will have the same neighborhood
configurations. To avoid this, we can compute the probabilities of
transition for every neighborhood configuration, and just look the
number up to know whether cells switch or not. This approach is enabled
by using the argument precompute_probas
in the control
list:
control_args <- list(engine = "compiled",
precompute_probas = TRUE,
console_output_every = 128) # report less often on console
out <- run_camodel(kubo_model,
initmat = init_mat,
times = seq(0, 512),
control = control_args)
## iter = 0 ( 0 %) 0: 0.5 +: 0.5
## iter = 128 ( 25 %) 0:0.23 +:0.77 [4266.67 iter/s]
## iter = 256 ( 50 %) 0:0.224 +:0.776 [4413.79 iter/s]
## iter = 384 ( 75 %) 0:0.222 +:0.778 [4571.43 iter/s]
## iter = 512 (100 %) 0:0.227 +:0.773 [4571.43 iter/s]
This can dramatically improve the performance for models that have a small number of states and simple transition rules. However, it may be counter-productive for models with a larger number of states, as the number of neighborhood configurations grows very quickly with the number of different model states (S4 or S8 for S states, depending on the type of neighborhood). By default, a simple heuristic is used to make a decent choice.
The last approach to improve performance is to use multithreading. In
chouca
, the approach used considers the updating of cells
in parallel, instead of doing it one-by-one. However, there is some
synchronisation work required between threads to make sure the different
cores are not writing to shared data structures at the same time. This
often results in a significant overhead, and thus only small performance
improvements when using memoisation.
control_args <- list(engine = "compiled",
precompute_probas = TRUE,
cores = 2,
console_output_every = 128)
out <- run_camodel(kubo_model,
initmat = init_mat,
times = seq(0, 512),
control = control_args)
## iter = 0 ( 0 %) 0: 0.5 +: 0.5
## iter = 128 ( 25 %) 0:0.23 +:0.77 [4923.08 iter/s]
## iter = 256 ( 50 %) 0:0.232 +:0.768 [5565.22 iter/s]
## iter = 384 ( 75 %) 0:0.225 +:0.775 [5565.22 iter/s]
## iter = 512 (100 %) 0:0.229 +:0.771 [5333.33 iter/s]
Future improvements of the package may improve this situation. In the
meantime, if you happen to be running several simulations at the same
time (which is often the case), you are probably better off
parallelizing at a higher level. Note that if you are not using
memoisation (precompute_probas = FALSE
), then using
multiple cores can be an interesting strategy.
Finally, because the model code is compiled on the fly, you may be
able to use optimizations that are specific to your combination of
platform and compiler. The best way to do so is to edit your platform
~/.R/Makevars
file to add the flags you want, such as
CXXFLAGS=-O3 -march=native -mtune=native
. However, do not
expect too much as most of the time this does not bring much more
performance to the table.
Various options are available to access simulation outputs. When
nothing is specified, the percentage of cells in each state will be
saved at each time step specified in the times
vector
(argument to run_camodel()
), and the landscape at the first
and last time step will be saved. These can be accessed by extracting
elements from the object returned by run_camodel()
.
Global covers are saved as a matrix whose first column is the time, and the others are the percentages of cells in each state:
init <- generate_initmat(kubo_model, c(0.5, 0.5),
nr = 128)
run <- run_camodel(kubo_model, init, times = seq(0, 128, by = 1))
## iter = 0 ( 0 %) 0:0.508 +:0.492
## iter = 32 ( 25 %) 0:0.228 +:0.772 [390.24 iter/s]
## iter = 64 ( 50 %) 0:0.233 +:0.767 [400.00 iter/s]
## iter = 96 ( 75 %) 0:0.223 +:0.777 [400.00 iter/s]
## iter = 128 (100 %) 0:0.235 +:0.765 [400.00 iter/s]
# Extract covers and display the last lines of the table
covers <- run[["output"]][["covers"]]
tail(covers)
## t 0 +
## [124,] 123 0.2249756 0.7750244
## [125,] 124 0.2313232 0.7686768
## [126,] 125 0.2316895 0.7683105
## [127,] 126 0.2304077 0.7695923
## [128,] 127 0.2308960 0.7691040
## [129,] 128 0.2348633 0.7651367
The landscape are also available, using a similar syntax. A nifty
function to display landscapes is available in the
spatialwarnings
package, display_matrix()
:
## Warning in FUN(X[[i]], ...): The matrix has only two unique values, but it is
## not of logical type. Did you mean to use TRUE/FALSE values?
## Warning in FUN(X[[i]], ...): The matrix has only two unique values, but it is
## not of logical type. Did you mean to use TRUE/FALSE values?
Sometimes, we do not want to store everything at all time steps. One
option to reduce output is to specify a different times
vector to run_camodel()
. This will not affect the precision
of the simulation, but will simply change the points in time for which
simulation output is saved.
## iter = 0 ( 0 %) 0:0.508 +:0.492
## iter = 32 ( 25 %) 0:0.233 +:0.767 [390.24 iter/s]
## iter = 64 ( 50 %) 0:0.231 +:0.769 [400.00 iter/s]
## iter = 96 ( 75 %) 0:0.218 +:0.782 [400.00 iter/s]
## iter = 128 (100 %) 0:0.23 +:0.77 [400.00 iter/s]
## t 0 +
## [12,] 88 0.2177124 0.7822876
## [13,] 96 0.2183228 0.7816772
## [14,] 104 0.2331543 0.7668457
## [15,] 112 0.2297974 0.7702026
## [16,] 120 0.2262573 0.7737427
## [17,] 128 0.2301025 0.7698975
plot(covers[ ,"t"], covers[ ,"+"], type = "l", col = "red")
lines(covers2[ ,"t"], covers2[ ,"+"], col = "blue")
Another option is to set the control argument
save_covers_every
to something higher than one. This will
only save simulation output at time points that are multiple of this
value:
ctrl <- list(save_covers_every = 8)
run3 <- run_camodel(kubo_model, init, times = seq(0, 128, by = 1),
control = ctrl)
## iter = 0 ( 0 %) 0:0.508 +:0.492
## iter = 32 ( 25 %) 0:0.236 +:0.764 [390.24 iter/s]
## iter = 64 ( 50 %) 0:0.23 +:0.77 [400.00 iter/s]
## iter = 96 ( 75 %) 0:0.222 +:0.778 [400.00 iter/s]
## iter = 128 (100 %) 0:0.229 +:0.771 [400.00 iter/s]
## t 0 +
## [12,] 88 0.2252808 0.7747192
## [13,] 96 0.2217407 0.7782593
## [14,] 104 0.2294312 0.7705688
## [15,] 112 0.2334595 0.7665405
## [16,] 120 0.2288208 0.7711792
## [17,] 128 0.2288208 0.7711792
plot(covers[ ,"t"], covers[ ,"+"], type = "b", col = "red",
xlab = "t", ylab = "forest c over")
lines(covers2[ ,"t"], covers2[ ,"+"], col = "blue")
points(covers2[ ,"t"], covers2[ ,"+"], col = "blue")
lines(covers3[ ,"t"], covers3[ ,"+"], col = "darkgreen")
points(covers3[ ,"t"], covers3[ ,"+"], col = "darkgreen")
legend(x = "right",
legend = c("original", "reduced 'times' vector", "using 'save_covers_every'"),
col = c("red", "blue", "darkgreen"),
lty = c(1, 1, 1))
A similar argument exists for saving landscapes,
save_snapshots_every
, to reduce or increase the number of
landscapes being saved during the simulation, with similar effects.
One of the easiest way to dissect the results of your simulation is
simply to store the landscape, or the covers of each state as the
simulation runs, then do what you want with them afterwards. This is not
always possible, however, as the landscapes may be large and not that
many may fit in your computer’s memory, or your model may be running for
a large number of iterations. To avoid this, chouca
can
take a special function that will be executed as the simulation is
running.
This function must have a specific form: its first argument should be
t
, the current time of the simulation (an integer), and the
second should be mat
, the current landscape (a
matrix
object containing the levels of a factor).
For example, we can compute how the spatial autocorrelation changes in the landscape as the simulation is running, measured by the Moran’s I index (available in package spatialwarnings):
mod <- ca_library("forestgap")
init <- generate_initmat(mod, c(TREE = .5, EMPTY = .5), nr = 256)
# Define the function that computes AC at lag-1 using Moran's I
compute_aclag1 <- function(t, mat) {
m <- matrix(mat == "TREE", nrow = nrow(mat), ncol = ncol(mat))
data.frame(t = t, ac = spatialwarnings::raw_moran(mat))
}
control_args <- list(save_covers_every = 0,
custom_output_every = 1,
custom_output_fun = compute_aclag1)
out <- run_camodel(mod, init, times = seq(0, 64), control = control_args)
## iter = 0 ( 0 %) EMPTY:0.501 TREE:0.499
## iter = 16 ( 25 %) EMPTY:0.531 TREE:0.469 [72.73 iter/s]
## iter = 32 ( 50 %) EMPTY:0.531 TREE:0.469 [79.21 iter/s]
## iter = 48 ( 75 %) EMPTY:0.531 TREE:0.469 [80.81 iter/s]
## iter = 64 (100 %) EMPTY:0.531 TREE:0.469 [79.21 iter/s]
The results can be extracted as follows:
By default, all custom outputs are stored in a list, so a common
operation is to transform it into a table using rbind
or
any other function:
## t ac
## 56 55 0.011228878
## 57 56 0.007447191
## 58 57 0.011228878
## 59 58 0.007447191
## 60 59 0.011228878
## 61 60 0.007447191
## 62 61 0.011228878
## 63 62 0.007447191
## 64 63 0.011228878
## 65 64 0.007447191
The ability to carry out on-line computations has the nice side-effect that simulations can be visualised as they are running. This can be done, again by passing a specific function that will be executed as the simulation runs. The package includes a few helpers to do so.
The first one is landscape_plotter()
, which can be used
as follow, and will display the landscape as it is being run. We use
here the “rock-paper-scissor” model included with chouca
for its pretty patterns:
mod <- ca_library("rock-paper-scissor")
ctrl <- list(custom_output_every = 1,
precompute_probas = FALSE,
custom_output_fun = landscape_plotter(mod))
init <- generate_initmat(mod, rep(1, 3)/3, 100, 178)
out <- run_camodel(mod, init, seq(0, 128), control = ctrl)
See also the animated version for this output.
Another option to visualize simulations as they run is using
trace_plotter()
, which will display the covers of each
state in the landscape during the run:
ctrl <- list(custom_output_every = 1,
custom_output_fun = trace_plotter(mod, init, max_samples = 128),
console_output_every = 128)
out <- run_camodel(mod, init, seq(0, 512), control = ctrl)
See also the animated version for this output.
These two functions have many options to customize their output to
your liking, which are documented in ?trace_plotter
or
?landscape_plotter
. Of course, using them involves a large
performance penalty as plots needs to be redrawn during the execution of
the simulation: make sure you remove them if you need to run a large
amount of simulations!
chouca
has several caveats you may bump into as you use
it:
It would probably be unwise to define stochastic cellular
automata with more than 100 different states. chouca
works
better with models that have a low number of states and/or
transitions.
M1/2 Macs, and Macs in general have spotty support for OpenMP: you may or may not be able to use multiple threads on your machine when running simulations.
Kubo, Takuya, Yoh Iwasa, and Naoki Furumoto. 1996. “Forest Spatial Dynamics with Gap Expansion: Total Gap Area and Gap Size Distribution.” Journal of Theoretical Biology 180 (3): 229–46.
Mayor, Ángeles G., Sonia Kéfi, Susana Bautista, Francisco Rodríguez, Fabrizio Cartení, and Max Rietkerk. 2013. “Feedbacks between Vegetation Pattern and Resource Loss Dramatically Decrease Ecosystem Resilience and Restoration Potential in a Simple Dryland Model.” Landscape Ecology 28 (5): 931–42. https://doi.org/10.1007/s10980-013-9870-4.