These are all very good questions. I'll try to answer them below.
Herbert Sauro wrote:
>Hyss certainly looks different from the normal run of the mill
>simulators that abound.
>To be honest I wish I understood the theory behind stochastic simulation
>better, there are a number of things which nag me and I've asked around
>but no one has given me a good answer. Three things in particular:
>1. What are the rules regarding the use of different rate laws in
The short answer is that the reaction propensities (or rates) of each
event in a stochastic simulation must be positive, but are otherwise
restricted only by their application. Stochastic simulations (or, to be
more specific, jump Markov processes) may be used to describe traffic
flow patterns, reaction-diffusion systems, population balances (on the
ecological level), and other systems which do not have to be strictly
physical systems. Since system's biology is mainly interested in
describing physical reaction-diffusion (or just reaction) systems, the
reaction rate laws must make physical sense. But all of these systems
may be equally described using the 'Master equation', which is the
evolution equation for jump Markov processes and is typically unsolvable.
The paper describing the original stochastic simulation algorithm
(Gillespie, 1976) gives a simple derivation of the stochastic version of
mass action kinetics. Molecules which regularly collide with one another
have a small probability of reacting. The more combinations of molecules
that collide and react, the higher the rate of the reaction. So a = k *
#(combinations of molecules colliding), which is a = k * #A*#B for
bi-molecular 2nd order reactions, or a = k * (#A - 1)*#A for
mono-molecular 2nd order reactions. If one assumes that the molecules
are hard-ball spheres, then a specific kinetic constant, k, can be
analytically derived (of course, this assumption is rarely true and
experiments are typically required to calculate the specific kinetic
constant.) In these cases, k is in units of [molecules sec]^-1 and #
refers to the number of molecules.
Now, technically, any other expressions for reaction rate laws utilize
some assumption. Rao & Arkin (2002) rederived the Michaelis Menten rate
law expressions from the Master equation, using a quasi steady state
approximation (QSSA). The difference between the deterministic and
stochastic versions is that the probabilistic QSSA does not assume that
the number of molecules (or concentration) of the enzyme-substrate
complex is constant. Instead, it assumes that the probability
distribution of the number of molecules is constant (or, that the
probability of a molecule being consumed or produced does not change
over time, which is the same thing). The same sort of restrictions
apply. The timescale of the kcat parameter must slower than the
timescale of the k1 and k-1 parameters. In the end, the rate law
expression is the same as the ODE version, but this should not taken as
proof that all reaction rate laws are instantly convertable.
While some people just take a system of ODEs and convert them to a
stochastic version, they will often be converting reaction rate terms
that utilized assumptions that are not valid in the physical sense of a
mixture of molecules stochastically colliding and reacting. Of course,
they will get a similar answer to the system of ODEs (with some possibly
interesting fluctuations), but there is no reason why those stochastic
dynamics will correctly predict what is happening in reality. That's
because the assumptions made in creating the system of ODEs did not
consider what those assumptions do to the fluctuations / internal noise
of the system.
It would probably be very informative to compare the dynamics of a
system using just elementary reactions and one using more complex
reaction rate expressions, which use assumptions that may or may not be
true in a jump Markov process. (There was a paper doing this for the
Michaelis Menten expressions, but I forget the reference. Other reaction
rate laws have not been tested.)
>2. I've seen a few hybrid schemes for continuous/discrete simulators but
>I've seen little theory to back them up. In particular the question of
>how to mesh a variable step size continuous solver with a variable step
>size stochastic solver does not appear to have been satisfactorily
>answered (or has it?).
Ok, yeah, there's about 3-4 hybrid schemes out there. Some I think are
not good, others are ...and you can tell for yourself by asking a few
Did they compute the error between their method and the original
SSA's (including the mean, the variance, and the probability
distribution itself)? It is difficult to eye-ball the accuracy of the
solution, especially when you're looking for error in the higher moments.
What approximations did they make to convert the discrete system to a
continuous one? How do they check to make sure the approximation is
valid? How often, during the simulation, do they check?
What examples did they use to demonstrate their method? If they
choose only linear examples (A --> B, B --> C, ad infinitum), then
there's no gaurentee it will work for non-linear ones. Did they choose
only trivial non-linear ones (A + B --> C and reverse)? That's a bit of
cheating, because there's an analytical solution for the dimerization
reaction and, if they use it in the simulation method, then they're
misrepresenting the ability of their method.
How much theory do they include with the method? What approximations
did they make when they derived their equations? I've seen quite a few
people make the following invalid approximation: if the probability of a
reaction event is greater than X, then it is now a deterministic
reaction rate. That's a huge leap and there's no support for that
approximation at all. It also produces a lot of error.
Ok, so here is how I answer the question of combining a continuous &
jump Markov process (continuous-stochastic / discrete-stochastic
description). Writing equations here is a bit awkward, but the paper is
available online. I'll sketch through the derivation.
First, you partition the system into fast/continuous and slow reactions.
There's a few ways to do this and I described a very simple one: if the
number of molecules of either reactant or product species is greater
than epsilon and if the reaction propensity is greater than lambda, then
the reaction is a fast/continuous one. Otherwise, it is slow. By
increasing epsilon and lambda, the approximation from
discrete-stochastic to continuous-stochastic generates less error. We
repartition the system dynamically and often.
Second, describe the effects of the subsystem of fast/continuous
reactions using the chemical Langevin equation (CLE), which is a system
of Ito stochastic differential equations (SDEs). (I'm not the first
person to do this. Haseltine & Rawlings (2002) use a similar method and
Gillespie wrote a derivation of the CLE in 2000.) This approximates a
subset of reactions from a jump Markov process to a continuous Markov
process. Other examples of continuous Markov processes are random walks,
diffusion, and brownian dynamics.
Third (and the most important part), derive a system of (what I call)
Jump equations, which are another subsystem of Ito SDEs and whose
solution are the times of the slow reactions. These SDEs are coupled to
the CLE and are solved simultaneously using an SDE numerical integrator.
I'll sketch their derivation here:
Start by writing down the time-dependent probability of some event
happening. The event may be occurring according to an exponential, a
gamma, a Gaussian, or any other computable distribution. A typical Monte
Carlo technique to sample the events times of these distributions is to
equate the integral of the probability distribution (which is called the
cumulative distribution function) to a uniform random number. Now
replace the upper bound of that integral with a variable, which is the
time of the next event, and rearrange the equation so that the left hand
side is now equal to some residual. The residual is always negative when
the event has just started to become possible (ie. when the last event
occurred) and it is equal to zero when the event has occurred. If the
event is generated according to an exponential distribution (which is
what stochastic chemical kinetics uses), then the rate of change of the
residual is the reaction propensity and the initial condition of the
residual is log(r), where r is a uniform random number between 0 and 1.
If the reaction propensity is constant over time then you will exactly
recover Gillespie's algebraic equation for generating the reaction
times. However, if the reaction propensity is changing based on what the
fast/continuous reactions are doing, then you need to numerically
integrate the Jump equations to get the reaction times.
So the coupling between the fast/continuous and slow reactions is
explicitly taken into account. The simultaneously solution of both
subsystems of SDEs may be done using very well-defined stochastic
numerical integrators. The slow reactions are still discrete events.
There's a lot of good theory on the global error of a particular
stochastic numerical integrator (good book on the topic by Kloeden &
Platen). So you can determine what the error of your solution is by
using numerical SDE theory, which has been in developement for 40 years
(but is still not as well developed as numerical ODE theory.) The method
is very accurate, especially when the fast reactions are very fast.
In the paper, I also put forth an additional and optional approximation
called the 'Multiple Slow Reaction' approximation. If, for example, you
have a slow reaction whose reaction times are dependent on some species
which is only minorly changed by the execution of the slow reactions,
then you can execute multiple slow reaction events (in order and
recomputing their rates each time) in between the numerical integration
of the CLE. It speeds up the simulation when there are many, many slow
reactions, whose combined effects are small compared to the fast
reactions. We're still working on a way to dynamically calculate how
many 'multiple reaction executions' is too large. The method works fine
without the approximation, but it slows down for very large systems (due
to the cost of repeatedly integrating the CLE for each slow reaction event.)
>3. As we all appreciate here, systems biology is just simply building a
>model and hitting the run button, it is also understanding the model
>(amongst other things) and for that we have a great variety of tools and
>particularly theory to help us. My impression is that stochastic models
>(including hybrid ones, and deterministic hybrid) do not have the same
>degree of assistance, therefore what can one do to dissect a stochastic
>model in the same manner we do with a continuous model?
In deterministic systems, there's a lot of theory on flows, stability,
and bifurcation. The topic of 'non-linear analysis and dynamics' is
replete with methods that can systematically analyze any system of ODEs
(Auto is a good program for doing this). For stochastic systems
(specifically, either jump or continuous Markov processes), there is
much less theory that treats these systems as dynamical ones. But,
there's a few people working on this topic and I think the field is
making rapid progress. For example, both ODEs and SDEs generate 'flows'
in space. A trajectory in a system of ODEs does not cross other
trajectories and it has only one mapping from origin to destination for
any given time. A trajectory in a system of SDEs does not generate a
unique mapping from one place to another. There is a probability
distribution of being at any place at some time.
But, there is a tricky idea of a 'cocycle' that says that the space of
all possible trajectories moved forward by X amount of time and then
again moved forward by Y amount of time is equal to the space of all
possible trajectories moved forward by X + Y time. Probably makes sense,
but it was very hard to prove. (Please, someone correct me if I got that
wrong. Measure theory is not my thing.) Using the twin ideas of a 'flow'
and a 'cocycle', mathematicians can start to think about stochastic
systems as random dynamical systems. Instead of fixed points (or roots),
you now have random attractors. Instead of stability, you now have
escape rates. Instead of bifurcations, you now have fluid changes in the
shape of the random attractor as a function of some parameter (which, in
reality, is a lot more accurate!!). So the theory is sparse, but
present, and is rapidly developing.
Eventually, the tools will be good enough to always work and always
produce an accurate solution in as little time as possible. The methods
aren't that far developed, but progress is fast. The software
http://hysss.sourceforge.net can be thought as a development playground
for these new types of stochastic methods. For people to use in system's
biology research and for faster progress in the development of better
I hope that acceptably answers the questions. (And if you've read this
far, then I congratulate you on your patience.)
>From: Howard Salis [mailto:email@example.com]
>Sent: Thursday, July 21, 2005 3:04 PM
>To: SBML Discussion List
>Subject: Re: [sbml-discuss] What do SBML libraries need to focus on
>Well, since you're asking, check out http://hysss.sourceforge.net. It's
>GPL'd, of course.
>Though, it's not complete and we'll soon be adding yet another (very
>innovative) hybrid method which utilizes a new type of probabilistic
>steady state approximation.
>The (slightly crude, but very capable) GUI is written in Matlab. I
>obviously don't spend a lot of time on the GUI.
>Herbert Sauro wrote:
>>This remark is to the whole community, how many ode/gillespie wrappers
>>do we really need? Are there now almost 100 listed on the sbml site?
>>Instead of writing yet another sbml/ode wrapper isn't it time we
>>actually did something truly innovative?
>>From: Michael Hucka [mailto:firstname.lastname@example.org]
>>Sent: Thursday, July 21, 2005 1:20 PM
>>To: SBML Discussion List
>>Cc: LibSBML Discussion List
>>Subject: Re: [sbml-discuss] What do SBML libraries need to focus on
>>>>>>>On 21 Jul 2005, Stefan Hoops <email@example.com> wrote:
>> shoops> What I definitely exclude are the following points:
>> shoops> - Computing higher-level information about a model.
>> shoops> - Solvers (mentioned by Howard Salis)
>> shoops> This can and should be provided in additional shoops>
>>libraries or tools by whoever wants to shoops> contribute.
>>Agreed. There is currently no plan to roll solvers into libSBML. If
>>we (SBML Team) do ever work on solution engines, I forsee them being
>>separate packages or libraries; it makes more sense that way.