Introduction
The current document defines the package multi Version 1 of SBML Level 3 Version 1.
Graphical and typographical conventions
We use the following typographical conventions to distinguish objects and data types from other entities:
Class
| Names of classes begin with a capital letter and are printed in a bold, sans-serif typeface. |
attribute
| Names of attributes begin with a lowercase letter and are printed in a bold, italic, sans-serif typeface. |
value
| CDATA (character data (see http://en.wikipedia.org/wiki/CDATA)), that is the textual content of an element or the value of an attribute, is printed in an italic, sans-serif typeface. |
code | Examples of XML code are surrounded by a boxes and have a different background color than the rest of the text. |
Some SpeciesTypes can represent binding sites. Those binding sites can be linked to other binding sites. In this document, a binding site can be represented in four different contexts:
Throughout the text, the American spelling is used rather than the British one.
Motivation
This package multi addresses two different — though related — problems, commonly encountered when trying to model biological processes:
- The representation of entities that can exist under different states affecting their behaviours (multi-state entities). Those entities carry state features, sometimes many of them, each able to take different values. This may result in a combinatorial explosion of alternative states taken by the entities.
- The creation and behaviour of complexes made up of different components (multi-component entities). The rules of assembly may lead to an unbounded list of species, with the number of components and their topology impossible to precise before the simulation.
As a simple example of multi-state multi-component entity, let’s consider a ligand-gated ion channel with only one feature, the pore, that can adopt three different values, closed , opened and desensitized . In addition to this state feature, the channel can be bound to a scaffold through an anchor.
Taking into account the different values for the state feature, plus the status of the anchoring site (bound or not), this receptor can exist under six different states:
- free, closed
- free, open
- free, desensitized
- anchored, closed
- anchored, open
- anchored, desensitized
With the SBML core, any reaction involving the receptor, such as binding to a ligand, will have to be written six times. However, some of the state features and/or bonds may not affect the binding of the ligand, but the reactions have to be enumerated nevertheless, if we want to keep track of all the populations. Writing all the possibilities can be in the best case just exhausting, and in the worst case plainly impossible due to the combinatorial explosion. If an entity possesses 4 bivalued features and 2 tri-valued features, the number of possible state is 24x32. A dodecamer of CaMKII with 5 different characteristics taking two values (e.g. activity, binding to calmodulin and to ATP, phosphorylations on threonin 286 and 306) exhibits 60 state features, and consequently a billion of billion possible states. Writing such a model by enumerating all possible states and reactions is plainly impossible, and one needs a way to describe only the relevant states of this species rather than all the possible ones.
Another problem addressed by the package multi is the unbounded list of multi-component entities. Let's imagine a situation where we would like to model the growth of microtubules from dimers of tubulin. We cannot possibly enumerate all the possible microtubules of different lengths. Furthermore, the length of a microtubule does not affect the rate with which a new dimer of tubulin is incorporated. The only thing we need to encode is the binding between a tubulin dimer incorporated in a microtubule and a free tubulin dimer.
In the schema above, the grey binding site is either bound or unbound, and it can be bound to another tubulin dimer or a microtubule containing 100 tubulin dimers.
To handle the problems described above a field of modeling was developed, called rule-based modeling [1]. The main idea of rule-based modeling is to write down the rules that reactions must obey rather than the reactions themselves. And example of language used to describe rule-based models in biology is BioNetGen [2]. Another approach to avoid the combinatorial explosion of possible cases, is to use multi-agent modeling, where one represent all the interacting entities individually rather than pools. It the number of possible cases exceeds the number of entities, this method, otherwise verbose, become parsimonious. Example of multi-agent software used in biology are StochSim [3, 4] and Simmune [5, 6].
The graphical equivalent of the package multi Version 1 of SBML Level 3 Version 1 is the SBGN Entity Relationship language.
Finally, the package multi permit to encode entities that are made of components belonging to different compartments. Those entities cannot be species, since a species is attached to a single compartment, and the compartments in SBML Level 3 Version 1 are not overlapping. In multi, speciesTypes representing species attached to different compartments can be linked.
Past work on this problem or similar topics
Proposals for supporting multistates and multicomponent species have a long history in SBML. Here is a reconstruction in chronological order:
- Andrew Finney was probably the first to formulate, in March 2001, proposed SBML extensions to support complex species, to be able to cover multistates species and species made up of graphs of components, as part of a collection of proposals for new SBML development. At the 3rd Workshop on Software Platforms for Systems Biology in June 2001, Nicolas Le Novère gave a presentation entitled Multistate molecules and complex objects proposing to extend Andrew's multistate proposal.
- Nicolas Le Novère and Tom Shimizu came up in July 2001 with an alternative proposal for encoding and using states in SBML. A sligthly extended and corrected version of this proposal presented by Nicolas at the 5th Workshop on Software Platforms for Systems Biology in July 2002. Nicolas Le Novère, Tom Shimizu and Andrew Finney published a complete description of this extension in December 2002.
- In March 2004, before the 2nd SBML hackathon, Andrew Finney published an updated proposal to encode complex species made up of several components. Planed as an extension for SBML Level 3, the document also described SpeciesTypes that would later be incorporated to SBML Level 2, from version 2 onward.
- In October 2004, Michael Blinov published, together with Jim Fader, Byron Goldstein, Andrew Finney and Bill Hlavacek, an alternative proposal for encoding multi-component species, that also contained some possibilities of encoding multistate features.
- Anika Oellrich started to implement a new SBML L2 support for StochSim in spring 2007, storing multistate information in proprietary annotations. This led in June 2007 to a proposal for Level 3 by Le Novère and Oellrich, meant to work in conjunction with 2004 Finney's multicomponents proposal. The proposal was presented at the 12th SBML forum meeting. A light correction was published in December 2007.
- Also at the 12th SBML forum meeting, Michael Blinov presented an updated version of his proposal. He later published two proposals for SBML L3, one with a hierarchical speciesTypes structure and one with a non-hierarchical speciesTypes structure.
- On December 6 and 7 2007, an SBML Focused Videoconference was held, which launched the effort to develope the Level 3 package multi.


