Andrew Finney, Michael Hucka
{afinney,mhucka}@cds.caltech.edu
Systems Biology Workbench Development Group
ERATO Kitano Symbiotic Systems Project
Control and Dynamical Systems, MC 107-81
California Institute of Technology, Pasadena, CA 91125, USA
http://www.sbml.org
Principal Investigators: John Doyle and Hiroaki Kitano
SBML Level 2, Version 1 (Final)
June 28, 2003
We present the Systems Biology Markup Language (SBML) Level 2, a model representation formalism for systems biology. SBML is oriented towards describing systems of biochemical reactions of the sort common in research on a number of topics, including cell signaling pathways, metabolic pathways, biochemical reactions, gene regulation, and many others. SBML is defined in a neutral fashion with respect to programming languages and software encoding; however, it is primarily oriented towards allowing models to be encoded using XML, the eXtensible Markup Language (Bray et al., 2000; Bosak and Bray, 1999). This document contains many examples of SBML models written in XML, as well as an XML Schema (Fallside, 2000; Thompson et al., 2000; Biron and Malhotra, 2000) that defines SBML Level 2. A downloadable copy of the XML Schema and other related documents and software are also available from the SBML project web site, http://www.sbml.org/.
Major releases of SBML are termed levels. SBML Level 2 evolved out of SBML Level 1 (Hucka et al., 2001,2003). All of the structures of Level 1 can be mapped in a straightforward fashion to Level 2. In addition, a large subset of the structures in Level 2 can be mapped to Level 1. However, the levels remain distinct; a valid SBML Level 1 document is not a valid SBML Level 2 document, and likewise, a valid SBML Level 2 document is not a valid SBML Level 1 document. Appendix B lists the differences between Level 1 and Level 2.
SBML Level 2 was created in collaboration with the authors of the following systems: BASIS (Kirkwood et al., 2003), Bio Skektch Pad (Belta et al., 2003), BioSpreadsheet (McCollum and Lancaster, 2003), BioSpice (Arkin, 2001), CellDesigner (Funahashi and Kitano, 2003), Cellerator (Shapiro et al., 2001,2003), COPASI (Mendes, 2000), DBsolve (Goryanin et al., 1999; Goryanin, 2001), E-CELL (Tomita et al., 2001,1999), ESS (Peterson and Drager, 2003), Gepasi (Mendes, 1997,2001), Jarnac (Sauro, 2000; Sauro and Fell, 1991), JDesigner (Sauro, 2001), JigCell (Vass et al., 2003), MCell (Bartol and Stiles, 2002), NetBuilder (Schilstra and Bolouri, 2002), PathScout (Minch et al., 2003), ProMoT/DIVA (Stelling et al., 2001), StochSim (Morton-Firth and Bray, 1998; Bray et al., 2001), and Virtual Cell (Schaff et al., 2001,2000). SBML Level 2 was developed with the help of these packages' authors, as well as help and collaboration from the creators of CellML (Hedley et al., 2001) and many other individuals listed in the Acknowledgments (Section 6.2).
SBML Level 2 is meant to support basic biochemical network models and the kinds of operations that are possible in existing analysis/simulation tools. Future software tools will undoubtedly require further evolution of SBML, and we expect that higher SBML levels will add structures and facilities on top of Level 2 after the simulation community has had time to gain experience with the current language definition. In Section 6.1, we discuss extensions that will likely be included in SBML Level 3.
The definition of the model description language presented here does not specify how programs should communicate or read/write SBML. We assume that for a simulation program to communicate a model encoded in SBML, the program will have to translate its internal data structures to and from SBML, use a suitable transmission medium and protocol, etc., but these issues are outside of the scope of this document.
We define SBML using a graphical notation based upon UML, the Unified Modeling Language (Oestereich, 1999; Eriksson and Penker, 1998). This UML-based definition in turn is used to define an XML Schema (Fallside, 2000; Thompson et al., 2000; Biron and Malhotra, 2000) for SBML. There are three main advantages to using UML as a basis for defining SBML data structures. First, compared to using other notations or a programming language, the UML visual representations are generally easier to grasp by readers who are not computer scientists. Second, the visual notation is implementation-neutral: the defined structures can be encoded in any concrete implementation language--not just XML, but C, Java and other languages as well. Third, UML is a de facto industry standard that is documented in many sources. Readers are therefore more likely to be familiar with it than other notations.
Our notation and our approach for mapping it to XML Schema is explained in a separate document (Hucka, 2000). Appendix A presents a summary, and examples throughout this document illustrate the approach. All data types in SBML follow XML Schema datatype definitions and conventions.
We follow certain naming and typographical conventions throughout this document. Specifically, the names of data structure attributes or fields begin with a lowercase letter, and the names of data structures and types begin with an uppercase letter. Keywords (names of types, XML elements, etc.) are written in a typewriter-style font; for example, Compartment is a type name and compartment is a field name. Likewise, literal XML examples are also written in a typewriter-style font.
The following is an example of a simple network of biochemical reactions that can be represented in SBML:
![]() |
|
A software package can read an SBML model description and translate it into its own internal format for model analysis. For example, a package might provide the ability to simulate the model by constructing differential equations representing the network and then perform numerical time integration on the equations to explore the model's dynamic behavior.
SBML allows models of arbitrary complexity to be represented. Each type of component in a model is described using a specific type of data structure that organizes the relevant information. The data structures determine how the resulting model is encoded in XML.
In the sections that follow, we describe in detail SBML's various constructs and their uses. Section 3 first introduces a few basic structures which are used throughout SBML Level 2, then Section 4 provides details on each of the main components. Section 5 provides a number of complete examples of models encoded in XML using SBML Level 2. Section 6 contains a list of anticipated enhancements that will be made in Level 3 and a discussion of other efforts related to SBML. Appendix A summarizes the UML-based notation used in this document, Appendix B describes the differences between SBML Level 1 Version 2 and SBML Level 2 as described in this document, and finally, Appendix C provides the complete XML Schema for SBML Level 2.
This section covers certain concepts and constructs that are used repeatedly in the rest of SBML Level 2.
The base types in SBML (e.g., integer, double, and others) are taken directly from XML Schema (Fallside, 2000; Thompson et al., 2000; Biron and Malhotra, 2000). SBML defines additional data types and structures beyond this. Every structure composing an SBML Level 2 model definition has a specific data type that is derived directly or indirectly from a single abstract type called SBase. This base type is designed to allow a modeler or a software package to attach arbitrary information to each major structure or list in an SBML model. The definition of SBase is presented in Figure 1.
|
SBase contains three fields, all of which are optional: metaid, notes and annotation. These fields are discussed separately in the following subsections.
The metaid field is present for supporting metadata annotations using RDF (Resource Description Format; Lassila and Swick, 1999). It has a data type of ID (the XML identifier type), and serves as anchors for metadata references. Metadata expressed using RDF can be placed anywhere within an sbml element and its subelements, except within MathML elements. The metadata elements can include RDF description elements in which the RDF describes attributes contain the values of the metaid fields of SBML elements in the model. The form of the RDF element content in SBML should follow the form described in the CellML Metadata Specification (Cuellar et al., 2002).
The field notes in SBase is a container for XHTML content. It is intended to serve as a place for storing optional information intended to be seen by humans. Typically, the notes field will contain user comments about the structure in which the notes field is enclosed. Every data object derived directly or indirectly from type SBase can have a separate value for notes, allowing users considerable freedom when adding comments to their models. Section 5 provides examples of using notes in different models.
SBase includes the field called annotation to provide a container for software-generated annotations that are not intended to be seen by humans. This field is a container for arbitrary data (XML type any). As with the user-visible notes field, every data object can have its own value for annotation. Section 3.2 provides guidelines for using this field.
The overall SBML inheritance hierarchy is depicted in Figure 2. In addition to the relationships shown, all substructures such as trigger on Event and the listOf lists are also derived from SBase. (However, the notes and annotation elements contained inside SBase are not derived from SBase.)
|
In other type definitions presented below, we follow the UML convention of hiding the attributes derived from a parent type such as SBase. It should be kept in mind that these attributes are always available.
The annotation field in the definition of SBase is formally unconstrained in order that software developers may attach any information they need to the structures in an SBML model. However, it is important that this facility not be misused. In particular, it is critical that information essential to a model definition is not stored in annotation. Parameter values, functional dependencies between model structures, etc., should not be recorded as annotations.
Here are examples of the kinds of data that may be appropriately stored in annotation: (a) information about the graphical layout of model components; (b) application-specific processing instructions that do not change the essence of a model; (c) identification information for cross-referencing components in a model with items in a database.
Different applications may use XML Namespaces (Bray et al., 1999) to specify the intended vocabulary of a particular annotation. Here is an example. Suppose a particular application needs to annotate data structures in an SBML model definition with screen layout information and a time stamp. The application's developers should choose a URI (Universal Resource Identifier; Harold and Means 2001; W3C 2000a) reference that uniquely identifies the vocabulary the application will use for such annotations, and a prefix string for the annotations. For illustration purposes, let us say the URI reference is ``http://www.mysim.org/ns'' and the chosen prefix is mysim. An example of an annotation might then be as follows:
The namespace prefix mysim is used to qualify the XML elements mysim:nodecolors and mysim:timestamp; presumably these symbols have meaning to the application. This example places the XML Namespace information on annotation itself rather than on a higher-level enclosing construct or the enclosing document level, but other placements would be valid as well (Bray et al., 1999).
The use of XML Namespaces permits multiple applications to place annotations on XML elements of a model without risking interference or element name collisions. Annotations stored by different simulation packages can thus coexist in the same model definition. Although XML Namespace names must be URI references, an XML Namespace name is not required to be directly usable in the sense of identifying an actual, retrieval document or resource on the Internet (Bray et al., 1999). ``http://www.mysim.org/'' is a namespace name or URI in the example above. The name is simply intended to enable unique identification of constructs, and using URIs is a common and simple way of creating a unique name string. For the convenience of developers of simulation and analysis tools, we reserve certain namespace names for use with annotations in SBML. These reserved names are listed in Table 1.
Note that the namespaces being referred to here are XML Namespaces specifically in the context of the annotation field on SBase. The namespace issue here is unrelated to the namespaces discussed in Section 3.5 in the context of SId and symbols in SBML.
As will become apparent below, most structures in SBML include two common fields: id and name. The id field is usually required for most structures and is used to identify a component within the model definition. Other SBML structures can refer to the component using this identifier. Section 3.4 provides a definition of the data type SId used for the id field, and Section 3.5 describes the scoping and namespace rules for these identifiers.
The equality of SId values is determined by an exact character sequence match; i.e., comparisons of these identifiers must be performed in a case-sensitive manner. This applies to all uses of SId including the identifiers of unit definitions.
In contrast to the id field, the name field is optional and is not intended to be used for cross-referencing purposes within a model. Its purpose instead is to provide a human-readable label for the component. The data type of the name field is the type string defined in XML Schema (Thompson et al., 2000; Biron and Malhotra, 2000). This type includes all Unicode characters (Unicode Consortium, 1996) except for two delimiter characters, 0xFFFE and 0xFFFF (Biron and Malhotra, 2000). In addition, the following quoting rules specified by XML for character data (Bray et al., 2000, Section 2.4) must be obeyed:
The recommended practice for handling name is as follows. If a software tool has the capability for displaying the content of name fields, it should display this content to the user as a component's label instead of the component's id field. If the user interface does not have this capability (e.g., because it cannot display or use special characters in symbol names), or if the name field is missing on a given component, then the user interface should display the value of the id field instead. (Script language interpreters are especially likely to display id fields instead of name fields.)
As a consequence of the above, authors of systems that automatically generate the values of id fields should be aware some systems may display the id's to the user. Authors therefore may wish to take some care to have their software create id values that are reasonably easy for humans to type and read.
An additional point worth mentioning is although there are restrictions on the uniqueness of id values (see Section 3.5 below), there are no restrictions on the uniqueness of name values in a model. This allows a software package more leeway in assigning component identifiers. For example, a species in an SBML model must be located in a compartment, which means that if the same species appears in multiple compartments (e.g., in the context of a transport reaction), they must be given different identifiers. It is currently the case that users and software differ sharply in philosophy about how to treat this situation: some treat these as different species, and others treat them as the same species located in different places. Those in the latter group often want to use the same name but have different id values for the differently-localized ``instances'' of the species. The freedom from restrictions on name values enables SBML to accommodate both philosophies.
The type SId is the type of the id field found on the majority of SBML components. SId is a data type derived from the basic XML type string, but with restrictions about the types of characters permitted and the sequence in which they may appear. Its definition is shown in Figure 3.
|
The SId is purposefully not derived from the XML ID type. Using XML's ID would force all SBML identifiers to exist in a single global namespace, which would affect not only the form of local parameter definitions but also future extensions for supporting model/submodel composition. Further, the use of the ID type for SBML identifiers would have limited utility because MathML ci elements are not of the type IDREF (see Section 3.6). Since the IDREF-ID linkage cannot be exploited in MathML constructs, the utility of the XML ID type is greatly reduced.
A biochemical network model can contain a large number of components representing different parts of a model. This leads to a problem in deciding the scope of an identifer: in what contexts does a given identifier X represent the same thing? The approaches used in existing simulation packages tend to fall into two categories which we may call global and local. The global approach places all identifiers into a single global namespace, so that an identifier X represents the same thing wherever it appears in a given model definition. The local approach places symbols in different namespaces depending on the context, where the context may be, for example, individual reaction rate expressions. The latter approach means that a user may use the same identifer X in different rate expressions and have each instance represent a different quantity.
The fact that different simulation programs may use different rules for identifier resolution poses a problem for the exchange of models between simulation tools. Without careful consideration, a model written out in SBML format by one program may be misinterpreted by another program. SBML Level 2 must therefore include a specific set of rules for treating identifers and namespaces.
The namespace rules in SBML Level 2 are relatively straightforward and are intended to avoid this problem with a minimum of requirements on the implementation of software tools:
The set of rules above can enable software packages using either local or global namespaces for parameters to exchange SBML model definitions. In particular, software environments using local namespaces for parameters internally should be able to accept SBML model definitions without needing to change component identifiers. Environments using a global namespace for parameters internally can perform a simple manipulation of the identifiers of local parameter elements within reaction definitions to avoid name collisions. (An example approach for the latter would be the following: when receiving an SBML-encoded model, prefix each parameter identifier inside each reaction with a string constructed from the reaction's identifier; when writing an SBML-encoded model, strip off the prefix.)
The namespace rules described here will hopefully provide a clean transition path to future levels of SBML, when submodels are introduced (Section 6.1). Submodels will provide the ability to compose one model from a collection of other models. This capability will have to be built on top of SBML Level 2's namespace organization. A straightforward approach to handling namespaces is to make each submodel's space be private. The rules governing namespaces within a submodel can simply be the Level 2 namespace rule described here, with each submodel having its own (to itself, global) namespace.
Mathematical expressions in SBML Level 2 are represented using MathML 2.0 (W3C, 2000b), the XML standard for describing mathematics in machine-readable format. It is used in the definitions of functions (Section 4.3), rules (Section 4.8), reaction kinetics (Section 4.9.7), stoichiometries (Section 4.9.5) and events (Section 4.10). The KineticLaw, StoichiometryMath, EventAssignment and Rule structures each have a single MathML math subelement. A function definition has a single lambda subelement. The Event structure has two math fields, trigger and delay each containing a single MathML math element.
The XML namespace URI for all MathML elements is ``http://www.w3.org/1998/Math/MathML''. [See the W3C document by Bray et al. (1999) for more information about using XML namespaces.] The examples in Section 5 illustrate the use of this namespace and MathML in SBML.
The subset of MathML elements used in SBML Level 2 is similar to that used by CellML and is itemized below:
The inclusion of logical operators, relational operators, piecewise, piece, and otherwise elements facilitates the encoding of discontinuous expressions. Elements for representing partial differential calculus are not included. We anticipate that the requirements for partial differential calculus will be addressed in proposals for SBML Level 3 geometry representations (see Section 6.1).
The following are the only attributes permitted on MathML elements in SBML:
Missing values for these attributes are to be treated in the same way as defined by MathML. These restrictions on attributes are designed to confine the MathML elements to their default semantics and to avoid conflicts in the interpretation of the type of token elements.
The following are the only permissible values for the type attribute on MathML cn elements: ``e-notation'', ``real'', ``integer'', and ``rational''. The value of the type attribute defaults to ``real''.
The content of a ci element must obey MathML whitespace rules and contain an identifier that is declared elsewhere in the model. The set of possible identifiers that can appear in a ci element depends on the containing structure in which the ci is used:
SBML Level 2 uses the MathML csymbol element to denote certain built-in mathematical entities without introducing reserved names into the component identifier namespace. The encoding field of csymbol should be set to text. The definitionURL should be set to one of the following predefined SBML symbol URLs:
The following examples demonstrate these concepts. The XML fragment below
encodes the formula
, where
is the built-in symbol for time.
As a further example, the following XML fragment encodes the equation
or alternatively
:
Note that it is not necessary for a parser to access the resource pointed to by the ``definitionURL:'' in this context, the URL should be interpreted as a URI. Also, the content of the csymbol element is for rendering purposes only and can be ignored by the parser.
In this section, we define each of the major data structures in SBML. To provide illustrations of their use, we give partial model definitions in XML. Section 5 provides many full examples of SBML in XML.
The outermost portion of an SBML Level 2 model definition consists of a single Sbml structure enclosing a single Model structure (see next Section). The definition of Sbml is shown in Figure 4.
|
The XML namespace URI for SBML Level 2 is ``http://www.sbml.org/sbml/level2''. All SBML Level 2 elements should be encoded using this URI by assigning this URI to either the default namespace or a tag prefix. The character encoding for SBML is UTF-8. SBML documents should include the encoding attribute with the value UTF-8 in the XML prologue.
In the transformation of UML to XML used in this document, the Sbml structure is turned into an element named sbml. The element has two required attributes: level and version. For SBML Level 2 Version 1, these attributes must be set to ``2'' and ``1'', respectively. (The version attribute is present in case SBML Level 2 must be revised in the future to correct errors.)
The following is an abbreviated example of the outermost content of an SBML model definition in XML:
The Model structure is the highest-level construct in an SBML data stream or document. Its definition is shown in Figure 5. Only one component of type Model is allowed per instance of an SBML document or data stream, although it does not necessarily need to represent a single biological entity.
|
Model serves as a container for FunctionDefinition, UnitDefinition, Compartment, Species, Parameter, Rule, Reaction and Event components. All of these components are optional; that is, the lists in each of the respective fields are permitted to have zero length. (However, there are dependencies between components, such that defining some requires defining others. For example, as explained in other sections below, defining a species requires defining a compartment, and defining a reaction requires defining a species.)
The Model structure has an optional field, id, used to give the model an identifier. The identifier must be a text string conforming to the syntax permitted by the SId data type described in Section 3.4. Model also has an optional name field, of type string. The name and id fields should be used as described in Section 3.3.
In the XML encoding of an SBML model, the lists of species, compartments, unit definitions, parameters, reactions, function definitions, rules and events are translated into lists of XML elements enclosed within elements of the form listOfs, where the blank is replaced by the name of the component type (e.g., ``Reaction''). The resulting XML data object has the form illustrated by the following skeletal model:
Readers may wonder about the motivations for the listOfs notation. A simpler approach to creating the lists of components would be to place them all directly at the top level under <model> ... </model>. We chose instead to group them within XML elements named after listOfs, because we believe this helps organize the components and makes visual reading of model definitions easier. These listOfs elements are derived from SBase which enables each list to contain its own metaid, notes and annotation fields. Further details of how listOfs elements implement UML lists is described in Appendix A.
The FunctionDefinition structure associates an identifier with a function definition. The identifier can then be used in any subsequent MathML apply elements. FunctionDefinition is shown in Figure 6.
|
The FunctionDefinition structure has three fields: id, name and math. Their purposes are explained in the following subsections.
The id and name fields have types SId and string, respectively, and operate in the manner described in Section 3.3. MathML ci elements can refer to the function defined by a FunctionDefinition using the value of its id field.
The math field is a container for MathML content that defines the function. The content of this field can only be a MathML lambda element. This is the only place in SBML where a lambda element can be used. The function is only available for use in other MathML elements that follow the FunctionDefinition structure in an SBML model. (These restrictions prevent recursive and mutually-recursive functions from being expressed.) The lambda element can contain any of the elements in the MathML subset listed in Section 3.6.1 but not any further lamdba elements.
The following abbreviated SBML example shows a FunctionDefinition
structure defining
as representing
:
Units may be supplied in a number of contexts in an SBML model. The units of the following mathematical entities can be specified explicitly: constants, initial conditions, symbols in formulae and the results of formulae. Rather than having to give a complete unit definition on every structure, SBML provides a facility for defining identified units which can be reused throughout a model. In addition, by default, SBML mathematical entities have units composed from built-in units in a consistent fashion (see Sections 4.4.3, 4.5.4, 4.6.4 and 4.9.7). By redefining the built-in units it is possible to change the units used throughout a model in a simple and consistent manner.
The SBML UnitDefinition and Unit structures enable combinations of units to be given abbreviated names, and enable built-in units to be redefined. The definitions of UnitDefinition and Unit are shown in Figure 7.
|
An instance of a UnitDefinition consists of an id field of type SId, an optional string field name and a list of structures of type Unit. As mentioned in Section 3.5, unit identifiers defined by the id field are considered to be in a separate global namespace distinct from the namespace of other identifiers in a model; thus, unit identifiers cannot collide with the identifiers of species, compartments, reactions, etc.
The approach to defining units in SBML is compositional; for example,
is constructed by combining, within the same
UnitDefinition Unit list, a Unit structure representing
with a Unit structure representing
.
The Unit structure is described in the next subsection.
The Unit data structure has one required field, kind, whose value must be taken from UnitKind, an enumeration of base units. The possible values of UnitKind are given in Table 2.
Note that the set of acceptable values for the field kind does not include units defined by UnitDefinition structures. This means that the units definition feature in SBML is not hierarchial--user-defined units cannot be built on top of other user-defined units, only on top of base units. (SBML differs from CellML in this respect; CellML allows the construction of hierarchial unit definitions.)
A Unit structure represents a (possibly
transformed) reference to a base unit chosen from UnitKind. The
formula for a single transformation represented by a Unit structure
is as follows (where
is the original base unit and
is
the new unit):
The optional multiplier field can be used to multiply the kind unit by a real-numbered factor; this enables the definition of units that are not power-of-ten multiples of SI units. For instance, a multiplier of 0.3048 could be used to define ``foot'' as a measure of length in terms of a metre. The multiplier field has a default value of ``1'' (one). Finally, the offset field is used to represent the addition of a constant in the transformation of the kind unit. For example, an offset value of ``32.0'' would be needed to define Fahrenheit in terms of degrees Celsius. The offset field has a default value of ``0'' (zero).
The composition of
Unit structures within a
UnitDefinition to create more complex units involves a linear
product according to the following formula:
The following example illustrates the definition of an abbreviation named
``mmls'' for the units
:
The following example defines Fahrenheit:
There are five special unit names in SBML, listed in Table 3, corresponding to the five types of quantities or built-in units that play roles in biochemical reactions: amount of substance, volume, area, length and time. All SBML mathematical entities apart from parameters have default units. These default units are composed from the set of built-in units. Further SBML defines defaults for the built-in units, listed in the third column of Table 3, all with default scale and offset values of zero and a default multiplier value of one.
A field that defines the units for a mathematical entity (e.g., the field units on Parameter) can refer to a named unit chosen from among the following:
Within certain limits, a model may change the built-in units by reassigning the keywords ``substance'', ``length'', ``area'', ``time'', and ``volume'' in a UnitDefinition. The second column in Table 3 lists the set of units that should be used in redefining a given built-in unit.
The following example illustrates how to change the built-in units of volume to be milliliters. If this definition appeared in a model, the units of volume on all components that did not explicitly specify different units would be changed to milliliters.
Software developers are asked to pay special attention to the units used in an SBML model. Different users and developers sometimes make different assumptions about units, and these assumptions may not correspond to what is defined in SBML. Sections 3.6.3, 4.6.4 and 4.9.7 have particularly important notes about the usage of units in SBML.
A compartment in SBML represents a bounded space in which species are located. Compartments do not necessarily have to correspond to actual structures inside or outside of a cell, although models are often designed that way. The definition of Compartment is shown in Figure 8.
|
It is worth pointing out that, although compartments are optional in the overall definition of Model (see Section 4.2), every species in an SBML model must be located in a compartment. This in turn means that if a model declares any species, the model must also declare at least one compartment.
Compartment has one required field, id, of type SId, to give the compartment a unique identifier by which other parts of an SBML model definition can refer to it. A compartment can also have an optional name field of type string. Identifiers and names should be used according to the guidelines described in Section 3.3.
A Compartment structure has an optional field spatialDimensions, whose value must be a positive integer indicating the number of spatial dimensions possessed by the compartment. The maximum value of spatialDimensions is ``3'', meaning a three-dimensional structure (a volume). Other permissible values are ``2'' (for a two-dimensional area), ``1'' (for a one-dimensional curve), and ``0'' (for a point). The default value is ``3''.
Each compartment has an optional floating-point field named size, representing the total size of the compartment. The size field enables concentrations of species to be calculated in the absence of geometry information. Note in particular that in SBML Level 2, a missing size value does not imply that the compartment size is 1. (This is unlike the definition of compartment volume in SBML Level 1.) The size field must not be present if the spatialDimensions field is has a value of ``0''. When the spatialDimensions field does not have a value of ``0'', a missing value for size for a given compartment signifies that the value is either unknown, is determined by an assignment rule, not required for analysis or available from an external data source.
The units associated with the compartment's size value may be explicitly set using the optional field units. The value chosen for this field must be either one of the base units from Table 2, or the built-in units ``volume'', ``area'', ``length'' or ``dimensionless'', or a new unit defined by a unit definition in the enclosing model. The type of units assigned to the units field must also agree with the number of spatial dimensions of the compartment; that is, they must be units of volume if the value of spatialDimensions is ``3''; they must be units of area if the value of spatialDimensions is ``2''; they must be units of length if the value of spatialDimensions is ``1''; and they must be ``dimensionless'' if the value of spatialDimensions is ``0''. The default units depends on the value of the compartment's spatialDimensions field according to the following rule: for spatial dimensions of 3, 2, 1 or 0, the compartment has the default units of volume, area, length and dimensionless, respectively. (See Table 3 and Table 2.)
The units of the compartment size, as defined by the units field, are used in the following ways:
A Compartment also has an optional boolean field called constant that indicates whether the compartment's size stays constant or can vary during a simulation. A value of ``false'' indicates the compartment's size can be determined by rules (see Section 4.8), and the value of the size field should be taken as being the initial size of the compartment. The default value for the constant field is ``true'' because in the most common modeling scenarios at the time of this writing, compartment sizes remain constant. The constant field must default to or be set to ``true'' if the spatialDimensions field is 0.
The optional field outside of type SId can be used to express containment relationships between compartments. If present, the value of outside for a given compartment must be the name of another compartment enclosing it, or in other words, the compartment that is ``outside'' of it. This enables the representation of simple topological relationships between compartments, for those simulation systems that can make use of the information (e.g., for drawing simple diagrams of compartments).
Although containment relationships are partly taken into account by the compartmental localization of reactants and products, it is not always possible to determine purely from the reaction equations whether one compartment is meant to be located within another. In the absence of a value for outside, compartment definitions in SBML Level 2 do not have any implied spatial relationships between each other. For many modeling applications, the transfer of substances described by the reactions in a model sufficiently express the relationships between the compartments. (As discussed in Section 6.1, we expect that SBML Level 3 will introduce the ability to define geometries and spatial qualities.)
The following example illustrates two compartments in an abbreviated SBML example of a model definition:
The following is an example of using outside to model a cell membrane. To express that a compartment named B has a membrane that is modeled as another compartment M, which in turn is located within another compartment A, one would write:
The term species refers to chemical entities that take part in reactions. These include simple ions (e.g., protons, calcium), simple molecules (e.g., glucose, ATP), large molecules (e.g., RNA, polysaccharides, and proteins), and others. The Species data structure is intended to represent these entities. Its definition is shown in Figure 9.
|
As with other major structures in SBML, Species has a mandatory field, id, used to give the species an identifier. The identifier must be a text string conforming to the syntax permitted by the SId data type described in Section 3.4. Species also has an optional name field, of type string. The name and id fields should be used as described in Section 3.3.
The required field compartment, also of type SId, is used to identify the compartment in which the species is located. The field's value must be the identifier of an existing Compartment structure. It is important to note that there is no default value for the compartment field on Species; every species in an SBML model must be assigned a compartment, and consequently, a model must define at least one compartment if that model contains any species.
The optional fields initialAmount and initialConcentration, both having a data type of double, are used to set the initial quantity of the species in the named compartment. These fields are mutually exclusive; i.e., only one can have a value on any given instance of a Species structure. Also, initialConcentration must not have a value if the species' compartment has a spatialDimensions value of ``0'' or if the value of the species' hasOnlySubstanceUnits field is ``true''. Missing initialAmount or initialConcentration values implies that their values are either unknown, set by an assignment rule, not required for analysis or available from an external data source.
The units of the value in the initialAmount field is that given by the substanceUnits field of the species structure. The units of the value in the initialConcentration field are the units of the species as described in the next subsection.
The units associated with a species' quantity, referred to as the units of the species, are determined via the optional fields substanceUnits, spatialSizeUnits and hasOnlySubstanceUnits.
hasOnlySubstanceUnits is a boolean field which defaults to ``false''. The units of the species are of the form substance/size units (i.e., concentration units, using a broad definition of concentration) if the compartment's spatialDimensions is non-zero and hasOnlySubstanceUnits has the value ``false''. The units of the species are of the form substance if spatialDimensions is zero or hasOnlySubstanceUnits has the value ``true''. The units of substance are those defined in the substanceUnits, and the size units are those given in the spatialSizeUnits field.
For both substanceUnits and spatialSizeUnits, the value chosen must be either a base unit from Table 2, a built-in unit from Table 3, or a new unit defined by a unit definition in the enclosing model. The chosen units for substanceUnits must be a variant of mole or item units. The substanceUnits field defaults to the the built-in unit ``substance'' shown in Table 3.
The type of units assigned to the spatialSizeUnits field must agree with the number of spatial dimensions of the species' compartment. Specifically, they must be units of volume if the value of the compartment's spatialDimensions is ``3''; they must be units of area if the value of spatialDimensions is ``2''; and they must be units of length if the value of spatialDimensions is ``1''. The spatialSizeUnits must not have a value if spatialDimensions on the compartment has a value of ``0'', or if the species' hasOnlySubstanceUnits field has a value of ``true''. The default value of the spatialSizeUnits is the value of the units field of the species' compartment.
The units of the species are used in the following ways:
The Species structure has an optional boolean field named constant used to indicate whether the concentration of that species can vary during a simulation. The default value is ``false'', indicating that the species' concentration can be determined by reactions and rules.
Another optional field defined for Species is boundaryCondition. By default, when a species is a product or reactant of one or more reactions, its concentration is determined by those reactions. In SBML, it is possible to indicate that a given species' concentration is not determined by the set of reactions even when that species occurs as a product or reactant; i.e., the species is on the boundary of the reaction system but is a component of the rest of the model. The boolean field boundaryCondition can be used to indicate this. The value of the field defaults to ``false'', indicating the species is part of the reaction system. Table 4 shows how to interpret the combined values of the boundaryCondition and constant fields. In practice, a boundaryCondition value of ``true'' means a differential equation derived from the reaction definitions should not be generated for the species. The example model in section 5.5 contains all four possible combinations of the boundaryCondition and constant fields on species elements. Section 5.6 contains a translation into ODEs of a model which uses boundaryCondition and constant fields.
The optional field charge takes an integer
indicating the charge on the species (in terms of electrons, not the SI
unit coulombs). This may be useful when the species is a charged ion such
as calcium (
Ca
).
The following example shows two species definitions within an abbreviated SBML model definition. The example shows that species are listed under the heading listOfSpecies in the model:
A Parameter structure is used to declare a variable for use in mathematical formulae in an SBML model definition. By default, parameters have constant value for the duration of a simulation and for this reason are called ``parameters'' instead of variables in SBML. The definition of Parameter is shown in Figure 10.
|
Parameter has one required field, id, of type SId, to give the parameter a unique identifier by which other parts of an SBML model definition can refer to it. A parameter can also have an optional name field of type string. Identifiers and names should be used according to the guidelines described in Section 3.3.
The optional field value determines the value (of type double) assigned to the identifer. A missing value implies that the value is either unknown, determined by an assignment rule, not required for analysis or available from an external data source. If the parameter is not constant then the value field contains the initial value.
The units associated with the value of the parameter are specified by the field units. These units are used when the parameter identifier appears in MathML expressions and in AssignmentRule structures setting the value of the parameter. A RateRule structure that may determine the value of the parameter has units parameter units/time, where parameter units are the units assigned to the parameter and time is the built-in time units. The value assigned to the parameter's units field must be chosen from one of the following possibilities: one of the base unit names from Table 2; one of the built-in unit names appearing in first column of Table 3; or the name of a new unit defined in the list of unit definitions in the enclosing Model structure. There are no constraints on which units can be chosen from these sets. There are no default units for parameters.
The Parameter structure has an optional boolean field named constant which indicates whether the parameter's value can vary during a simulation. The field's default value is ``true''; a value of ``false'' indicates the parameter's value can be changed by rules (see Section 4.8) and the value is actually intended to be the initial value of the parameter.
Parameters can be defined in two places in SBML: in lists of parameters defined at the top level in a Model structure and within individual reaction definitions (as described in Section 4.9). Parameters defined at the top level are global to the whole model; parameters that are defined within a reaction are local to the particular reaction and (within that reaction) override any global parameters having the same names (See Section 3.5 for further details). Parameters local to a reaction cannot be changed by rules and therefore are implicitly always constant; thus, parameter definitions within Reaction structures should not have their constant field set.
The following is an example of parameters defined at the Model level:
Rules provide a way to create constraints on variables for
cases in which the constraints cannot be expressed using reactions
(Section 4.9) nor the assignment of an initial value to a
component in a model. There are two orthogonal dimensions by which rules
can be described. First, there are three different possible functional
forms, corresponding to the following three general cases (where
is a
variable,
is some arbitrary function,
is a vector of variables that
does not include
, and
is a vector of variables that may include
):
| Algebraic | left-hand side is zero: | |
| Assignment | left-hand side is a scalar: | |
| Rate | left-hand side is a rate-of-change: |
|
The second dimension concerns the role of variable
in the equations
above:
can be the identifier of a compartment (to set its size), a
species (to set its concentration), or a parameter (to set its value).
In their general form given above, there is little to distinguish between assignment and algebraic rules. They are treated as separate cases for the following reasons:
The approach taken to covering these cases in SBML is to define an abstract Rule structure containing only one field, math, to hold the right-hand side expression, then to derive subtypes of Rule that add fields to distinguish the cases of algebraic, assignment and rate rules. Figure 11 gives the definitions of Rule and the subtypes derived from it. The figure shows there are three subtypes, AlgebraicRule, AssignmentRule and RateRule derived directly from Rule. These correspond to the cases Algebraic, Assignment and Rate described above respectively.
|
The rule type AlgebraicRule is used to express equations that are neither assignments of model variables nor rates of change. AlgebraicRule does not add any fields to the basic Rule; its role is simply to distinguish this case from the other cases. An example of the use of AlgebraicRule structures is given in Section 5.4.
The rule type AssignmentRule is used to express equations that set the values of variables. The left-hand side (the variable field) of an assignment rule can refer to the identifier of a species, compartment, or parameter. Two or more RateRule or AssignmentRule structures cannot have the same left-hand side or variable field value in an SBML model definition. In all cases, as would be expected, the units of the formula representing the right hand side, the math field, are identical to the units associated with the left hand side, the variable field, when that variable appears in other formulae.
The effects of an AssignmentRule structure are in general terms the same, but differ in the precise details depending on the type of variable being set:
Restrictions: In a given SBML Level 2 model, there cannot be both a AssignmentRule variable field and a SpeciesReference species field having the same value. (See Section 4.9 for the definition of SpeciesReference.) This means an assignment rule cannot be defined for a species that is created or destroyed in a reaction. The only exception is when the given species is a boundary condition; i.e., on the Species structure the boundaryCondition field is set to ``true''.
The rule type RateRule is used to express equations that determine the rates of change of variables. The left-hand side (the variable of a rate rule) can refer to the identifier of a species, compartment, or parameter. Two or more RateRule or AssignmentRule structures cannot have the same left-hand side or variable field value in an SBML model definition. In all cases, as would be expected, the units of the formula representing the right hand side, in the math field, are of the form x/time where x are the same units as associated with the symbol in the variable field, when that variable appears in other formulae. time is a built-in unit (see Section 4.4). The effects of a RateRule are in general terms the same, but differ in the precise details depending on which variable is being set:
Restrictions: In a given model, there cannot be both a SpeciesReference species field and a RateRule variable field having the same value. (See Section 4.9 for the definition of SpeciesReference.) This means an assignment rule cannot be defined for a species that is created or destroyed in a reaction. The only exception is when the given species is a boundary condition; i.e., on the Species structure that defines the species the boundaryCondition field is set to ``true''.
SBML specifically does not stipulate the form of the algorithms that can be applied to rules and reactions. For example, SBML does not specify when or how often rules should be evaluated. The constraints described by rules and kinetic rate laws are meant to apply collectively to the set of variable values for a specific instant in time.
The ordering of assignment rules is significant: they are always evaluated in the order given in SBML.
No more than one assignment or rate rule can be defined for a given identifier. No assignment or rate rule can be defined for an identifier whose corresponding structure has the field constant set to true.
An assignment rule for a given identifier overrides the initial value assigned to that identifier; i.e., the initial value should be ignored. This does not mean that a structure declaring an identifier can be omitted if there is an assignment rule for that identifier. For example, there must be a Parameter structure for a given parameter if there is a rule for that parameter.
The math field of an assignment rule structure can contain any identifier in a MathML ci element except for the following: (a) identifiers for which there exists a subsequent assignment rule, and (b) the identifier for which the rule is defined. These constraints are designed to eliminate algebraic loops among the scalar rules; eliminating algebraic loops ensures that assignment rules can be evaluated any number of times without the result of those evaluations changing. As an example, consider the following equations, in the order shown:
This section contains an example set of rules. Consider the following set of equations:
![]() |
A reaction represents any transformation, transport or binding process, typically a chemical reaction, that can change the amount of one or more species. In SBML, a reaction is defined primarily in terms of the participating reactants and products (and their corresponding stoichiometries), along with optional modifier species, an optional kinetic law describing the rate at which the reaction takes place, and optional parameters entering into the kinetic law. These various parts of a reaction are recorded in the SBML Reaction type defined in Figure 12.
|
As with most other main structures in SBML, the Reaction data structure includes a required id and an optional name. These must be used according to the guidelines described in Section 3.3.
The reactant species, product species and modifier species in a reaction are described using the fields reactant, product and modifiers, respectively. These fields are optional lists of SpeciesReference and ModifierSpeciesReference structures, as shown in Figure 12. They are described in more detail in Sections 4.9.5 and 4.9.6 below. The abstract type SimpleSpeciesReference is shown simply to demonstrate the common field, species, of the SpeciesReference and ModifierSpeciesReference structures. In future levels of SBML it is anticipated that SimpleSpeciesReference will have additional fields.
The optional boolean field reversible indicates whether the reaction is reversible. The field is optional, and if left unspecified in a model, it defaults to a value of ``true''. Although the reversibility of a reaction is determined by its rate law, the need to allow rate expressions in SBML to be optional leads to the need for a flag indicating reversibility. Information about reversibility in the absence of a KineticLaw in a Reaction is useful in certain kinds of structural analyses such as elementary mode analysis. It is true that the presence of this information in two places (i.e., the rate expression and the flag reversible) leaves open the possibility of a model containing contradictory information, but the creation of such a model would indicate an error on the part of the software generating it. Software developers must take care to guard against logical contradictions in the definitions of reactions.
The optional boolean field fast is another boolean field in the Reaction data structure; a value of ``true'' signifies that the given reaction is a ``fast'' one compared to others in the system being modeled. This may be relevant when computing equilibrium concentrations of rapidly equilibrating reactions. Simulation/analysis packages may chose to use this information to reduce the number of ODEs required and thereby optimize such computations. (A simulator/analysis package that has no facilities for dealing with fast reactions can ignore this field. In theory, if the choice of which reactions are fast is correctly made, then a simulation performed with them should give the same results as a simulation performed without fast reactions. However, currently there appears to be no single unambiguous method for designating which reactions should be considered fast, and some users may designate a reaction as fast when in fact it is not.) The fast field does not have a default value: a missing value indicates the modeller does not know or wish to specify the rate of the reaction relative to other reactions in the model.
Every species that enters into a given reaction must appear in that reaction's lists of reactants, products or modifiers. In an SBML model, all species that participate in any reaction are listed in the listOfSpecies field of the top-level Model data structure (see Section 4.2). Lists of products, reactants and modifiers in Reaction structures do not introduce new species, but rather, they refer back to those listed in the model's listOfSpecies. For reactants and products, the connection is made using the SpeciesReference data structure defined in Figure 12.
In SpeciesReference, the field species of type SId must refer to the name of an existing species defined in the enclosing Model structure. The stoichiometry for the product or reactant can be specified using either the stoichiometry or stoichiometryMath fields of the SpeciesReference structure. The stoichiometry field is of type double and should contain values greater than 0. The stoichiometryMath field is implemented as an element containing a MathML math expression in dimensionless units. Only one of the stoichiometry and stoichiometryMath fields should be used on a given SpeciesReference structure. When neither field is present then the stoichiometry associated with the SpeciesReference structure is ``1''.
When generating SBML Level 2, it is recommended for maximal interoperability that the stoichiometry field be used in preference to the stoichiometryMath field and that the stoichiometry field contains integer values. Parsing software should expect and handle appropriately all possible values of the stoichiometry and stoichiometryMath fields including, for example, non-integer values for stoichiometry.
The following is a simple example of a species reference for species ``X0'', with stoichiometry 2, in a list of reactants within a reaction named ``J1'':
The following is a more complex example of a species reference for species ``X0'', with a stoichiometry expression consisting of the parameter x: