FAQ
Date of last content update: 2008-6-26
This Frequently Asked Questions (FAQ) document answers to some frequent questions about the Systems Biology Markup Language (SBML). This is a non-normative document that does not define any aspect of SBML; rather, it is intended to provide additional information in an easily accessible and readable form.
General questions about SBML
What is SBML?
The short answer is this: the Systems Biology Markup Language (SBML) is a machine-readable exchange format for computational models of biological processes. Its strength is in representating phenomena at the scale of biochemical reactions, but it is not limited to that. By supporting SBML as an input and output format, different software tools can operate on the same representation of a model, removing chances for errors in translation and assuring a common starting point for analyses and simulations.
A slightly longer (but still relatively short) answer can be found in the separate Basic Introduction to SBML.
What kind of models can you represent in SBML?
This question is difficult to answer directly. One way to get a sense for what can be represented in SBML is to look at the kinds of models that have been represented in SBML. A good starting place for that is BioModels Database.
However, a lot depends on how a modeler chooses to express a model. A common abstraction used when describing cellular phenomena is to describe the system as a set of chemical entities linked by processes (reactions) that can transform one entity into another or transport entities between compartments. A compartment in SBML is a location having a defined size or extent (which may be in terms of volume, area, length, or a point). Every chemical species in an SBML model must be located in a compartment. It is worth noting that compartments do not have to map one-to-one to biological structures; compartments can be conceptual too. But SBML is by no mean limited to encoding biochemical reactions. One can encode any mathematical rule linking quantitative characteristics of the biological system, including, but not limited to, electrical behaviour, growth etc. SBML can also describe discrete events that are triggered by state changes in the modeled system.
How is SBML different from BioPAX?
While BioPAX is meant ot facilitate the exchange of biological pathways, SBML has been designed to facilitate exchange and reuse of quantitative models, not necessarily limited to the biochemical events. SBML models contain information about sizes, amounts and kinetics, that cannot be expressed with BioPAX. Conversely, BioPAX being an ontology, one can define much more precisely the identity of the objects considered, whether physical entities or biochemical events. In SBML, this information may be encoded using annotation with terms from the Systems Biology Ontology . Although SBML and BioPAX do not fulfill the same purpose, it is nevertheless possible to convert one into another. Examples of tools providing this service are BiNoM and BioModels Database.
How is SBML different from CellML?
CellML is another format to encode quantitative models, based on XML like SBML. CellML is being developed by the Bioengineering Institute at the University of Auckland and collaborating groups. The chief differences between CellML and SBML can be perhaps described in the following ways. While a model encoded in SBML is based on the successive, hierachical, declarations of model constituents, a CellML model is built as a network of components. A component can contain variables, mathematical expressions, metadata etc. In CellML, the biological information is entirely stored in metadata rather than the language elements. In SBML, the language elements were more directly influenced by present-day biochemical network simulation software, and the mathematical expressions are more constrained than what is permitted in CellML's subset of MathML.
Although SBML and CellML cannot be fully interconverted at the moment, it is nevertheless sometimes possible. Examples of tools providing this service can be found at http://www.ebi.ac.uk/biomodels/download.html.
Is SBML just an XML format?
Yes and no. The primary encoding of SBML is indeed XML, a popular text-based language for expressing structured data in a generic fashion. However, a design goal of SBML has always been to define it in terms of a language-independent formalism (specifically, using UML) and then map that to XML, so that mappings to other formats may be easier.
Isn't SBML too complicated to write?
Don't write SBML by hand. Instead, use software tools that provide higher-level interfaces to reading, writing, and manipulating SBML. Some provide graphical user interfaces, while others provide textual interfaces where you can write models in terms of chemical reactions. Take a look at our SBML Software Guide for help finding a tool that may be suitable for your needs.
Where is SBML defined?
The Systems Biology Markup Language is formally defined in the specification documents.
Where can I get working SBML models?
BioModels Database provides a database of hundreds of published models in SBML format. The models in the database have been checked by humans to correspond to the publication and have been annotated with links to other data resources to make searching easier.
What are the SBML Levels?
Levels in SBML are a way of managing complexity in the continued evolution and enhancement of the language. SBML is being developed in a series of levels, where each level adds new features and fixes problems with the previous level. The lowest-numbered levels provide fundamental features that are common to all biochemical network models. Higher-numbered levels add more features that are specific to particular classes of tools. Any level can be used as a standard for interchanging models.
What are the differences between Levels 1 and 2?
The changes in SBML Level 2 include: replacing SBML Level 1's text-string based format for mathematical expressions with a subset of MathML (a W3C standard), introducing support for metadata, introducing support for named function definitions, introducing explicit modifier species such as catalysts in reactions, and introducing new constructs for discrete events and time delays. In Version 2 of Level 2, additional major changes include new constructs for types of species, types of compartments, initial assignments, constraints, and a standard approach for annotating model components with cross-references to terms from ontologies and controlled vocabularies. In Version 3 of Level 2, a number of small but important corrections were introduced, the consistency of the unit system was improved, and the UML notation in the specification document was much improved in clarity.
Why is Level 1 still being kept around if Level 2 exists?
There exist tools that either were developed before the creation of SBML Level 2 or for which Level 1 is more appropriate. SBML Level 1 therefore continues to have relevance even with the existence of Level 2.
Note that since all Level 1 models can be translated to SBML Level 2, tools that read SBML Level 2 can be made to support Level 1 reasonably easily. Moreover, the availability of libSBML makes it much easier for application developers to support different SBML levels in software applications. Among other features, libSBML has a built-in Level 1 to Level 2 translation facility.
Are there tutorials about SBML?
The SBML Team occasionally puts on tutorials at conferences such as the International Conference on Systems Biology (ICSB), as well as topic-specific tutorials at SBML workshops such as the SBML Hackathons. Please check the Events page and the News page on SBML.org for information about possible upcoming events. Slides and other materials are available online on the SBML.org website.
What is the MIME type for SBML?
The MIME media subtype for SBML is application/sbml+xml and it is defined by RFC 3823 . The goal of defining a MIME type for SBML is to enable applications to recognize files and data streams as being in SBML format by virtue of being tagged with the SBML MIME type.
Questions about software support of SBML
Is there a list of software packages that support SBML?
The list of tools supporting SBML has grown to over 120 at the time of this writing. It is no longer feasible to maintain a list in this FAQ. Instead, we refer readers to the SBML Software Guide, which provides links to known software packages supporting SBML. The guide provides both a compact feature matrix as well as a longer annotated overview.
Where can I find certified SBML software?
There is no certification process for software today. As of this writing (5 May 2008), the SBML Team is hard at work on a comprehensive SBML Test Suite. This Suite will make it possible to test SBML support objectively and will help assess the degree of SBML support in different software packages.
However, it is unlikely there will ever be a full "certification" mechanism for SBML. The development and support of SBML is funded primarily by government grants, and we simply do not have the resources it would take to run a true certification process of the sort common in industry.
Are software libraries available for programming with SBML?
Yes. The SBML Software Guide includes information about known libraries for programming SBML support. The SBML Team itself has developed 3 free and open-source packages that can be used to support SBML in different environments:
- libSBML is a portable, embeddable API library providing language interfaces for C, C++, Java, Lisp, MATLAB, Octave, Perl, Python, and Ruby. It runs on Linux, MacOS and Windows.
- MathSBML is a package for working with SBML in Mathematica.
- SBMLToolbox is a package for working with SBML in MATLAB.
Which level of SBML should I use in my software?
We recommend supporting the highest SBML Level that your software can support, because higher levels tend to fix design problems in lower levels of SBML. However, if your software cannot support some of the features of a higher Level of SBML, a lower SBML Level may be more suitable. Note that within Levels of SBML, you should always support the highest Version of the specification for that Level.
What if I can't encode some feature that my software has?
You can try storing the data in SBML's <annotation> elements. These
are described in some detail in the SBML
Level 1 and Level 2 specification documents. The
<annotation> elements can be enclosed within any SBML
element and can contain elements of any namespace. Note that data stored
in annotations should not contain data that could be or is encoded already
in SBML.
How should I structure annotations?
The annotation data enclosed in a specific SBML element is assumed by other
applications to be directly associated with that specific element.
Therefore, it is important to decompose and locate annotation data
appropriately in an SBML document. Avoid encoding all your annotations in
a single top-level attribute. The data associated with, for example, an
individual species in a model should be encoded in the
<annotation> element enclosed within the SBML
<species> element representing that species in the
SBML file.
How should I include database identifiers such as ChEBI identifiers?
Annotations involving database identifiers can be created using the scheme described in Section 6 of the SBML Level 2 Version 3 specification. The approach involves using RDF annotations and specific BioModel elements and qualifiers detailed in the SBML specification. You can find examples of models using this approach in BioModels Database.
What should my software do when it encounters incorrect SBML?
Although an application can't be expected to detect all possible errors in an SBML document, it should do as much as it can to detect errors of syntax and self-consistency. Such errors indicate that something is clearly wrong and that whatever (or whoever) wrote the model made an error. You may want to double-check the validity of the model by testing it with the online SBML Validator. If the SBML file fails, the model should be rejected because it cannot be used as-is. (Incidentally, if you encounter consistent differences between an SBML specification and a software package that claims to be compliant with that specification, please report this to the sbml-interoperability mailing list.)
Detecting and handling incorrect SBML is different from detecting and handling an invalid model encoded in SBML.
How can I test whether I've implemented SBML support as intended?
The SBML Test Suite will provide a large set of input files and corresponding results, and allow you to test your software's implementation of SBML handling. The SBML Test Suite is currently under development and we expect to release iti publicly in August 2008.
Questions about SBML features and their use
Why are non-biochemical features such as explicit equations included in SBML?
The aim of SBML is to enable the construction of quantitative models that describe both the activity of biochemical networks and interaction of biochemical networks and other phenomena. SBML allows the declaration of variables (non-constant parameters) and associated ODEs and DAEs to describe these phenomena. Examples of these phenomena include the mechanical force generated by muscle cells and the electrical potential across a synapse.
Why use MathML? It's much more complicated than text strings
Here is a partial list of motivations for why the switch to MathML was made in SBML Level 2, in no particular order:
- The list of operators available in the text-string formula notation of Level 1 was judged to be limited. People wanted to expand the mathematical vocabulary to include additional functions (both built-in and user-defined), mathematical constants, logical operators, relational operators and a special symbol to represent time. Rather than growing the simple C-like syntax of Level 1 into something more complicated and esoteric in order to support these features, and consequently having to manage two standards in two different formats (XML and text string formulas), we chose to leverage an existing standard for expressing mathematical formulas in Level 2: the content portion of MathML.
- There is no standard text-string formula syntax to choose from. The notation in Level 1 was inspired by C, but as many people have pointed out repeatedly, there are differences, and these differences need to dealt with by software tools parsing the infix notation. Thus, this particular problem exists no matter what notation/encoding you choose—the infix text-string notation didn't offer an advantage in this particular regard. Now imagine if we had to grow the syntax to accommodate more operators, user-defined functions, etc. Even more people would complain about differences due to a non-standard mathematical syntax.
- Related to the above: using MathML means we can avoid having to define reserved words for various language features, such as the time symbol and the delay function. MathML has a mechanism for introducing special terms and operators without having to define new identifiers in the language. Without MathML, we would have had to choose arbitrarily an identifier for each of those quantities, and every new one that was deemed important in the future. Parsing and generating expressions using these identifiers would be problematic in tools that used different built-in symbol values (for example, if a tool uses 't' instead of 'time' for the time symbol).
- Using MathML allows us to extend SBML without introducing new non-XML syntax. For example if we wanted to introduce some form of modularity we might want a '.' operator in expressions to reference components of submodel instances. We could agree on the introduction of a MathML operator to do this which would be tool neutral rather than again creating an arbitrary syntax, that tools would have to parse, which may or may not be similar to that used within the tools.
- Whether you parse formulas written as text strings, or parse formulas written as MathML, your software still needs to build up expression trees. Once that's done, there is in principle not much difference between the two.
- MathML is proper XML, which means that tools using XML parsers can work with it directly. Authors do not have to write a different kind of parser for the text-string infix syntax; they can use a generic XML parser if they wish. Further, libraries specialized for MathML could be used by software developers, possibly saving development time and effort. (Of course, the use of libSBML isolates software tools from all this even further.)
- Making SBML all-XML means that SBML is more amenable to tools that can process, manipulate and store XML, such as (e.g.) XSLT, XQuery, XPath, and other XML technologies. To give an example of the power of this, it has made it possible to write XSLT transformations to take CellML 1.1 to SBML Level 2. It would have been difficult to construct text-string formulas from CellML reaction definitions using XSLT transformations.
All that said, there are some disadvantages to using MathML in SBML. One is that by introducing MathML part-way through the evolution of SBML, we have created a legacy support problem by having two formula representations with which to contend and interconvert. Another is that people perceive MathML to require greater effort to support, but whether this is true in practice depends on the underlying system. For some applications, it is actually easier to parse and handle MathML than a text-string representation of mathematical formulas, because the MathML expression structure is already made explicit and can be read using available XML software.
The SBML notion of a species seems peculiar, doesn't it?
Well, no, or yes, depending on your definition of "peculiar".
The SBML construct called species represents a pool, that is, a set of "things" that are treated as being indistinguishable from the standpoint of the processes (reactions) in which they participate. When the "same" species (a chemical or other thing) is present in different compartments, each must be treated as a different pool. The reason for this is because the concentrations or partial pressures being different in the various compartments means that the chemical activities are different as well. Also, the pH of different compartments being different, the electrochemical properties of a given chemical entity could be different (think about an enzyme in the cytosol and a lysosome). Analytical software will therefore have to construct different state variables for the different pools, even if the pools contain the same kind of "thing". This is actually a common concept in biochemical simulation, dating back to some of the earliest simulation software.
If you need to express a link between species with different identifiers, you can use the species type construct available since SBML Level 2 Version 2.
Can I have two species with the same name attribute value?
Yes, this is perfectly legal SBML. Of course, you would only want to do that if the species are actually the same conceptual type of entity—you wouldn't want to give the same names to, say, glucose-6-phosphate and ATP in a model, because it wouldn't make any sense.
Species and compartment identifiers in SBML refer to "things" that can participate in dynamical behaviors, but each identifier does not have to refer to a single unique entity. It is possible that the same conceptual entity appears in multiple contexts in a model. Since a species must be given a unique identifier in each compartment in which it appears (see the answer to the previous question for an explanation of why), it is convenient to give the species definitions all the same names. It will usually make more sense to humans that way, and software can track the separate amounts of species in the different compartments by their identifiers.
Note that beginning with SBML Level 2 Version 2, there are explicit constructs for species types and compartment types. If you are using names to convey the idea that different entities are the same conceptual "thing" despite having different identifiers, you may want to indicate the relationship more strongly by defining common species types or compartment types, and then declaring the species/compartments to be of the appropriate types.
Why doesn't SBML Level 2 define a default compartment?
Software developers are sometimes bother by the fact that SBML does not specify a default compartment; all compartments in SBML must be defined explicitly. There are several reasons for this:
- A model that uses a single unit-volume compartment is making explicit an important underlying assumption about the model. Leaving it implicit would be more prone to errors.
- SBML would have to define a reserved identifier to refer to the default compartment. This is a recipe for an eventual identifier collision when someone, somewhere, accidentally uses the same identifier.
- A default compartment would only save effort in developing the SBML writing component of a software tool. The writing component is the easy part; reading and interpreting is the harder part. Defining a default compartment would not help readers much, if at all.
- A default compartment would be a special case which all SBML parsing programs would have to handle specially.
How do you represent models that don't define a compartment?
It will be necessary to create a compartment in the SBML representation of the model. One approach is to locate all species in a single compartment with unit volume. The default units system of SBML will ensure that this unit volume representation is exactly equivalent to a model dealing with concentrations, including rate laws defined in substance/volume/time units.
When making changes like this to accommodate SBML requirements, it is a good idea
to write a note (perhaps stored in a <notes>
element inside the top-level <model> element) explaining what
has been done. This will help future readers of the SBML file to understand why certain
choices were made.
Why is there a distinction between "assignment" and "algebraic" rules? Aren't they equivalent?
Although it is typically easy to transform between assignment and algebraic rules, SBML provides separate constructs for them, for the following reasons:
- Algebraic rules define the point in the model where there is a circular dependency between variables. For instance, the equations x = 2 * y and y = x + 1 have a circular dependancy. It is not possible to form such a dependancy in scalar rules (see the SBML Level 2 specification). At least one of the example equations would have to be encoded as an algebraic rule in SBML.
- Many tools are not capable of supporting algebraic rules (DAEs)
- Those tools that do support algebraic rules make the distinction between assignment rules and algebraic rules.
Why can't user-defined functions be recursive in Level 2?
Functions definitions in SBML Level 2 are designed to allow them to be substituted in place of the function call operator; that is, they are deliberately defined so that software tools can treat them like macros rather than functions. This would not be possible if functions were allowed to be recursive.
Why doesn't SBML provide a way to define constants?
It does. Use the SBML parameter construct and set the attribute constant to true. See the next question.
Are you saying that parameters may not be constant in SBML? That's crazy talk!
Yes. There are at least two reasons for this:
- The object data structure defining a variable (other than species or compartment) and a constant would be nearly identical. The only difference is that one would be called constant and the other allowed to vary. SBML simply uses a more parsimonious representation involving the use of just one object, with a flag,
constant, indicating whether the symbol value is constant during a simulation.
- Some modelers and software systems actually do use the concept of time-varying parameters. See, for example, this FAQ item from SAAM II. In SAAM II, "any parameter could in fact be defined as time-varying".
And you probably thought we were just making this stuff up!
Why was the constant attribute on species/compartments/parameters introduced?
Given a model that doesn't contain algebraic rules, it is possible to infer which components (species, compartment and parameters) are meant to be variables by examining the set of scalar rules, rate rules and reactions. However, given a model containing algebraic rules, you need knowledge about which symbols are variables and which are constants to solve the system of equations. The occurence of a symbol in an algebraic rule doesn't imply that the symbol is a variable.
Why can't you assign different units of time to (e.g.) event delays?
SBML Level 2 Version 1 provided this capability. It defined unit attributes on various SBML components such as kinetic laws and event delays, letting a model redefine units for individual quantities. Unfortunately, this turned out to introduce serious practical problems. First, one could construct models in which it was impossible, without additional information, to convert quantities to the same consistent units throughout the model (a necessary prerequisite to constructing a system of equations from the model definition). Second, and in practice more important, the freedom to reassign units in so many different contexts may have been convenient for model writers, but it made it hugely more difficult for model readers to interpret a model—it placed a large burden on the software interpreting a model. And third, it was much more error prone, with modelers creating models where they did not realize they had made unintended errors in unit consistency.
SBML Level 2 Version 2 removed most of the places where units could be redefined on individual components, but left some (notably, the time units on event delays). SBML Level 2 Version 3 further removed these attributes. These actions were taken based on the experiences of SBML users and developers. (See, for example, this discussion thread from 2005.)
A parameter has no units declared; what units does it have?
SBML assumes that the parameter has the units appropriate for its use within a model. In some cases it may be possible to derive these units from a mathematical expression using the parameter; assuming that the units of all other parts of the expression are known.
However, if parameters with undeclared units are used, it makes checking unit consistency difficult - if not impossible. It is therefore advisable, where possible, to include units for parameters within a model.
I want to use fractional exponents on units, how can I do this?
The SBML unit construct restricts the attribute exponent to an integer value. Thus, it is not possible to explicitly declare a unit with a fractional exponent. There are also restrictions on the units of expressions to which power or root functions may be applied. These restrictions are required to ensure that parameters and mathematical expressions used within SBML are physically sensible.
It is possible to overcome the restrictions by declaring additional parameters, with appropriate units, that can be used to normalise values within expressions. For example, consider an expression such as
where [] denotes a concentration. This would not be a valid expression within SBML since it produces intermediate units of concentration1/2. To correctly encode this, declare a parameter p, with value 1 and units equal to the units of concentration. Using this parameter and rewriting the expression as
produces the same numeric result, whilst preserving physically sensible units at all stages of the calculation.
Does the 'same units' in assignments mean dimensionally or actually equivalent?
It means they must actually be the same!
There are several constructs in SBML where a mathematical expression can be used to assign value to a variable (species, compartment or parameter) within the model. The specification states that the units of both sides of the equation should be the same. This refers to the actual physical unit, not the dimensionality—metre is not the same as foot !
Questions about the SBML development process
Where did the name "SBML" come from?
When SBML was first conceived, around the year 2000, Hiroaki Kitano suggested the name Systems Biology Markup Language. The name stuck.
What is the overall SBML development process?
SBML development has been and continues to be motivated and directed by the systems biology community. The process is managed by the SBML Editors (see next question), but they do so under the control of the community. The editors collect proposals for changes to SBML from the SBML Working Groups and from other groups and individuals, and then seek to establish a consensus in the community about how to proceed with the proposals. With this information, the editors assemble some of the proposals into a draft specification for a new edition of SBML. After this draft has been reviewed by the community, it becomes a final specification for the new edition of SBML. (Edition in this context can be either a new SBML Level, or a new version of an existing level, or a new release of an existing version.)
Who are the SBML Editors?
The SBML Editors are listed on a separate page on the sbml.org website.
What are the "SBML Forum" Meetings?
These are biannual face-to-face meetings of the SBML community. The formal title of the meetings is the Workshops on Software Platforms for Systems Biology. They are held as satellite workshops of the annual International Conference on Systems Biology (ICSB), usually in the fall or early winter of every year. SBML Forum meetings allow for significant discussion of new SBML proposals and interoperability issues. Presentations and other materials of every meeting are archived in the Events area of the SBML.org website.
Why isn't SBML part of a standards process like OMG?
Some time ago, the SBML Editors at the time considered submitting SBML as a proposal to the Object Management Group (OMG) in response to a request for proposals (RFP) for pathways representations. However, the SBML community decided at the 7th Forum meeting that while it would be useful to have the endorsement of a standards body like the OMG, people's time and resources would be better spent working on SBML development rather conforming to all the standards requirements of the OMG process.
This does not rule out the possibility of seeking standards-body recognition sometime in the future.
How do I propose changes or additions for SBML Level X?
There are several ways, with the first one below being preferred because it's the quickest and easiest:
- Start a discussion on the sbml-discuss list/forum. This is sure to provoke a response
. Doing so also helps find out whether the capability is not already in SBML in some other form, because someone will point out if it is.
- You can also attend an SBML event, in particular the annual SBML Forum meetings, where proposed changes to SBML are a major discussion topic.
- Finally, if you are shy or just want to pose a question in advance of making public statements, you can send email to the SBML Editors.
SBML development is too slow—can't it be faster?
This is a can't-win situation. The archives of the sbml-discuss mailing list as well as anecdotes from SBML workshops show that for every person who complains about SBML development being too slow, there is another who complains SBML is changing too rapidly. It seems impossible to please everyone.
Where does the funding come from to keep SBML development going?
The initial development of SBML from its inception through the year 2003 was principally funded by the Japan Science and Technology Agency under the ERATO Kitano Symbiotic Systems Project headed by Hiroaki Kitano. Many agencies and commercial organizations supported smaller parts of overall SBML development as well as workshops and travel expenses. Many more academic organizations supported people who spent considerable time working on SBML and related projects despite that it was not an official aspect of their research. Since 2003, the primary source of stable funding has been the National Institute for General Medical Sciences under grant GM070923 to Michael Hucka (Chair of the SBML Editors). A more detailed list of funding acknowledgments is available on a separate page.
Questions about the SBML website
What is the difference between sbml.org and sbml.info?
There is no difference. They are alternative names for the same website, provided as a convenience to SBML users and web searchers. We tend to refer to "the SBML site" or "the SBML portal" as being http://sbml.org, but the other address should work just as well.
Who runs sbml.org?
The SBML Team maintains the server at the California Institute of Technology in Pasadena, California, USA.
Miscellaneous questions
I have some SBML that hasn't been formatted nicely. Is there a way to clean it up?
LibSBML includes a demo program that simply echoes whatever SBML is given to it, and in the process of writing the output, it does a pretty reasonable job of pretty-printing the XML.
If you want a more general solution with more control over formatting, you may want to look at HTML Tidy, a free, open-source, general-purpose pretty-printer which (despite its name) will work with XML too. It can be embedded into applications.


