# SBML Level 3 Core Proposal 2008 08 28

## Contents

Authors: Mike Hucka, Sven Sahle, Stefan Hoops

This is a working document collecting the results of discussions and work to date on the form of SBML Level 3 Core. Separate pages describe the results of ongoing work about packages that can be added on top of the Core. Community feedback is sought on all aspects of these plans for Level 3.

## General principles for Level 3 Core

The following are the general principles being followed in the development of Level Core. These principles arose from discussions in the SBML community at various times, together with additional considerations developed by the SBML Editors in the process of working on SBML Level 3.

1. Level 3 Core will not be a complete redesign of SBML.

Many aspects of SBML are, frankly, not pretty. As tends to happen with formats and other computer artifacts that evolved over time, the structure of SBML and many specific details are not ideal or done in a way that one might do if given the opportunity to redesign SBML with 20-20 hindsight. Despite this, it is currently the consensus among heavy SBML users that it is not so bad that it is time for a complete ground-up redesign. Large changes to SBML would require large changes in supporting software, and the bottom line is that no one feels it is worth the time, effort and resources to go that far. Consequently, Level 3 Core will be highly similar to Level 2 Version 4.

2. Level 3 Core will not contain unnecessary changes compared to Level 2 Version 4.

Given the opportunity of creating a whole new Level of SBML, there is a temptation to make small changes just because we can (e.g., to fix some long-standing wart). As a general principle of Level 3 development, the SBML Editors feel it is better to restrain from making changes that do not represent either (1) technical fixes, (2) changes necessary for supporting Level 3 packages, or (3) removal of deprecated features.

3. Definitions will be primarily at the level of object structures and textual explanations

The form of the definition of the XML (specifically, whether an XML Schema is provided, or whether other technology such as RELAX NG is used) is a technicality that needs to be resolved, but is not the main problem in defining SBML. The definition of SBML components and use should remain at the level of object structures (using UML notation) and associated textual descriptions.

## Overall characteristics of Level 3 Core

Level 3 Core will be based on SBML Level 2 Version 4. The following is a list of changes planned relative to SBML Level 2 Version 4.

### Package specification

Level 3 will add a construct to the <sbml> element enabling a model to declare which SBML Level 3 packages are used in the model. The following is an example of what this will look like:

<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" level="3" version="1"
xmlns:layout="http://www.sbml.org/sbml/level3/version1/layout/version1">

<listOfPackages>
<package namespace="http://www.sbml.org/sbml/level3/version1/layout/version1" required="false"/>
<package namespace="http://www.sbml.org/sbml/level3/version1/render/version1" required="false"/>
</listOfPackages>
<model id="TheModel">
...
</model>
</sbml>


The packages are identified by their XML namespaces. The required attribute is used to indicate whether a given package is required for correct mathematical interpretation of the model, or whether the package is optional.

No declaration is necessary for the Level 3 Core package, since it is the base package and support for it is required in any case.

### Use of XPath for references to identifiers

A problem that will be faced in SBML Level 3 packages is how to reference identifiers inside a model from outside of that model. For example, this is faced by the Layout package, since it places a list of layouts outside of a <model>; it will also be faced by the package for hierarchical model composition, in which multiple models will potentially be encapsulated inside another.

Rather than defining a novel hierarchical referencing scheme using some object structure in SBML, the Level 3 approach will be to use XML XPath syntax. The way it is expected to be done is as follows. Currently, identifier references in SBML are simply strings whose values are required to be SId identifier values (e.g., the species attribute of speciesReference). In Level 3, all such references will be defined to be strings whose values are XPath expressions, with suitable limitations placed on the scope of the referenced entities. (The reason for the last point is that XPath can technical reference anything inside an XML document, but in the places where SBML allows a reference, the target must be a specific kind of thing such as a species identifier, compartment identifier, etc. So SBML must stipulate that this is not violated by the change to using XPath.)

### Removal of default values

A source of confusion and error in interpreting models in SBML Level 2, as well as a burden on implementors, is the specification of default values for certain attributes of SBML components. These include default values for stoichiometries, default units, etc.

The presence of a default means that an SBML model file does not need to mention the defaulted attributes and their values. It is far too easy for a reader of the model to misinterpret it such a model—a reader would need to memorize the SBML specification in complete detail to know unmentioned attributes exist, and to know their values. Further, some SBML default values are controversial in SBML, with different modelers preferring one to another. Finally, the fact that default values exist at all means that a model is in some sense not self-contained.

For SBML Level 3, the SBML Editors have concluded the best approach is simply to remove all default values from the specification. This means that any attribute or element not included in the XML does not exist. The cost is an increase in verbosity and file size. The former is only an issue for human readers (computer software doesn't care), while the latter issue is largely mitigated by the speeds and storage capacities of modern computers.

Consequences for the units of an SBML models:

• There are no "built-in" units anymore. The units "substance", "time", "volume", "area", "length" have to be explicitly defined if required. If they are not, all the values appearing in the model and resulting from a simulation of this model are purely numerical.
• If one wants to attach a unit to an SBML entity, a unit, that can be arbitrarily named, need to be defined in the listOfUnitDefinition. This unit then has to be referred to by the relevant SBML element (with xpointer).
• If some SBML entities do not have units attached, no assumption are made, and unit checking will be limited.

NB: Since the units of time and of reaction extent have to be the same for the whole model, they shall be defined in newly created attributes.

### Removal of compartment outside attribute

The outside on the SBML compartment component in Level 2 is intended for expressing (weak) spatial/topological relationships between compartments. Level 3 is expected to bring real geometry facilities. Consequently, outside will be removed in Level 3.

### Removal of species charge attribute

The charge on the SBML species component has been deprecated for some time. It will be removed in Level 3.

### Units of the kinetic law

In SBML Level 2, the units of the kinetic law is always amount-of-substance per time. The units of "amount-of-substance" are established by the SBML built-in unit symbol substance, and the units of time are established by the SBML built-in unit symbol time. Both of these can be redefined using SBML's unit definition facilities.

The kinetic law of a reaction is meant to indicate how frequently the reaction happens, or in other words, the reaction events per time. The stoichiometry then indicates how much of the substances involved in the reaction (the substrates and products) are consumed/produced when the reaction occurs. However, this notion turns out to be incompatible with the fact that in Level 2, the units for amount-of-substance can also be a mass unit like gram, meaning that the speed of a reaction can be in units of mass per time. In addition, it is also confusing that a unit like mole, which is usually employed in the context of a number of particles, can also indicate a number of reaction events.

The solution for Level 3 is that we will have a separate unit definition for amount-of-substance (which can be based on mole, gram, item, or a dimensionless number) and reaction extent (which can only be a dimensionless number). The units of the rate law will then be the units of reaction extent divided by the units of time.

### Specification of yield on species

The stoichiometry attribute and stoichiometryMath element on speciesReferences in Level 2 are not normal biochemical stoichiometries, because they embody a unit conversion. This conversion is necessary to map the units of the species concentration to the units needed in order for the reaction kineticLaw to have the units of substance/time.

This scheme has proven to be the source of frequent errors and confusion among users of SBML. The problem is that the unit conversion is not made explicit, and therefore hidden and easily missed. Further, there is no way in SBML Level 2 to attach a unit definition to the stoichiometry, to allow proper unit verification by software.

For Level 3, this will be resolved by adding two features: a yield attribute on the SBML species component, and a list of "conversion factors". The list of conversion factors will be parallel to the list of unit definitions, and allow the expression of both a number and a unit. This will require a change to the definition of <model> as shown below:

The definition of ConversionFactor will be a simple extension of Parameter. The difference is that conversion factors are constant quantities, so the constant attribute is set to true. (The fact that conversion factors are derived from the Parameter object class is just a convenience, to avoid having to define another class of objects in SBML. The conversion factors are put into a separate list, not the list of parameters, and are not used in the same way as parameters. Hopefully this will not lead to confusion.)

Finally, there is the addition of the yield attribute to species. This attribute's value will be referencing a conversion factor defined in the model by xpath.

The meaning of yield on a species definition is the following. If the units of a given species are given as Us, the value of yield is Y, and the units of reaction extent in the model are given as Ur, the yield establishes the following:

$U_s \cdot Y = U_r$