All SBML components can have an optional attribute,
metaid, to store a “meta identifier”. This attribute is used to support metadata annotations in an SBML document. Often, users of SBML-compatible software tools do not have to see or interact with this attribute directly – it is something that software tools usually manage and hide from end users. Software developers do have to work with
metaid attributes, however, and the allowed values for
metaid have been a source of confusion for developers and advanced users of SBML. This page attempts to clarify the syntax of the allowed values of the SBML
The origin of the
SBML uses the XML type
ID as defined in the February 2004 version of the XML 1.0 specification. Although later versions of the XML specifications imposed more restrictions on the format of the
ID type, SBML specifications have always continued to use the original definition. This is the source of confusion: although people often speak loosely and say that the
metaid attribute takes on the values permitted by “the type
ID defined in XML,” in reality the permitted values are taken from a specific version of the XML specification.
The specific definition of XML
ID used in SBML is explicitly defined in the SBML specification documents, and has remained the same since the first introduction of the SBML
metaid attribute. Developers and advanced users of SBML are advised to consult the definition in the SBML specification rather than searching for the XML specification documents, to avoid the risk of accidentally reading a different definition of XML
The specific text of the SBML specifications related to the definition of
metaid is reproduced further below on this page for reference. Informally, the syntax stipulates that the value of an attribute of type
ID must start with either a letter, an underscore, or a colon, and following the first character, it can then have any number of characters permitted in the categories shown below.
||range of Unicode characters considered to be letters|
||range of Unicode characters considered to be digits|
||list of characters that add such things as accents to the preceding character|
||list of characters that extend the shape of the preceding character|
||one of letter, underscore, colon followed by any number of characters of type
The XML specification documents refer to ranges of Unicode characters as ‘productions’ (which are rules defined according to a certain convention). The following subsections list the allowable Unicode characters for a given definition.
These includes the expected (to English users) lower and uppercase Latin characters, as well as characters from all other sets. More precisely, letters are defined by following production:
Letter ::= BaseChar | Ideographic
The XML specification then goes on to provide additional productions that list Unicode character ranges for
Digit production includes 0–9 and others.
These are characters that are used to add an accent to the preceding character and therefore allow text such as å to be used.
These are characters that can extend the shape of a character, such as · for example. The full set of Unicode characters that falls into each of the categories listed above can be found here.
The relevant text from the SBML specification
The SBML specification documents explicitly define the syntax of the XML
ID type used. In recent Levels and Versions of SBML, the text is in section 3.1.6. The following is the text verbatim.
The XML Schema 1.0 type ID is identical to the XML 1.0 type ID. The literal representation of this type consists of strings of characters restricted as summarized in Figure 5.
NameChar ::= letter | digit | '.' | '-' | '_' | ':' | CombiningChar | Extender ID ::= ( letter | '_' | ':' ) NameChar*
Figure 5: Type ID expressed in the variant of BNF used by the XML 1.0 specification (Bray et al., 2004). The characters
) are used for grouping, the character
* indicates “zero or more times”, and the character
| indicates “or”. The production letter consists of the basic upper and lower case alphabetic characters of the Latin alphabet along with a large number of related characters defined by Unicode 2.0; similarly, the production digit consists of the numerals
0..9 along with related Unicode 2.0 characters. The
CombiningChar production is a list of characters that add such things as accents to the preceding character. (For example, the Unicode character
#x030A when combined with ‘a’ produces ‘å’.) The
Extender production is a list of characters that extend the shape of the preceding character. Please consult the XML 1.0 specification (Bray et al., 2004) for the complete definitions of
In SBML, type
ID is the data type of the metaid attribute on
SBase, described in Section 3.2. An important aspect of
ID is the XML requirement that a given value of
ID must be unique throughout an XML document. All data values of type
ID are considered to reside in a single common global namespace spanning the entire XML document, regardless of the attribute where type
ID is used and regardless of the level of nesting of the objects (or XML elements).
Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., and Yergeau, F. (2004). Extensible markup language (XML) 1.0 (third edition), W3C recommendation 4-February-2004. Available via the World Wide Web at http://www.w3.org/TR/2004/REC-xml-20040204.