The syntax of 'metaid' values in SBML

All SBML components can have an optional attribute, metaid, to store a “meta identifier”. This attribute is used to support metadata annotations in an SBML document. Often, users of SBML-compatible software tools do not have to see or interact with this attribute directly – it is something that software tools usually manage and hide from end users. Software developers do have to work with metaid attributes, however, and the allowed values for metaid have been a source of confusion for developers and advanced users of SBML. This page attempts to clarify the syntax of the allowed values of the SBML metaid attribute.

The origin of the metaid attribute

SBML uses the XML type ID as defined in the February 2004 version of the XML 1.0 specification. Although later versions of the XML specifications imposed more restrictions on the format of the ID type, SBML specifications have always continued to use the original definition. This is the source of confusion: although people often speak loosely and say that the metaid attribute takes on the values permitted by “the type ID defined in XML,” in reality the permitted values are taken from a specific version of the XML specification.

The specific definition of XML ID used in SBML is explicitly defined in the SBML specification documents, and has remained the same since the first introduction of the SBML metaid attribute. Developers and advanced users of SBML are advised to consult the definition in the SBML specification rather than searching for the XML specification documents, to avoid the risk of accidentally reading a different definition of XML ID.

Syntax explained

The specific text of the SBML specifications related to the definition of metaid is reproduced further below on this page for reference. Informally, the syntax stipulates that the value of an attribute of type ID must start with either a letter, an underscore, or a colon, and following the first character, it can then have any number of characters permitted in the categories shown below.

Category Explanation
dot .
dash -
underscore _
colon :
Letter range of Unicode characters considered to be letters
Digit range of Unicode characters considered to be digits
CombiningChar list of characters that add such things as accents to the preceding character
Extender list of characters that extend the shape of the preceding character
NameChar one of Letter, Digit, CombiningChar, Extender, dot, dash, underscore, colon
ID one of letter, underscore, colon followed by any number of characters of type NameChar

XML Productions

The XML specification documents refer to ranges of Unicode characters as ‘productions’ (which are rules defined according to a certain convention). The following subsections list the allowable Unicode characters for a given definition.

Production Letter

These includes the expected (to English users) lower and uppercase Latin characters, as well as characters from all other sets. More precisely, letters are defined by following production:

Letter ::= BaseChar | Ideographic

The XML specification then goes on to provide additional productions that list Unicode character ranges for BaseChar and Ideographic.

Production Digit

The Digit production includes 0–9 and others.

Production CombiningChar

These are characters that are used to add an accent to the preceding character and therefore allow text such as å to be used.

Production Extender

These are characters that can extend the shape of a character, such as · for example. The full set of Unicode characters that falls into each of the categories listed above can be found here.

The relevant text from the SBML specification

The SBML specification documents explicitly define the syntax of the XML ID type used. In recent Levels and Versions of SBML, the text is in section 3.1.6. The following is the text verbatim.

The XML Schema 1.0 type ID is identical to the XML 1.0 type ID. The literal representation of this type consists of strings of characters restricted as summarized in Figure 5.

NameChar ::= letter | digit | '.' | '-' | '_' | ':' | CombiningChar | Extender
ID ::= ( letter | '_' | ':' ) NameChar*

Figure 5: Type ID expressed in the variant of BNF used by the XML 1.0 specification (Bray et al., 2004). The characters ( and ) are used for grouping, the character * indicates “zero or more times”, and the character | indicates “or”. The production letter consists of the basic upper and lower case alphabetic characters of the Latin alphabet along with a large number of related characters defined by Unicode 2.0; similarly, the production digit consists of the numerals 0..9 along with related Unicode 2.0 characters. The CombiningChar production is a list of characters that add such things as accents to the preceding character. (For example, the Unicode character #x030A when combined with ‘a’ produces ‘å’.) The Extender production is a list of characters that extend the shape of the preceding character. Please consult the XML 1.0 specification (Bray et al., 2004) for the complete definitions of letter, digit, CombiningChar, and Extender.

In SBML, type ID is the data type of the metaid attribute on SBase, described in Section 3.2. An important aspect of ID is the XML requirement that a given value of ID must be unique throughout an XML document. All data values of type ID are considered to reside in a single common global namespace spanning the entire XML document, regardless of the attribute where type ID is used and regardless of the level of nesting of the objects (or XML elements).

References

Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., and Yergeau, F. (2004). Extensible markup language (XML) 1.0 (third edition), W3C recommendation 4-February-2004. Available via the World Wide Web at http://www.w3.org/TR/2004/REC-xml-20040204.