About annotations in Level 2
The following is an attempt to clarify some issues in the SBML annotation scheme defined in SBML Level 2. The fact that SBML Level 2 uses RDF, but not all RDF, has been a source of confusion. The use of a very restricted subset of RDF is intended to let software developers avoid having to implement full-on RDF parsing, but at the same time, understanding what can and cannot be put into SBML Level 2 merits elaboration. We discuss these issues here and provide guidelines for how to encode certain kinds of information that SBML developers have requested.
What is allowed inside
rdf:Description in SBML Level 2?
In Section 6.3, the SBML Level 2 specification states that RDF content in an
<annotation> element, when not attached to the
<model> element, essentially must take the following form (and here we've shortened the list of XML namespace declarations to a typical minimum set):
<SBML_ELEMENT ... metaid="SBML_META_ID" ...>
<rdf:Description rdf:about="#SBML_META_ID" >
<rdf:li rdf:resource="MIRIAM_URN" />
The different elements have the following meanings:
| || The SBML element being annotated. Can be any SBML element such as |
| || The |
| || A BioModels qualifier expressing the nature of the relationship between the SBML object being annotated and the target of the annotation (i.e., the thing pointed to by the |
| ||The URI of the (external) resource being referenced.|
| || A placeholder for "more of the same", meaning, zero or more elements of the same form as the immediately preceding element. In the template above, there are 2 places where this appears: inside the |
| ||A placeholder for either no content, or content that is other than the content defined by this Level 2 annotation scheme|
There are some important restrictions to keep in mind regarding the SBML Level 2 scheme. The first restriction (stipulated in Section 3.2.4 of the SBML Level 2 Version 4 specification) is that there can be only one
<rdf:RDF> element inside a given
<annotation> element. Consequently, all RDF content associated with a given SBML object must be contained in this single
<rdf:RDF> element. The second restriction (stipulated in Section 6.3 of the SBML Level 2 Version 4 specification) is that SBML/MIRIAM annotations are only recognized when they are in the first
<rdf:Description> element within the
<rdf:RDF> element. (This is shown in red type in the fragment above.) There can be other content inside the
<rdf:RDF> element, but as far as the SBML MIRIAM annotation scheme is concerned, such other RDF content is private to a given model or software application and falls outside the simple scheme defined by SBML Level 2. (These restrictions are an effort to reduce the burden on software developers—it is presumed that the highly restricted scheme means an application does not need a full-blown RDF parser/writer in order to parse or write the SBML MIRIAM annotations.)
A third restriction is that the structure of the
<rdf:Description> element (shown in red type above) is actually limited to only what is shown here, and the structure of the
QUALIFIER element is likewise limited to just an
<rdf:Bag> containing only
<rdf:li> elements that only use
rdf:resource attributes. For better or worse, this is what SBML Level 2 defines as the scheme for annotations. While an application could write other RDF content inside the first
<rdf:Description> element, it would not be compliant with SBML Level 2. Applications should put other RDF content outside, where indicated above by the
... after the first
<rdf:Description> element. The sections below provide give examples of doing this.
MIRIAM_URN refer to something inside the model?
No. Although this is perhaps not clear in the SBML Level 2 specification, the restriction on the
<rdf:li rdf:resource="MIRIAM_URN"> elements is that the value must be a MIRIAM URN. Although this simplifies reading model definitions, it unfortunately limits what can be put within this annotation framework.
An application or model author is free to put other RDF content elsewhere (for example, after the first
<rdf:Description> element, as mentioned above), but then it becomes a proprietary annotation, and there is no guarantee that another software program will be prepared to interpret it in the intended way.
How do I put evidence codes in annotations?
There are at least two ways to do this. One approach puts the annotations entirely within the first
<rdf:Description> element, making it SBML MIRIAM compliant, but it comes with a limitation that may or may not be significant in a particular situation. The second approach puts the annotation outside the first
<rdf:Description> element, and while it is correct RDF and provides more power and flexibility, it is not compliant with the restricted SBML MIRIAM scheme of SBML Level 2 (but may nevertheless be understood by software that can parse full RDF). We describe each approach in turn.
MIRIAM Resources defines a URI for the ECO evidence code ontology accessed via the Ontology Lookup Service. To use it in the SBML Level 2 annotation scheme, use the
isDescribedBy biological qualifier from the BioModels qualifiers list. The following is an example of annotating a reaction with the ECO code
<reaction metaid="metaid_0000052" id="vPFK" name="Phosphofructokinase"> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" > <rdf:Description rdf:about="#metaid_0000052"> <bqbiol:isDescribedBy> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:obo.eco:ECO%3A0000183"/> </rdf:Bag> </bqbiol:isDescribedBy> <bqbiol:is> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:kegg.reaction:R00756"/> <rdf:li rdf:resource="urn:miriam:reactome:REACT_736"/> </rdf:Bag> </bqbiol:is> </rdf:Description> </rdf:RDF> </annotation> ... </reaction>
The above says that the reaction "is described by" the given evidence code
ECO:0000183 ("Inferred from experiment/inferred from cell-free assay"), which is to be interpreted as the kind of evidence that supports the presence of the reaction in the model. (Since the evidence code is attached to the reaction and refers to it, this is the only possible interpretation.) This approach is simple and fits entirely within the scheme defined in SBML Level 2.
Depending on a modeler's goals, this evidence association may be just what is sought. However, sometimes a modeler wants to describe the evidence associated with a particular statement about an SBML model entity. An example is associating an evidence code with a claim that a species occurs in a particular compartment: it is not a statement about the species, but rather about a relationship involving the species. The SBML Level 2 annotation scheme does not permit this kind of annotation-referring-to-another-annotation structure. We therefore describe an alternative (approach #2) below.
The second approach uses other RDF syntax (specifically, RDF Reification) placed outside of the first
<rdf:Description> element controlled by the SBML MIRIAM annotation scheme. RDF reification provides a vocabulary for describing RDF statements using RDF, which is exactly the situation we face when we want to attach evidence codes to particular SBML annotations. RDF reification uses a
<rdf:Statement> element and three RDF property elements for the subject, object, and predicate involved in the relationship being stated.
The following examples illustrates the syntax and approach:
<species metaid="metaid_0000052" id="S" compartment="ly"> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" > <rdf:Description rdf:about="#metaid_0000052"> <bqbiol:occursIn> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:obo.go:GO%3A0005764"/> </rdf:Bag> </bqbiol:occursIn> </rdf:Description> <rdf:Statement> <rdf:subject rdf:resource="#metaid_0000052"/> <rdf:predicate rdf:resource="http://biomodels.net/biology-qualifiers/occursIn"/> <rdf:object rdf:resource="urn:miriam:obo.go:GO%3A0005764"/> <bqbiol:isDescribedBy> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:obo.eco:ECO%3A0000004"/> <rdf:li rdf:resource="urn:miriam:pubmed:7017716"/> </rdf:Bag> </bqbiol:isDescribedBy> </rdf:Statement> </rdf:RDF> </annotation> ... </species>
<rdf:Description> element above says that the species
S is located in a lysosome (a cellular structure identified by GO term
GO:0005764). That much conforms to the SBML MIRIAM annotation scheme defined in SBML Level 2.
After this, the annotation above contains a second RDF element, the
<rdf:Statement>. This element states that the relationship "
S occurs in
GO:0005764" is described by two things simultaneously: the Pubmed publication number
7017716, and the ECO evidence code
ECO:0000004 ("inferred from cell fractionation assay"). The first part of the statement uses the properties
rdf:object, respectively to identify the subject (here, the element with XML ID value
metaid_0000052, which is the element for species
S), the predicate (here, "occurs in", defined by a BioModels.net qualifier), and the target/object (here, the lysosome as identified by GO term
GO:0005764) of the relationship. The second part of the statement records the fact that this relationship is described by the given publication and evidence code.
It may seem as though there is duplication of information between the
<rdf:Statement> elements, but there is not. The RDF reification statement in the second element is not an assertion that species
S is located in a lysosome; it is a statement about the assertion that species
S is located in a lysosome. As the RDF Primer puts it, "... when someone says that John said something about the weight of a tent, they are not making a statement about the weight of a tent themselves, they are making a statement about something John said".
Sample code for approaches #1 and #2
Sarah Keating wrote sample code demonstrating how to add the annotations above using libSBML. The programs are included in the libSBML distribution under the
examples/c++ directory. Here are direct links to the files:
- Example code for case #1 above: addingEvidenceCodes_1.cpp
- Example code for case #2 above: addingEvidenceCodes_2.cpp
Why are identifiers such as "
CHEBI:0001234" encoded as "
A final comment about the identifiers encoded in the URNs: note how
ECO:0000183 is written as
ECO%3A0000183 in the value assigned to the attribute
rdf:resource. The reason for this rests in the definition of RDF/XML and URNs. the Wikipedia entry for percent-encoding explains it well; the essential point is that in the case of ECO identifiers (and ChEBI and some other database identifiers), it so happens that '
:' is part of the identifier itself. (I.e., the identifier is "
ECO:0000183", not "
0000183".) So it must be escaped when written as part of a URN, because '
:' is a reserved character representing the component separator in URNs. Applications must percent-encode '
:' characters that appear in entity identifiers (whether from ECO, ChEBI, GO, or other) when writing them in MIRIAM URIs, and percent-decode the identifiers when reading the URIs.
How do I put InChI strings in annotations?
Certain modeling efforts want to attach InChI strings to SBML species elements, as a way of putting self-contained unique identifiers on the species. The limitations on SBML Level 2's RDF annotation content described above prevents putting the InChI content directly inside the
<rdf:Description> elements as defined by the SBML MIRIAM scheme. Consequently, in SBML Level 2, the annotations must be entered as "proprietary" annotations, outside the Level 2-defined scheme. We give an example of an approach to doing this here.
<species metaid="metaid_M_8" id="M_8" name="1-Methylnicotinamide" compartment="C_1"> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" > <rdf:Description rdf:about="#metaid_M_8"> <bqbiol:is> <rdf:Bag> <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A16797"/> </rdf:Bag> </bqbiol:is> </rdf:Description> <rdf:Description rdf:about="#metaid_M_8"> <in:inchi xmlns:in="http://biomodels.net/inchi"> InChI=1/C7H8N2O/c1-9-4-2-3-6(5-9)7(8)10/h2-5H,1H3,(H-,8,10)/p+1 </in:inchi> </rdf:Description> </rdf:RDF> </annotation> </species>
The essential point of the scheme above is that it puts the InChI string as the content of an element other than the first
<rdf:Description> element inside the
<rdf:RDF> element. The SBML Level 2 specification only constrains the content of the first
<rdf:Description> element, and permits valid XML RDF content beyond it, so the above is valid RDF, but simply outside the scope of the SBML Level 2 annotation scheme. SBML-compliant software tools may not attempt to understand it, but tools with extended support for RDF may be able to.
One last note about the use of the namespace URI
"http://biomodels.net/inchi". The use of a namespace-qualified element implies that the entity controlling the namespace has defined (or can define) the content that does into the element. It may seem reasonable to use
"http://iupac.org/inchi" for the namespace string, but it turns out that IUPAC, the organization behind InChI, does not define this namespace (or any namespace, as far as we can determine). This introduces the question of just exactly what to use in this situation. Attempting to use
"http://iupac.org/inchi" would risk encountering a conflict in the future if IUPAC ever does define something using the
"http://iupac.org/inchi" namespace. Consequently, a different namespace must be used. The BioModels.net consortium will release a simple schema for this purpose, using the namespace URI