Philosophical and ontological differences are always highlighted when a community of scholars attempts to codify a standard. One such difference was highlighted briefly at the recent SBML Forum at Stanford when it was announced that SpeciesType had been removed from the L3 core Public Review Draft.
While we had invested significant time and resources coding to the L2 spec and making use of SpeciesType, we acknowledge that our use of SpeciesType was basically a hack. Nevertheless, I believe the purpose for which we adopted SpeciesType is important and universal and I would like to propose a simple addition to the L3 core specification that will meet this important and universal need.
From my perspective, having not been present at the birth of SBML, but having built mathematical models of biological systems for more than 40 years, the SBML object Species is named and used in a way that hides its true ontological character and furthermore makes it impossible to write an algorithm that extracts from a ListOfSpecies the fundamental entities that define each Species.
Here are some of the Species definitions from Alicia Smith’s justly famous model of Ran transport. These were copied from BioModels.net Model 164.
+ <species metaid="metaid_0000034" id="Carrier_Cytosol" name="Carrier_Cytosol" compartment="Cytosol" initialConcentration="11.8952664327711" spatialSizeUnits="litre">
+ <species metaid="metaid_0000035" id="Carrier_RanGTP_Cytosol" name="Carrier_RanGTP_Cytosol" compartment="Cytosol" initialConcentration="0.00182967434742422" spatialSizeUnits="litre">
+ <species metaid="metaid_0000036" id="RanGAP_Cytosol" name="RanGAP_Cytosol" compartment="Cytosol" initialConcentration="0.5" spatialSizeUnits="litre">
+ <species metaid="metaid_0000037" id="RanBP1_Cytosol" name="RanBP1_Cytosol" compartment="Cytosol" initialConcentration="2.91577340630959" spatialSizeUnits="litre">
+ <species metaid="metaid_0000038" id="RanBP1_Carrier_RanGTP_Cytosol" name="RanBP1_Carrier_RanGTP_Cytosol" compartment="Cytosol" initialConcentration="0.0842265936904004" spatialSizeUnits="litre">
+ <species metaid="metaid_0000039" id="NTF2_Nucleus" name="NTF2_Nucleus" compartment="Nucleus" initialConcentration="0.560888580955963" spatialSizeUnits="litre">
+ <species metaid="metaid_0000040" id="RanGDP_Nucleus" name="RanGDP_Nucleus" compartment="Nucleus" initialConcentration="0.0466849733424111" spatialSizeUnits="litre">
+ <species metaid="metaid_0000041" id="RCC1_Nucleus" name="RCC1_Nucleus" compartment="Nucleus" initialConcentration="0.4" spatialSizeUnits="litre">
+ <species metaid="metaid_0000042" id="RanGTP_Nucleus" name="RanGTP_Nucleus" compartment="Nucleus" initialConcentration="0.0118032373274648" spatialSizeUnits="litre">
+ <species metaid="metaid_0000043" id="NTF2_RanGDP_Nucleus" name="NTF2_RanGDP_Nucleus" compartment="Nucleus" initialConcentration="0.939111419044037" spatialSizeUnits="litre">
A species is defined by an entity-Compartment pair. Most commonly, the entity is an SBO material entity such as a macromolecule or a simple chemical. The problem that we want to solve is simple:
Given a Species definition, write an algorithm to extract the entity and the Compartment that define the Species.
I assert that this is currently impossible without annotations, and that it is so fundamental a concept that it should be possible in the L3 Core. You can easily extract the Compartment. Why not the entity?
Please understand that neither the species id nor the species Name fulfills this need. There are no rules on the syntax of either that would, in the general case, permit a program to extract the identity of the entity (e.g. molecule or molecular complex) from the species id or the species name.
The encoders of BioModel 164 did as well as they could, and a human might well succeed in picking out the molecular complex Ran:GTP from the species name RanGTP_Nucleus. But given the L3 core spec alone, no general algorithmic solution to the problem of identifying the material entity is possible.
The frequently heard proposal that entity information should be encoded in annotations suffers from multiple weaknesses, most of which were cited at the Forum but to no avail in rescuing SpeciesType from a premature demise.
1. application-specific annotations will not, in general, be interpretable by a program that imports an SBML file.
2. MIRIAM annotations rely on the presence, in some web resource, of exactly the molecule(s) you want to reference – including its post-translational modifications or its “active” or “inactive” state. Most such annotations terminate at isVersionOf and simply fail to distinguish among related molecules.
3. Research work frequently involves molecules that are not yet in any database. SBML should support work on the cutting edge as well as it supports work on textbook pathways.
A more philosophical objection to the current state of affairs is that while Entity is one of the six primary controlled vocabularies in SBO, an SBML user has no way to map to the molecule/molecular complex branch of the SBO Entity tree. Table 5 on page 84 of the L3 Spec is quick to point out that SBML Compartments map to the SBO material entity branch, but it appears ontologically questionable to assert, as Table 5 does, that a SBML Species also maps to the SBO material entity branch.
A Species is “Entity in Compartment,” not just Entity. By saying that Species maps to material entity we are perpetuating the misconception that Species really does identify a molecule or a molecular complex. Indeed, I suspect that many people read Species and think chemical species. If you are fond of logical disputation, you could argue that an entity in an entity IS an entity, but this is not the point. The point is that our current definition of Species is incomplete.
If you work with models that have only one compartment, you never encounter this difficulty because your Species name can be the name of the molecule/entity with no ambiguity. But as soon as you want to put the same molecule in multiple Compartments, the uniqueness requirement of Species id forces you to add some version of _Nucleus to your Species id. This, of course will make sense to a human reader, but that is just not the point of XML.
Thus I propose (based on conversations with Stefan Hoops and Jim Schaff, who bear no responsibility for this post, but whose support I would welcome) that we add to the L3 core a required
<entity id=”Arf1” name=ARF1_HUMAN … other attributes? />
<entity id=”BFA” name=Brefeldin A />
and insert an new required attribute in the Species element:
<species id=”Arf1_Golgi” name=”ARF1_HUMAN in Golgi” compartment=”Golgi” entity=”Arf1” initialAmount=”1e5” boundaryCondition=”false” constant=”false />
In summary, the L3 Public Review Draft spec defines a species (p.43) as “a pool of entities that (a) are considered indistinguishable from each other for the purposes of the model, (b) participate in reactions, and (c) are located in a specific compartment.” This proposal aims to make it possible to know immediately what entity is referred to by any given Species element.
A program reading an SBML file should not be left to guess the identity of the entity referred to in a Species element.
A Species is (most often) a pool of molecules in a place. The Species element carefully defines the place with a required Compartment attribute; it should define the molecule equally carefully and the proposed “entity” attribute would do so in a simple, straightforward and useful way.