This is a discussion page for topics raised by Nicolas Le Novère regarding the future of SBML.
1. Is SBML only for process descriptions?
Question by Nicolas
This is possibly the most important message of this discussion! It bears consequences not just on SBML, but on the entire domain of model encoding in the biomedical domain.
As described in the "history" message, the structure of SBML fundamentaly derives from the representation of metabolic networks. Those networks are described using chemical kinetics, and at the level of modelling, by the description of processes. Process Descriptions are just one type of approach to model systems. As systems biology rose and expanded over the last decades, other approaches gained ground, more suitable to problems such as signalling pathways, gene regulatory networks, multi-cellular models etc. But SBML core is not perfectly suitable to encode state-transitions, rule-based models, logical models, multi-agent models, discrete event simulations, statistical models etc.
The main question I would like to ask here is: "Should SBML be restricted to process descriptions?"
If the answer is "yes", why do-we then try to stretch SBML using packages for instance for rule-based models and logical models? Why not designing a federation of languages? After all, we do not try to cover the same ground than NeuroML, NineML or PMML, and we are perfectly happy for them to take care of their respective modeling approaches. This will be the topic of a further post "focussed languages".
If the answer is "no", why then are process descriptions in the core? All SBML supporting tools are meant to support (meaning at least to read) the core. Why do we force everyone to read constructs such as reactions, that are specific to process descriptions, even if those constructs are meaningless for logical models, rule-based models, models without pools etc. This will be the topic of a further post "slimmer SBML core".
| Comment by Nicolas: I do not understand why you write that. Your first sentence seems to imply that you equate modeling in systems biology with species and reactions. There may be several ways of seing modeling in systems biology (I am not talking about the "other" systems biology and bayesian modeling etc.), but none claim that. If it is the application of systems theory to biology, this is by no means restricted to the description of processes transforming pools (to generalise the reaction/species concept). If I read Bertalanffy, what I find is a lot of ODEs, with variables representing many different things. If I consider the birth of systems biology at Hodgkin Huxley (because of the cylce experiment, modeling, simulation, analysis), I find again partial and ordinary differential equations, but the variables are voltage and gates, not pools. Modern models of neurons have thousands of "compartments" with tens of thousands of "channels", and they are simulated by numerically integrating ODEs, but there are no processes depleting and inflating pools, and the compartments are not containers. If I decide to look at what people did in 1999, when the domain exploded, I also see multi-agent models, cell automata, etc. all approaches being not reductible to pools and processes. Limiting systems biology to only that is very restricting, and is actually frequently used by people to dismiss SBML. It is true that reactions and species are a defining feature of the language. They exemplify one of the fiercest battle in DDMoRe, where some partners want to write the maths of pharmacometrics models, while others want to define the elements of the structural models, the math emerging from it. So they are defining features, but there are no reasons to consider that they should be THE defining features. We can perfectly imagine an SBML with the choice between e.g. three different approaches, all with their defining features. SBML would not loose its identity without them because the question is not "should we suppress reactions and species?". But "should-we make them mandatory if SBML is not restricted to process descriptions and some software do not use them at all?".
Please move the "benefit" sentence to point 3.
And the fact you in some cases you can hack a model to use reactions and species to represent things that are not processes and pools, in order to get the same phenomenological result is irrelevant. Such a model would be MIRIAM-incompliant anyway.
Please move the "hybrid" discussion to point 2.
|Comment by Sarah: We have an issue here that divides groups of people, without them really realising it. You have the people who will 'interpret' constructs in an adaptive way that facilitates what they want to model (exemplified by storing recipes in SBML) and you have people who will 'interpret' constructs in a way that the name of the construct suggests to them it should be interpreted (exemplified by a comment that now MIRIAM URNs could be replaced by Identifiers it would be possible to use them in a non biochemical modelling context). So some people see 'reaction' and are quite happy to use it for gene regulation/petri nets/baking and others can only see a 'reaction' as "A process in which one or more substances (reactants) are chemically changed into one or more new substances (products)." [ref] This is a quirk of human nature.|
|Comment by Frank: To add my 2cts, I totally agree with the statement above. And I believe the next level should cater to both audiences. But not by *removing* compartment / species / reactions but by adding a 'variable' category. Over time we would then phase out the concept of varying parameters (as that is what variables are for). That way we can cleanly express both concepts. SBML without Compartments / Species / Reactions would no longer be SBML.|
|Comment by Nicolas: (first of all, I do not propose to remove Compartments / Species / Reactions from SBML, but from the core). But keeping them in the core is indeed one of the possibilities. But then, we need to clearly says that SBML is for process models. To describe non-process based models, I often quote logical models, but since it is not convincing for Chris: What about multi-physics models, like the one using comsol? Dagmar Iber is certainly considered as a systems biologist. Other examples are the models of cytoskeletton and structure deformation. See the work of Jean-François Joanny and Jacques Prost. Or the work of Jonathan Sherratt, with cellular automata for modelling flow. In 2000, Jonathan was certainly considered as part of the systems biologist community, at least in the UK. And finally, let's not forget all the models of the physiome community. Regardless of our small war with some of their members, the current models of heart, liver or kidney are very impressive, and there is no way we can encode them simply with process descriptions. If SBML is not SBML without Compartments / Species / Reactions, then we need to stop claiming that SBML aims to cover all modelling in biology or even systems biology, and actively blocking the development of other languages.|
|Comment by Sarah: So, to actually address the question. I do not think that SBML is only for process description and I think we should aim towards a core SBML that uses far more generic terms. Terms that do not require even the most pedantic of people to have to stretch their imagination in order to see how the construct applies to them. We have several such issues in L3. Parameters that might be constant or variable is another one.|
|Comment by Nicolas: I don't think this particular issue is a problem of terminology. It is a problem of mathematics and of model structure. The type of variables and relationships between them.|
2. Focused languages
Question by Nicolas
At the moment, the principle of SBML is to have one language, that covers everything we want to encode. The idea of extending the language was proposed in 2001, and a system of packages to do so has been proposed in 2002, and implemented in 2009. Over the last 9 years, this design has not been discussed, and it would perharps be a good time to do so.
There are many different definitions of a "model". As far as SBML is concerned, a model is an ensemble of variables and their relationships that describes the behaviour of a system. For instance, all the relationships encoded in an SBML core document, whether rules, kineticLaws and events, are meant to form a system of equation to be analysed as one. Some modelling approaches are not meant to be used together, but rather to be used on their own. The models can then be connected for instance through the result of simulations and analyses. Examples are process models, rule-based models and logical models. As already recognised in SBGN, those three representations of reality are fundamentally different, and one cannot generate meaningful hybrid models. One needs layers of transformation in between. Other examples are statistical population models.
It is therefore no surprise that among the packages that triggered the most controversies so far are "multi" and "qual". In multi, the community faced the challenge that a rule-based model is meant to be used upstream of a model description. A rule-based model is a set of rules, that when applied allow to generate a model description. There are currently at least two ways of doing so, by generating a process description model or by generating a multi-agent model. The challenge of the "multi" package is that we are trying to put together in a single description rule-based models, multi-agent models and process models. The intrication of the different constructs make the package use pretty hard. The "qual" situation is the opposite. Since there is virtually no connection between logical models and process models, the conveptual framework, the variables and the math being complely different, the constructs of the "qual" package interact very lightly with the core constructs. We could split all the qual:X constructs, put them in their own file, and we would not loose anything.
Now that we have languages such as SED-ML to describe simulations and analyses based on several model descriptions, including variable transformations, maybe it would not be entirely inconceivable to have different formats better suited to given types of models, and developed by their own user communities. We actually do that already when it comes to already well organised communities. There are no plans I am aware of to develop SBML packages that would cover the type of models encoded in NeuroML and NineML.
Since 2002, XML technologies evolved a lot, and it is not a problem anymore to link information coming from files in different formats. Note that other arena of knowledge encoding use such federation of languages. The most used everyday is probably Microsoft OOXML, that is a zipped archive containing files in various XML formats. Some are application specific (WordprocessingML, SpreadsheetML and PresentationML), and other are cross applications (Office Math Markup Language, DrawingML, Extended properties, Custom properties, Variant Types, Custom XML data properties, Bibliography etc.). Another example is NeuroML. NeuroML v1.8 is a combination of languages covering different representations such as channelML (state-transition model of ionic channels), MorphML (geometry of neurons to enable cable approximation), NetworkML (for representing neuronal networks).
We can imagine a structure for SBML, where one master file would coordinate the interplay of models encoding in several dedicated languages. Those languages could themselves be developed by the SBML community, or be imported from other communities (e.g. NeuroML, NineML, PharML etc.). Alternatively, such a federation of files could be part of a COMBINE archive, that could also include SED-ML files, SBGNML files etc.
| Comment by Chris: I'm hesitant about answering this one as I know we have a very different viewpoint. I'm a strong believer in hybrid modeling formalisms. I disagree that it is inconceivable to mix multi and qual models with core constructed models. This may not be easy to do with the way these packages are evolving now. It is not, however, in my opinion a fundamental limitation.
One case in point is that I was convinced that I needed completely new constructs to model genetic regulation. However, we were able to figure out a completely reasonable way to encode this as chemical reactions, and it is working great. Too often we go for new methods when the existing methods would work well. Indeed, SBML is an extremely rich modeling language. I have a lot of experience in formal modeling methods for a variety of systems including digital and analog hardware, software, and physical systems. The current SBML modeling constructs are capable of representing the same things as used in modeling formalisms for these domains. I'm even now using SBML to simulate these things as a result.
|Comment by Nicolas: I understand that in some cases one may develop hybrid models with different formalisms. But this is not mainstream (actually I am yet to find a single example of such a model answering a biological question). We can indeed "hack" SBML to change its semantics and encode variables and relationships that were not meant to be encoded there. But 1) Because the semantics is changed, those models cannot be shared anymore (e.g. the kineticLaw is not to be interpreted in substance per time anymore). As Lucian said in a post, an important feature of SBML is the fact that constructs have a semantic content. 2) The hacking cannot solve all solutions. See my comment above about multi-physics, deformation, and cellular automata models. 3) The various modelling communities have ways of encoding their models and they will not change their ways to fit with another community that decided it was the world's navel.|
|Comment by Frank: I'm afraid I'm not clear what the question for this point is. Being involved in SED-ML I welcome the concept of an archive format bundling a variety of other formats. At the same time, I also am a strong proponent of the single file solution. If the files represent XML based standards then there is nothing preventing us to re-use them as is within SBML. One example being SBGN-ML, there is nothing wrong for an application to store SBGN maps within an SBML document if they choose to do so.|
|Comment by Nicolas: Yup, I think you misunderstood. I mentioned the multi-file issue precisely to say that if we decided to go for several languages, there was no technical block.|
3. Slimmer core?
Question by Nicolas
As mentioned in the "process description" message, a tool supporting SBML is supposed to handle the core, if only because the packages are called from the core constructs. The problem is that the core package is pretty large, and does not restrict itself to generic constructs used by all packages. Because of history and pragmatic issues, the core blesses some constructs used only in specific modeling approaches. It also provides generic constructs that have seldomly been used. All those constructs are forced upon all supporting sofware.
1) Constructs specific to process descriptions
compartment: A compartment with a size is useful when we want to define pools and densities. It is of no use for representations that use only non-pool variables for instance through rules. Examples are physiological variables (tumour size, blood flow ...), physical parameters (voltage, light intensity ...), probabilities or ratio etc.
reaction: The reaction of SBML is fundamentaly a process. It consumes elements from pools with a given stoichiometry, and it produces elements from pools with a given stoichiometry. The SBML specification explains how to assemble the kineticLaws of reactions into ODEs to describe the evolution of the pools. Such an element is not useful for models such as state transitions, logical modeling of activity flows etc. The reaction element is a very complex element, and understanding it, including all the subelements is a pretty big burden for people who would not use it.
2) Generic constructs that are seldomly used
SBML Level 1 and 2 did not have the concept of packages. Everything had then to be covered by the language and understood by everyone. With the exception of species and compartment types, most of the constructs of Level 2 ended up in Level 3 core. Here are some statistics of use from BioModels Dabase release 20 :
- species: 120 100
- reactions: 119 575
- rules: 34 581
- but algebraic rules: 0
- constraints: 0
- initialAssignment: 898
- events: 227
- reaction attribute "fast": 4 729
- delay csymbol: 16
- delay time: 1 175
The question is not to decide if those constructs are useful. Some are obviously useful for some people, and as the spectrum of models to cover become larger, they will become more supported and more used. But should contructs that have never been used in the past be part of the core and imposed upon everyone? It boils down to the question of "SBML support". If a tool needs to support everything in the core to be declared SBML-compliant, then we need to strip off most of the list above. If this is not the case, and that tools can cherry-pick, what is the role of the packages at all?
| Comment by Chris: Already answered about process descriptions. I think we need to keep them.
As for the other constructs, I would not miss algebraic rules though we support them. I'm not sure they are useful. I would also love to see delay csymbol removed or at least moved to a package as well as fast.
Not sure what you mean by delay time. Is this time csymbol or time in events. We use both a lot.
We use constraints a lot!!! I suppose we should submit some models to BioModels using them. They are great for checking properties during simulation. For example, if you want to know the probability of species A or B reaches 30 molecules first, you can create constraints and count up the number of simulations terminated by each constraint. This is also extremely simple to implement in a simulator and it will be useful also for all kinds of models including logical models (we already use it for these).
|Comment by Frank: As indicated above, I agree with Chris in that we need to keep them. There is no need for SBML to loose its identity. However, these concepts should remain optional to use. We want to provide an easy upgrade path. As compared to the generic elements support for reactions is really a minimal burden. As for the other constructs while I still don't see a need for algebraic rules and DDEs and 'fast' in the core and personally would like to see them go into a add-on, I see that some groups have gone through the effort of implementing them. However, this only happened recently, until model editors are taking advantage of these features we won't see them in BioModels.|
|Comment by Nicolas: Sorry, but if they are optional, why don't we put them in a package? I just don't understand this holy status of three particular elements. Regarding the other elements you mention, I agree that without support, they cant be in BioModels DB. But look at the literature. How many models using algebraic rules? Using fast reactions without defining the math to solve them? Those constructs will never be used because their is no scientific case for it. The fact that a few software developer developed support for them because the current specification contain them is not a reason for keeping them in the future. I am sorry but the decision on what to keep in the core and what to ditch has always be taken by less than a handfull of people, namely the developers of the two initial supporting software and their descendant (Gepasi/COPASI, Jarnac/SBW). As an example even when a good half of the community complained about the disappearrance of speciesTypes (wrongly IMHO because groups is better), they went off nevertheless. While nobody ever demanded to keep the algebraic rules but they stayed. I am not complaining about the developers above, mind you (I owe so much to COPASI), but I believe it is either time to decide rationally or to admit that SBML structure and evolution is arbitrary.|
4. Modularity in core
Question by Nicolas
Biology is modular. A biological entity, whether an enzyme, an organelle, a cell, a tissue, an organism, is a module, that reads information from the environment, perform internal processes largely hidden from this environment, and then adapts its behavioural properties, that are themselves read by other entities in this environment. The entire basis of synthetic biology is modularity. Multi-scale modelling relies on modularity.
Knowledge representation is modular. Almost all programming languages, all representation languages are modular. Modularity facilitates development and fixing, it facilitates exploration and visualisation. Modularity allows encapsulation and re-use. Modularity allows to combine various expertises and workforces on a large-scale project.
Despite that, an SBML model is monolithic.
We actually discussed the CellML modular structure as a solution in the very beginning. But it was deemed too complicated at the time (It is hard to develop partial support for such a structure). Alternatives were proposed by other early members of the community (e.g. ProMot/Diva). But the urgent need was a simple format to encode metabolic models. Going for a simple monolithic format was definitively the right way to go at the time to get quick endorsement. Nevertheless, the approach used by CellML may be the right one from an engineering point of view; and we may now have better XML technologies to support that. In 2007, I proposed this approach as a pre-requisite to the model composition package. http://sbml.org/Events/Other_Events/SBML_Composition_Workshop_2007/Modularity_In_Core I am not claiming that this is the solution, and I would probably describe it differently now. But I still think an SBML model should allow modules and encapsulation. This is the proper way to go. And it is entirely compatible with all we did so far. The current SBML models are just models with a single module.
Note: Modularity of SBML is different from model composition. In model composition, one takes existing, fully fledged SBML models. One writes complex but necessary replacement and adapting layers. If a model of the collection is modified carelessly, the entire composed model becomes incorrect. But modularity in core would certainly facilitates and empowers the use of model composition.
|Comment by Chris: I agree that modularity should be in the core. We are now using the hierarchical model composition package which is great, but it is as you say more model composition than modularity. Some design decisions may be different such as use of ports, if we wanted true modularity.|
5. One file, one model, one simulation
Question by Nicolas
The current paradigm in the SBML ecosystem is 1 file = 1 model = 1 simulation. People may argue that this is not true in principle, but in practise it is. A set of equations describing a system is encoded in an SBML file. This SBML file is loaded in a modeling and simulation environment for further development or analysis. Except for the tools specifically dedicated to model integration, the vast majority of SBML-supporting software does not even offer the possibility to load several models at the same time. The reasons for this situation are multiple, and were mostly relevant at the origin of SBML. Furthermore, in many modern cases it is an hindrance.
For the large-scale models (e.g. the Jamboree metabolic reconstructions), maintenance of the files is becoming a very complex tasks, because of their size, but also merging problems.
In synthetic biology, the same building blocks are re-used in many different models.
In multi-scale modeling, each scale is better represented within a given set of equations, the variables being linked through variable transformation (e.g. a model for the metabolisme of a cell, a model for a population of cells, a model of the different tissues in an organism).
In multi-approach modeling, different parts of the model may need to be analysed with different methods and actually form different systems of equations.
But I believe it is more general than that. When designing the future of SBML, we should not be hindered the physical location of the information whether in one file or in one location. Solving this issue is a pre-requisite before the modularisation and the existence of a panoply of languages. Note that the issue is peripherally addressed by the model composition package, and also by using SED-ML. But are these solutions generic enough?
|Comment by Chris: I agree with this one. The hierarchical model composition package has already removed the 1 file = 1 model = 1 simulation idea, by the way. It should indeed be made part of the core as discussed in an earlier message.|
6. Externalize initial conditions?
Question by Nicolas
At the moment, all the information, including the list of variables, the mathematical relationships, and the initial conditions, are contained in one SBML file. The inclusion of initial conditions in the SBML file was discussed as early as 2001. At the time, software such as VCell and StochSim defined a model as the list of variables and relationships. The initial conditions were part of the model parametrisation. The SBML model was seen as an "instance" of a model. In the case of a bi-stable system, different points of origin for the variables may lead to different behaviours. But saying that those represent different models, despite having the same species, reactions and parameters, is a bit odd.
MIRIAM states: "The model must be instantiable in a simulation: all quantitative attributes must be defined, including initial conditions."
There is nothing in that sentence saying that the initial conditions must be unique, and even less specifying where to store those conditions. One constraint in 2000/2001 was the obligation for SBML to be the vector of communication within SBW modules. This constraint is not in the way anymore. With the appearance of NuML, and its potential use in packages such as distrib, part of the initial conditions could possibly move out of the SBML file describing the model. The availability of large-scale quantitative datasets means we should be able to re-use models with different datasets. Some efforts, such as as the worflows developed in Manchester, parameterise the models using experimental information.
Should-we not question the presence of initial conditions in the main SBML file itself?
NB: Of course we can systematically overide all the values in an SBML file using a SED-ML file, but this is quite a complex and cumbersome procedure.
| Comment by Chris: I'm not sure why this is hard to do with SED-ML. You can think of the initial conditions as a reasonable starting condition and use SED-ML to encode other starting conditions you are interested in. It is good to know a good place to start.
By the way, you can now do this natively in SBML with hierarchical model composition package. You can have a model for which you have other models that include this model and change the initial conditions using replacements. Quite useful.
|Comment by Frank: Just to answer chris's question, what makes this hard to do is simply the fact that SED-ML is agnostic to the modeling language, as such it has to use XPath expressions to modify parameters which is a bit heavy when all you want to do is to change some initial conditions.|
| Comment by Sarah: I agree that including initial conditions within a model creates an 'instance' of the model. My own (very limited) biological modelling experience involved a Monte Carlo simulation; to my mind I ran the same model thousands of times - each time with different initial conditions. I did not have thousands of models.
I think abstracting the layer with initial conditions would be a good thing.
|Comment by Frank: I'd be fine with an abstraction layer or even multiple initial conditions. However I would maintain that an SBML model should at least contain one set of initial conditions, or provide the option of having one!|
7. Elements vs attributes
Question by Nicolas
SBML does not use element content at the moment. The only exceptions are coming from other languages such as XHTML or MathML. So instead of using:
<elementA> blah </elementA>
SBML will use
<elementA content="blah" />
There are many reasons to avoid XML attributes. Some are listed at: http://www.w3schools.com/dtd/dtd_el_vs_attr.asp
- attributes cannot contain multiple values (child elements can)
- attributes are not easily expandable (for future changes)
- attributes cannot describe structures (child elements can)
- attributes are more difficult to manipulate by program code
- attribute values are not easy to test against a DTD
But for SBML we have to add two other reasons:
1) In order to annotate a particular property, for instance the concentration or the location of a species, we need those properties to be elements and not attributes. In the example:
<species metaid="X" id="X" compartment="Y" initialAmount="1" units=”U”> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22rdfsyntaxns#" xmlns:bqmodel="http://biomodels.net/modelqualifiers/"> <rdf:Description rdf:about="#_000003"> <bqbiol:isDescribedBy> <rdf:Bag> <rdf:li rdf:resource="http://identifiers.org/pubmed/12345" /> </rdf:Bag> </bqbiol:isDescribedBy> </rdf:Description> </rdf:RDF> </annotation> </species>
What is the article pubmed/12356 about? The fact that X is contained in Y or that there are 1 U of X in Y?
2) We cannot relate attributes in the XML itself. We need the specification. In the string:
<species metaid="X" id="X" initialAmount="1" compartment="Y" units="U">
U is the unit of what? It may seems inoccuous, the answer being RTFM. But this kind of answer is valid in a world where a model is loaded in an SBML-aware software and interpreted by a human being. It becomes a lot less useful when we enter the realm of automated semantic reasoning.
|Comment by Chris: This seems reasonable to me. I'm at an SBOL meeting right now, and we are headed towards an RDF/XML serialization which will use primarily elements.|
|Comment by Mike: I entirely agree that the next generation should be element-based. We could use attributes for such things as entity identifiers, as well as the units of a value contained in an element.|
|Comment by Sarah: There is also an issue with xml attributes and namespaces that is only now becoming apparent. We should definitely avoid attributes wherever possible.|
|Comment by Mike: I'd also like to introduce the question of whether we should be focusing at the XML level when developing the representation format. It's added as a separate question further below on this page.|
|Comment by Frank: At first glance this seems really scary to me. Instead I would prefer a hybrid solution. That would be 'allow both' That means you could either use an attribute or in case you need to annotate or do something extraordinary it would be allowed to have instead of the attribute a child-element with the same name as the attribute.|
|Comment by Nicolas: Hmmmm. I programmed my first XML parsers using the Perl package XML::Simple, that did not distinguish between both and THAT was scary. I would not like to have the choice between element or attribute for the same concept. But completely banning all element is maybe not the best choice. A rule of thumb could be: unbounded value= always element, restricted value= possible attribute. Example of attributes would be thing have boolean values, units etc.|
8. Semantics of element names
Question by Nicolas
The names of many SBML elements are semantically loaded. This caused a lot of confusion in the past. The most common reaction is "SBML is only for biochemical models". A few reasons why those names carry a biological meaning are: 1) In 2000, SBML *was* for biochemical models 2) In 2000 there was no other suitable mean to encode biological meaning 3) In 2000 SBML was supposed to be written and read by human 4) In 2000 SBML was developed and processed by biologists
None of those reasons are valid anymore.
The main examples of semantically loaded element names are:
The Systems Biology Markup Language. Over the last 10 years, systems biology evolved a lot, and SBML only covers part of the models developed in systems biology. Conversely, because of its quality and the quality of its software support, SBML has been used outside systems biology. BioModels Database contains electrical models of neurons, models of tumour growth, and even of zombie infection. I suggest we stop writing "The Systems Biology Markup Language (SBML)" and consider the acronym as a name, similar to SRI or CellML. That would require cleaning of the specification and the website.
The element species represent "molecular species". But really in SBML we want to represent all kinds of pools. A species can be a pool of molecules, or cells, or individuals. These pools are then consumed and produced by reactions. Moreover in the life sciences, a species when used without adjective, is assumed to be a taxonomic unit. Moving the name "species" to "pool" would bring a genericity to SBML that would encourage its adoption past the Systems Biology community.
Reaction, reactant, product
The SBML reactions represent all processes consumming and producing pools, not just biochemical "reaction". For instance, transport are not strictly speaking reactions, particularly the passive ones. If I model the transformation of lymphocytes to plasmacytes upon presentation of antigenes, most people would not call that a reaction (even if it may result in ... and allergic reaction).
Similarly the elements reactant and product are not only biologically loaded, but they also imply a directionality of the process that is not there. If the kineticLaw is negative, the product is consumed and the reactant produced. It would be better to have a single list of poolReference, with stoichiometry being negative, positive or null.
All the biological semantics should be externalise to proper tools, e.g. controlled vocabularies and ontologies.
Species, compartments and parameters
The fact that parameters can be constant or not, as are the species triggered much confusion. In modeling, there are state variables and parameters. Parameter values do not define the state of the system (although the decision on what is a variable and what is a parameter is sometimes hard to take). There should perhaps be one element for all variables, like in CellML, the nature of a variable being controlled by a typing mechanism. Therefore a compartment could be a container for some pools engaged in processes, but a member of a pool for some other processes. This deserves a specific discussion. Cf. post "generic Variable".
Other small issues
Some attribute names are misleading. For instance "concentration" instead of "density".
A very often misinterpreted element is "constraint", which is taken as a constraint put on the model, akin to a boundary condition, as in the package FBC, "Flux Balance Constraints". While "constraint" is in fact a measurement of the system state and the resulting action taken by the analysis tool (in a sense, "constraint" does not belong to SBML , but to SED-ML).
In pharmacometrics, a "compartment" is what we call a species, i.e. a pool of a given chemical in a given "vessel". We may consider switching from compartment to container, that explicitely says it contains something.
| Comment by Chris: I think keeping SBML is needed unless we really plan to develop something completely new.
I like the idea on changing species, reaction, and parameter to more generic names. Even today in the SBOL meeting there was reluctance to use species in a model because it may be more than one molecular species involved in a lumped fashion. The name parameter also confused as it could change dynamically.
9. Using generic variables with controlled vocabulary instead of species, parameters
Question by Nicolas
The fact that SBML has blessed variables, called species and compartment, and generic parameters, that can be both variable or constant brought confusion for most of the decade. Because of the variable support of the rules, some users stored values that were not pool size in species (e.g. probabilities), other created variables that represented pools.
If we incorporate the modularity in the core, and we use this modularity to represent multi-scal models, we may end up with pools (species) of compartments. Would-it not be much handier to have something like:
<listOfVariables> <variable id="S1" type="pool"> <listOfProperties> <property type="quantity"> <value>1<value> <unit>millimole</unit> </property> </listOfProperties> </variable> <variable id="C1" type="container"> <listOfProperties> <property type="size"> <value>1e-18<value> <!-- here the unit is the global one, Litre --> </property> </listOfProperties> </variable> </listOfVariables>
|Comment by Chris: This actually seems logical. It is more like a programming language with variables that have types.|
|Comment by Frank: While I like the notion of introducing variables (and their properties), I maintain that the species construct should remain. The biggest concern i have with the assigning of properties on variables is that these properties might depend on other variables. We should very much aim to refuse to create another nightmare such as the hasOnlySubstanceUnit attribute on species.|
|Comment by Nicolas: I have to say, this issue is slightly dependent on the "only process description" and "several languages above. But I am not sure I understand your concern Frank. Could-you elaborate? How are elements parameter, species and compartment different of variable type="pool/container/parameter"?|
10. Element to provide properties to elements such as species or compartments
Question by Nicolas
In the current specification, the parameters are not associated with constructs such as compartments, species and reactions. Therefore, they cannot be used to store numerical properties explicitely associated with those constructs. Example would be physical constants such as molecular weight, conductance, Gibbs energy etc. When a number must be linked to a construct, an attribute is created, such as the size of a compartment or a charge. This forces the generation of a new specification when we want to add or remove (cf. charge) properties. Would-it not be much handier to have something like:
<construct id="x"> <listOfProperties> <property id="SId" sboTerm="SBO:#######"> <value>1<value> [*] <unit>MyUnit</unit> [*] </property> </listOfProperties> </construct>
- If this was incorporated into SBML L3, these would be attributes.
|Comment by Chris: This seems reasonable too.|
11. Subset of MathML, or all MathML?
Question by Nicolas
At the moment, SBML re-uses a subset of MathML to describe the mathematical parts. The rational for such a choice was that SBML-compliant software had to support all the mathematical operators, and the burden would be too heavy. The choice of the operators to support was historical. As a result, SBML supports "factorial" or "logbase", elements that have never been used in a model (I cannot even imagine a use in our communities), but does not support product, sum and matrices, cornerstones of chemical kinetics and systems biology.
People do not support the whole SBML subset anyway, so why don't we allow the whole MathML content? Or at least add elements on a regular basis upon demonstration of use-case?
| Comment by Lucian: Hey, I can answer this one, at least ;-)
The opposite approach was taken by CellML, which the spec claims supports all of MathML. In the beginning, before there were any CellML interpreters at all, Catherine and her cohorts created models with the <partialdiff> operator.
In time, when an actual CellML interpreter came to be, it did not support the <partialdiff> operator. And in fact, no CellML interpreter has ever supported the <partialdiff> operator. And there continue to be models (contrary to Catherine's wishes) in the CellML repository with the <partialdiff> operator that absolutely nobody can interpret.
SBML is intended to be an exchange format. As such, you should expect to be able to code your model in one tool and run it in another. If you open the doors to all MathML, I am afraid we would get balkanization of interpretation abilities, and you would no longer be assured that if you coded your model in one tool it would run in another.
I am not, however, opposed to your second suggestion of expanding the MathML subset that we allow. This has happened in the past, and we can certainly do it again. Demonstration of use-case would be good, but more important (to my mind) would be implementation by two or more tools. We could float a list of new MathML to include in 3.2, but I would not call 3.2 official until 2+ tools supported all the new MathML we wanted to include.
(Which is another reason not to open the floodgates to 'all MathML': we would need two tools to implement all of it before we could finalize the spec.)
|Comment by Chris: I agree with Lucian that we should not allow anything in the specification that is not supported by two tools. However, I'm strongly in favor of revisiting the mathML subset that we use. In particular, I would like us to consider adding arrays for example.|
|Comment by Mike: To this discussion, I would like to add an additional question: should we stick with MathML 2.0 or go to 3.0? See http://www.w3.org/Math/|
|Comment by Frank: For me, the question is not so much as to *which subset should be allowed*, I am fine with expanding it as needed. My real concern is what dimensions will my result have. The big issue with the products / sums / matrices is that they only make sense in certain contexts. Currently SBML identifiers stand for real-valued elements, there is no identifier that represents a vector or a matrix. Once we have them, and we make really clear when and in which contexts they are allowed to be used, then we need the additional constructs.|
12. Should the design focus on XML, or be abstracted away from that level?
Question by Mike
Would it be worth avoiding the level of XML, and discussing instead the object model? A specification in terms of object model could allow mapping not only to XML, but to other formats like JSON, Protocol Buffers, and others. In terms of good software development practices and leaving our options open for the future, this would be desirable, but OTOH it's not clear what to do about things like MathML, which are heavily XML dependent.
In the existing SBML specifications, we did originally try to stay at the level of object models and avoid XML details as much as possible, but it progressively got worse and more XML-dependent over time.
Added 2012-05-20: another question related to this is whether to switch to (e.g.) RDF.
|Comment by Sarah: This issue is similar to the one I pose in Q13. We need to get the biologists on board; which means we need design decisions based on what the biologists want to be able to model and how they would like to model it. The technical bit comes later.|
|Comment by Frank: I prefer XML for the simple reason that we can transform it rather easily. There are Json formatters for XML so people can easily convert from one to the other. Personally I would stay away from lesser known containers (as the mentioned protocol buffers).|
13. How much do we concentrate on seamless upgrade from L3 ?
Question by Sarah
During the transition from L2 to L3 the editors made the concious decision that L3 core would be basically L2V4. There were slight alterations but nothing really major. This decision had a huge impact on any design issues that came up.
For SBML+ we should think through this one. Nicolas is arguing (rightly) that we need to consider our potential users to be biologists; not necessarily the software developers. In most commercial situations the software developer does what the client wants (with some room for discussion) .
Obviously we do not want to make upgrading for software a complete re-write but if we want biologists to come to the table with their needs; they must feel they will be addressed.
|Comment by Frank: This question deserves a classical 'it depends'. I think it is important that SBML keeps its identity for me that means that compartments / species / reactions remain in the core. However, I would welcome other breaking changes such as parameters becoming parameters when the notion of variables is introduced.|
|Comment by Nicolas: This is not the question. Repeating the same opinion as a comment to all question will not increment the weight of it. It remains Frank=1 ;-)|
14. Should metadata stay in SBML?
Question by Nicolas
When we started with the annotations in 2005, the official stance of SBML was that we were supposed to use CellML metadata. Trying to make it work led to the current "controled annotations". It is based on RDF, and this is good. However, since the RDF is embedded in SBML, itself not an RDF serialization, the metadata is not usable with RDF tools such as SPARQL queries. In order to use those annotations, one needs to extract them. Actually, BioModels Database store the annotations in a MySQL database, and not with the model itself.
If we are moving towards a multi-language archive in COMBINE, with SBML, SED-ML and SBGN-ML files, I wonder if it would not be better to have all the annotations in a separate valid RDF file. The description elements would still point to the SBML metaid, except through more complete URIs, similar to what we do with NuML, SED-ML and soon with SBGN-ML.
15. Should-we keep the ListOf
Question by Nicolas
Grouping SBML elements of the same class into list elements has been a defining feature of SBML since the beginning. It is very unusual to find such lists in XML formats. Their presence, and the associated presence of annotation and notes that are ordered and unique, contrary to other children elements, has been what caused the drop of XML Schema from L3. This cause a significant hindrance to adoption since XML schema remains far more used to automatically work with XML languages than any other technics. It has been argued that it was important to keep the listOf in case one wants to annotate a whole list. But this has never been done in 12 years of SBML. It seems to me that the only real use of the ListOf is to structure the document for human browsing. While this was important in the past, I think it is becoming less important nowadays.
16. Should-we consider moving toward using only amounts for species
Question by Nicolas and Frank
[Put some text here]