Issues To Address
The following are unresolved issues and questions that arose from the first day of videoconferencing and post-videoconference discussions at UCHC during the SBML Composition meeting, 9 Sep 2007.
Before listing the issues, we articulate one starting requirement: whatever scheme is developed, it should allow a software tool (e.g., libSBML) to unambiguously flatten an L3 "composed" model into a valid SBML L2v3 model. This requirement conflicts with some of the comments below, in which there are questions of user intervention. We leave the comments below as they are, in order to describe the space of possibilities, but ultimately we believe that being able to flatten a model down to L2v3 is a desirable feature and should serve as a design goal. It would enable composed models to be interpreted immediately and transparently by software tools that use libSBML, which in turn is important to enabling rapid adoption of the composition facility.
AF: This approach also simplifies the definition of some of the semantics of the proposal. We simply state that a composed model (in L3) is valid if the flattened L2 model is valid. This is not to say that actual implementations will report errors in this way :).
SBML Inclusion: Separating pieces of SBML into separate files and having a mechanism to include the external pieces into a given file.
SBML Model Composition: composing a model from submodels. AF suggests the following subdivisions within composition:
- Designed Composition -- previously known as 'Aggregation'
- Ad-Hoc Composition or Documentation of Merge -- previously known as 'Composition'
SBML Modularity: separating the different SBML extensions by related concepts, such as spatial layout, model composition, spatial geometry, etc.
Scope (use cases) supported—Added by AF Monday morning
We need to decide what use cases we are going to support with this extension or the scope of the extension.
I would suggest three levels of scope:
- Level A - Designed Composition - Modellers construct models by composing submodels that have been designed to be reused. Modellers modify submodels to fit the contexts in which those models are reused. Composed models do not record modifications to submodels to enable those submodels to reused instead the submodels are modified directly.
- Level B - Ad-Hoc Composition (in context of reuse) - Modellers construct models by composing submodels that have been designed to be reused however modellers do not modify the submodels but instead record the transaformations that are required to build a composed model. These transformations are recorded with the composed model.
- Level C - Documented Composition - Modellers build larger models by merging smaller models in a semi-automatic fashion. Modellers record the transformations required to complete the composition as documentatiion and so that the transformations can be 'replayed' on revised versions of the starting models.
An aside: Why model composition might be different from Software Engineering - when I have time
At the outset, we assume there will be directional links, and that these links can go from higher levels of a model/submodel hierarchy to a lower level. The link permits overloading any properties of the destination component. Example: a species definition at the higher level may have an initial amount, and when linked, this initial amount overrides the value given in the linked-to species in a submodel. (This is a general principle. If some SBML component in a submodel needs to be given different property values, then one would define a new version of that component at the higher level, and then link from that component to the c omponent being overloaded in an instance of a submodel. The meaning is that the definition in the enclosing model overrides the values of the linked-to component of the submodel. An implication is that the initial conditions of a variable such as a species is determined by the "from" element of the link, or if no link is defined, then the initial conditions are determined by the values in the submodel.)
We assume that anything potentially may be linked: species, compartment, parameter, etc., and yes, even rules. It remains to be decided which components will be allowed to be linked in the first iteration of this SBML Composition proposal.
Now for some questions:
- Should we allow deletion of elements? (This was raised by Martin during the videocon.) It appears that deletions may be necessary so that a composed model can deliberatly remove irrelevant constructs (such as rules) inside model instances.
AF: My view is yes. But this depends on the use cases we support.
MG: To make AdHocComposition fully reversible, with fully included original models, we will need Deletion.
- Do we allow direct replacement, or do we implement replacements simply by using a deletion followed by a new definition? (Schaff prefers if we don't have a separate construct for replacements and instead simply use deletion + new definition.)
- If the answer to the previous question is yes, we allow removal, and links are used as the mechanism for causing removal (as it is done in the Finney proposal), then, the UCHC Saturday afternoon discussions lead to a preference for defining an explicit attribute on a link to indicate deletion is intended rather than implying a deletion merely by omitting the "from" attribute of the link. The reason is that this helps establish deliberate intent ("I really mean to delete such-and-such") rather than leave the user of the model guessing whether some piece of the link was accidentally omitted.
JS: if we allow top-down links and deletion, then we don't need to support overloading of reactions. One can produce the same result by (1) deleting a reaction, (2) definining a new replacement reaction at a higher model level, and (3) link down the species to the relevant species in the submodels.
- Should we limit what can be linked and removed? Maybe we should allow linking only species, compartments, reactions, and parameters? And maybe we should allow deletion of reactions and rules only? These restrictions might reduce complexity of both the specification and the implementation. What about compartment types, species types, and other things? Which of SBML's elements should be included in the set of things allowed to be linked and removed?
AF - I have created a table here that is a first attempt at trying to answer these questions: Overloading Semantics For SBML Objects By Type. And after some discussion we arrived at the simplier form : Overloading Semantics
- If peer-to-peer links (i.e., within the same hierarchical level) are allowed, what happens if you have 3 species "A" in 3 separate non-identical submodels, and you put links from two of the A's to the third A?
In other words, imagine having A1->A2 and also A3->A2, where A1, A2 and A3 are in their separate submodels. Now suppose they have different initial amounts (or other properties). Which values win: A1's or A3's?
MG: Suggestion: as a consistency-rule, this is forbidden on the same siblings level. If you do it on different levels, then the outer model wins, since it instantiated the submodel and then changed the linking. Effectively copying all species to the toplevel causes a lot of redundancy. Is this copying useful: it causes useless redundancies. It also obscures the fact that in an DesignedComposition? model you want to link X.A->Y.A (because you do A->X.A, A->Y.A)
AF: I agree. In fact another way to think about this is that the overloading applies to the submodels as if they are flattened already so that any overloading at the lower level has occurred already. Martin can you explain your last question more clearly?
Two options so far have come to mind:
1. Allow such peer-to-peer linking, and specify that the tool must detect such situations, tell the user there is a conflict, and let the user decide which value should be used.
MG: this can be done in the composition tool, when producing the composed model. This should not be done in the reading tool, that wants to flatten/analyze the model.
SH: How do you store how the user resolved the conflict in the resulting file? A tool may still opt to allow peer to peer linking in its interface. However since this might lead to conflicts (e.g. which initial value wins, or circular links) the user chosen conflict resolution must be saved in the SBML file. Hierarchical linking is a clean way to do this.
2. Don't allow "peer-to-peer" links. The UCHC subgroup puts forward the following posulate: such conflicts can be avoided by disallowing peer-to-peer links and only allowing top-down hierarchical links. This means that the value would be determined by a species definition in an enclosing model, with links downward to override the values in the submodels. Question: are there any cases where peer-to-peer connections are actually needed?
MG: For the case of Designed Composition: in the easiest case you would not naturally copy the elements for linking to the upper level, but only, if you really have to modify them (the case where X.A and Y.A both don't have the correct attributes for the composed model). In a correctly designed model, this happens seldom. And you can't really tell where the copy of A really comes from (was it X.A, Y.A or Z.A). But this is something, you want to know in DesignedComposition?.
AF: I have 2 problems with the original idea
a) this is too restrictive
An example: I have two submodels containing ATP. I should be able to overload one ATP with the other. Otherwise I'd have to create ATP at the level above and overload that onto the submodel ATP species (that's more complex).
b) I don't think it removes the problem at all
An example I have ATP1 and ATP2 at the top level and by accident overload them both onto a lower level ATP. This is covergent overloading and it's not peer to peer.
SH: This is an user error in the enclosing model and as such it is in the users control to change and fix it. A tool of course should indicate such problems.
MG: Converging links on the same level are in the same way user error!
- If peer-to-peer connections are allowed, and the (conflicting) result is permitted and deemed something that a user should resolve, then the result could easily be an overdetermined model. Overdetermined models are invalid SBML in SBML Level 2. Will we have to remove the prohibition on overdetermined models in L3?
AF : Yes the user needs to resolve the conflict....but the conflict must be resolved before it gets to SBML L3. Such a conflicted model is useless to another tool. Such conflicts are siimilar tow say having 2 assignment rules for the same species in L2.
SH: I disagree the user must be able to save his work if it is not complete. A model might be broken or not suitable for simulation but it should still be a valid SBML document.
- What is the scope of SId identifier values? Does each submodel establish a separate namespace for SId?.
- SpeciesReference?/ModifierSpeciesReference inside a reaction point to different SId values within a submodel. Now suppose you link to (overload) a reaction in the submodel. How do you fix up the species references?
- Must interfaces (ports/terminals) be provided in the first version of the composition proposal? (Current feeling at UCHC is yes, because it's something people want to use, and it's important for effective model reuse.)
(comment MG): Definitely Yes, without having ports you cant export models from/to tools that support the Designed Composition Use Case internally
AF: Yes but they must be optional
- If ports/terminals are provided, do we allow them to be ignored, in effect allowing users of a model to violate the contract implied by the interface definition? Why or why not?
AF: There is no need for SBML to enforce the use of the interface so why impose it. SBML can support a wider range of methodologies if interfaces are not formally enforced in the languages. Ports are there if tools wnat to enforce interfaces
- Do we really need to define in/out/inout properties on terminals/ports? If we decide to allow only hierarchical links as described above, it seems we don't need in/out. Can we leave them out in the first version of the proposal?
- Result of afternoon discussions at UCHC: they're not needed. We are dealing here with biological entities, and the in/out concept is not. SBML should provide the ability to describe the intended biological model.
AF: In general I don't think there is any reason to have in/out in most cases. However what you may need is a boolean flag "FullyDetermined" (say) which indicates that the species is determined by a rule internally and therefore can't be determined by a rule or reaction elsewhere. After discussion We thought that 'Mutable' is a better name for this flag.
- How do you detect that what you are reading at the time of inclusion is (or is not) what was originally used by the original model's author? One idea is to provide the ability to put an optional mark of some sort that a tool could use to guess whether the linked-to file has changed since the time the linking model was formulated. Here are some ideas for what that mark could be:
- Time stamp. (Cons: very weak, issues of time granularity)
- Include an MD5 hash in the reference.
- Digital signature. (like PGP)
NLN: That could be store (optionally) in the SBML element. Each file should contain an SBML container. This container states the Level and the Version of what is inside.
AF: I need a lot more convincing that there is a real world requirement for this + this applies to single models as well so it could be considered to be a completely seperate L3 'Signature' extension
- What is the granularity of the inclusion? Must each file be a whole model, or can each file contains specific "groups" of SBML elements, or each file can contain anything (an annotation, a single attribute, anything at all?)? Some options:
- Require that the end result (i.e., after all references have been processed) should have valid SBML syntax, but allow file fragments to contain content that wouldn't be valid stand-alone SBML
- Each fragment must be valid SBML (stronger requirement, but maybe it's impossible to require)
AF: My preference: inclusion unit is a valid sbml model and the result is a valid sbml model. L2 allows for sufficiently sparse models IMHO.
- If a mark/signature/stamp is supported, and the answer to the previous question is that a file can contain anything at all, then will the marks be defined only for specific kinds of included files? For instance, would you include a mark if the file contained an ? How about an attribute value?
AF: see comments above on signatures
- In a valid SBML model, all SId values must be unique across the model and all metaid (of type XML ID) values must be unique across the whole XML document. What happens when file inclusion is used? It seems clear that id's will have to be changed -- so what are the rules for producing unique id's in the merged model? Whose responsibility is it to ensure id uniqueness?
AF: my preference is that objects in different inclusion units are in different namespaces
AF: So what's the mechanism this: ???
<sbml> <model id="x"> <listOfModels> <model id="y"> <sbmlInclude href="y.sbml" parse="xml" encoding="utf8"/> </model> </listOfModels> <listOfInstances> <instance model="y" id="InstanceOfY"/> </listOfInstances> </model> </sbml>
Note this preculdes the use of xpointer to point into the instance the following allows the use of xpointer
<sbml> <model id="x"> <listOfInstances> <instance model="y" id="InstanceOfY"> <sbmlInclude href="y.sbml" parse="xml" encoding="utf8"/> </instance> </listOfInstances> </model> </sbml>
Relating common components
- When two species (or indeed any components) are related in different (sub)models, they are supposedly the "same" -- they are equated with each other. However, and rather obviously, they may have different id's in the different submodels. How does a system determine that in fact they are the same thing? If you want to allow automatic "flattening" of a model into L2 format, how do you do it?
AF: The flattening processor can simply concatentate the internal identifier with the instance identifier as it goes so that identifiers become a paths through the heirarchy
- (Michael R.) What if the units of an included component are different from the current models? Do we have to define conversion rules? Are they default conversion rules? (If so, we are back to having default unit conversions in SBML, something that was all but deliberately banished in L2v3.) If they are not default conversion rules, then don't we have to provide a way for a model to give explicit conversion rules? If so, what form will these take?
SH: I suggest that an explicit unit conversion is stored in the link which connects objects with different units. A good tool however may introduce this automatically to assist the user.
NLN: The different units should be allowed in the core of L3. It is not just a problem of composition. If we provide some kind of modules in the core, with their own sets of units, tools will have to provide unit conversion mechanisms. I may be wrong but I think some unit conversion capabilities are/will be provided by libSBML.
- Are we going to attempt compatibility with CellML If yes, to what degree? What would it take?
- If we don't use in/out (see above), can we just ignore CellML's in/out qualifiers? Will that work?
- What if we work only towards the ability to import CellML modules, without the ability for CellML to import SBML. Would that require dealing with CellML's in/out? Would it make things easier?
NLN: We have to increase the compatibility with CellML. In the long run, this is vital for SBML. Any form of modularity will improve the compatibility. I am not sure in/out are crucial. They can be infered from the reaction and rule graphs.