| Author | Topic |
Posts: 961
Registered: October 2003
|
|
Data put into models
|
30 Mar '07 23:04
|
 |
|
[I'm cross-posting this to sbml-discuss as a way of moving
this SBML question there, since it's not really a
libsbml-discuss topic but rather an sbml-discuss topic. --MH]
txr24> since R users tend to start with data rather than
txr24> models, it seems to me that models ought to contain
txr24> (within reactions) the original data used to
txr24> estimate the rate law parameters. This would allow
txr24> parameter confidence interval estimates and
txr24> comparisons of rate law structures using metrics
txr24> like Akaike's Information Criterion. Is SBML moving
txr24> that way?
People ask about putting data into SBML (usually the results
of simulation, but sometimes experimental data). However, I
really don't think data should be put into the SBML itself,
and I think I'm not alone in this opinion. First, a given
model may have been constructed from a lot of different data
sources -- would you put them all in? Second, there are too
many different possible data formats in use. Third, putting
data into the model would inflate the size of the models
(which some people complain are already too big, being XML).
And fourth, if you really insist, you can put the data into
<annotation> elements in a model anyway; nothing is
preventing that, so the capability is technically there
already.
A better approach would be to put data into separate files,
and link to them from the SBML using (say) RDF annotations.
A meta format could easily be developed (like jar files,
which are really just zip files) that carries both the SBML
and the collateral data files. The end result would be that
a single file is still exchanged (not multiple files as
might be implied by stating the data should be separate).
As a bonus, a zip format would already support compression,
something else that people have asked for SBML.
I don't mean to come down hard against the put-the-data-
into-SBML idea; I'm just trying to explain the reasoning
plainly.
MH
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 33
Registered: March 2007
|
|
Re: Data put into models
|
31 Mar '07 00:20

|
 |
|
Another alternative might be to move forward on an Experiment Markup
Language that would capture both experimental protocols and the resulting
experimental data. This would allow SBML to retain its identity as a model
exchange medium while beginning a separate and distinct "EML" that would
capture everything important about an experiment and its resulting data
sets. Lots of modelers start with data, not just R users, and all of them
would benefit from a standard format for experimental design/protocol and
experimental data exchange. I believe Upi Bhalla started a thread along
these lines last year on sbml-discuss, but even if the SBML editors tell us
it's off topic, there may be quite a few SBML folks who see the need for an
EML and would be willing to invest time and energy to make it happen.
R-based tools and anyone else interested in parameter estimation (and AIC)
could then build interesting tools that take both SBML and EML as inputs.
RDP
Robert D Phair, PhD
Chief Scientific Officer
Integrative Bioinformatics Inc
www.integrativebioinformatics.com
ProcessDB
Test. Validate. Extend.
-----Original Message-----
From: sbml-discuss-bounces@caltech.edu
[mailto:sbml-discuss-bounces@caltech.edu] On Behalf Of Michael Hucka
Sent: Friday, March 30, 2007 11:04 PM
To: LibSBML Discussion List
Cc: sbml-discuss@caltech.edu
Subject: [sbml-discuss] Data put into models
[I'm cross-posting this to sbml-discuss as a way of moving
this SBML question there, since it's not really a
libsbml-discuss topic but rather an sbml-discuss topic. --MH]
txr24> since R users tend to start with data rather than
txr24> models, it seems to me that models ought to contain
txr24> (within reactions) the original data used to
txr24> estimate the rate law parameters. This would allow
txr24> parameter confidence interval estimates and
txr24> comparisons of rate law structures using metrics
txr24> like Akaike's Information Criterion. Is SBML moving
txr24> that way?
People ask about putting data into SBML (usually the results
of simulation, but sometimes experimental data). However, I
really don't think data should be put into the SBML itself,
and I think I'm not alone in this opinion. First, a given
model may have been constructed from a lot of different data
sources -- would you put them all in? Second, there are too
many different possible data formats in use. Third, putting
data into the model would inflate the size of the models
(which some people complain are already too big, being XML).
And fourth, if you really insist, you can put the data into
<annotation> elements in a model anyway; nothing is
preventing that, so the capability is technically there
already.
A better approach would be to put data into separate files,
and link to them from the SBML using (say) RDF annotations.
A meta format could easily be developed (like jar files,
which are really just zip files) that carries both the SBML
and the collateral data files. The end result would be that
a single file is still exchanged (not multiple files as
might be implied by stating the data should be separate).
As a bonus, a zip format would already support compression,
something else that people have asked for SBML.
I don't mean to come down hard against the put-the-data-
into-SBML idea; I'm just trying to explain the reasoning
plainly.
MH
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 5
Registered: March 2007
|
|
Re: Data put into models
|
31 Mar '07 03:33

|
 |
|
I agree with what Michael says, but I feel that really scientists would need
data available with model (in some way). I am sure that would be a very big
mistake embodying data with models (even if other communities working with
similar data do that!), but the solution of separating data with model,
taking however them linked, is useful and successful (for me).
Better Explaining:
The mass spectrometry community has provided for modellers and data analysts
a standard model format (mzData) which contains all metadata relating to
whatever experiment of Mass Spectrometry (maldi, seldi, lc, ecc...). It is
not a model, but just an assembly of metadata to specify methods, hardware,
software, sample sources, user and experimental data. All in one file.
This approach is wrong for me because a MS experimental run is as
'repeatable' at least as one of our stochastic simulation. Therefore, does
not make sense to link data (as is) with metadata.
What we have to retain is that we are building a tool useful for biology
(and not just for computer scientists) and then we have to wonder and
investigate what could be useful for it.
My opinion (founded on my current work), is that could be a winning choice
link data, but:
1) Not raw data, but 'behavioural data'. Namely pieces of simulation that
carry behaviours and not just a group of points!
2) Not as a group of points, but as a compressed STRINGs. (see encoding
methods for e-mail). In this way, you can enable all those distributed
architectures web/grid based to work simpler with SBML;
3) Not data in files, but in XML repositories.
4) Not just simulation data, but also more annotated (and more structured
than annotations) information about any sources of that simulation.
Then three separated tiers to confer completeness of information.
Tommaso
--
Tommaso Mazza, Eng.
PhD Student in Computer Science and Biomedical Engineering
Web: http://www.hostingjava.it/-mazza
-----Original Message-----
From: sbml-discuss-bounces@caltech.edu
[mailto:sbml-discuss-bounces@caltech.edu] On Behalf Of Robert Phair
Sent: sabato 31 marzo 2007 9.21
To: 'SBML Discussion List'
Subject: Re: [sbml-discuss] Data put into models
Another alternative might be to move forward on an Experiment Markup
Language that would capture both experimental protocols and the resulting
experimental data. This would allow SBML to retain its identity as a model
exchange medium while beginning a separate and distinct "EML" that would
capture everything important about an experiment and its resulting data
sets. Lots of modelers start with data, not just R users, and all of them
would benefit from a standard format for experimental design/protocol and
experimental data exchange. I believe Upi Bhalla started a thread along
these lines last year on sbml-discuss, but even if the SBML editors tell us
it's off topic, there may be quite a few SBML folks who see the need for an
EML and would be willing to invest time and energy to make it happen.
R-based tools and anyone else interested in parameter estimation (and AIC)
could then build interesting tools that take both SBML and EML as inputs.
RDP
Robert D Phair, PhD
Chief Scientific Officer
Integrative Bioinformatics Inc
www.integrativebioinformatics.com
ProcessDB
Test. Validate. Extend.
-----Original Message-----
From: sbml-discuss-bounces@caltech.edu
[mailto:sbml-discuss-bounces@caltech.edu] On Behalf Of Michael Hucka
Sent: Friday, March 30, 2007 11:04 PM
To: LibSBML Discussion List
Cc: sbml-discuss@caltech.edu
Subject: [sbml-discuss] Data put into models
[I'm cross-posting this to sbml-discuss as a way of moving
this SBML question there, since it's not really a
libsbml-discuss topic but rather an sbml-discuss topic. --MH]
txr24> since R users tend to start with data rather than
txr24> models, it seems to me that models ought to contain
txr24> (within reactions) the original data used to
txr24> estimate the rate law parameters. This would allow
txr24> parameter confidence interval estimates and
txr24> comparisons of rate law structures using metrics
txr24> like Akaike's Information Criterion. Is SBML moving
txr24> that way?
People ask about putting data into SBML (usually the results
of simulation, but sometimes experimental data). However, I
really don't think data should be put into the SBML itself,
and I think I'm not alone in this opinion. First, a given
model may have been constructed from a lot of different data
sources -- would you put them all in? Second, there are too
many different possible data formats in use. Third, putting
data into the model would inflate the size of the models
(which some people complain are already too big, being XML).
And fourth, if you really insist, you can put the data into
<annotation> elements in a model anyway; nothing is
preventing that, so the capability is technically there
already.
A better approach would be to put data into separate files,
and link to them from the SBML using (say) RDF annotations.
A meta format could easily be developed (like jar files,
which are really just zip files) that carries both the SBML
and the collateral data files. The end result would be that
a single file is still exchanged (not multiple files as
might be implied by stating the data should be separate).
As a bonus, a zip format would already support compression,
something else that people have asked for SBML.
I don't mean to come down hard against the put-the-data-
into-SBML idea; I'm just trying to explain the reasoning
plainly.
MH
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 15
Registered: December 2004
|
|
Re: Data put into models
|
31 Mar '07 05:17

|
 |
|
thanks Robert, I like your EML idea because it places the original data
up at a higher level, where it belongs. I'm currently reanalyzing some
data and am finding that the original analysis was off by over an order
of magnitude on several dissociation constant estimates merely because
the experimentalists (who did their own analysis) did not know how to
solve two quadratics in two unknowns (they manually iterated using the
quadratic formula for a single unknown, and since their model was highly
over-parameterized, they stopped, likely due to exhaustion, well before
convergence). My main conclusion from this is that we should trust
experimentalist to give us their measurements, but rate law structure
selection and parameter estimation should be left to people how have
some experience in data analysis. EML, or perhaps something like ESBML,
would create the intermediate needed to emphasize that a hand-off is
needed at that point. The bottom line is that, since parameter estimates
can be highly conditional on the selected rate law structure, the
original data should get much more attention than any parameter
estimates derived from it.
EML should be given high priority, I would think.
tom
Robert Phair wrote:
> Another alternative might be to move forward on an Experiment Markup
> Language that would capture both experimental protocols and the resulting
> experimental data. This would allow SBML to retain its identity as a model
> exchange medium while beginning a separate and distinct "EML" that would
> capture everything important about an experiment and its resulting data
> sets. Lots of modelers start with data, not just R users, and all of them
> would benefit from a standard format for experimental design/protocol and
> experimental data exchange. I believe Upi Bhalla started a thread along
> these lines last year on sbml-discuss, but even if the SBML editors tell us
> it's off topic, there may be quite a few SBML folks who see the need for an
> EML and would be willing to invest time and energy to make it happen.
>
> R-based tools and anyone else interested in parameter estimation (and AIC)
> could then build interesting tools that take both SBML and EML as inputs.
>
> RDP
>
> Robert D Phair, PhD
> Chief Scientific Officer
> Integrative Bioinformatics Inc
> www.integrativebioinformatics.com
> ProcessDB
> Test. Validate. Extend.
>
>
>
> -----Original Message-----
> From: sbml-discuss-bounces@caltech.edu
> [mailto:sbml-discuss-bounces@caltech.edu] On Behalf Of Michael Hucka
> Sent: Friday, March 30, 2007 11:04 PM
> To: LibSBML Discussion List
> Cc: sbml-discuss@caltech.edu
> Subject: [sbml-discuss] Data put into models
>
> [I'm cross-posting this to sbml-discuss as a way of moving
> this SBML question there, since it's not really a
> libsbml-discuss topic but rather an sbml-discuss topic. --MH]
>
> txr24> since R users tend to start with data rather than
> txr24> models, it seems to me that models ought to contain
> txr24> (within reactions) the original data used to
> txr24> estimate the rate law parameters. This would allow
> txr24> parameter confidence interval estimates and
> txr24> comparisons of rate law structures using metrics
> txr24> like Akaike's Information Criterion. Is SBML moving
> txr24> that way?
>
> People ask about putting data into SBML (usually the results
> of simulation, but sometimes experimental data). However, I
> really don't think data should be put into the SBML itself,
> and I think I'm not alone in this opinion. First, a given
> model may have been constructed from a lot of different data
> sources -- would you put them all in? Second, there are too
> many different possible data formats in use. Third, putting
> data into the model would inflate the size of the models
> (which some people complain are already too big, being XML).
> And fourth, if you really insist, you can put the data into
> <annotation> elements in a model anyway; nothing is
> preventing that, so the capability is technically there
> already.
>
> A better approach would be to put data into separate files,
> and link to them from the SBML using (say) RDF annotations.
> A meta format could easily be developed (like jar files,
> which are really just zip files) that carries both the SBML
> and the collateral data files. The end result would be that
> a single file is still exchanged (not multiple files as
> might be implied by stating the data should be separate).
> As a bonus, a zip format would already support compression,
> something else that people have asked for SBML.
>
> I don't mean to come down hard against the put-the-data-
> into-SBML idea; I'm just trying to explain the reasoning
> plainly.
>
> MH
>
> ____________________________________________________________
> To manage your sbml-discuss list subscription, visit
> https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
>
> For a web interface to the sbml-discuss mailing list, visit
> http://sbml.org/forums/
>
> For questions or feedback about the sbml-discuss list,
> contact sbml-team@caltech.edu.
>
>
>
> ____________________________________________________________
> To manage your sbml-discuss list subscription, visit
> https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
>
> For a web interface to the sbml-discuss mailing list, visit
> http://sbml.org/forums/
>
> For questions or feedback about the sbml-discuss list,
> contact sbml-team@caltech.edu.
>
>
>
--
Tomas Radivoyevitch
Assistant Professor
Department of Epidemiology and Biostatistics
Case Western Reserve University
10900 Euclid Avenue
Cleveland, OH 44106
Email: txr24@case.edu
phone: 216-368-1965
website: http://epbi-radivot.cwru.edu/
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 469
Registered: October 2003
|
|
Re: Data put into models
|
31 Mar '07 06:31

|
 |
|
There are various efforts in that direction. Most of them stemmed-up
from micro-arrays and protein-protein interaction formats (MAGE-ML and
PSI-ML). See in particular
http://fuge.sourceforge.net/index.php
On Sat, 31 Mar 2007, Robert Phair wrote:
> Another alternative might be to move forward on an Experiment Markup
> Language that would capture both experimental protocols and the resulting
> experimental data. This would allow SBML to retain its identity as a model
> exchange medium while beginning a separate and distinct "EML" that would
> capture everything important about an experiment and its resulting data
> sets. Lots of modelers start with data, not just R users, and all of them
> would benefit from a standard format for experimental design/protocol and
> experimental data exchange. I believe Upi Bhalla started a thread along
> these lines last year on sbml-discuss, but even if the SBML editors tell us
> it's off topic, there may be quite a few SBML folks who see the need for an
> EML and would be willing to invest time and energy to make it happen.
>
> R-based tools and anyone else interested in parameter estimation (and AIC)
> could then build interesting tools that take both SBML and EML as inputs.
>
> RDP
>
> Robert D Phair, PhD
> Chief Scientific Officer
> Integrative Bioinformatics Inc
> www.integrativebioinformatics.com
> ProcessDB
> Test. Validate. Extend.
>
>
>
> -----Original Message-----
> From: sbml-discuss-bounces@caltech.edu
> [mailto:sbml-discuss-bounces@caltech.edu] On Behalf Of Michael Hucka
> Sent: Friday, March 30, 2007 11:04 PM
> To: LibSBML Discussion List
> Cc: sbml-discuss@caltech.edu
> Subject: [sbml-discuss] Data put into models
>
> [I'm cross-posting this to sbml-discuss as a way of moving
> this SBML question there, since it's not really a
> libsbml-discuss topic but rather an sbml-discuss topic. --MH]
>
> txr24> since R users tend to start with data rather than
> txr24> models, it seems to me that models ought to contain
> txr24> (within reactions) the original data used to
> txr24> estimate the rate law parameters. This would allow
> txr24> parameter confidence interval estimates and
> txr24> comparisons of rate law structures using metrics
> txr24> like Akaike's Information Criterion. Is SBML moving
> txr24> that way?
>
> People ask about putting data into SBML (usually the results
> of simulation, but sometimes experimental data). However, I
> really don't think data should be put into the SBML itself,
> and I think I'm not alone in this opinion. First, a given
> model may have been constructed from a lot of different data
> sources -- would you put them all in? Second, there are too
> many different possible data formats in use. Third, putting
> data into the model would inflate the size of the models
> (which some people complain are already too big, being XML).
> And fourth, if you really insist, you can put the data into
> <annotation> elements in a model anyway; nothing is
> preventing that, so the capability is technically there
> already.
>
> A better approach would be to put data into separate files,
> and link to them from the SBML using (say) RDF annotations.
> A meta format could easily be developed (like jar files,
> which are really just zip files) that carries both the SBML
> and the collateral data files. The end result would be that
> a single file is still exchanged (not multiple files as
> might be implied by stating the data should be separate).
> As a bonus, a zip format would already support compression,
> something else that people have asked for SBML.
>
> I don't mean to come down hard against the put-the-data-
> into-SBML idea; I'm just trying to explain the reasoning
> plainly.
>
> MH
>
> ____________________________________________________________
> To manage your sbml-discuss list subscription, visit
> https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
>
> For a web interface to the sbml-discuss mailing list, visit
> http://sbml.org/forums/
>
> For questions or feedback about the sbml-discuss list,
> contact sbml-team@caltech.edu.
>
>
>
> ____________________________________________________________
> To manage your sbml-discuss list subscription, visit
> https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
>
> For a web interface to the sbml-discuss mailing list, visit
> http://sbml.org/forums/
>
> For questions or feedback about the sbml-discuss list,
> contact sbml-team@caltech.edu.
>
--
Nicolas LE NOVERE, Computational Neurobiology,
EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074
http://www.ebi.ac.uk/~lenov, AIM: nlenovere, MSN: nlenovere@hotmail.com
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 15
Registered: October 2003
|
|
Re: Data put into models
|
31 Mar '07 09:29

|
 |
|
I agree something like the data exchange standard GEML for expression
data would be useful for SB modeling. This is a separate standardization
effort, though, and several attempts have been made; maybe it's time to
try again.
BTW, I'd suggest calling it EDML for Experimental Data Markup Language,
in deference to our colleagues at the European Media Lab.
Best--
--Eric
-----Original Message-----
From: sbml-discuss-bounces@caltech.edu
[mailto:sbml-discuss-bounces@caltech.edu] On Behalf Of Robert Phair
Sent: Saturday, March 31, 2007 3:21 AM
To: 'SBML Discussion List'
Subject: Re: [sbml-discuss] Data put into models
Another alternative might be to move forward on an Experiment Markup
Language that would capture both experimental protocols and the
resulting
experimental data. This would allow SBML to retain its identity as a
model
exchange medium while beginning a separate and distinct "EML" that would
capture everything important about an experiment and its resulting data
sets. Lots of modelers start with data, not just R users, and all of
them
would benefit from a standard format for experimental design/protocol
and
experimental data exchange. I believe Upi Bhalla started a thread along
these lines last year on sbml-discuss, but even if the SBML editors tell
us
it's off topic, there may be quite a few SBML folks who see the need for
an
EML and would be willing to invest time and energy to make it happen.
R-based tools and anyone else interested in parameter estimation (and
AIC)
could then build interesting tools that take both SBML and EML as
inputs.
RDP
Robert D Phair, PhD
Chief Scientific Officer
Integrative Bioinformatics Inc
www.integrativebioinformatics.com
ProcessDB
Test. Validate. Extend.
-----Original Message-----
From: sbml-discuss-bounces@caltech.edu
[mailto:sbml-discuss-bounces@caltech.edu] On Behalf Of Michael Hucka
Sent: Friday, March 30, 2007 11:04 PM
To: LibSBML Discussion List
Cc: sbml-discuss@caltech.edu
Subject: [sbml-discuss] Data put into models
[I'm cross-posting this to sbml-discuss as a way of moving
this SBML question there, since it's not really a
libsbml-discuss topic but rather an sbml-discuss topic. --MH]
txr24> since R users tend to start with data rather than
txr24> models, it seems to me that models ought to contain
txr24> (within reactions) the original data used to
txr24> estimate the rate law parameters. This would allow
txr24> parameter confidence interval estimates and
txr24> comparisons of rate law structures using metrics
txr24> like Akaike's Information Criterion. Is SBML moving
txr24> that way?
People ask about putting data into SBML (usually the results
of simulation, but sometimes experimental data). However, I
really don't think data should be put into the SBML itself,
and I think I'm not alone in this opinion. First, a given
model may have been constructed from a lot of different data
sources -- would you put them all in? Second, there are too
many different possible data formats in use. Third, putting
data into the model would inflate the size of the models
(which some people complain are already too big, being XML).
And fourth, if you really insist, you can put the data into
<annotation> elements in a model anyway; nothing is
preventing that, so the capability is technically there
already.
A better approach would be to put data into separate files,
and link to them from the SBML using (say) RDF annotations.
A meta format could easily be developed (like jar files,
which are really just zip files) that carries both the SBML
and the collateral data files. The end result would be that
a single file is still exchanged (not multiple files as
might be implied by stating the data should be separate).
As a bonus, a zip format would already support compression,
something else that people have asked for SBML.
I don't mean to come down hard against the put-the-data-
into-SBML idea; I'm just trying to explain the reasoning
plainly.
MH
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates (which may be known
outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD
and in Japan, as Banyu - direct contact information for affiliates is
available at http://www.merck.com/contact/contacts.html) that may be
confidential, proprietary copyrighted and/or legally privileged. It is
intended solely for the use of the individual or entity named on this
message. If you are not the intended recipient, and have received this
message in error, please notify us immediately by reply e-mail and then
delete it from your system.
------------------------------------------------------------------------------
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 961
Registered: October 2003
|
|
Re: Data put into models
|
31 Mar '07 10:56

|
 |
|
Yes, I agree too. What I wrote before was not incompatible
with this, but this is a more sensible approach overall.
MH
>>>>> On 31 Mar 2007, "Robert Phair" <rphair@integrativebioinformatics.com> wrote:
rphair> Another alternative might be to move forward on an
rphair> Experiment Markup Language that would capture both
rphair> experimental protocols and the resulting
rphair> experimental data. This would allow SBML to retain
rphair> its identity as a model exchange medium while
rphair> beginning a separate and distinct "EML" that would
rphair> capture everything important about an experiment
rphair> and its resulting data sets. Lots of modelers
rphair> start with data, not just R users, and all of them
rphair> would benefit from a standard format for
rphair> experimental design/protocol and experimental data
rphair> exchange. I believe Upi Bhalla started a thread
rphair> along these lines last year on sbml-discuss, but
rphair> even if the SBML editors tell us it's off topic,
rphair> there may be quite a few SBML folks who see the
rphair> need for an EML and would be willing to invest
rphair> time and energy to make it happen.
rphair>
rphair> R-based tools and anyone else interested in
rphair> parameter estimation (and AIC) could then build
rphair> interesting tools that take both SBML and EML as
rphair> inputs.
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 30
Registered: October 2003
|
|
Re: Data put into models
|
31 Mar '07 22:39

|
 |
|
We do have the beginnings of an XML-based specification _for chemical
kinetic simulations_ that does just what Robert says: specifies
experimental protocols and resulting data. The goal is to enable
simulators to pick an experiment, pick a model, and run it to see how well
the model fits.
I should stress that this project is limited at present to kinetic
simulations and that there are all sorts of other interesting kinds of
data that we haven't even thought about. EML or EDML (Experimental Data
Markup Language) would possibly be too broad a name, perhaps KDML for
Kinetic Data Markup Language may be more appropriate.
The specification is separate from any given model or SBML, as we want to
be able to compare several models against the same set of experiments.
This is a good time to circulate ideas and wish-list. It would be nice to
get input from the SBML community. Mike, if you feel this needs its
separate group let us know and we can discuss mechanisms off-line.
-- Upi Bhalla
NCBS Bangalore
On Sat, March 31, 2007 12:50 pm, Robert Phair wrote:
> Another alternative might be to move forward on an Experiment Markup
Language that would capture both experimental protocols and the
resulting
> experimental data. This would allow SBML to retain its identity as a
model
> exchange medium while beginning a separate and distinct "EML" that would
capture everything important about an experiment and its resulting data
sets. Lots of modelers start with data, not just R users, and all of
them
> would benefit from a standard format for experimental design/protocol
and
> experimental data exchange. I believe Upi Bhalla started a thread along
these lines last year on sbml-discuss, but even if the SBML editors tell
us
> it's off topic, there may be quite a few SBML folks who see the need for an
> EML and would be willing to invest time and energy to make it happen.
>
> R-based tools and anyone else interested in parameter estimation (and
AIC)
> could then build interesting tools that take both SBML and EML as
inputs.
>
> RDP
>
> Robert D Phair, PhD
> Chief Scientific Officer
> Integrative Bioinformatics Inc
> www.integrativebioinformatics.com
> ProcessDB
> Test. Validate. Extend.
>
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 1
Registered: April 2007
|
|
Re: Data put into models
|
01 Apr '07 00:08

|
 |
|
We have a similar vision in building EUCLIS (an information system for
Circadian Systems Biology, http://www.bioinfo.mpg.de/euclis ) for the
submodules "Clock Experiments" and "Clock Models". We have just begun to
work on formats, metadata and SOPs (standard operating procedures) for such
experiments and would be very interested in collaborating in a community
effort.
Eduardo Mendoza
LMU Munich & University of the Philippines Diliman
P.S. You will have to register at EUCLIS to get "guest" access.
-----Ursprüngliche Nachricht-----
Von: sbml-discuss-bounces@caltech.edu
[mailto:sbml-discuss-bounces@caltech.edu] Im Auftrag von Upinder S. Bhalla
Gesendet: Sonntag, 1. April 2007 07:40
An: SBML Discussion List
Betreff: Re: [sbml-discuss] Data put into models
We do have the beginnings of an XML-based specification _for chemical
kinetic simulations_ that does just what Robert says: specifies
experimental protocols and resulting data. The goal is to enable
simulators to pick an experiment, pick a model, and run it to see how well
the model fits.
I should stress that this project is limited at present to kinetic
simulations and that there are all sorts of other interesting kinds of
data that we haven't even thought about. EML or EDML (Experimental Data
Markup Language) would possibly be too broad a name, perhaps KDML for
Kinetic Data Markup Language may be more appropriate.
The specification is separate from any given model or SBML, as we want to
be able to compare several models against the same set of experiments.
This is a good time to circulate ideas and wish-list. It would be nice to
get input from the SBML community. Mike, if you feel this needs its
separate group let us know and we can discuss mechanisms off-line.
-- Upi Bhalla
NCBS Bangalore
On Sat, March 31, 2007 12:50 pm, Robert Phair wrote:
> Another alternative might be to move forward on an Experiment Markup
Language that would capture both experimental protocols and the
resulting
> experimental data. This would allow SBML to retain its identity as a
model
> exchange medium while beginning a separate and distinct "EML" that would
capture everything important about an experiment and its resulting data
sets. Lots of modelers start with data, not just R users, and all of
them
> would benefit from a standard format for experimental design/protocol
and
> experimental data exchange. I believe Upi Bhalla started a thread along
these lines last year on sbml-discuss, but even if the SBML editors tell
us
> it's off topic, there may be quite a few SBML folks who see the need for
an
> EML and would be willing to invest time and energy to make it happen.
>
> R-based tools and anyone else interested in parameter estimation (and
AIC)
> could then build interesting tools that take both SBML and EML as
inputs.
>
> RDP
>
> Robert D Phair, PhD
> Chief Scientific Officer
> Integrative Bioinformatics Inc
> www.integrativebioinformatics.com
> ProcessDB
> Test. Validate. Extend.
>
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 123
Registered: September 2003
|
|
Re: Data put into models
|
01 Apr '07 09:29

|
 |
|
I'm glad to see people converging on a workable solution. Let me try
to summarize:
1- SBML should not contain data
2- data should be formatted in some data ML format(s)
2a- there is the suggestion of an EDML, to be created
2b- Uppi has the begginings of a Kinetic Data ML
2c- there are several omic data ML (MAGE-ML, mzData, etc.)
3- there should be a mechanism to group a certain SBML file with
several data ML files (probably some higher-level XML type that
contains these as entries, or a zip file, etc.)
I would add the point that once some data ML is adopted, we could also
format the results of simulations in that same format.
Personally I think the point 2 above should be technology neutral. I
am more interested in generic data ML's that are the equivalent to
tab-delimited files where the column and row "titles" can be
specified. This will allow software to be independent of the horrible
intricacies of MAGE-ML, mzData, etc... All the modeling software
needs is a correspondence between colums and rows to elements of the
model (eg some variable, time, etc.).
In COPASI we have a similar structure already. Our COPASI files (which
are XML not so different from SBML...) have links to data files and a
translation table that specifies what each column of the data file
represents in the model. This is in the parameter estimation part of
COPASI. (So for our software we would just need to change the readers
from reading tab-delimited to reading XML...)
I believe there is already a generic XML data format that is
equivalent to the CDF specification (a binary format invented by
astronomers that is flexible and contains metadata at the top; also
used by mass spec vendors in the form of NetCDF ANDI protocol). If no
one else knows about it I will try to find out and post here.
Pedro Mendes
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 287
Registered: September 2003
|
|
Re: Data put into models
|
01 Apr '07 15:49

|
 |
|
Hi Pedro,
NetCDF files have three equivalent representations, binary (CDF),
plain-text (CDL), and XML (NcML), which you can move among with the
ncdump and ncgen tools. For more information, see:
http://www.unidata.ucar.edu/software/netcdf/ncml/
Incidentally, the ANDI NetCDF format is still alive and well. I made
use of it recently for a project at JPL.
Ben
On Apr 1, 2007, at 9:29 AM, Pedro Mendes wrote:
> I believe there is already a generic XML data format that is
> equivalent to the CDF specification (a binary format invented by
> astronomers that is flexible and contains metadata at the top; also
> used by mass spec vendors in the form of NetCDF ANDI protocol). If no
> one else knows about it I will try to find out and post here.
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 170
Registered: December 2006
|
|
Re: Data put into models
|
02 Apr '07 06:03

|
 |
|
Hello All,
It is sometimes good to not read email at the weekend. The benefit is
that you can step in when the dust is settled :)
I agree with Pedro's summary that suggest that the data is kept
independent from the model in preferably an already existing format.
NetCDF would be one of my preferred candidates as it is agnostic to
the content. The connection between data and model should be made in a
third XML based format which allows for connecting data points with
model entities.
Unfortunately this third XML would dependent on the chosen data and
model formats. We as the SBML community may want to restrict the model
format to be SBML. Note, this is not really a restriction as long as the
SBML model can be translated to another language such as CellML or
BioPAX the mapping is translatable too. Still it would require that we
have a set of SBML to data format mappings. I suggest that all these
mappings are derived from one comman format which abstracts the data
format to what we need in SBML, i.e., as Pedro observed tables
containing independent (time, initial values, parameters, etc.) and
dependent values. In addition we probably would need information about
the experiment type, which generated the data, like steady state or time
course. This type of architecture would be easily extendable to new data
formats and even new experiment types.
Thanks,
Stefan
On Sun, 01 Apr 2007 15:49:37 -0700
Ben Bornstein <ben.bornstein@jpl.nasa.gov> wrote:
> Hi Pedro,
>
> NetCDF files have three equivalent representations, binary (CDF),
> plain-text (CDL), and XML (NcML), which you can move among with the
> ncdump and ncgen tools. For more information, see:
>
> http://www.unidata.ucar.edu/software/netcdf/ncml/
>
> Incidentally, the ANDI NetCDF format is still alive and well. I made
>
> use of it recently for a project at JPL.
>
> Ben
>
>
> On Apr 1, 2007, at 9:29 AM, Pedro Mendes wrote:
>
> > I believe there is already a generic XML data format that is
> > equivalent to the CDF specification (a binary format invented by
> > astronomers that is flexible and contains metadata at the top; also
> > used by mass spec vendors in the form of NetCDF ANDI protocol). If
> > no one else knows about it I will try to find out and post here.
>
> ____________________________________________________________
> To manage your sbml-discuss list subscription, visit
> https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
>
> For a web interface to the sbml-discuss mailing list, visit
> http://sbml.org/forums/
>
> For questions or feedback about the sbml-discuss list,
> contact sbml-team@caltech.edu.
--
Stefan Hoops, Ph.D.
Senior Project Associate
Virginia Bioinformatics Institute - 0477
Virginia Tech
Bioinformatics Facility I
Blacksburg, Va 24061, USA
Phone: (540) 231-1799
Fax: (540) 231-2606
Email: shoops@vbi.vt.edu
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 33
Registered: March 2007
|
|
Re: Data put into models
|
01 Apr '07 10:23

|
 |
|
There is an important point worth re-stating. Both Upi and I are suggesting
that the EML/EDML/KDML should capture TWO things:
1) the experimental protocol (probably as a time line of experimental
perturbations)
2) the resulting experimental data ( as time - measurement value pairs) with
time synchronized to the protocol time line.
There may, of course, be many perturbations and many different measurements
in the same experiment. The key point is that a data file without the
corresponding experimental protocol is not very helpful.
Mones Berman and his colleagues learned this the hard way in the 1970s
during development of the SAAM (Simulation Analysis And Modeling) program,
and I think there is general agreement on this point.
So whatever the ML acronym I think it's important that it convey more than
just data. That's why I suggested EML.
RDP
Robert D Phair, PhD
Chief Scientific Officer
Integrative Bioinformatics Inc
301.437.0601
www.integrativebioinformatics.com
ProcessDB
Test. Validate. Extend.
-----Original Message-----
From: sbml-discuss-bounces@caltech.edu
[mailto:sbml-discuss-bounces@caltech.edu] On Behalf Of Upinder S. Bhalla
Sent: Saturday, March 31, 2007 10:40 PM
To: SBML Discussion List
Subject: Re: [sbml-discuss] Data put into models
We do have the beginnings of an XML-based specification _for chemical
kinetic simulations_ that does just what Robert says: specifies
experimental protocols and resulting data. The goal is to enable
simulators to pick an experiment, pick a model, and run it to see how well
the model fits.
I should stress that this project is limited at present to kinetic
simulations and that there are all sorts of other interesting kinds of
data that we haven't even thought about. EML or EDML (Experimental Data
Markup Language) would possibly be too broad a name, perhaps KDML for
Kinetic Data Markup Language may be more appropriate.
The specification is separate from any given model or SBML, as we want to
be able to compare several models against the same set of experiments.
This is a good time to circulate ideas and wish-list. It would be nice to
get input from the SBML community. Mike, if you feel this needs its
separate group let us know and we can discuss mechanisms off-line.
-- Upi Bhalla
NCBS Bangalore
On Sat, March 31, 2007 12:50 pm, Robert Phair wrote:
> Another alternative might be to move forward on an Experiment Markup
Language that would capture both experimental protocols and the
resulting
> experimental data. This would allow SBML to retain its identity as a
model
> exchange medium while beginning a separate and distinct "EML" that would
capture everything important about an experiment and its resulting data
sets. Lots of modelers start with data, not just R users, and all of
them
> would benefit from a standard format for experimental design/protocol
and
> experimental data exchange. I believe Upi Bhalla started a thread along
these lines last year on sbml-discuss, but even if the SBML editors tell
us
> it's off topic, there may be quite a few SBML folks who see the need for
an
> EML and would be willing to invest time and energy to make it happen.
>
> R-based tools and anyone else interested in parameter estimation (and
AIC)
> could then build interesting tools that take both SBML and EML as
inputs.
>
> RDP
>
> Robert D Phair, PhD
> Chief Scientific Officer
> Integrative Bioinformatics Inc
> www.integrativebioinformatics.com
> ProcessDB
> Test. Validate. Extend.
>
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 46
Registered: September 2003
|
|
Re: Data put into models
|
01 Apr '07 11:17

|
 |
|
Hello folks,
Someone on this list posted a link to FuGE, so I looked it up, and it
seems to me that your design requirements for an experimental markup
language are captured in large part by the Functional Genomics Experiment
(FuGE) project.
In particular:
A) Experimental Protocols:
"The Protocol package allows experimental procedures to be specified as
instances of Protocol(to capture
standard protocols) and ProtocolApplication (to capture specific instances
of a protocol). The package defines three general types of
ProtocolApplication.
1. Protocols that take a physical substance as input and produce a new
type of substance as output
(MaterialTreatment, discussed in Section2.2.1).
2. Protocols that take a physical substance as input and produce data,
either destroying the input
substance or leaving the material to undergo further assays
(DataAcquisition, Section2.1.6).
3. Protocols that take data as input and transform it to produce new
types of data, such as converting
binarydata (images) to discrete data (mRNA or protein abundance values)
(DataTransformation, Section2.1.6).
B) Resulting Experimental Data:
"Essentially a Data object represents a container for a set of
multidimensional data
matrices, and the coordinate set found in each of the dimensions. This
type of flexible and self-describing
data structure has been used successfully by other compute and data
intensive domains [3, 7]. "
Furthermore, other data standards, such as PSI-MI, have managed to
integrate their formats with FuGE, so a precedent for integrating FuGE
with SBML has already been established.
check out (if you haven't already) http://fuge.sourceforge.net
Sincerely,
Jeremy
> There is an important point worth re-stating. Both Upi and I are
> suggesting that the EML/EDML/KDML should capture TWO things:
>
> 1) the experimental protocol (probably as a time line of experimental
> perturbations)
> 2) the resulting experimental data ( as time - measurement value pairs)
> with time synchronized to the protocol time line.
>
> There may, of course, be many perturbations and many different
> measurements in the same experiment. The key point is that a data file
> without the corresponding experimental protocol is not very helpful.
>
> Mones Berman and his colleagues learned this the hard way in the 1970s
> during development of the SAAM (Simulation Analysis And Modeling)
> program, and I think there is general agreement on this point.
>
> So whatever the ML acronym I think it's important that it convey more
> than just data. That's why I suggested EML.
>
> RDP
>
> Robert D Phair, PhD
> Chief Scientific Officer
> Integrative Bioinformatics Inc
> 301.437.0601
> www.integrativebioinformatics.com
> ProcessDB
> Test. Validate. Extend.
>
>
> -----Original Message-----
> From: sbml-discuss-bounces@caltech.edu
> [mailto:sbml-discuss-bounces@caltech.edu] On Behalf Of Upinder S. Bhalla
> Sent: Saturday, March 31, 2007 10:40 PM
> To: SBML Discussion List
> Subject: Re: [sbml-discuss] Data put into models
>
> We do have the beginnings of an XML-based specification _for chemical
> kinetic simulations_ that does just what Robert says: specifies
> experimental protocols and resulting data. The goal is to enable
> simulators to pick an experiment, pick a model, and run it to see how
> well the model fits.
>
> I should stress that this project is limited at present to kinetic
> simulations and that there are all sorts of other interesting kinds of
> data that we haven't even thought about. EML or EDML (Experimental Data
> Markup Language) would possibly be too broad a name, perhaps KDML for
> Kinetic Data Markup Language may be more appropriate.
>
> The specification is separate from any given model or SBML, as we want
> to be able to compare several models against the same set of
> experiments.
>
> This is a good time to circulate ideas and wish-list. It would be nice
> to get input from the SBML community. Mike, if you feel this needs its
> separate group let us know and we can discuss mechanisms off-line.
>
> -- Upi Bhalla
> NCBS Bangalore
>
>
>
> On Sat, March 31, 2007 12:50 pm, Robert Phair wrote:
>> Another alternative might be to move forward on an Experiment Markup
> Language that would capture both experimental protocols and the
> resulting
>> experimental data. This would allow SBML to retain its identity as a
> model
>> exchange medium while beginning a separate and distinct "EML" that
>> would
> capture everything important about an experiment and its resulting data
> sets. Lots of modelers start with data, not just R users, and all of
> them
>> would benefit from a standard format for experimental design/protocol
> and
>> experimental data exchange. I believe Upi Bhalla started a thread
>> along
> these lines last year on sbml-discuss, but even if the SBML editors tell
> us
>> it's off topic, there may be quite a few SBML folks who see the need
>> for
> an
>> EML and would be willing to invest time and energy to make it happen.
>>
>> R-based tools and anyone else interested in parameter estimation (and
> AIC)
>> could then build interesting tools that take both SBML and EML as
> inputs.
>>
>> RDP
>>
>> Robert D Phair, PhD
>> Chief Scientific Officer
>> Integrative Bioinformatics Inc
>> www.integrativebioinformatics.com
>> ProcessDB
>> Test. Validate. Extend.
>>
>
>
> ____________________________________________________________
> To manage your sbml-discuss list subscription, visit
> https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
>
> For a web interface to the sbml-discuss mailing list, visit
> http://sbml.org/forums/
>
> For questions or feedback about the sbml-discuss list,
> contact sbml-team@caltech.edu.
>
>
>
> ____________________________________________________________
> To manage your sbml-discuss list subscription, visit
> https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
>
> For a web interface to the sbml-discuss mailing list, visit
> http://sbml.org/forums/
>
> For questions or feedback about the sbml-discuss list,
> contact sbml-team@caltech.edu.
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 961
Registered: October 2003
|
|
|
Posts: 961
Registered: October 2003
|
|
|
Posts: 961
Registered: October 2003
|
|
Re: Data put into models
|
02 Apr '07 00:39

|
 |
|
bhalla> This is a good time to circulate ideas and
bhalla> wish-list. It would be nice to get input from the
bhalla> SBML community. Mike, if you feel this needs its
bhalla> separate group let us know and we can discuss
bhalla> mechanisms off-line.
So far it's still reasonably on-topic for the SBML list,
though at some point if things go well and the details start
to get really fleshed out, it's going to be too off-topic.
A separate list at that point might be quite appropriate if
other people agree discussions will continue indefinitely.
Past efforts have eventually petered out, though. I think
what needs to happen is someone (preferrably someone with
hands-on experience in the matter) who's level-headed and
unbiased, really has to champion the effort, start writing a
proposal to be distributed and discussed, etc. (No, I'm not
volunteering :-) and I'd be wrong anyway.)
MH
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 469
Registered: October 2003
|
|
Re: Data put into models
|
02 Apr '07 00:51

|
 |
|
On Mon, 2 Apr 2007, Michael Hucka wrote:
> bhalla> This is a good time to circulate ideas and
> bhalla> wish-list. It would be nice to get input from the
> bhalla> SBML community. Mike, if you feel this needs its
> bhalla> separate group let us know and we can discuss
> bhalla> mechanisms off-line.
>
> So far it's still reasonably on-topic for the SBML list,
> though at some point if things go well and the details start
> to get really fleshed out, it's going to be too off-topic.
> A separate list at that point might be quite appropriate if
> other people agree discussions will continue indefinitely.
I think there are two completely different topics intermingled in the
current thread.
1) Difference between a model and a specific instance of a model;
Ability to encode ranges of parameters; Describe of parameter
optimisation procedures etc.
This is definitively related to SBML. We have talked of the problem
several times in the past, and it is probably going to be a hot debate
in the future (and that depends on other stuff like ability to use
matrices (arrays)). For instance, BioModels DB cannot continue much
further with the current situation: a) Many models require parameter
scan to be curated, and the curation picture does not correspond to
the SBML distributed. b) When "identical" models with just a different
value in a parameter produce qualitatively different results, we
create different entries in the database.
2) How to encode data-sets used to derive a model; How to encode the
result of a simulation.
This is not an SBML problem. The first part belongs to data
generators. There are many efforts out there, and urge to not reinvent
the wheel, or if you do, to base your effort on FuGE and Co. The
second part belongs to the simulators. It is much needed, and would be
invaluable to benchmark simulators, curate models etc. The output of
simulators being widely different, the format has to be carefully
crafted. The size will be a significant problem. We are running
mesoscopic simulations of a simple synapse at the moment. The output
is the position of each entity, their internal structure and their
orientation, at each timestep. That allows to generate very nice
movies and traces. But each 1s simulation generates a file of 300 GB...
--
Nicolas LE NOVERE, Computational Neurobiology,
EMBL-EBI, Wellcome-Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Tel: +44(0)1223494521, Fax: +44(0)1223494468, Mob: +44(0)7833147074
http://www.ebi.ac.uk/~lenov, AIM: nlenovere, MSN: nlenovere@hotmail.com
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 60
Registered: September 2003
|
|
Re: Data put into models
|
02 Apr '07 12:39

|
 |
|
On Apr 2, 2007, at 12:51 AM, Nicolas Le Novere wrote:
>
> 2) How to encode data-sets used to derive a model; How to encode the
> result of a simulation.
I see these are two separate issues -
2a) Data sets to derive a model
- this depends widely on the technology used in the experiment, e.g.,
microarray vs. confocal imaging will have quite different data
- some of these formats are standardized already, e.g, FuGE, and are
way beyond the scope of SBML
2b) Encode results of simulation
- this is quite relevant to SBML for the points Nicolas and others
have already made and would be very convenient to have a standardized
format asap
- to a limited extent this problem is already approachable if we
limit ourselves to SMBL L2-compatible - but will be far more complex
when L3 enhancements become included, e.g., spatial models - or if we
try to include things like Nicolas' simulations (which nevertheless
must be dealt with eventually). A L1 version of this might be
something as simple as an xml-encoded CSV table.
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 30
Registered: October 2003
|
|
Re: Data put into models
|
03 Apr '07 23:07

|
 |
|
To build on the interest of this group in the EDML/KDML idea,
we thought we would suggest some concrete specifications and
bring up the issues we have encountered in devising KDML.
At some point soon we should do a hands-up of people on the list who
are interested in bashing out details and coming up with a 0.1 version.
I would like to start out quickly with something really simple but
functional, and then by fooling around with it we will
have a better feel for how it should evolve. Sorry about the length
of this note!
I'd first like to acknowledge Niraj, Subhasis, Poorvi, Pragati and Harsha
in my lab who have contributed to these ideas.
Our initial goals (a subset of those expressed on this list) include:
- Define quantitative experiments performed on a biochemical system.
- Use this data to 'drive' simulators to run specific models to
replicate the experiment
- Compare simulated results with 'real' values.
- Do this in a model and simulator-independent way.
Our current design (we have some preliminary XML specs for those
who like such things) includes the following:
1. Experiment types:
- Steady state measurements
(e.g., % in phosphorylated state in control or experimental cases)
- Time-series measurements
(e.g., add something at time T and watch time-evolution)
- Repeated protocols: repeat a certain sequence of procedures
(e.g. a dose response curve is a series of steady-state measurements)
2. Model entities:
This is the tricky bit. We want to be able to work on any model,
so we cannot use model-specific identifiers. But the alternative
assumes a certain amount of markup to help us track down the
molecules.
- Molecules: Use Gene Ontology Id.
Parameters are concentration and buffering status
- Reactions: Use SBO reaction category, and reactants
Parameters are rate constants, which in turn depend on
SBO categories.
3. Data:
This is essentially headed tables (cf Pedro's comment). The table
headings specify which entity and parameter is being measured.
The goal here is to provide the experimental values, and specify
how to extract the equivalent values from the simulation.
- Entity values, confidence intervals
- Time series: value, time.
- Dose-response data: value, value.
- Functions of the above (e.g., % phosphorylation w.r.t control)
A bit of MathML may be involved.
Note that this may get recursive: A % value may depend on
a prior experiment to get a control value. How do we
handle this?
The above are what we would like to do to get started. At this point the
main challenge is point 2: tying the model entities in the KDML to specific
entities in, say, an SBML model. If we hack this in (and we have!) we
have very preliminary implementations that can generate simulator control
scripts for GENESIS/MOOSE.
4. Conditions and meta-data
(Note the FuGE goals below).
Experiment details: system, experimenter, citation
Experiment conditions: Temperature, pH...
Sample sources: Human, primate, bovine, mouse
Reagent sources
Simulation details: Simulator, timestep, numerical method, tolerance...
5. Related and background work:
FuGE: (Functional Genomics Experiment) provides a framework
for standards development efforts in the life sciences. They
specify formats for reporting of proteomics, mass spec, gels
and so on. Their markup language, FuGE-ML, provides a framework
for capturing complete laboratory work-flows, enabling the
integration of pre-existing data formats.
Comment: They seem to expect that specific domains will provide
their own MLs within this framework. This is a role we could fill.
EUCLIS: This is a joint venture for creation of an information
system for circadian systems biology that will connect models and
their internal documentation to simulation results, experiment
annotation and experimental results.
Comment: I think there is lots of overlap here, and Eduardo has already
expressed interest in working on a community effort.
NetCDF: This is a standard portable data format good for high volume
data. As Pedro and Stefan mentioned, it has converters including to XML
formats.
Comment 1: I think it is overkill at this stage. Our experience is
that marking up experimental biochem data (time-series or dose-responses)
involves at most 10-20 data points. Our experience with NetCDF (we use
it for neuronal recording data) is that it gets tedious to do all the setup
functions. Does someone have a specific example where we really do need
to store masses of biochemical data?
Comment 2: Several people have said that they would like to separate
data and experiment as well as model specification. This is an
important discussion, but seems to span SBML model definitions as well
as the nascent KDML spec.
Various simulator-specific formats: These would be very
useful to establish a commonly used base set of requirements.
Cheers,
Upi
--
Upinder S. Bhalla National Centre for Biological Sciences,
bhalla@ncbs.res.in Tata Institute of Fundamental Research,
+91-80-23666130 Bellary Road,
Fax: +91-80-23636662 Bangalore 560065, INDIA
Web: http://www.ncbs.res.in/~bhalla/index.html
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 30
Registered: October 2003
|
|
Re: Data put into models
|
09 Apr '07 23:30
|
 |
|
Since there has been deafening silence on this thread I'd like to ask
specific people if they would like to help frame some of the following
goals for an EDML/KDML:
- Suggest specific kinds of experiment/data that we may have missed out
(Pedro? Eduardo?)
- Have a look at our XML prototype (Nicolas? Mike?)
- Help set up a way of generically identifying entities in a
model-independent manner (Nicolas? Mike?)
- Work on a driver from the KDML to their specific simulator (Herbert?
Ion? Pedro?)
- Anything else!
We should have a working prototype in a very few weeks, so I do hope we
can get community input soon.
Thanks,
Upi
--
Upinder S. Bhalla National Centre for Biological Sciences,
bhalla@ncbs.res.in Tata Institute of Fundamental Research,
+91-80-23666130 Bellary Road,
Fax: +91-80-23636662 Bangalore 560065, INDIA
Web: http://www.ncbs.res.in/~bhalla/index.html
On Wed, April 4, 2007 11:37 am, Upinder S. Bhalla wrote:
> To build on the interest of this group in the EDML/KDML idea,
> we thought we would suggest some concrete specifications and
> bring up the issues we have encountered in devising KDML.
> At some point soon we should do a hands-up of people on the list who are
interested in bashing out details and coming up with a 0.1 version. I
would like to start out quickly with something really simple but
functional, and then by fooling around with it we will
> have a better feel for how it should evolve. Sorry about the length of
this note!
>
> I'd first like to acknowledge Niraj, Subhasis, Poorvi, Pragati and
Harsha
> in my lab who have contributed to these ideas.
>
> Our initial goals (a subset of those expressed on this list) include: -
Define quantitative experiments performed on a biochemical system. - Use
this data to 'drive' simulators to run specific models to
> replicate the experiment
> - Compare simulated results with 'real' values.
> - Do this in a model and simulator-independent way.
>
> Our current design (we have some preliminary XML specs for those who
like such things) includes the following:
>
> 1. Experiment types:
> - Steady state measurements
> (e.g., % in phosphorylated state in control or experimental
cases)
> - Time-series measurements
> (e.g., add something at time T and watch time-evolution) -
Repeated protocols: repeat a certain sequence of procedures
(e.g. a dose response curve is a series of steady-state
> measurements)
> 2. Model entities:
> This is the tricky bit. We want to be able to work on any model,
so we cannot use model-specific identifiers. But the alternative
assumes a certain amount of markup to help us track down the
> molecules.
> - Molecules: Use Gene Ontology Id.
> Parameters are concentration and buffering status
> - Reactions: Use SBO reaction category, and reactants
> Parameters are rate constants, which in turn depend on
SBO categories.
> 3. Data:
> This is essentially headed tables (cf Pedro's comment). The
table
> headings specify which entity and parameter is being measured.
The goal here is to provide the experimental values, and specify
how to extract the equivalent values from the simulation. -
Entity values, confidence intervals
> - Time series: value, time.
> - Dose-response data: value, value.
> - Functions of the above (e.g., % phosphorylation w.r.t control)
> A bit of MathML may be involved.
> Note that this may get recursive: A % value may depend
on
> a prior experiment to get a control value. How do we
handle this?
>
> The above are what we would like to do to get started. At this point the
main challenge is point 2: tying the model entities in the KDML to
specific
> entities in, say, an SBML model. If we hack this in (and we have!) we
have very preliminary implementations that can generate simulator
control
> scripts for GENESIS/MOOSE.
>
> 4. Conditions and meta-data
> (Note the FuGE goals below).
> Experiment details: system, experimenter, citation
> Experiment conditions: Temperature, pH...
> Sample sources: Human, primate, bovine, mouse
> Reagent sources
>
> Simulation details: Simulator, timestep, numerical method, tolerance...
>
>
> 5. Related and background work:
> FuGE: (Functional Genomics Experiment) provides a framework
> for standards development efforts in the life sciences. They
> specify formats for reporting of proteomics, mass spec, gels
> and so on. Their markup language, FuGE-ML, provides a framework
> for capturing complete laboratory work-flows, enabling the
> integration of pre-existing data formats.
> Comment: They seem to expect that specific domains will provide
> their own MLs within this framework. This is a role we could fill.
>
> EUCLIS: This is a joint venture for creation of an information
> system for circadian systems biology that will connect models and their
internal documentation to simulation results, experiment
> annotation and experimental results.
> Comment: I think there is lots of overlap here, and Eduardo has already
expressed interest in working on a community effort.
>
> NetCDF: This is a standard portable data format good for high volume
data. As Pedro and Stefan mentioned, it has converters including to XML
formats.
> Comment 1: I think it is overkill at this stage. Our experience is that
marking up experimental biochem data (time-series or
dose-responses)
> involves at most 10-20 data points. Our experience with NetCDF (we use
it for neuronal recording data) is that it gets tedious to do all the
setup
> functions. Does someone have a specific example where we really do need
to store masses of biochemical data?
> Comment 2: Several people have said that they would like to separate
data and experiment as well as model specification. This is an
> important discussion, but seems to span SBML model definitions as well
as the nascent KDML spec.
>
> Various simulator-specific formats: These would be very
> useful to establish a commonly used base set of requirements.
>
> Cheers,
> Upi
> --
> Upinder S. Bhalla National Centre for Biological Sciences,
bhalla@ncbs.res.in Tata Institute of Fundamental Research,
+91-80-23666130 Bellary Road,
> Fax: +91-80-23636662 Bangalore 560065, INDIA
> Web: http://www.ncbs.res.in/~bhalla/index.html
>
>
>
>
>
> ____________________________________________________________
> To manage your sbml-discuss list subscription, visit
> https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
>
> For a web interface to the sbml-discuss mailing list, visit
> http://sbml.org/forums/
>
> For questions or feedback about the sbml-discuss list,
> contact sbml-team@caltech.edu.
>
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 11
Registered: March 2007
|
|
|
Posts: 16
Registered: September 2003
|
|
Re: Data put into models
|
31 Mar '07 12:32
|
 |
|
For many of the same reasons (and others given by others in this thread),
parameter values and initial conditions should be factored out of the SBML
model and placed in their own file. Particularly when you get into
parameter estimation, the concept of parameter values stored in the SBML
file gives nothing but headaches. The output of a paramter
estimation run is a new parameter set. Where does it go? Do we want to
create a new copy of the SBML file that copies the non-parameter data?
Probably not. We want it in its own place. And not as a second set of
parameters in the original SBML file, either!
Fundamentally, there can be multiple paramter sets associated with a given
SBML model, that a user can choose between. Conceivably, there might be
some parameter values that get shared across a collection of models, as
well. So there is no one-to-one mapping.
All of these various pieces ("model", parameter values, experimental
data, simulation outputs, objective functions that relate experimental
data to simulation output, to name a few) need their own separate
existance. Of course, this then begs for some mechanism for collecting all
of the pieces together. A good analogy is how programmers collect
all the various files (and not just sourcecode files) that make up a
program into a "project".
-- Cliff
On Fri, 30 Mar 2007, Michael Hucka wrote:
> [I'm cross-posting this to sbml-discuss as a way of moving
> this SBML question there, since it's not really a
> libsbml-discuss topic but rather an sbml-discuss topic. --MH]
>
> txr24> since R users tend to start with data rather than
> txr24> models, it seems to me that models ought to contain
> txr24> (within reactions) the original data used to
> txr24> estimate the rate law parameters. This would allow
> txr24> parameter confidence interval estimates and
> txr24> comparisons of rate law structures using metrics
> txr24> like Akaike's Information Criterion. Is SBML moving
> txr24> that way?
>
> People ask about putting data into SBML (usually the results
> of simulation, but sometimes experimental data). However, I
> really don't think data should be put into the SBML itself,
> and I think I'm not alone in this opinion. First, a given
> model may have been constructed from a lot of different data
> sources -- would you put them all in? Second, there are too
> many different possible data formats in use. Third, putting
> data into the model would inflate the size of the models
> (which some people complain are already too big, being XML).
> And fourth, if you really insist, you can put the data into
> <annotation> elements in a model anyway; nothing is
> preventing that, so the capability is technically there
> already.
>
> A better approach would be to put data into separate files,
> and link to them from the SBML using (say) RDF annotations.
> A meta format could easily be developed (like jar files,
> which are really just zip files) that carries both the SBML
> and the collateral data files. The end result would be that
> a single file is still exchanged (not multiple files as
> might be implied by stating the data should be separate).
> As a bonus, a zip format would already support compression,
> something else that people have asked for SBML.
>
> I don't mean to come down hard against the put-the-data-
> into-SBML idea; I'm just trying to explain the reasoning
> plainly.
>
> MH
>
> ____________________________________________________________
> To manage your sbml-discuss list subscription, visit
> https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
>
> For a web interface to the sbml-discuss mailing list, visit
> http://sbml.org/forums/
>
> For questions or feedback about the sbml-discuss list,
> contact sbml-team@caltech.edu.
>
--
Cliff Shaffer, Associate Professor Phone: (540) 231-4354
Department of Computer Science Email: shaffer@cs.vt.edu
Virginia Tech, Blacksburg, VA 24061-0106 WWW: www.cs.vt.edu/~shaffer
-------------------------------------------------------------------------
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 33
Registered: March 2007
|
|
Re: Data put into models
|
01 Apr '07 11:48
|
 |
|
I agree with Cliff. We have the headaches he describes every time we do a
parameter optimization and want to save the optimized values back to SBML
without losing the starting values.
A more radical possibility might be to factor the Units out of the SBML file
along with the parameters. A model should probably be correct independent of
units, and parameter files with parameter-associated units would even permit
mixed units where this is convenient and correct.
I would argue for the relationship between models and parameter files to be
one-to-many. I think Cliff's idea of sharing parameter files among multiple
models will create more problems than it solves, but I'd be willing to be
convinced otherwise.
It might be helpful to reserve the phrase "objective function" for the
weighted sum of squares or whatever function is optimized during the fitting
process. Functions that relate model variables to measurements are widely
referred to as "Associations" although I admit I don't know what SBO has to
say about this. Associations are akin to the mappings that Mike Hucka
referred to in one of his posts in this thread. They map some function of
the model variables to the column headings that Pedro Mendes described in
his nice summary as identifying a particular set of experimental data. This
mapping is an art and needs to have a prominent place in any software tool
whose goal is parameter optimization or even direct comparison of model
simulations with experimental data.
As for a mechanism to assemble the files, it might be logical for each
software tool to import and assemble all the files it needs. I think, for
example, that the Virtual Cell team has a construct called an Application
that is close to Cliff's analogy to a software project.
RDP
Robert D Phair, PhD
Chief Scientific Officer
Integrative Bioinformatics Inc
www.integrativebioinformatics.com
ProcessDB
Test. Validate. Extend.
-----Original Message-----
From: sbml-discuss-bounces@caltech.edu
[mailto:sbml-discuss-bounces@caltech.edu] On Behalf Of Cliff Shaffer
Sent: Saturday, March 31, 2007 12:33 PM
To: SBML Discussion List
Cc: LibSBML Discussion List
Subject: Re: [sbml-discuss] Data put into models
For many of the same reasons (and others given by others in this thread),
parameter values and initial conditions should be factored out of the SBML
model and placed in their own file. Particularly when you get into
parameter estimation, the concept of parameter values stored in the SBML
file gives nothing but headaches. The output of a paramter
estimation run is a new parameter set. Where does it go? Do we want to
create a new copy of the SBML file that copies the non-parameter data?
Probably not. We want it in its own place. And not as a second set of
parameters in the original SBML file, either!
Fundamentally, there can be multiple paramter sets associated with a given
SBML model, that a user can choose between. Conceivably, there might be
some parameter values that get shared across a collection of models, as
well. So there is no one-to-one mapping.
All of these various pieces ("model", parameter values, experimental
data, simulation outputs, objective functions that relate experimental
data to simulation output, to name a few) need their own separate
existance. Of course, this then begs for some mechanism for collecting all
of the pieces together. A good analogy is how programmers collect
all the various files (and not just sourcecode files) that make up a
program into a "project".
-- Cliff
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 16
Registered: September 2003
|
|
Re: Data put into models
|
02 Apr '07 08:32
|
 |
|
On Sun, 1 Apr 2007, Robert Phair wrote:
> It might be helpful to reserve the phrase "objective function" for the
> weighted sum of squares or whatever function is optimized during the fitting
> process. Functions that relate model variables to measurements are widely
> referred to as "Associations" although I admit I don't know what SBO has to
> say about this. Associations are akin to the mappings that Mike Hucka
> referred to in one of his posts in this thread. They map some function of
> the model variables to the column headings that Pedro Mendes described in
> his nice summary as identifying a particular set of experimental data. This
> mapping is an art and needs to have a prominent place in any software tool
> whose goal is parameter optimization or even direct comparison of model
> simulations with experimental data.
Yes, I was too rushed posting that message, and I goofed in referring to
"objective functions" in the way that I did. What you are describing is
what we call a "transform". Unfortunately, it is not clear that a
transform can be anything other than an executable for some computer
program.
We do have a need for storing separate objective functions. One example of
where we do this is the comparison between the (transformed!) simulation
data with the experimental data. Our objective function defines the
"scoring" for the strength of the match between the experimental data and
the experiment, and also the relative weighting of the various experiments
being matched (there is usually a battery of experiments). This can be
extremely model specific. As an example, in John Tyson's yeast cell model,
we have 100-150 "experiments" to match, one for each mutated form of the
cell. Each experiment has information tied to it, such as "viable" and the
mean mass at division, or "arrested in G1 phase". Our objective funtion
defines a scoring for the transformed simulation output. For example, if
the simulation agrees with the experimental data that this mutated form is
viable, but has a slightly different mass at division, it might be a minor
penalty. If both simulation and experiment say the mutation is not viable,
but get different phases in the cycle where the cell fails, would be a
bigger penalty. If they disagree on the issue of viable vs. inviable, that
is a much bigger penalty. In our case, to first order the issue is how
many of the mutations will the simulation get "right", but then how far
off they are will affect the detailed scores. It is a big part of the
modeling job for our modelers to define this objective function correctly.
-- Cliff
--
Cliff Shaffer, Associate Professor Phone: (540) 231-4354
Department of Computer Science Email: shaffer@cs.vt.edu
Virginia Tech, Blacksburg, VA 24061-0106 WWW: www.cs.vt.edu/~shaffer
-------------------------------------------------------------------------
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 46
Registered: August 2004
|
|
|
Posts: 349
Registered: September 2003
|
|
Re: Data put into models
|
01 Apr '07 16:06
|
 |
|
JSim uses the idea of a project to store many things about a model, including data sets and information on model runs, this is the closest I've seen to what people are taking about. Personally I don't like the idea of different files possibly scattered across a hard drive, it is so easy for files to become detached from their semantic group. Cliff Shaffer has brought up on a number of occasions the idea for a kind of zip file that represents a given model that includes internally separate files. Zip files don't have to be unpacked to be accessible and could provide a means to store the many different kinds of information required to instantiate a complete model.
Herbert Sauro
-----Original Message-----
From: sbml-discuss-bounces@caltech.edu on behalf of Robert Phair
Sent: Sun 4/1/2007 11:48 AM
To: 'SBML Discussion List'
Cc:
Subject: Re: [sbml-discuss] Data put into models
I agree with Cliff. We have the headaches he describes every time we do a
parameter optimization and want to save the optimized values back to SBML
without losing the starting values.
A more radical possibility might be to factor the Units out of the SBML file
along with the parameters. A model should probably be correct independent of
units, and parameter files with parameter-associated units would even permit
mixed units where this is convenient and correct.
I would argue for the relationship between models and parameter files to be
one-to-many. I think Cliff's idea of sharing parameter files among multiple
models will create more problems than it solves, but I'd be willing to be
convinced otherwise.
It might be helpful to reserve the phrase "objective function" for the
weighted sum of squares or whatever function is optimized during the fitting
process. Functions that relate model variables to measurements are widely
referred to as "Associations" although I admit I don't know what SBO has to
say about this. Associations are akin to the mappings that Mike Hucka
referred to in one of his posts in this thread. They map some function of
the model variables to the column headings that Pedro Mendes described in
his nice summary as identifying a particular set of experimental data. This
mapping is an art and needs to have a prominent place in any software tool
whose goal is parameter optimization or even direct comparison of model
simulations with experimental data.
As for a mechanism to assemble the files, it might be logical for each
software tool to import and assemble all the files it needs. I think, for
example, that the Virtual Cell team has a construct called an Application
that is close to Cliff's analogy to a software project.
RDP
Robert D Phair, PhD
Chief Scientific Officer
Integrative Bioinformatics Inc
www.integrativebioinformatics.com
ProcessDB
Test. Validate. Extend.
-----Original Message-----
From: sbml-discuss-bounces@caltech.edu
[mailto:sbml-discuss-bounces@caltech.edu] On Behalf Of Cliff Shaffer
Sent: Saturday, March 31, 2007 12:33 PM
To: SBML Discussion List
Cc: LibSBML Discussion List
Subject: Re: [sbml-discuss] Data put into models
For many of the same reasons (and others given by others in this thread),
parameter values and initial conditions should be factored out of the SBML
model and placed in their own file. Particularly when you get into
parameter estimation, the concept of parameter values stored in the SBML
file gives nothing but headaches. The output of a paramter
estimation run is a new parameter set. Where does it go? Do we want to
create a new copy of the SBML file that copies the non-parameter data?
Probably not. We want it in its own place. And not as a second set of
parameters in the original SBML file, either!
Fundamentally, there can be multiple paramter sets associated with a given
SBML model, that a user can choose between. Conceivably, there might be
some parameter values that get shared across a collection of models, as
well. So there is no one-to-one mapping.
All of these various pieces ("model", parameter values, experimental
data, simulation outputs, objective functions that relate experimental
data to simulation output, to name a few) need their own separate
existance. Of course, this then begs for some mechanism for collecting all
of the pieces together. A good analogy is how programmers collect
all the various files (and not just sourcecode files) that make up a
program into a "project".
-- Cliff
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
Posts: 24
Registered: October 2006
|
|
Re: Data put into models
|
01 Apr '07 20:28
|
 |
|
VCell has moved a few years ago to a paradigm of layer decomposition of the modeling/simulation process. We separate the (semi)abstract 'physiological' (bichemical/biophysical) model, that can have several 'applications' (init values, geometry, protocols etc.), which in turn can each have several 'simulations' (params, scans, optimization, time/space resolution). This has proven very valuable to our users (biologists/modlelers), but made it difficult to maintain compatibility with SBML - which we try, we really try hard... :) We are now working on including referential links to experimental data, metadata, comparisons, etc.
One thing that helped us, even before XML and SBML, was the use of a relational database for persistent storage at fine granularity for all entities. This is a very powerful way to do things like that from a software development perspective. But it does not help by itself with compatibility with the outside world. For that, we need standards, languages, and such. We now maintain a full dual representation of all things VCell: both XML-based, and database-based.
We are very eager to contribute on any community effort to develop new standards and languages (or modify existing ones- wink, wink...) that can help create a separation of abstractions between model, protocols, experimental data, while allowing the integration of entities relevant to a given modeling "project".
Ion
-----Original Message-----
From: sbml-discuss-bounces@caltech.edu on behalf of Herbert Sauro
Sent: Sun 4/1/2007 7:06 PM
To: SBML Discussion List
Cc:
Subject: Re: [sbml-discuss] Data put into models
JSim uses the idea of a project to store many things about a model, including data sets and information on model runs, this is the closest I've seen to what people are taking about. Personally I don't like the idea of different files possibly scattered across a hard drive, it is so easy for files to become detached from their semantic group. Cliff Shaffer has brought up on a number of occasions the idea for a kind of zip file that represents a given model that includes internally separate files. Zip files don't have to be unpacked to be accessible and could provide a means to store the many different kinds of information required to instantiate a complete model.
Herbert Sauro
-----Original Message-----
From: sbml-discuss-bounces@caltech.edu on behalf of Robert Phair
Sent: Sun 4/1/2007 11:48 AM
To: 'SBML Discussion List'
Cc:
Subject: Re: [sbml-discuss] Data put into models
I agree with Cliff. We have the headaches he describes every time we do a
parameter optimization and want to save the optimized values back to SBML
without losing the starting values.
A more radical possibility might be to factor the Units out of the SBML file
along with the parameters. A model should probably be correct independent of
units, and parameter files with parameter-associated units would even permit
mixed units where this is convenient and correct.
I would argue for the relationship between models and parameter files to be
one-to-many. I think Cliff's idea of sharing parameter files among multiple
models will create more problems than it solves, but I'd be willing to be
convinced otherwise.
It might be helpful to reserve the phrase "objective function" for the
weighted sum of squares or whatever function is optimized during the fitting
process. Functions that relate model variables to measurements are widely
referred to as "Associations" although I admit I don't know what SBO has to
say about this. Associations are akin to the mappings that Mike Hucka
referred to in one of his posts in this thread. They map some function of
the model variables to the column headings that Pedro Mendes described in
his nice summary as identifying a particular set of experimental data. This
mapping is an art and needs to have a prominent place in any software tool
whose goal is parameter optimization or even direct comparison of model
simulations with experimental data.
As for a mechanism to assemble the files, it might be logical for each
software tool to import and assemble all the files it needs. I think, for
example, that the Virtual Cell team has a construct called an Application
that is close to Cliff's analogy to a software project.
RDP
Robert D Phair, PhD
Chief Scientific Officer
Integrative Bioinformatics Inc
www.integrativebioinformatics.com
ProcessDB
Test. Validate. Extend.
-----Original Message-----
From: sbml-discuss-bounces@caltech.edu
[mailto:sbml-discuss-bounces@caltech.edu] On Behalf Of Cliff Shaffer
Sent: Saturday, March 31, 2007 12:33 PM
To: SBML Discussion List
Cc: LibSBML Discussion List
Subject: Re: [sbml-discuss] Data put into models
For many of the same reasons (and others given by others in this thread),
parameter values and initial conditions should be factored out of the SBML
model and placed in their own file. Particularly when you get into
parameter estimation, the concept of parameter values stored in the SBML
file gives nothing but headaches. The output of a paramter
estimation run is a new parameter set. Where does it go? Do we want to
create a new copy of the SBML file that copies the non-parameter data?
Probably not. We want it in its own place. And not as a second set of
parameters in the original SBML file, either!
Fundamentally, there can be multiple paramter sets associated with a given
SBML model, that a user can choose between. Conceivably, there might be
some parameter values that get shared across a collection of models, as
well. So there is no one-to-one mapping.
All of these various pieces ("model", parameter values, experimental
data, simulation outputs, objective functions that relate experimental
data to simulation output, to name a few) need their own separate
existance. Of course, this then begs for some mechanism for collecting all
of the pieces together. A good analogy is how programmers collect
all the various files (and not just sourcecode files) that make up a
program into a "project".
-- Cliff
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
____________________________________________________________
To manage your sbml-discuss list subscription, visit
https://utils.its.caltech.edu/mailman/listinfo/sbml-discuss
For a web interface to the sbml-discuss mailing list, visit
http://sbml.org/forums/
For questions or feedback about the sbml-discuss list,
contact sbml-team@caltech.edu.
|
|
|
|