This page provides a work-in-progress design sketch for the SBML Test Suite. Updated 2008-01-09 by M. Hucka.
Software components envisioned
The vision for the SBML Test Suite is to have both a standalone software tool that can be run by a person on their local computer, and a web system.
- The standalone software would consist of a test runner, the collection of test cases, and the collection of test results. The test runner would be designed to drive an external software package (of the likes of MathSBML, SBML ODE Solver, COPASI, etc.) that can be controlled from the command line by starting a process with specific command-line flags. The user would configure the test runner such that it would pass the correct arguments for the particular simulation package they want to test, then the runner would let the user start running the test cases. The runner would go through the 100's or 1000's of tests, controlling the simulation package to load and run each model in turn, gathering the output of each simulation run, comparing them to the expected/standard results, and finally tallying successes and failures.
- Not all software packages provide the means to control them via a command-line interface; some only have GUIs. Nevertheless, we need to provide some means for people to test those tools as well. Until we find some better technology that would also allow us to control GUIs in a portable fashion across Windows, MacOS and Linux, the fall-back approach can be the following. We give people the collection of test models for downloading and tell them "run these in your software tool however you want, gather the results in a specific way, and then upload the results to our website and we'll tally them for you". This is thus the purpose of the online version of the SBML Test Suite. It will provide a means of uploading a collection of test run outputs (collected into some format we define, perhaps a single zip file), then it will compare the results to the expected values, and finally tabulate the outcome of the comparison.
The availability of an online system also provides us with opportunities to add value to the service, in particular by providing the ability for users to store the results of the tests and indicate that they are to be taken as official results for a given software tool. Once a few software providers do this, then we can provide a matrix on the website showing the capabilities of different software packages side-by-side, much like http://wikimatrix.org provides the ability for people to compare the features of wiki systems side-by-side.
Here is a mock-up of a screen from the standalone test runner. This is not meant to be a requirement, but a starting point for some of the features that I think we will need. The online system would share some of the same features (especially in the way it displays results) but would not have the tool-driving/interfacing capabilities of the standalone software (providing instead only a way to upload results).
- This shows the test run and display screen. It consists at the top of buttons to start, stop and restart the testing, plus buttons to configure the system and also one to go visit the website if they wish.
- Next, there is a test result map. I'm envisioning this as a quick view of test results. It would have a small square for every test case. Hovering the cursor over a square would display the case number and summary description in the area below the map. When a run is started, the map clears and then as each test case is completed, progress is indicated by the corresponding square getting colored green or red depending whether the test was successful or not. Gradually, the user could see the map filling out with small green and red squares. Then they could click on the squares to find out the details of that test result. (Presumably they would mostly be interested in the red squares, and since we hope most tools will have many fewer red squares than green squares, this kind of map won't be completely unusable.)
- The "details" area under the map would show the test case number, the short description of what the case is about, and information about the test results. The results would be at least an indication of pass/fail, but I think we could also add a difference plot similar to what Frank Bergmann has done in his testing approach here: http://sys-bio.org/fbergman/compare/ The basic point is to compute the differences between the expected values and the actual values at each time point, and provide a plot, so people can see trends in the differences (which might help them to understand what's going on). Another case of failure is that the tool simply can't process the test model (e.g., because it doesn't support a certain feature). In those cases, we would only indicate failure, without a difference plot.
The configuration screen for the test runner is not shown here, but I imagine it would have two key features:
- First, it would allow someone to define the command line flags needed to drive a particular tool. I imagine this could be handled as a fill-in-the-blanks form, where we ask the user for things like "what is the command path to the software", "what is the flag to tell the tool to load a model", "what is the flag to tell the tool to set the simulation start time to X", "what is the flag to tell the tool to start a simulation run", etc., and the user would type in text strings that are the command-line flags required by the tool. Then the software could combine these flags, and issue a system call to start the software with all the right flags concatenated together. This avoids the need for shell scripts to interface to the software.
- The test runner configuration page should allow people to save these interface definitions, as well as reload them, and load new ones. The configurations would be stored as files and the users could name them. We will be able to distribute interface definitions for different software. The test runner would come with sample configurations for MathSBML, SBMLToolbox, SBML ODE Solver, etc.).
Another component not shown in the mock-up screen above (except for the wording "test case catalog" near the top) is a separate pane or screen where people can view the list of test cases and documentation about them. Each test case will have some human-written description associated with it, in addition to the short one-line description, plus there may be other documentation and explanations we want to provide. So there will need to be a place for that. In the mock-up, showed it as a separate pane of the main window, but there may be better ways of handling that. Because there are a lot of test cases, I think it would make sense to also provide another way of getting at the data: a list of the test cases. This is shown along the right-hand side in screen mock-up. The idea is to provide a scrollable list of the case numbers, as well as a text search box, so people could search for specific cases and find the results quickly.
For the expected outputs of simulation runs for each test case, we may as well use the comma-separated value (CSV) format that Andrew Finney used in the SBML Semantic Test Suite (the first generation and precursor of the new SBML Test Suite). This means the data files are ASCII, containing floating point numbers, and can be read and parsed by pretty much anything (even a human).
In addition to the expected outputs, we will also want to provide as part of the test suite documentation a plot of the simulation time-course produced by our reference software (MathSBML). These plots can be in an image format such as JPG.
Organization of the test cases
I have recently reworked the organization of the test files following an idea put forward by Akiya Jouraku and Akira Funahashi. The approach avoids putting all the files in a topical directory hierarchy (as Andrew Finney's original version had done), and avoid naming the files after the tests (e.g., "basicRules-assignmentInitialize-Species.xml"). Instead, it has a flat file organization with test cases named and numbered from 0001, putting categorical and other information within the case test definitions. This is analogous to the modern web concept of tagging versus hierarchical organization.
So the organization I have worked out now is to have a subdirectory for each test case (0001, 0002, 0003, etc.), each containing case description files, SBML files, run parameters, and sample/expected output data. The descriptions are in LateX format. There is a set of top-level LaTeX files and makefiles that pull all the individual test descriptions together, along with their simulation plots in JPGs, and create a single document that documents every test case.
Here's a sample of the nice PDF that is produced by this system.