RELAX NG schemas and validation for SBML Level 3

This page describes the use of RELAX NG schemas for validating SBML Level 3 files. Similar to W3C XML Schema, RELAX NG (RNG) provides the ability to define the valid syntax of an XML document. Importantly, RNG does not require the type of deterministic content that XML Schema requires; the latter requirement is what prevents us from writing a complete XML Schema for SBML Level 3, because Level 3 has certain features that violate the deterministic content restriction. Consequently, RELAX NG can be used to do schema-based validation of SBML Level 3 documents.

The use of SBML Level 3 packages unfortunately complicates the use of RNG schemas with existing schema processors. As discussed below, the Level 3 package schemas are not independent of the schema for SBML Level 3 Core, and they must be used together. Thus, note the following:

If you obtain an RNG schema for an SBML Level 3 package, you must also make sure to get the RNG schema for SBML Level 3 core.

We describe a process below for how to create a combined RNG schema that can be fed to an off-the-shelf RNG schema processor. Note as of 2012-12-08: As discussed below, getting a working version of a C/C++ native-code processor has been difficult. Xerces and Expat do not support RNG, and libxml2 has proven inconsistent. The Java based RNG processors such as Jing have worked better in our tests.

Directory structure in the source repository

The RNG schemas for SBML Level 3 are available from the following subdirectory within the SBML project on Github:

sbml-specifications/RelaxNG/

This directory contains two template files sbml.rng and pkg-math.rng, along with many subdirectories (one subdirectory per SBML Level 3 package). The file sbml.rng is the top-level schema template for Level 3, to be used in a manner described below, and it is designed to assume that it is physically located in the directory where the subdirectories sbml-core, sbml-mathml, etc., are located. Similarly, the file pkg.rng is the top-level template for including any additional MathML that individual packages may have added and should also be in this directory even if no additional math is to be included. When you use the schema, please be aware of this dependency on the subdirectory structure and files.

The schema for SBML L3V1 Core is located in the sbml-core subdirectory. There is an RNG schema for the MathML subset supported by SBML L3V1. This is located within the sbml-mathml subdirectory. Note the sbml core schema references the mathml schema and thus both subdirectories need to be present when validating any SBML L3 document.

The package subdirectories and the schemas contained within them use the following naming convention: the name begins with sbml- followed by the package label. Thus, for example, there is a subdirectory named sbml-comp containing the RNG schema for the Hierarchical Model Composition package. Within that subdirectory, there is a file named sbml-comp.rng that contains the RNG schema for the package.

Thus to validate a document that uses the ‘‘‘comp’’’ package you would expect a directory structure as shown.

dir:sbml-comp

  file:sbml-comp.rng

dir:sbml-core

  file:sbml-core.rng
  file:sbml-simple-types.rng

dir:sbml-mathml

  whole load of files
  file:sbml-mathml.rng

file:pkg-math.rng
file:sbml.rng

Using a schema processor

As noted above, the package schemas are not independent of the SBML Level 3 Core schema, and must be combined with the Core schema to be used to validate an SBML file. The reason for this stems from the use of mandatory attributes in the schema definitions. The problem is as follows. When a mandatory attribute is required on a given element, RNG validators will complain if it is missing; however, unless a model uses ‘‘all’’ possible Level 3 packages, some attributes are going to be missing for some packages. Thus, one cannot feed a union-of-all-schemas to an RNG validator as a means of validating any possible combination of packages that may arise in a model: the schema file must contain references to only those packages that ‘‘are actually present within the SBML document’’ being validated. (We note in passing that we have tried many ways of avoiding this catch-22 situation. It has turned out to be an unanticipated and very frustrating repercussion of attempting to use RNG.)

The following example schema shows what the content of sbml.rng should be for validating a document containing construts from the “comp” and “qual” packages.

<?xml version="1.0" encoding="UTF-8"?>

<grammar xmlns="http://relaxng.org/ns/structure/1.0"
   datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">

   <!-- include the sbml data types -->

   <include href="sbml-core/sbml-simple-types.rng"/>

   <!-- include sbml core  -->

   <include href="sbml-core/sbml-core.rng"/>

   <!-- need to include any packages that are present in the document -->

   <include href="sbml-comp/sbml-comp.rng"/>
   <include href="sbml-qual/sbml-qual.rng"/>

</grammar>

The need to construct a schema for each combination of packages is obviously not an ideal solution. We have made several attempts to solve this issue, but so far have found that the only robust solution is to generate the appropriate sbml.rng schema document ‘‘on the fly’’ based on the SBML content being validated. This approach is described below.

Example code for generating a sbml.rng file on the fly

The following is a short Python program that, using libSBML, can be used to generate the appropriate sbml.rng file for a given document.

import sys
import libsbml

def writeOpening(rng):  
  #write the first part of the file
  rng.write('<?xml version="1.0" encoding="UTF-8"?>\n\n')
  rng.write('<grammar xmlns="http://relaxng.org/ns/structure/1.0"\n')
  rng.write('\tdatatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">\n')
  rng.write('\n')
  rng.write('\t<!-- include the sbml data types -->\n')
  rng.write('\t<include href="sbml-core/sbml-core.rng"/>\n')
  rng.write('\t<include href="sbml-core/sbml-simple-types.rng"/>\n\n')

def writeClosing(rng):
  rng.write('\n</grammar>\n')
  
def writeOptional(rng, sbmlFile):
  doc = libsbml.readSBML(sbmlFile)
  rng.write('\t<!-- include the sbml package data types -->\n')
  for i in range(0, doc.getNumPlugins()):
    name = (doc.getPlugin(i)).getPackageName()
    rng.write('\t<include href="sbml-{0}/sbml-{0}.rng"/>\n'.format(name))

if len(sys.argv) != 2:
  print 'Usage: createRNGfile.py sbmlFile'
else:
  rng = open("sbml.rng", 'w')
  writeOpening(rng)
  writeOptional(rng, sys.argv[1])
  writeClosing(rng)

The pkg-math.rng file

The majority of Level 3 packages do not extend the MathML subset used by SBML and thus the pkg-math.rng need not be altered. The exceptions are the arrays and multi packages. In these cases the relevant lines in pkg.rng need to be uncommented.

SBML validation using RELAX NG schemas

Once the appropriate sbml.rng file has been created, the resulting schema can be used to validate an SBML Level 3 document. Several alternative methods are available to accomplish this; the RELAX NG home page lists a variety of different off-the-shelf validators and software systems that can be used.

Using libSBML

The libSBML validation code allows users to add validators to those currently called by the checkConsistency() function in libSBML. This facilitates the addition of RNG validation to the existing validation performed by libSBML.

LibSBML provides two approaches to performing RNG validation; calling an external validator and using the RelaxNG schema validation provided by libxml2. The examples included with libSBML illustrate the use of these methods. Note that the alternative parsers supported by libSBML, Xerces and Expat, do not provide RNG schema validation functionality; the examples will only work with libxml2.

  1. The callExternalValidator example illustrates calling an external validation engine.
  2. The rngvalidator illustrates an example of calling the rng schema validation provided by libxml2.
Our experiences with using libxml2‘s RNG validation show that it is extremely problematic. Thus, while it is supported in principle, we do not recommend using this approach in practice.

Using the online validator

Our official online validator will apply RNG schema validation for any SBML Level 3 packages that are still under development. The validator will detect any such packages that are present within an uploaded model and will apply the appropriate RNG schema validation. In order to remove the repetition of reported errors the online validator uses an adaptation of the approach detailed above.