BIDS Schema Tools Quick Start

The bidsschematools package is a Python package that is bundled with the BIDS specification in order to render the components of the specification document from the BIDS schema.

import bidsschematools as bst
import bidsschematools.schema

Schema loading

Schemas are loaded with the load_schema() function. The default schema is the one that is bundled as part of the package:

schema = bst.schema.load_schema()

In the BIDS repository, the schema source is a subdirectory of YAML documents that need to be compiled before they can be used. Passing such a directory into load_schema() will perform the compilation:

# Build from local repository, or else set `BIDS_SPEC_REPO` environment variable
spec = UPath(os.getenv('BIDS_SPEC_REPO', UPath(os.getcwd()).parent.parent.parent))
schema_from_directory = bst.schema.load_schema(spec / 'src' / 'schema')

load_schema() can also load a precompiled JSON schema document, such as can be found at https://bids-specification.readthedocs.io/en/latest/schema.json.

schema_path = UPath('https://bids-specification.readthedocs.io/en/latest/schema.json')
schema_from_json = bst.schema.load_schema(schema_path)

Schema organization

The schema has three top-level subdivisions (objects, rules, and meta) and also contains its own version and the version of the BIDS standard that it encodes:

print(list(schema.keys()))
print(f'BIDS-Schema version: {schema.schema_version}')
print(f'BIDS version: {schema.bids_version}')
['meta', 'objects', 'rules', 'bids_version', 'schema_version']
BIDS-Schema version: 1.2.0-dev
BIDS version: 1.10.2-dev

Note that the schema provided by load_schema() is a Namespace object, which is able to access fields with dot notation (schema.bids_version) as well as index notation (schema['bids_version']).

We can see the general structure of the schema by listing keys at the second level:

list(schema.keys(level=2))
['meta.associations',
 'meta.context',
 'meta.expression_tests',
 'meta.versions',
 'objects.columns',
 'objects.common_principles',
 'objects.datatypes',
 'objects.entities',
 'objects.enums',
 'objects.extensions',
 'objects.files',
 'objects.formats',
 'objects.metadata',
 'objects.metaentities',
 'objects.modalities',
 'objects.suffixes',
 'rules.checks',
 'rules.common_principles',
 'rules.dataset_metadata',
 'rules.directories',
 'rules.entities',
 'rules.errors',
 'rules.files',
 'rules.json',
 'rules.metaentities',
 'rules.modalities',
 'rules.sidecars',
 'rules.tabular_data']
schema['meta.associations']
<Namespace {'events': {'selectors': ['"task" in entities', "extension != '.json'"], 'target': {'suffix': 'events', 'extension': '.tsv'}, 'inherit': True}, 'aslcontext': {'selectors': ["suffix == 'asl'", "match(extension, '\\.nii(\\.gz)?$')"], 'target': {'suffix': 'aslcontext', 'extension': '.tsv'}, 'inherit': True}, 'm0scan': {'selectors': ["suffix == 'asl'", "match(extension, '\\.nii(\\.gz)?$')"], 'target': {'suffix': 'm0scan', 'extension': ['.nii', '.nii.gz']}, 'inherit': False}, 'magnitude': {'selectors': ["suffix == 'fieldmap'", "match(extension, '\\.nii(\\.gz)?$')"], 'target': {'suffix': 'magnitude', 'extension': ['.nii', '.nii.gz']}, 'inherit': False}, 'magnitude1': {'selectors': ["match(suffix, 'phase(diff|1)$')", "match(extension, '\\.nii(\\.gz)?$')"], 'target': {'suffix': 'magnitude1', 'extension': ['.nii', '.nii.gz']}, 'inherit': False}, 'bval': {'selectors': ["intersects([suffix], ['dwi', 'epi'])", "match(extension, '\\.nii(\\.gz)?$')"], 'target': {'extension': '.bval'}, 'inherit': True}, 'bvec': {'selectors': ["intersects([suffix], ['dwi', 'epi'])", "match(extension, '\\.nii(\\.gz)?$')"], 'target': {'extension': '.bvec'}, 'inherit': True}, 'channels': {'selectors': ["intersects([suffix], ['eeg', 'ieeg', 'meg', 'nirs', 'motion', 'optodes'])", "extension != '.json'"], 'target': {'suffix': 'channels', 'extension': '.tsv'}, 'inherit': True}, 'coordsystem': {'selectors': ["intersects([suffix], ['eeg', 'ieeg', 'meg', 'nirs', 'motion', 'electrodes', 'optodes'])", "extension != '.json'"], 'target': {'suffix': 'coordsystem', 'extension': '.json'}, 'inherit': True}, 'electrodes': {'selectors': ["intersects([suffix], ['eeg', 'ieeg', 'meg'])", "extension != '.json'"], 'target': {'suffix': 'electrodes', 'extension': '.tsv', 'entities': ['space']}, 'inherit': True}}>

The objects subschema contains definitions of data and metadata used throughout BIDS, while the rules subschema contains definitions of constraints that may be validated. meta contains definitions that are essential for validation.

To see how these sections interact, consider rules.files.raw.func.func, which contains the valid entities, datatypes, extensions and suffixes for functional neuroimaging files:

pprint(schema.rules.files.raw.func.func.to_dict())
{'datatypes': ['func'],
 'entities': {'acquisition': 'optional',
              'ceagent': 'optional',
              'chunk': 'optional',
              'direction': 'optional',
              'echo': 'optional',
              'part': 'optional',
              'reconstruction': 'optional',
              'run': 'optional',
              'session': 'optional',
              'subject': 'required',
              'task': 'required'},
 'extensions': ['.nii.gz', '.nii', '.json'],
 'suffixes': ['bold', 'cbv', 'sbref']}

We can look up the suffix definitions in objects.suffixes:

pprint({
    suffix: schema.objects.suffixes[suffix].to_dict()
    for suffix in schema.rules.files.raw.func.func.suffixes
})
{'bold': {'description': 'Blood-Oxygen-Level Dependent contrast (specialized '
                         'T2\\* weighting)\n',
          'display_name': 'Blood-Oxygen-Level Dependent image',
          'value': 'bold'},
 'cbv': {'description': 'Cerebral Blood Volume contrast (specialized T2\\* '
                        'weighting or difference between T1 weighted images)\n',
         'display_name': 'Cerebral blood volume image',
         'value': 'cbv'},
 'sbref': {'description': 'Single-band reference for one or more multi-band '
                          '`dwi` images.\n',
           'display_name': 'Single-band reference image',
           'value': 'sbref'}}

Note that suffixes are duplicated in the value field. Because suffixes are valid identifiers, this generally does not cause problems, and objects.suffixes[suffix] is a shorthand lookup.

This isn’t true in all fields, for example, in extensions:

schema.objects.extensions.nii_gz
<Namespace {'value': '.nii.gz', 'display_name': 'Compressed NIfTI', 'description': 'A compressed Neuroimaging Informatics Technology Initiative (NIfTI) data file.\n'}>

In this case, a lookup table needs to be built on the fly:

ext_lookup = {ext.value: ext for ext in schema.objects.extensions.values()}
pprint({
    ext: ext_lookup[ext].to_dict()
    for ext in schema.rules.files.raw.func.func.extensions
})
{'.json': {'description': 'A JSON file.\n'
                          '\n'
                          'In the BIDS specification, JSON files are primarily '
                          'used as "sidecar" files, in which metadata '
                          'describing "data"\n'
                          'files are encoded.\n'
                          'These sidecar files follow the inheritance '
                          'principle.\n'
                          '\n'
                          'There are also a few special cases of JSON files '
                          'being first-order data files, such as '
                          '`genetic_info.json`.\n',
           'display_name': 'JavaScript Object Notation',
           'value': '.json'},
 '.nii': {'description': 'A Neuroimaging Informatics Technology Initiative '
                         '(NIfTI) data file.\n',
          'display_name': 'NIfTI',
          'value': '.nii'},
 '.nii.gz': {'description': 'A compressed Neuroimaging Informatics Technology '
                            'Initiative (NIfTI) data file.\n',
             'display_name': 'Compressed NIfTI',
             'value': '.nii.gz'}}