Items

An item (also called data element in other contexts) describes a single piece of information, for example the date of symptom onset. Often, an item will correspond to a single question in a data collection form. An item has a descriptive yet short name that is unique among all items in this dataset, for example Symptom onset date. Items are indicated with the symbol Note that the item name is not intended to serve as a complete definition of the item as some details may be omitted for brevity. Complete definitions are included underneath each item name.

An item value or simply value is a piece of information related to a certain individual. For example 2016-05 is a (partial) date which could describe the symptom onset date for an individual. Its precise meaning, for example that it describes the onset of the first symptoms related to the individual's neuromuscular condition, is described by the item it is associated with. An item value may have various formats or representations; for example, this date could be stored in a registry’s database in the international format 2016-05, but entered into their data collection form by selecting the month “May” and the year “2016” in drop-down menus.

Inclusion

A mandatory item means that designated registries (see below) must collect it (that is, include it in their data collection forms and be able to store and provide the data). They are marked within the document as follows:

  • CR items are mandatory for clinician-reported registries
  • PR items are mandatory for patient-reported registries
  • Dual-reported registries should interpret this based on who reports each item in their registry.

Note that a mandatory item in the dataset does not always correspond to a mandatory field in a data collection form. It is always possible that the data provider does not know the value of a mandatory item. Although registries should make appropriate efforts to obtain the data for mandatory items, missing values generally do not preclude any individual's data from being submitted or considered in enquiries.

In addition to the appropriate mandatory items, each registry is encouraged to collect the other items in the dataset if they are relevant and feasible at the local level.

This dataset is not restrictive. Registries in the TREAT-NMD network are independent and as such are free to collect additional data according to their needs or priorities.

Item types

For each item, this dataset specifies an item type which is one of the following:

  • yes/no: An item with this type has two possible values: a positive value, often denoted as “yes” or “true”, and a negative value, often denoted as “no” or “false”. A registry may choose any label for the positive/negative values that are appropriate in the respective context. Depending on the context, a yes/no item may be implemented in a form using radio buttons, a dropdown menu, a checkbox. As noted below, a selection option for unknown values may be included.
  • decimal: A number which may include a decimal point. A decimal item is generally implemented in forms using a text box together with the proper validation. Decimal items always specify a unit or scale, where the unit is generally an SI unit (International System of Units) such as kilograms or centimetres. A registry may collect values using different units, but must be prepared to convert them to the specified unit for data submission and analysis.
  • integer: A whole number. An integer item is generally implemented using a text box with proper validation.
  • date: A point in time that must be captured by registries with a minimum resolution of month and year. Note that TREAT-NMD will only ever request and accept partial dates consisting of a month and year for submission and enquiries. Depending on the item and context, registries may collect date items by asking for the individual's age at that time. In this case, either the age can be stored or the age converted to a date using the individual's date of birth. In any case, registries must be able to provide dates, but also be able to provide data for queries using ages. A data collection form could use an input box, dropdown menus or a calendar widget for inputting these data. If an input box is used, registries may use a localised date format depending on the language and location of the users, for example the format “day/month/year” for the UK. However, registries must be able to provide dates according to the international standard ISO 8601 in a format such as “year-month-day”. Whenever dates are exchanged with TREAT-NMD, this standard will be used.
  • single selection: For each single selection item, all possible values are listed in this specification. Exactly one of those values must be collected (or no value at all). In a form, such items can be implemented using radio buttons or dropdown menus. Not all possible values listed here must be offered to users; registries may restrict the options depending on the item and the context. As noted below, a selection option for unknown values may be included. Registries may choose any order of the values in their data collection forms. Furthermore, the value of a single selection item may also be derived and not explicitly asked for, depending on the structure of the form. For example, a form may include separate sections for each type of disease-modifying therapy; then the value of the item DMT would not need to be explicitly asked for, but would be determined by which section of the form is filled out.
  • multiple selection: For a multiple selection item, everything stated for single selection items also applies. However, multiple values may be collected. The order of the collected values is irrelevant, and each possible value may only be provided once. Multiple selection items can be implemented using checkboxes or listboxes.
  • free text: A string of characters which has no restrictions on its format. Free text fields are usually implemented using text boxes.
  • restricted text: A string of characters with some restrictions on the format, for example Country of residence where the value must be a two-letter country code. The implementation in a data collection form depends on the item and the technical possibilities.

Unknown and missing values

For any item, the value may be unknown to the data provider or missing in a registry for another reason (for example, because the form was not yet completed). Therefore, this specification acknowledges that in a TREAT-NMD enquiry or data submission, there may be no value for any item, even if it is mandatory. The precise way to indicate a missing value in a TREAT-NMD enquiry or data submission (for example by leaving the cell in an Excel spreadsheet blank) will be specified for each submission.

Since this applies to all items, this dataset contains no explicit values to indicate unknown or otherwise missing values. For this reason, all Unknown values have been removed in version 2. However, a registry may add a selection option in their data collection forms for an unknown value wherever they deem appropriate. This allows a registry to track whether a data provider does not have a certain piece of information, or has merely not completed the form fully. This is important for data curation, but since curation is performed independently by the local registries, this distinction is not relevant for any analyses of data submitted to TREAT-NMD and is therefore outside the scope of this core dataset.

Wherever registries offer a selection option for an unknown value, any appropriate label may be chosen, for example “Unknown” or “I don't know”. To encourage registry users to return and complete any unknown values as soon as possible, registries may also choose to use “To be confirmed” where appropriate. Moreover, patient-reported registries may choose to include an option “I do not wish to disclose” for potentially sensitive questions in their data collection forms.

Enumerated values

For single selection and multiple selection items, all possible values are enumerated in the dataset specification.

Value ID

The value ID is a concise, stable and descriptive text that uniquely identifies a certain value for a certain item. Whenever providing data to TREAT-NMD, the value ID must be used. Like an item ID, a value ID is always in English and must not be translated into a local language. Furthermore, it is recommended that registries use the value IDs provided in the dataset specifications as internal identifiers in their registry platforms as well.

Description

The description of a value is sometimes the same as the value ID, but often it is longer in order to provide a precise and more comprehensible definition of the value. Registries may use the provided descriptions as labels in their data collection forms, but this is not required. In particular, registries which do not use English forms should of course use labels in their respective local language. Furthermore, the descriptions are generally worded for curators and clinicians, so patient-reported registries should adapt the wording where necessary to ensure that is easy to undertand for patients and their families.

Classification

For some values, the datasets provide a mapping to a classification such as the Human Phenotype Ontology (HPO) or ORPHAcodes. These mappings are provided as a convenience as the clasification may provide further information, synonyms that can be used as labels, and insight into why a certain definition was chosen in the dataset. Note that this mapping is only applies to one direction; i.e., if a certain value applies to a individual, then the mapped term applies, but not necessarily the other way around, because the value in the datset may be more specific than the term in the classification. For example, the item LGMD type in the LGMD dataset contains the values LGMD D5 which denotes a dominant Bethlem myopathy as well as LGMD R22 which denotes a recessive Bethlem myopathy. However, there exists only one ORPHAcode (ORPHA:610) for Bethlem myopathy, so both values in the dataset are mapped to the same ORPHAcode. If one only knows that ORPHA:610 applies to a certain individual, the mapping in the dataset does not provide a unique value for the item LGMD type.

Deprecation

Some values are marked as “deprecated” since a certain version of the dataset. For example, version 1 of the SMA dataset specified Part-time as a possible value for ventilation duration, while version 2 provides the more fine-grained values Part-time awake and sleeping and Part-time sleeping. Registries who had implemented version 1 of the dataset will have the value Part-time recorded in many cases and this data should be used whenever required. But for collection of new data, it should no longer be provided as a possible option and is therefore marked as deprecated.

Longitudinal items

Longitudinal items are marked with the symbol longitudinal For a given patient, many values for a longitudinal item can be collected over time. For each of these values, a datestamp which denotes the date the value refers to (for example, the date of a measurement) must be saved. For use in TREAT-NMD enquiries and data submission, it is sufficient that the datestamp has only the month and the year.

An example of a typical longitudinal item is Weight, which should be collected at each visit or registry update. When analysing the data, it is important to know when each weight measurement was made. When performing a registry update, either by the individual themselves or by a clinician after an examination, the date of the measurement will be equal to or approximately equal to the date of the registry update. In such cases, in particular when a data collection form contains questions such as “What is your current weight?”, a registry may automatically set the datestamp of a value to the entry date. If all values of a certain item are known to refer to the date of entry, a registry may not explicitly store any value date for this item at all and instead use the entry date whenever a value date is asked for.

However, when historical data is added, in particular at the baseline registration, the date of the measurement may be significantly earlier than the date of the entry. In such cases, registries must allow for the date to be entered explicitly.

In an example registry, the weight values may be stored in a table similar to the following one:

Patient ID Date Weight
1 2015-03 38.2
1 2016-04 43.8
2 2014-09 63.0
2 2016-09 61.5

For each patient, there exist multiple rows in this table where each row corresponds to one weight measurement.

Further examples of longitudinal items are Scoliosis diagnosis and Cobb angle. For Cobb angle the date of the measurement (that is, the radiology examination) will often be different from the entry date. In the example registry, the values for those items could be stored in the same table:

Patient ID Date Weight Scoliosis diagnosis Cobb angle
1 2015-03 38.2 no
1 2016-01 8
1 2016-04 43.8 yes
2 2014-09 63.0 no
2 2016-09 61.5 no

For patient 1, the positive scoliosis diagnosis was entered in April 2016. The radiology examination in which the Cobb angle was measured was performed in January 2016. Therefore, the row with the date 2016-01 only has the value of the Cobb angle, while the cells for the other items are blank; and similarly, the other rows have no value for the Cobb angle.

For a clinician-reported registry, the datestamp of a longitudinal value should usually be the date when the clinical examination on which the entry is based was performed. Exceptions are noted in the description of some items: For example, the datestamp of the Cobb angle must be the date of the radiology examination on which it is based. This is particularly important for baseline entries, when older data may be entered. If the precise date for a value is not available but the year is known, it may be specified using only the year. But if no information about a date is known, it must be omitted.

For patient-reported registries, the datestamp of a longitudinal value generally should be the date of the registry entry, with the same exceptions as for clinician-reported registries.

Datestamped items

A small number of items require datestamps, but no historical values; for example Is family member affected. Although this item should be collected on every update, only the latest value is relevant. However, it is still important to know when that value was last updated. Such items are called datestamped and marked with datestamped For any such item, registries must save the date on which the value was last updated or marked as up-to-date. As with the datestamp for longitudinal items, only the month and the year are required for analysis.

Creation and modification timestamps

In addition to the datestamps explicitly required, registries should save the date and possibly time of each entry and value modification, ideally together with audit information such as which user performed the change. Registries should use a data collection platform which automatically provides this functionality. These timestamps support auditing data entries and ensuring data quality, but may also serve as a fallback if no explicit datestamp is available for a certain value. For example, if a certain value for the item Cobb angle has no date set, one at least knows that the date of the examination must be before the date this value was entered. However, this specification currently has no rules on when and how such dates would be used for TREAT-NMD enquiries and registries must aim to obtain explicit datestamps for any datestamped or longitudinal item whenever possible.

Past and present status

In many cases, both the past and present status is important. For example, registries should capture whether an individual is currently using a feeding tube, and also whether this has been the case in the past. While individuals will usually know both the present and previous status for a condition like this, clinicians may not always have the complete medical records. Furthermore, a registry may change a question from capturing only the current status (e.g. “Are you currently using a feeding tube?”) to asking about the past as well (e.g. “Have you ever used a feeding tube?” with the responses “Currently”, “Previously” and “No”). It should then remain possible to handle the responses to the previous question in a uniform way.

To cover all possible ways to model such an item in a data collection form, several items in this dataset specify the values listed below. Please note, the full list of values for this item is not intended for use in a data collection form and is for data mapping purposes only. Instead, registries generally should only present the options Currently, Previously and Never with suitable user-friendly wording.

Value ID Description
Currently This is currently the case
Previously This has previously been the case, but is not currently
Never This has never been the case, neither previously nor currently
Sometime This has been the case at some time, but it is unknown whether it is currently the case
Not currently This is currently not the case, but it is unknown whether it has previously been the case

The following tables demonstrate how such an item can be implemented in a form and how the possible responses would be mapped to the values above.

In general, registries should use example A (with the wording adapted according to the context) for current data collection forms if possible, as it is concise and conveys the most information. Otherwise, examples B or C should be used as a basis. If neither of these variants is possible, example D may be used.

Example A: “Has this ever been the case?”

Response Value
Currently Currently
Previously Previously
Never Never
Unknown [no value]

Example B: “Has this ever been the case?” and “Is this currently the case?”

In this variant, the item is collected using two separate yes/no questions which each have an “unknown” option.

Response 1 (ever?) Response 2 (currently?) Value Remark
Yes Yes Currently
Yes No Previously
Yes Unknown Sometime
No Yes invalid
No No Never
No Unknown Never implausible
Unknown Yes Currently implausible
Unknown No Not currently
Unknown Unknown [no value]

Whenever ”No” is selected for the first question, this must be mapped to the value Never and the second question should not be displayed, thus avoiding invalid or implausible entries. The combination of “Unknown” for and “Yes” is implausible as knowing that something is currently the case implies knowing that it has ever been the case; this should be avoided by hiding or disabling the option “Yes” for the second question whenever “Unknown” is selected for the first.

Example C: “Is this currently the case?” and “Has this previously been the case?”

This variant is essentially only a minor modification of example B in which “ever” is replaced by “previously” and the questions are swapped. The mapping is the same as above, but it is shown here in the modified order for convenience:

Response 1 (currently?) Response 2 (previously?) Value Remark
Yes Yes Currently
Yes No Currently implausible
Yes Unknown Currently implausible
No Yes Previously
No No Never
No Unknown Not currently
Unknown Yes Sometime
Unknown No Never implausible
Unknown Unknown [no value]

The combinations that are considered implausible or invalid in example B are considered implausible here as well (while considering the change in order): The combinations “Yes” and “No” as well as “Yes” and “Unknown” are marked as implausible because the current state at a given moment will be the past state just a moment later. The combination “Unknown” and “No” is deemed implausible because if someone knows that a certain condition never held in the past, then this assessment applies to all moments right up to the current moment and should therefore also apply to the current moment. In electronic data collection forms, these combinations should be avoided by only displaying the second question if the reply to the first is not “Yes” and by hiding the option “No” in the second question if the reply to the first is “Unknown”.

Example D: “Has this ever been the case?”

This alternative is similar to example B, but does not contain the question about the current status. As it does not capture as much data as the other variants, it should generally be avoided and used only to map previously collected data.

Response Value
Yes Sometime
No Never
Unknown [no value]

Consistency rules

The aim of the datasets to avoid redundancy wherever possible. That means that for any piece of information (such as whether the diagnosis of a person has been genetically confirmed) there should generally be only one item (e.g. the item Genetic confirmation). But often, a certain degree of redundancy cannot be avoided. For instance, when details of a genetic confirmation are provided (e.g. in the record Genetic report in the SMA and DMD datasets or in the record Variant in the LGMD dataset), this already implies that there is a genetic confirmation. But the item Genetic confirmation is still important because it should be possible to state that a diagnosis is confirmed even when no details are known. However, this means that the values could contradict each other: Suppose that for a certain individual, a registry submits an instance of the record Genetic report, but also No as value of Genetic confirmation. Does this mean that the genetic report does not actually confirm the diagnosis? Or is the value of Genetic confirmation incorrect?

To avoid such ambiguities, the dataset specifications contain consistency rules to exclude invalid data. For example, in the SMA and DMD datasets, the item Genetic confirmation has the following rule: "Must be Yes in case an instance of the record Genetic report is provided.". In most cases, more than one item or record may be related to a rule (in this case, the item Genetic confirmation and the record Genetic report), but the rule is only specified in one of them. Registries should ensure that consistency rules are met at all times through form structure, conditional display rules and input validation (see below). When submitting data to TREAT-NMD, any data that violates any consistency rule may be rejected.

Note that consistency rules are different from the following other rules that are used in registries:

  • Conditional display rules specify on what conditions certain input fields or parts of a data collection form should be displayed. The dataset specifications do not contain any such rules because they don't mandate any specific form structures. However, in most cases registries can use conditional display rules to enforce consistency rules. In the example above, the consistency rule would be met if the data provider is first asked whether or not there is genetic confirmation. Only if the reply is "Yes", the further input fields for the genetic report are displayed.
  • Input validation rules specify under which conditions an input from a user is rejected. They are related to a specific data collection form within a specific data collection process. Therefore, the dataset specifications again contain no such rules, but input validation rules can be used to enforce consistency rules.
  • Completeness rules can be used to determine whether a certain registry entry is complete. For example, when the genetic report details are mandatory (e.g. for a clinician-reported registry), they need to be collected if the diagnosis is genetically confirmed. An according completeness rule would be "If Genetic confirmation is Yes, then an instance of the record Genetic report must be provided." Note that this is the converse of the consistency rule. While consistency rules mandate the exclusion of data, completeness rules require the inclusion of data. The dataset specifications do not contain any such rules as absolute requirements, but rather use the mandatory/non-mandatory status of items and records to indicate which information must be collected by which registries on a best-effort basis. It is always possible that certain information is not available for any reason, but this will generally not lead to other data on the same individual to be rejected.

There exist previous versions of the SMA and DMD datasets which used item numbers (e.g. 15.10) instead of the descriptive item IDs used here. To aid the transition to the new datasets, all items and records contain references to the numbers of the items in the previous datasets. In many cases, there is a direct correspondence between the old and new items; i.e., they describe the same information. But in some cases, the structure has been changed, so one new item may be related to multiple previous items or vice versa.

The Excel spreadsheet which can be found in the downloads page of each dataset also contains a worksheet "Mapping of previous version". It lists all previous item numbers which have related items in the current dataset together with their corresponding new item and record IDs.

Technical details on IDs

The ID of each item and record serves as a unique and stable identifier. It may only consist of ASCII letters, numbers, spaces and hyphens (to be precise, hyphen-minus signs). When implementing this dataset, registries should use the names of this dataset as identifiers in their datasets wherever possible. However, registries may also choose to use a variant of the names in which all letters are converted to lower case and spaces as well as hyphens are replaced with an underscore. For example, for the item Anti-AAV9 antibody test date, a registry may choose anti_aav9_antibody_test_date as a database column name or other identifier. Regarding potential future transfer of patient-level data, registries may expect that item names transformed in this way will be accepted just as the standard names used in this specification, as long as the transformation is consistent across all items for a given registry.

Back to top