10 Interoperability Guide
10.1 On interoperability
A key goal of the DDMoRe project is to have an intoperability framework in which models are written in a consistent language, translated to PharmML and from there converted to target software code. Before the DDMoRe project no existing language standard existed across target software used in pharmacometrics modelling, and while the underlying models could be expressed consistently in mathematical and statistical terms, the implementation of any given model varied by tool and by user according to their experience with a given target software tool.
There is some flexibility within MDL around how the user can express the mathematical and statistical models. Having flexibility allows the user to encode models quickly in a common language (MDL) which can then be shared with others and mutually understood. This flexibility also facilitates encoding in a given target when that language construct does not have a parallel in other tools. However, we STRONGLY encourage the user to encode the majority of models in a way that will facilitate interoperability. There are MDL constructs that facilitate interoperability – these generally appear as built-in functions which translate to specific constructs in PharmML and the target software. These constructs cover many typical models and are designed to allow the user to generate code quickly and have high confidence that it will be interoperable across tools.
The Model Description Language Interactive Development Environment (MDL-IDE) should assist the user in ensuring that the models encoded are valid MDL (and as a consequence, also valid PharmML). Not all models will result in code which can be readily converted to all target tools.
These interoperability constructs will be highlighted in the subsequent sections, but users should pay particular attention to sections on the use of GROUP_VARIABLES
, INDIVIDUAL_VARIABLES
and the MODEL_PREDICTION
.
10.2 Dataset conventions
There are a number of conventions in preparing data for use in MDL and for target software.
It is assumed that the
SOURCE
data file will be present as an ASCII comma-delimited text file (.csv).The data file should have a header row with names matching those in the
DATA_INPUT_VARIABLES
block.Data values should be numeric.
Data columns with string or date:time values should have
use is ignore
for MDL.Null or missing values should be denoted by
.
.
Generally speaking, MDL follows NONMEM dataset conventions.
In addition the following restrictions should be observed for interoperability reasons:
A column with
use is id
is mandatory. Values should be positive, non-zero integer, unique and contiguous.A column with
use is idv
is mandatory. Values should be positive, real. When the model is expressed usingDEQ
or [COMPARTMENTS
] block the values must be monotonic increasing within ID. This constraint does not apply to analytical models. The first idv value is taken as the initial time for the model. The initial value does not need to be the same for all individuals, but it must not be lower than that of the first individual. date:time format is not supported for time.A column with
use is dv
is mandatory. This column can be any real value. This must have a null value for dosing records.When modelling multiple outcomes a column with
use is dvid
is required. Values should be positive, integer. Values should not be null forOBSERVATION
records.A column with
use is mdv
is optional. Valid values are 0 (observed),1 (missing). When theOBSERVATION
is null or missing this column should have the value 1. It can take the value 1 whenOBSERVATION
s are present if thisOBSERVATION
is to be ignored.A column with
use is evid
is optional. Valid values are 0 (OBSERVATION
record), 1 (dosing record), 4 (reset and dose record).A column with
use is amt
is optional. For dosing records this column must be have positive, real value.A column with
use is rate
is optional. This column must have positive, real values. This column can only be used in combination with a column withuse is amt
. If the value is zero then a bolus dose is assumed.A column with
use is addl
is optional. This column must have positive, real values. This column can only be used in combination with columns withuse is amt
anduse is ii
.A column with
use is ii
is optional. This column must have positive, real values. This column can only be used in combination with columns withuse is amt
anduse is addl
oruse is ss
.A column with
use is ss
is optional. Valid values are 0 (not at steady state), 1 (at steady state). This column can only be used in combination with columns withuse is amt
anduse is ii
.A column with
use is cmt
is optional unless there is more than one route of administration. This column must have positive, integer values. Values in this column should start at 1 and correspond to the order of ODEs specified in theDEQ
block of the Model Object.Columns with
use is covariate
,use is catCov
anduse is variable
are optional. These columns must not have missing values. Columns withuse is catCov
must have integer values. Covariate names in theDATA_INPUT_VARIABLES
block must match the same name (including matching case) as the header name in the source file .csv. Please see sections 2.2.5, 4.3 and 4.9 for details on the use of these variables. If the column hasuse is covariate
oruse is catCov
but this variable is not declared in theCOVARIATES
block of the Model Object then it will bedropped
and ignored.Columns with
use is catCov
can only have values 0,1. This implies that categoricalCOVARIATES
with k values should be converted to k-1 indicator variables.A column with
use is varLevel
is optional. This column should not have missing values. Columns with this type should not have an underscore in the column name. The change in value of this variable denotes when to sample new values of the random variable.
10.3 Multiple uses of dataset columns
Dataset columns cannot have multiple uses defined in DATA_INPUT_VARIABLES
. The DATA_DERIVED_VARIABLES
block can be used to specify additional uses for dataset variables. In the current MDL, scope for using DATA_DERIVED_VARIABLES
is limited.
For example, if the user wants to specify different outcomes / OBSERVATION
s conditional on a dataset variable like CMT, i.e. using CMT as DVID then they will need to create a dataset variable DVID mapping into CMT values appropriately.
10.4 Defining constants in the model
For interoperability, constant values in the model should be defined as [STUCTURAL_PARAMETERS
] and fixed to a value in the Parameter Object.
For models expressed as systems of differential equations (DEQ
block), model parameters can be set to constant values in the MODEL_PREDICTION
block, but this may be inefficient in the target software translation.
In the current SEE, to ensure interoperability, model parameters should not be assigned a constant value in the GROUP_VARIABLES
or INDIVIDUAL_VARIABLES
block.
10.5 Interoperability in the MODEL_PREDICTION
block
To ensure Monolix interoperability, any variable used in the MODEL_PREDICTION
block must be either:
• the independent variable
• defined in MODEL_PREDICTION
• declared in INDIVIDUAL_VARIABLES
using {type is linear, … }
• defined as use is variable
in the DATA_INPUT_VARIABLES
block of the Data Object
This implies in particular that STRUCTURAL_PARAMETERS
, VARIABILITY_PARAMETERS
, GROUP_VARIABLES
and random variables defined in RANDOM_VARIABLE_DEFINITION
cannot be used in MODEL_PREDICTION
.
10.6 ## Interoperability in the OBSERVATION
block
For Monolix interoperability, different OBSERVATION
s / outcomes must not share VARIABILITY_PARAMETERS
and RANDOM_VARIABLE_DEFINITION
.
For interoperability with Monolix, the residual error(s) \(\sigma_{i,j}^{2}\) must be Normal(0,1) random variables.