This guide is relevant for Simurg Environment version 0.12.9 and above.

About Simurg

Simurg environment represents a unique easily-deployable and operatable software platform that seamlessly merges novel model-based techniques and algorithms with existing well-established software and workflows in a convenient GUI in order to perform the broadest spectrum of model-based analyses relevant for quantitative pharmacology.

Objectives

  1. Data handling and processing.
  2. Exploratory data analysis and quality check.
  3. Solving the direct problem for mathematical models based on various types of differential equations.
  4. Parameter estimation procedures for non-linear and linear systems with or without random effects.
  5. Development of regression models for various types of data (binary, categorical, etc.).
  6. Meta-analysis and meta-regression.
  7. Model development in Bayesian paradigm.
  8. Generation of reports based on the results of the analyses.

Simurg environment modules

  1. Data management module - semi-automatic data processing, visualization and quality check.
  2. NLME module - mathematical modeling of dynamical data using hierarchical modeling with frequentist and Bayesian approach suitable for both empirical and mechanistic models.
  3. MultiReg module - expands the range of data types and associated mathematical methods a modeler can use within Simurg environment.
  4. Reporting module - compile and update modeling reports in various formats.

Access to Simurg - internal servers

Requesting access

  1. Create a request using this form
  2. Wait for an e-mail - contains a link to the server and credentials

Accessing the environment

  1. Follow the link to the relevant server (see the list below)
  2. Fill in the user name
  3. Fill in the password
  4. Press "Sign in"
  5. Select "Simurg"
  6. Select number of cores and RAM

List of servers:

  1. M&S Decisions
  2. ER modeling school

Technical support

Use this form to provide feedback or report technical issues.

About Data management module

Background

Model-based analyses aim to establish quantitative relationships between different entities. These relationships are inherently data-driven, meaning they can only be as accurate and reliable as the underlying data allows. Consequently, a thorough evaluation of the data is essential before initiating any modeling efforts. Furthermore, data used in these analyses can come in various shapes and forms, following CDISC, software-specific or company-specific standards. Thus, a modeler should be equipped with a tool to perform convenient transitions from one type of data standard to another, visualize different types of data, and scan the data for potential errors and outliers.

Objectives

  • CDISC-compliant semi-automatic data processing.
  • Visualization of all types of data in different shapes and forms.
  • Quality check of the data.

Sections of the module

  • Data
  • Data quality check
  • Continuous data
  • Covariates
  • Dosing events
  • Tables

About NLME module

Background

Population PK/PD modeling and its variations are arguably one of the most used types of model-based analyses in MIDD. The development of such models follows a rigid workflow that includes such steps as structural model selection, statistical model selection, covariate search, and forward simulations. Mechanistic, or QSP models, utilize similar functionality, however, with a lot of nuances. For example, covariate search is not typically performed in QSP as relevant covariates ought to be included as a part of the structural model rather than a parameter. At the same time, the QSP approach demands additional set of tools, such as sensitivity analyses, likelihood profiling, or parameter estimation via virtual populations simulation.

Objectives

  • Implementation and modification of structural, statistical and covariate models.
  • Estimation of unknown parameters within the model using different algorithms and approaches.
  • Extensive model diagnostics and evaluation.
  • Automatic model development and assessment.
  • Model-based simulations.

Sections of the module

Data

One of the key functionalities of the NLME module is to estimate the parameter values of a model based on observed data, which is typically represented as time series measurements for each individual, study arm, animal, or other experimental setup. Additionally, the relevant data is often linked to drug administration and may include both time-varying and constant independent variables (covariates) that can be incorporated into the model.

Communication between the data and the model is facilitated by compiling a dataset with a predefined structure, which can be categorized into three types of elements: time series, dosing events, and covariates.

Standardized dataset structure

Standardized datasets in tabulated format accepted by Simurg software are inspired by CDISC guidelines [1] and are compatible with other conventional software, such as Monolix (Lixoft, France) and NONMEM (Icon, USA).
Each line of the dataset should correspond to one inidivudal and one time point. Single line can desribe a measurement, or a dosing event, or both.

Time series

Mandatory columns:

  • ID - unique identificator of an individual/animal/study arm/experimental setup, typically characterized by unique combination of observations, dosing events and covariates. Can be numeric or character.
  • TIME - observation time. Numeric.
  • DV - observed value of a dependent variable. Numeric.
  • DVID - natural number corresponding to the identificator of a dependent variable.

Optional columns:

  • DVNAME - character name of a dependent variable. Should have single value per DVID.
  • MDV - missing dependent variable flag. Equals 0 by default. If equals 1 - observation in the corresponding line is ignored by the software.
  • CENS - censoring flag, can be empty, 0, -1 (for right censoring) and 1 (for left censoring). Value in DV column associated with CENS not equal to 0 servesa as lower limit of quantification for left censoring or upper limit of quantification for right censoring (relevant for M3 censoring method).
  • LIMIT - if CENS column is present, numerical value in LIMIT column will define lower or upper limit of the censored observations (relevant for M4 censoring method).

Dosing events

  • EVID - identificator of a dosing event. By default equals 0 which corresponds to an observation without any associated events (AMT, etc. are ignored). Other possible values include:
    • 1 - dosing event.
    • 2 - reset of the whole system to initial conditions, with or without dosing event.
    • 3 - reset of the associated DVID to the value in DV column, with or without dosing event.
  • CMT - dosing compartment - a natural number corresponding to the running number of a differential equation within a model.
  • ADM - manually assigned administration ID. Replaces CMT if present. Natural number.
  • AMT - dosing amount. Numeric.
  • II - time interval between the doses. Numeric.
  • ADDL - number of additional doses. Natural number.
  • TINF or DUR - duration of infusion. Numeric.
  • RATE - infusion rate. Numeric. Replaces TINF or DUR if present.

Covariates

Any additional column in a dataset can be considered as continuous (if numeric) or categorical (if character) covariate, either constant (if covariate value does not change over time within a single ID), or time-varying. Interpolation for the latter is performed via last observation carried forward approach.

Initialization of the dataset

A dataset can be uploaded into the environment by pressing button and selecting a file with the following extensions: .csv, .txt, .tsv, .xls, .xlsx, .sas7bdat, .xpt.

Once a dataset is uploaded, its content will appear in a form of a table on the main panel:

Modifications of the dataset are possible through the Simurg Data management module's Data tab.

Once uploaded, the dataset is recognized by the software and can be used for subsequent model development.

References

[1] https://www.cdisc.org/standards/foundational/adam/basic-data-structure-adam-poppk-implementation-guide-v1-0

Model editor

Model editor tab allows a user to write de novo or modify existing code of a structural model. Simurg is capable of parsing of various syntaxes, including MLXTRAN and rxode2, in addition to having its own flexible modeling language. Import of an existing model from an external .txt file can be done by pressing button. Created or updated model can be saved to a file using button.

Essential strucutral elements of Simurg syntax

The only two mandatory sections that need to be present in a structural model file when using Simurg modeling syntax are # [INPUT] and # [MODEL], as shown on the figure:

# [INPUT] contains the names and initial values of parameters to be estimated.
# [MODEL] contains the rest of the code, including fixed parameters, explicit functions, initial conditions, differential equations, etc.

Comments are introduced using # symbol. Thus, sections like ### Explicit functions or ### Initial conditions do not affect parsing and used for sorting the code. End of the line should be marked with ;.

Syntax for the functional elements

Initial conditions

X(0) = X0, where X is a dependent variable, and X0 can be a number, a parameter, or an explicit function.

Differential equations

d/dt(X) = RHS, where X is a dependent variable, and RHS is the right hand side of a differential equation.

Bioavailability

f(X) = Fbio, where X is a dependent variable, and Fbio can be a number, a parameter, or an explicit function.

Lag time

Tlag(X) = Tlag, where X is a dependent variable, and Tlag can be a number, a parameter, or an explicit function.

Handling of covariates

If an object exists within model structure, but is not designated in # [INPUT], as explicit function, dependent variable or fixed parameter, it will be automatically treated as a covariate. Thus, model parsing at the Model tab will not fail as long as the modeling dataset contains a column with the name matching that of the object.

Example: 2-compartment PK model with first-order absorption

Model

Import of a structural model from .txt file should be preformed after a modeling dataset was uploaded at the Data tab by pressing button. Description of the modeling syntax is provided in Model editor tab.

Once a model file is uploaded, the content will be shown on the main panel and additional fileds will pop up to assign variables per DVID. The number of fields will correspond to the number of unique DVIDs in the dataset. The label for the fields is formed as
DVID#[DVID number from the dataset] ([respective DVNAME from the dataset]).

A user should assign variables to the DVIDs by typing variable name into the respective fields.
Then, the model should be initialized by pressing button.

Initial estimates

Good initial estimates of model parameters values speed up the subsequent estimation. The Simurg Platform provides the ability to visually evaluate the model predictions with the selected initial values for the fixed effects. To use it select “Initial estimates” tab.

Be aware of previous initialization of Data and Model on the corresponding tabs.

Click button "Check initial estimates" to explore the initial values from the model file.

Initial estimates screen 1

As Simurg/RxODE syntax provides the ability to set values for the parameters directly in the model file, "Initial estimates" tab will return values form the corresponding model file. For mlxtran syntax the initial values for parameters will be set to "1" by default.

You can modify the values of the parameters for the fixed effects on the right panel. To evaluate how the proposed values corresponds to the data click button "Show plots" - plots of the model's predictions and measurements for each ID in the dataset will be displayed. In case of several outputs in the project you can selectize the output of interest for visualization. In addition, you can change the axis to log-scale and adjust the limits of each axes.

Initial estimates screen 2

Initial estimates screen 3

If you want to reset the parameters values to the initial ones, click button "Check initial estimates".

When you are confident with the initial values, you could proceed to the "Task" tab for statistical components initialization. The last values ​​you entered will be set as the initial values ​​for subsequent evaluation.

Task

The "Task" tab provides tools to initialize and manage the statistical components used during the model calibration process.

Work in this tab begins by clicking the button. This sets the path to a folder where the configuration of statistical components and the results of model fitting will be stored.

You may select either a new (empty folder) directory or one that contains results from a previous fitting session. If the directory already contains results, you can skip earlier steps (e.g., Data, Model, or Initial estimates) and move directly to task section.

After selecting the working directory, four main options become available:

  • loads previously saved fitting results from the selected directory. Once loaded, you can proceed to tabs like Results, GoF plots, or Simulations to evaluate or utilize the fitted model.

  • loads a previously saved configuration of the statistical components. After loading, select the fitting algorithm (e.g., Simurg, Monolix, or nlmixr) and proceed to .

  • deletes all contents from the selected directory, allowing you to start fresh with a new statistical component configuration.

  • cleans the directory if it contains files and opens a list of options for configuring the statistical components. This option requires that the Data, Model (or Model editor), and Initial estimates tabs have already been properly initialized.

Creating statistical model

The process of creating a statistical model in the Task tab is divided into four key components:

1. Residual error model
2. Parameter definition
3. Covariate model

1. Residual error model

In pharmacometric modeling, the residual error model captures the unexplained differences between observed data and model predictions — those not accounted for by the structural model or inter-individual variability.

Simurg offers several residual error model options for each specified DVID, including:

  • Constant error (independent of the predicted value): $$ y = f + \epsilon, \epsilon ∼ N(0, a^2)$$

  • Proportional error (increases proportionally with the predicted value): $$ y = f \cdot (1 + \epsilon), \epsilon ∼ N(0, b^2)$$

  • Combined1 error (constant + proportional): $$ y = f + \epsilon, \epsilon ∼ N(0, a^2 + (b·f)^2)$$

Here, \(f \) is the predicted value, and \( a \) and \( b \) are estimated error parameters.

Additionally, you can specify how BLOQ (Below Limit of Quantification) data are handled. Available methods include:

  • M3: BLOQ data points are treated as left-censored values.
  • M4: A hybrid method where:
    • BLOQ values before the first quantifiable observation are treated as censored
    • BLOQ values after are treated as missing (ignored)

These options are only available if your dataset (initialized in the Data tab) contains the required columns:

  • For M3: CENS
  • For M4: CENS and LIMIT

If these columns are not present, the default handling method is "none".

2. Parameter definition

This section allows you to define the characteristics of model parameters during the fitting process. Specifically, you can determine:

  • Whether a parameter is fixed or includes random effects

  • The distribution type used to model the random effects

Note: The distribution settings apply to random effects, not the fixed effect estimates themselves.

Available distributions in Simurg include:

DistributionFormula
Normal\( P_i=\theta +\eta_i, \space\space \eta ∼ N(0,\omega^2)\)
Lognormal\( P_i=\theta + \exp(\eta_i), \space\space \eta ∼ N(0,\omega^2)\)
Logit-normal\( P_i=\frac{1}{1+\exp(-(\theta+\eta_i))}, \space\space \eta ∼ N(0,\omega^2)\)

where \(\theta\) is the typical value of a parameter, \(\eta_i\) the random effect for individual \(i\), and \(\omega\) is the standard deviation of \(\eta\).

In addition, you can specify initial values for random effects and their correlations using the matrix provided on the right-hand side of the interface. This matrix allows for the configuration of:

  • Variances – Initial guesses for \(\omega^2\), representing the variability of each random effect.
  • Correlations – Initial values for the correlations between random effects (typically set to 0 unless prior knowledge suggests otherwise).

Matrix structure:

  • Diagonal elements represent the initial values for the standard deviations of the random effects (i.e., \(\omega^2\)).
  • Off-diagonal elements define the initial correlations between the corresponding random effects.

These initial values can influence the convergence behavior of the fitting algorithm, so it's recommended to use reasonable estimates when available.

3. Covariate model

This section allows you to introduce covariate effects into the model, enabling more personalized and accurate parameter estimation based on individual-specific characteristics from the dataset.

To add a covariate effect:

1. Select the parameter you want the covariate to influence.

2. Choose the covariate from the list (the name must match a column in the initialized dataset).

3. Specify the covariate type:

  • Categorical: Define the reference category, which serves as the baseline level for comparison.

  • Continuous: Choose both a function to describe the covariate relationship and a central tendency transformation (mean or median) to normalize the covariate.

Functions for Continuous Covariates

Simurg provides several functional forms to model continuous covariate relationships:

  • Linear (lin): $$\theta_i = \theta_{ref} \cdot (1+\beta \cdot (x_i-x_{ref}))$$ A direct linear relationship between the covariate and the parameter.
  • Log-linear (loglin): $$\theta_i = \theta_{ref} \cdot \exp(\beta \cdot (x_i-x_{ref}))$$ A multiplicative effect, useful when the effect increases or decreases exponentially.
  • Power model: $$\theta_i = \theta_{ref} \cdot \left( \frac{x_i}{x_{ref}} \right) ^\beta $$ A flexible model that can capture nonlinear proportional effects, often used in allometric scaling.

Where \(\theta_i\) is the individualized parameter estimate, \(\theta_{ref}\) is the parameter value at the reference covariate value \(x_{ref}\), \(\beta\) is the estimated covariate effect, \(x_{i}\) is the individual's covariate value.

You can choose whether \(x_{ref}\) is based on the mean or median value of the covariate in the dataset.

4. Specify the initial value for the parameter associated with the reference category (for categorical covariates) or the normalized value (for continuous covariates).

5. Click "Set" to apply the covariate effect to the selected parameter.

Finalizing the setup

Once all configurations are complete, click the button. This action saves the statistical model setup—defined in the previous sections to the selected working directory.

After the control object has been successfully created:

Select the fitting algorithm you wish to use (e.g., Simurg, Monolix, or nlmixr).

Click to begin the model fitting process.

When the fitting is complete, you can move on to the Results tab to analyze the output and evaluate the model's performance.

Results

Essential output of a model calibration procedure includes several numerical characteristics and scores, such as:

  • point-estimates of population parameter values;
  • standard deviation (SD) of random effects;
  • eta-shrinkage;
  • standard errors (SE) for all parameters;
  • individual parameter values if random effects are present in the model;
  • correlation between parameters;
  • likelihood-based numerical criteria.

To extract this infromation from a modeling project, either a calibration procedure should be performed or the results of a calibration procedure should be loded following the instructions for the Task section. Once it is done, Results section in NLME can be accessed:

and relevant output can be generated by pressing button.

Generated output is spread across four tabs:

  • Summary
  • Individual parameters
  • Correlations
  • Likelihood

After "View model results" button is pressed, button will appear below it. By pressing this button all figures and tables from all four tabs will be saved to location of the current project within Simurg environment.

In addition, button, available on the first 3 tabs, allows to export figures or tables from a tab to local computer.

Summary

Summary tab contains essential information in a form of a summary table on model parameters obtained after a calibration procedure:

Parameter names are shown exactly as specified in the structural model.
Covariate coefficients are named using the following principle:
[parameter name]_[covariate name]_[transformation flag]

Residual error model parameters are assigned as follows:
[variable name]_[a - for additive compotent; b - for proportional component]

SE of the parameters are calculated in three steps.
First, variance-covariance matrix is calculated for transformed normally distributed parameters from the Fisher Information Matrix (FIM) as follows: $$ C(\theta)=I(\theta)^{-1} $$ Next, \( C(\theta) \) is forward-transformed to \( C^{tr}(\theta) \) using the formulas to compute the variance, dependent on the distribution of the parameters:

  • For normally distributed parameter: no transformation applied.
  • For log-normally distributed parameters: $$ SE(\theta_k)=\sqrt{( \exp(\sigma^2)−1) \cdot \exp(2\mu + \sigma^2)} \\ \mu = \ln(\theta_k) \\ \sigma^2 = \operatorname{var} (\ln (\theta_k)) $$
  • For logit-normally distributed parameters: a Monte Carlo sampling approach is used. \(100000\) samples are drawn from the covariance matrix in gaussian domain. Then the samples are transformed from gaussian to non-gaussian domain. Then the empirical variance \( \sigma^2 \) over all transformed samples \( \theta_k \) is calculated.

Finally, SE of the estimated parameter values is calculated from the diagonal elements of the forward-transformed variance-covariance matrix: $$ SE(\theta_k) = \sqrt{C^{tr}_{kk}(\theta_k)} $$

Relative standard error (RSE) is calculated as \( \frac{SE}{Estimate} \cdot 100 \% \).
Cases with RSE \(> 50 \% \) are highlighted in red, as RSE \( > 50 \% ( \frac{1}{1.96} * 100 \% ) \) corresponds to the situation where \( 95 \% \) confidence interval of \( N(1, 1) \) includes zero, making respective parameter not statistically different from zero with \( \operatorname{p-value} = 0.05 \).

Random effects column contains SD of the estimated random effects \( (\omega) \).

Eta-shrinkage is calculated based on the following equation: $$ \eta \space shrinkage = 1 - \frac{SD(\eta_i)}{\omega} $$ Eta-shrinkage exceeding \( 30 \% \) is indicative of unreliable individual parameter estimates and warrants the revision of a statistical model [1].

Individual parameters

This tab contains a single table with individual parameter values defined as the mean of conditional distribution for parameters with random effects and as typical parameter values for the parameters without random effects.

Correlations

Correlation matrix is derived from the variance-covariance matrix as: $$ \operatorname{corr}(\theta_i, \theta_j) = \frac{C^{tr}_{ij}}{(SE(\theta_i)*SE(\theta_j))} $$ and is represented visually in a form of a heatmap, where the value and color in each cell represents Pearson's correlation coefficient (blue - for negative values, red - for positive values).

Likelihood

This tab contains likelihood-based numerical scores used to benchmark models:

  • \( -2 \cdot \log(\operatorname{Likelihood}): n \log⁡(2\pi)+\sum(\log⁡(\sigma_j^2 ) + \frac{(Y_j-Y^*_j (t,\Theta))^2}{\sigma_j^2}) \)
  • Akaike information criterion: \( AIC = -2LL + 2 \cdot P \)
  • Bayes information criterion: \( BIC = -2LL + P \cdot \log(N) \)
    where \( P \) is the number of estimated parameters within the model; N is the number of data points. N.B.: likelihood cannot be computed in a closed form if random effects are present in the model.

Model comparison

"Likelihood" tab allows to perform semi-automatic model comparison across multiple projects, located within the same folder of the currently active project by pressing , selecting the subset of projects to include into the analysis (optional), and pressing button.

For example, running model comparison given the following folder structure:

  • parent-folder
    • Warfarin_PKPD_1
    • Warfarin_PKPD_2 - current project
    • Warfarin_PKPD_3
    • Warfarin_PKPD_4
    • Warfarin_PKPD_5
    • Warfarin_PKPD_6

where Warfarin_PKPD_1 ... Warfarin_PKPD_6 are successfully converged computational projects, will provide user with the following table:
Each row of this table provides essential information on each project in the parent folder, including numerical criteria, information on identifiability and shrinkage.

By indicating character string in the field, for example, project1, will leave only those projects in the table that contain this string within their names.

References

[1] TBD

Goodness-of-fit (GoF plots)

The GoF plots tab provides a suite of graphical tools to assess how well the model fits the observed data. These diagnostic plots help visually evaluate model performance, detect systematic bias, identify outliers, and uncover potential model misspecification.

To use this section, the model must first be fitted or previously generated results must be loaded, following the steps outlined in the Task section. Once this is done, the GoF plots section in NLME becomes accessible:

Getting Started

To begin, click the button. This action loads the model results stored in the Task section and activates the available plot menus. From there, you can create diagnostic plots based on your chosen configuration. Once you’ve configured the desired settings, click the button to generate the plot. The resulting plot can be downloaded by clicking the button for further analysis or reporting.

Available Plot Types

This section offers eight types of diagnostic plots, organized into the following tabs:

1. Time Profiles
2. Observed vs. Predicted
3. Residuals
4. Distribution of Random Effects (RE)
5. Correlation between RE
6. Individual parameters vs. covariates
7. VPC (Visual Predictive Check)
8. Prediction distribution

1. Time Profiles

The Time Profiles tab provides tools for visually evaluating how well the model fits the observed data over time, both at the population level and the individual level, for the selected output type.

The available output types are determined by the DVIDs (Dependent Variable Identifiers) specified in the Model section.

Plot Configuration Options

You can customize the plot using the following options:

  • Fit type to display: Choose whether to show the population predictions, individual predictions, or both on the plot

  • Axis settings:

    • Manually adjust the x- and y-axis limits
    • Enable or disable logarithmic scaling for either axis

2. Observed vs. Predicted

The Observed vs. Predicted tab allows you to assess how well the model predicts the observed data by comparing predicted values against actual observations. This comparison can be made at both the individual and population levels.

The available outputs correspond to the DVIDs selected in the Model section.

Plot Configuration Options

You can customize the plot using the following settings:

  • Prediction Type: Choose to display Individual predictions, Population predictions, or both
  • Log Axes: Enable logarithmic scaling on the x- and/or y-axes for better visualization of wide value ranges
  • Spline Overlay: Optionally add a spline to the plot to highlight trends or deviations from the ideal fit line

3. Residuals

The Residuals tab provides diagnostic plots to evaluate the distribution and behavior of residuals, helping to detect model misspecification, bias, or heteroscedasticity.

The outputs available for plotting correspond to the DVIDs selected in the Model section. You can choose to visualize individual or population residuals.

Plot Types

This tab includes two types of plots:

3.1 Scatter Plot

This plot displays residuals versus time or predicted values to detect patterns or trends that may indicate issues with model fit.

Configuration options:

  • Log scale for time axis – Apply logarithmic transformation to the time axis.
  • Log scale for predicted values axis – Enable log scale for the x-axis when plotting residuals vs. predicted values
  • Spline – Overlay a spline curve to visualize trends or systematic bias
  • Axis limits – Manually define y-axis limits for better control over the plot view

3.2 Histogram

This plot shows the distribution of residuals to assess normality and variability.

Configuration options:

  • Density curve – Overlay a smoothed density curve on the histogram
  • Theoretical distribution – Compare the residuals to a theoretical normal distribution
  • Information – Include a p-value from a statistical test (e.g., Shapiro-Wilk) to assess the normality of residuals

4. Distribution of Random Effects (RE)

The Distribution of Random Effects (RE) tab allows you to explore the variability captured by the model’s random effects and individual parameter estimates. This helps assess the assumption of normality and the behavior of random components in the model.

Begin by selecting the type of output you want to visualize:

4.1 Individual Parameters – Estimated parameter values for each individual.

4.2 Random Effects – Deviations from the population parameters (i.e., the modeled random components).

4.1 Individual Parameters

For Individual Parameters, only histograms are available.

Plot options:

  • Select parameter names – This dropdown automatically lists all parameters associated with random effects. You can select all, or a subset, to include in the plot
  • Density Curve – Overlay a smooth density curve on the histogram
  • Information – Show the p-value from a normality test to assess the distribution

4.2 Random Effects

For Random Effects, you can choose between two plot types:

4.2.1 Histogram Visualizes the distribution of random effects for each selected parameter.

Options include:

  • Select parameter names – A list of available omega terms (random effects) is automatically populated
  • Density Curve – Add a smooth density overlay
  • Theoretical distribution – Compare the empirical distribution with a standard normal distribution
  • Information – Include p-value results of a normality test (e.g., Shapiro-Wilk)

4.2.2 Boxplot Displays the spread and central tendency of selected random effects using boxplots.

5. Correlation Between RE

The Correlation Between RE tab allows you to explore pairwise relationships between individual parameter estimates or random effects, helping to identify potential correlations or dependencies that might inform model refinement or covariate modeling.

Start by selecting the type of correlation plot you want to generate:

5.1 Individual Parameters – Scatter plots showing relationships between estimated parameters for each individual.

5.2 Random Effects – Scatter plots of the omega terms (random deviations from the population parameters).

5.1 Individual Parameters

Configuration options:

  • Select parameter names – A list of model parameters associated with random effects is automatically populated. Select two or more to include in the plot
  • Linear regression – Optionally overlay a regression line to visualize the trend
  • Information – Display the Pearson correlation coefficient (r) to quantify the strength and direction of the relationship

5.2 Random Effects

Configuration options are the same as for Individual Parameters:

  • Select parameter names – The dropdown provides a list of omega terms for parameters with random effects. Choose the ones you'd like to analyze
  • Linear regression – Add a regression line to the scatter plot
  • Information – Show the Pearson r value to assess correlation strength

6. Individual parameters vs. covariates

The Individual Parameters vs. Covariates tab enables exploration of potential relationships between individual parameter estimates or random effects and covariates in the dataset. This analysis is useful for identifying covariate effects that could be included in future model refinements.

Start by choosing the type of output to visualize:

Individual Parameters – Displays estimated parameter values per individual against selected covariates.

Random Effects – Shows the corresponding omega values plotted against covariates.

6.1 Individual Parameters

Configuration options:

  • Select parameter names – Choose one or more individual parameters associated with random effects from the automatically populated list.
  • Select covariates names – Choose the covariate (column from your dataset) to plot against the selected parameters.
  • Linear regression – Optionally overlay a linear regression line to visualize potential trends.
  • Information – Display the Pearson correlation coefficient (r) to quantify the relationship with continuous covariates or p-value in case of categorical covariates.

6.2 Random Effects

Configuration is identical to the Individual Parameters option, with one difference:

Select Parameter Names – This dropdown lists omega terms corresponding to the random effects.

You can still:

  • Select a covariate,
  • Add a linear regression line,
  • And show the Pearson correlation coefficient or p-value.

7. VPC (Visual Predictive Check)

Graphical comparison of observed data with prediction intervals from simulations.

8. Prediction distribution

The Prediction distribution tab allows you to visualize the distribution of model predictions and assess how well they reflect the observed data across the selected output. This helps evaluate the spread and central tendency of predictions, and can be especially useful when exploring variability or stratification.

Getting Started

To begin, click the button. This step loads the prediction results from the fitted model into the tab.

Configuration Options

  • Select Output – Choose the output variable you wish to analyze. The available options are determined by the DVIDs selected in the Model section

  • Stratification by Dose – If the dataset contains a DOSE column, you can enable this option to generate separate prediction distributions for each dose group

Display Options:

  • Prediction Interval – Select the confidence interval to display around the predictions. Available options include: 50%, 80%, 90%, 95%

  • Legend – Include a legend for clarity when comparing multiple groups or overlays

  • Data – Overlay the observed data on top of the prediction distribution

  • Axis Labels – You can customize the x-axis and y-axis names to better describe your data and outputs

Covariate Search

Under construction

Simulations

Under construction

About MultiReg module

Background

Quantitative pharmacology analyses are represented by a wide range of mathematical methods, closely tied to the source data being analyzed. Among the most common data types we can distinguish time-to-event (TTE) data (e.g., overall survival data), as well as nominal data, which can be either binary (e.g., response to therapy, occurrence of adverse events), multinomial (e.g., tumor response by RECIST) or ordinal (e.g., severity of adverse events), and count data (e.g., frequency of certain adverse event). Implementation of associated mathematical methods in a user-friendly GUI is critical for performing efficient and timely model-based analyses.

Objectives

  • Extend Simurg syntax to support various types of regression modeling.
  • Provide functionality for model development, diagnostics, covariate search.
  • Allow to apply joint modeling techniques.

Sections of the module

Creating a dataset for exposure-response analysis

On this page, you can create your own dataset for ER analysis.

1: Upload PK and Response data

First, navigate to the "Upload PK and Response data" tab.

1.1: Select the Working Directory

Here, you need to select the working directory by pressing button. Working directory containing your project, which should include:

  • The PK model (ModFile.txt)
  • The dataset used for parameter fitting (DataTrans.csv)
  • The Results folder with individual parameter values (Results/indiv_parameters.csv)

Your dataset should include the following columns:

  • ID - subject ID, numeric
  • AMT - dosage of the drug, numeric

Once you select the working directory, the dataset will appear in the right panel, figure 1.

Figure 1. Expample of PK dataset

Initialize your PK data by pressing button.

After press button, if there are no necessary files or directors, you will see a notification in the lower right corner clarifying which file was not found.


1.2: Select the Exposure Data File

Next, select the file containing exposure data by pressing button.. Once selected, the dataset will appear in the right panel, figure 2.

Figure 2. Expample of response dataset

You should then specify which columns in your file represent:

  • Name of ID column - subject ID, numeric
  • Name of TOFI column - the period of time from the moment of the first dosage to the first event, numeric
  • Name of endpoint column - any type of data
  • Name of response analise value column - for binary - 0 ot 1, numeric
  • Name of nominal dosing column - for example: QD, QW, BID etc, any type of data
  • Name of nominal frequency column - any type of data.
  • Select covariate columns - сhoose the covarians with whom you will work further, any type of data

All of the above stakes, except for the covariat, will appear to be a recreational part of Response dataset.

We also recommend adding the EFFFL and SAFFL columns to your dataset. These columns should contain 1 for records corresponding to the respective endpoint type (efficacy or safety), and 0 otherwise. While these columns are not required, they allow you to save exposure–response datasets separately for each endpoint type.

After completing, initialize your response data by pressing button.


2: Exposure-response dataset generation

Now that all required data is loaded, go to the ER dataset generation tab.

2.1: Choose the time intervals for simulation

In this section, you can run simulations over different time intervals to calculate exposure parameters. Select one or more time intervals for your ER dataset using the dropdown list Choose variables:. Multiple simulations can be selected simultaneously.

Time intervals explained:

  1. First cycle - the time interval from the first dosing event to the end of the first cycle, based on the nominal dosing regimen.
  2. Single dose - the time interval from the first dosing event to the end of the first cycle, assuming only a single dose is administered.
  3. Scaled steady-state - the time interval equal to the length of one treatment cycle, starting from the time point at which the PK profile reaches steady-state. The dose used in the simulation is the average dose calculated over the period from the first dose up to the time of first incidence (TOFI).
  4. Steady-state - the time interval equal to the length of one treatment cycle, starting from the time point at which the PK profile reaches steady-state. The simulation uses the nominal dosing regimen.

For all simulations, you need to define Cycle duration, which should be entered in the Enter a Cycle duration field.

For Scaled steady-state and Steady-state simulations, you also need to define "Steady state cycle", which should be entered in the Enter a Steady state cycle field.

You also need to choose a variable of the model from the dropdown list Simulation output, for which you will make the simulations. In the dropdown list you will see all model variables that were taken from the control file.

Once all fields are filled and the simulation types are selected, start the simulations by pressing button. After completion, the simulation results will be visualized in the right panel.

Figure 3. Expample of obtained exposure simulations plots

You can save the generated plots to the Results folder in your working directory by pressing button. The file will be saved as Results/exposure_simulation.png.


2.2: Select Exposure parameters

Now we can calculate Exposure-Response based on data obtained after simulations for this you should select the exposure parameters needed for further analysis.

Metrics can be selected from the dropdown list Choose exposure metrics:, with the option to choose multiple metrics at once.

Exposure parameters Explained:

  1. Cmax - the maximum concentration of the drug
  2. Cmin - the minimum concentration of the drug
  3. Cavr - the average concentration of the drug
  4. AUC - the square under the drug pharmacokinetics curve

After chosed exposure metrics, start the estimation by pressing button.

The final table, exposure data will appear in the right panel.

Figure 4. Expample of obtained exposure dataset

2.3: Save results

To save the SIMPC and SIMPP datasets, click the button. The files will be saved in the same folder as your exposure dataset, as simpc.csv and simpp.csv, respectively.

To generate and save the exposure–response dataset, select which endpoints you want to include from the dropdown menu ADER dataset type: all Efficacy or all Safety endpoints. Then click the button. The data will be saved in the same folder as your exposure dataset, as adereff.csv or adersaf.csv, respectively.

If your exposure dataset does not contain the EFFFL and SAFFL flags, the exposure–response dataset will include all endpoints and be saved as adereff.csv in the same folder.

After the exposure–response file is generated, a notification will appear in the bottom-right corner indicating which endpoints were included (Figure 5).

Figure 5. Expample of notification. In adereff.csv saved exposure–response data for "INR", "VT", "TEAE" endpoints.

Data initialization

On the Data initialization tab, the dataset is uploaded for subsequent Exposure-response analysis (ER analysis).

ER analysis evaluates the relationship between drug exposure (e.g., AUC, Cmax) and clinical response (e.g., efficacy or safety outcomes). It helps determine whether higher or lower drug exposures lead to different probabilities of a desired effect or adverse event.

Exposure-response dataset (ER dataset) structure should correspond CDISC standards [1].

The dataset must include two types of variables used for analysis: independent variables (predictors) such as exposure metrics and covariates, and dependent variables — response metrics (endpoints).

A single dataset can contain multiple types of responses. The response type is identified in the PARAMCD column, while the values of the dependent variable are stored in the AVAL column. Exposure metrics and covariates are stored in separate columns with appropriate names (e.g. CAVESS, CMINFC, AGE).

Work on the Data initialization tab begins with selecting a dataset for exposure-response analysis. To do this, click button Select a file with ER dataset (csv).

In the opened window, select a csv file from the directory on the server. It can be a file with a dataset generated when working on the Dataset generation tab, or another dataset.

After the file with dataset is loaded, it appears in the preview on the right side of the screen.

Then one should select names of four mandatory columns from dropout lists:

  • Select ID column– the name of the column with the Subject Identificator (e.g. ID, USUBJID).
  • Select PARAMCD column – the name of the column with the Parameter Code (PARAMCD).
  • Select AVAL column – the name of the column with the Analysis Value (AVAL).
  • Select COHORT column – the name of the column with the Cohort values (e.g. DOSE, TRTP).

In the next block of drop-down lists, one can select the names of the response metric, exposure metrics and covariates that will be included in the analysis:

  • Select continuous response variables - the names of the variables from the PARAMCD column (for further work in the Continuous section).

  • Select binary response variables - the names of the variables from the PARAMCD column (for further work in the Binary section).

  • Select exposure variables - the names of all ER dataset columns.

  • Select continuous covariates - the names of all ER dataset columns.

  • Select categorical covariates - the names of all ER dataset columns.

Set working directory button - select a working directory - a folder on the server in which the results of the further analysis will be saved. It could be existing folder or one can create a new one. Selecting a directory is mandatory.

After all the required fields are filled in, click Initialize. If the working directory and required fields are selected, the message “Dataset successfully initialized” will appear.

If the working directory or some required fields are not selected, a warning will appear.

After successful initialization of the dataset, one can proceed to analysis in the Binary or Continuous sections.

References

[1] https://www.cdisc.org/standards/foundational/adam

Binary

Exposure–response analysis for binary endpoints (e.g., response vs. no response) aims to evaluate how drug exposure affects the probability of a clinical outcome. This process includes several key steps:

  • Exploratory Data Analysis (EDA): Understanding the distribution of exposure and response across subgroups.

  • Base Model Development: Building a model that describes the probability of response as a function of exposure.

  • Covariate Search: Identifying patient factors that influence response probability.

  • Model Diagnostics: Assessing the fit and predictive performance of the model including visual predictive check and sensitivity analysis (evaluating the robustness of model predictions to changes in covariates).

  • Forward Simulations: Simulating response probabilities under various dosing or covariate scenarios.

This structured approach supports informed decision-making in dose selection and patient subgroup evaluation.

Exploratory data analysis (EDA)

Exploratory data analysis (EDA) is the process of examining and summarizing datasets to understand their main characteristics before applying formal modeling or hypothesis testing. EDA helps identify patterns, trends, outliers, missing values, and potential relationships within the data.

Before you start working on the EDA tab, make sure that exposure and response metrics are selected in the Data initialization tab. If this is not done, a warning will appear: "Please select exposure and/or response variables in Data Initialization section".

If the metrics are selected, the page will look like this:

The EDA section includes several types of exploratory analysis, each implemented on a separate tab:

  • "Exposure by Cohort" - contains Boxplots of exposure metrics stratified by cohort
  • "Exposure by Endpoint" - contains Boxplots of exposure metrics stratified by dichotomous (binary) endpoint
  • "Empirical logistic plots" – contains Empirical logistic plots
  • "E-R quantile plots" - contains Exposure-response quantile plots
  • "Number of occurences" - contains Table of distribution of exposure by response
  • "Table Exposure by Quartile" – contains Table of distribution of exposure by quartile
  • "Table Exposure by Cohort" - contains Table of distribution of exposure by cohort.

At the top of the EDA tab, there is a Run button and fields for selecting exposureand respons metrics for exploratory analysis. If the fields are empty, then all exposure and response metrics will be included in the analysis.

To include only specific metrics in the analysis, select them in the Select exposure metrics, Select response metrics fields:

Click Run to start the analysis.

The results of the exploratory analysis can be seen on the individual tabs.

After the results are generated, you can adjust the number of metrics for which plots are displayed on the current tab using the Draw plot (Render table) buttons. Choose specific metrics in dropdown lists Select exposure variables and Select response variables at the top of the tab, click Draw plot(Render table) and plots (tables) will change only on the current tab.

1. Exposure by Cohort

Boxplots of exposure metrics stratified by cohort visually compare the distribution of drug exposure (e.g., Cmax, AUC, Css) across different cohorts in a clinical study. These boxplots provide insights into the spread, central tendency, and variability of exposure within each cohort. These plots allows to compare exposure levels across cohorts (e.g., different treatment groups, age categories, renal function groups), assess variability in drug exposure , identify potential outliers that might need further investigation and check for dose proportionality or differences in drug metabolism between groups.

To compare the means of exposure metrics across different cohorts ANOVA method is used. It helps to determine whether the differences in exposure distributions across dose levels are statistically significant.

Order of operations on the tab

Each page displays up to six graphs. If there are more graphs, they are spread across multiple pages. Use the radio buttons to switch between pages. The response and exposure metrics corresponding to each boxplot is indicated in the graph header. The panel of plots appears as follows:

By default, boxplots are colored by cohort, individual points are overlaid, and p-values from the ANOVA method are displayed on the plots. You can customize these visualization parameters using the checkboxes in the left panel.

Click Draw plot to redraw the plot after changing the visualization settings.

Saving Results

Save current .png – saves the panel of plots from the current page to the "EDA" folder in the working directory as a PNG file.

Save all .png – saves all generated panels of plots to the "EDA" folder in the working directory as multiple PNG files.

2. Exposure by Endpoint

Boxplots of exposure metrics stratified by a dichotomous (binary) endpoint visually compare the distribution of drug exposure between two outcome groups, such as, responder vs. non-responder (e.g., efficacy endpoint) or adverse event present vs. absent (e.g., safety endpoint).

To compare the means of exposure metrics corresponding to different types of endpoints T-test method is used. It helps to determine whether the differences in exposure distributions across binary endpoints are statistically significant.

Order of operations on the tab

Each page displays up to six graphs. If there are more graphs, they are spread across multiple pages. Use the radio buttons to switch between pages. The exposure metric corresponding to each boxplot is indicated in the graph header. The panel of plots appears as follows:

By default, boxplots are colored by cohort, individual points are overlaid, and p-values from the T-test are displayed on the plots. The display of individual data points and p-values can be configured from the side panel.

Click Draw plot to redraw the plot after changing the visualization settings.

Saving Results

Save current .png – saves the panel of plots from the current page to the "EDA" folder in the working directory as a PNG file.

Save all .png – saves all generated panels of plots to the "EDA" folder in the working directory as multiple PNG files.

3. Empirical logistic plots

Empirical logistic plots are graphical tools used in binary logistic regression to visualize the relationship between a continuous predictor and the probability of an outcome event. They are particularly useful for assessing the functional form of the predictor before fitting a formal logistic regression model. Helps to determine whether the relationship between the predictor and the outcome follows a linear logit scale (which is an assumption of logistic regression).

Each page displays up to six graphs. If there are more graphs, they are spread across multiple pages. Use the radio buttons to switch between pages. The response and exposure metrics corresponding to each boxplot is indicated in the graph header. The panel of plots appears as follows:

Saving Results

Save current .png – saves the panel of plots from the current page to the "EDA" folder in the working directory as a PNG file.

Save all .png – saves all generated panels of plots to the "EDA" folder in the working directory as multiple PNG files.

4. E-R quantile plots

Exposure-response quantile plots are graphical tools used to explore the relationship between a continuous exposure variable and a response variable. These plots help assess trends in exposure-response relationships without assuming a specific parametric model. The continuous exposure metric is divided into quantiles. For each quantile, the number of responders is calculated. The data is presented as bar plots, indicating the percentage and proportion of responders in the quartile. The x-axis shows the boundaries of the Quartile Groups for the given metric.

Each page displays up to six graphs. If there are more graphs, they are spread across multiple pages. Use the radio buttons to switch between pages. The response and exposure metrics corresponding to each boxplot is indicated in the graph header. The panel of plots appears as follows:

Saving Results

Save current .png – saves the panel of plots from the current page to the "EDA" folder in the working directory as a PNG file.

Save all .png – saves all generated panels of plots to the "EDA" folder in the working directory as multiple PNG files.

5. Number of occurences

This analysis examines whether drug exposure is associated with treatment outcomes. Table indicates persentage of responders and non-responders for each endpoint.

Example of a table:

Saving Results

Save .csv – saves table as a CSV file.

6. Table Exposure by Quartile

A table of distribution of exposure by quartile summarizes how a continuous exposure metric is distributed across quartiles of the population. This table contains information about Quartile Groups (Q1–Q4). The dataset is divided into four equal-sized groups based on exposure levels.

Each table contains data for a single endpoint and all selected exposure metrics. Tables for different endpoints are displayed on separate pages.

Example of a table:

You can choose only some metrics in dropdown lists Select exposure variables and Select response variables, click Render Table and the output will change only on the current tab.

Saving Results

Save current .docx – saves current table as a DOCX file.

Save all .docx – saves all generated tables into a single DOCX file.

7. Table Exposure by Cohort

A table of distribution of exposure by cohort summarizes the distribution of a drug exposure metric across different cohorts in a clinical study. Cohorts are predefined groups of subjects, often based on characteristics such as treatment regimen, age group, disease severity, or other stratification criteria. Main purposes of this table are comparison of exposure levels between different study groups and assessing variability in drug exposure across patient populations.

Key Components:

  1. Cohort Groups: Subjects are grouped by predefined study cohorts (e.g., treatment groups, age categories).
  2. Sample Size (N): The number of subjects in each cohort.
  3. Exposure Range: The minimum and maximum exposure values in each cohort.
  4. Median and Mean Exposure: Measures of central tendency for exposure in each cohort.
  5. Standart Deviation (SD).

Each table contains data for a single endpoint and all selected exposure metrics. Tables for different endpoints are displayed on separate pages.

Example of a table:

Saving Results

Save current .docx – saves current table as a DOCX file.

Save all .docx – saves all generated tables into a single DOCX file.

Base Model

Logistic Model

On this tab you can build a logistic regression model to explore the relationship between exposure metrics and the probability of a response event — a key component in Exposure–Response (ER) analysis.

The logistic model is calculated using the following equation [1]:

$$ \log\left(\frac{p}{1 - p}\right) = aX + b $$

Where:

  • a – the intercept of the model
  • b – the slope (effect size of exposure)
  • X – the value of the exposure metric
  • p – the probability of the response event occurring

To directly calculate the probability p(x) based on the exposure level, use [1]:

$$ p(x) = \frac{1}{1 + e^{-(aX + b)}} = \frac{e^{aX + b}}{1 + e^{aX + b}} $$

This function returns values between 0 and 1, representing the likelihood of the event at a given exposure level.


Running the Model

To begin the modeling process, simply click the Run button. This will automatically initiate optimization of logistic models for all combinations of exposure metrics and response variables, that you have chosen on the Data Initialization tab.

Once the computation is finished, you'll be presented with a detailed summary of the results in the table on the right side o tab.


Output Table

After optimization is complete, a summary table will appear on the right panel of the screen.

exposure plots
Figure 1. Example of the final dataset after model optimization

The table includes the following columns:

  • Endpoint – name of the response variable
  • Exposure – exposure metric used in the model
  • AIC – Akaike Information Criterion (lower is better)
  • -2LogLikelihood – negative log-likelihood value
  • Intercept – estimated intercept
  • Intercept RSE (%) – relative standard error of the intercept
  • Intercept p-value – significance level of the intercept
  • Slope – estimated slope
  • Slope RSE (%) – relative standard error of the slope
  • Slope p-value – significance level of the slope
  • Intercept identifiability – whether the intercept can be reliably estimated
  • Slope identifiability – whether the slope can be reliably estimated

Saving Results

You have flexible options for exporting model results:

  • Save table .csv – download the full summary table in CSV format
  • Save list of all models .Rdata – save all model objects for further analysis in R
  • Save list of best model .Rdata – save only the model with the lowest AIC (i.e., best fit)

Logistic Plots

The Logistic Plots tab provides interactive visualizations to help interpret model predictions.

exposure plots
Figure 2. Example of logistic model predictions

Plot Content

Use the right-hand panel to select the specific exposure metric and response variable you want to visualize.

The plot includes the following components:

  • Y-axis: probability of the response event
  • X-axis: exposure metric value
  • Black curve: model-predicted probability across exposure values
  • Gray area: 95% confidence interval around the prediction
  • Red points: observed individual data points
  • Black dots with whiskers: observed proportions of events in exposure bins (with 95% CI)
  • Blue text: numerical proportions shown directly on the graph

At the bottom, you’ll find boxplots showing how exposure values are distributed across different treatment groups.


Plot Settings

In the left-hand panel, you can fully customize your plots:

  • Set axis titles and numeric limits
  • Toggle logarithmic scale on the axes
  • Display the model's AIC value on the plot
  • Adjust the colors and sizes of curves and points for better visibility

Once your settings are configured, click the Update Plot button to apply changes.


Saving Plots

Export your plots in PNG format with ease:

  • Save current .png – download the currently displayed plot
  • Save all .png – download plots for all exposure–response combinations in batch mode

References

[1] McCullagh, P. (1989). Generalized Linear Models (2nd ed.). Routledge. https://doi.org/10.1201/9780203753736

Covariate search

At this stage of model development, the covariate structure of the model is reconstructed in an automated way. This is the last step of the binary exposure-response model development.

There are two panels on this page:

  • General panel contains inputs for the necessary information for the covariate search
  • Options panel contains inputs with options of the covariate search algorithm

General panel

On General panel the way to the working directory should be provided to the interface. It is done by the Source input. This input has two options. The first one is . If this option is chosen, the working directory will be equal to the one chosen on the Data Initialization panel. Another option is . After choosing this option the user can specify any project folder by pressing on .

For the proper work of the algorithm, the chosen folder should contain the file LogitModelsList.RData with the list of base models for each response variable. The path to the chosen directory is printed in the interface. Also, response variables, exposure metrics and covariates should be specified on the Data Initialization panel.

After inputting all necessary information, the user can press button to start the covariate search algorithm. After the search is finished, the table with best models for each of the provided base models will be printed in the interface. The table will contain the following information:

  1. Final Model Structure:

    • Response The endpoint variable described by the model.

    • Exposure The exposure metric that best characterizes the response variable.

    • Covariates Statistically significant covariates included in the model.

  2. Information Criteria Values:

    • LL The log-likelihood of the fitted model.

    • AIC Akaike Information Criterion values.

  3. Change in Information Criteria:

    The difference in LL and AIC values compared to the corresponding base model.

The user can save this table to the working directory by pressing button. Also, the user can save the list of final models to the working directory by pressing the button.

General Tab Screen

Options panel

The covariate search is performed using stepwise procedure. It consists of two parts: forward selection and backward elimination. On the options tab the parameters of the algorithm can be adjusted. The user can change the metric used for model comparison by Covariate evaluation method input. There are two options: model comparison with log rank test ( option) and Akaike information criterion ( option). In accordance with the chosen evaluation metric, thresholds for forward selection and backward elimination can be changed.

Model diagnostics

On this page you can perform model diagnostics with Visual Predictive Check plot (VPC Plot), Covariate sensitivity plot and Table of model odds ratios.

VPC plot is used to evaluate the fit and predictive performance of a logistic regression model relating drug exposure to the probability of response. It visualizes the model-predicted curve alongside empirical summaries of observed responses.

The Covariate sensitivity plot is used to explore how covariates (both continuous and categorical) impact the odds of response across the range of drug exposure. Main purposes of this plot are to assess the sensitivity of predicted odds to changes in key covariates across the exposure range and to visualize whether covariate effects are constant, increasing, or decreasing with exposure.

Table of odds ratios presents the results of a logistic regression analysis, showing the estimated effects of exposure and covariates on the outcome, including regression coefficients, p-values, and odds ratios with confidence intervals for both unit-based and user-defined changes.

1 Order of operation

1.1. Model selection

In the Model diagnostics tab, the models generated in the Covariate search section are used. Make sure that you save file with models on this tab.

Click Select a file with model .RData, navigate to the working directory, and select the file FinalModels.RData.

After selecting the model file, click Confirm model. Then, from the drop-down list Select response metric, you can select a model by response. Each response corresponds to one model.

Next, click Go to diagnostics.

1.2. Table of continuous covariates

Before running diagnostics one can fill in the tables with additional information about covariates. Using this tables one can add user-friendly names to plot labels and rescale the model parameters. To edit a cell, double-click it with the left mouse button.

Note that in the model transformed covariates can be used. Two types of transformed covariates are available:

  • Log-transformed

  • Median-centered

A median-centered covariate is a continuous covariate that has been transformed by subtracting its median value from each individual value. This results in a covariate whose median is zero, while the distribution and range of values remain the same (only shifted).

The Table of continuous covariates contains the following columns:

  • COV (“Covariate”) – automatically filled in from the model file. The first row corresponds to the exposure metric. The following rows contain the names of model continuous covariates.

  • BTR (“Back Transformed”) – contain the name of the corresponding untransformed covariate from the dataset, if the covariate listed in the ”COV” column is transformed. Only filled in for transformed covariates. Example: If the COV column contains “LOGCAVG” ("Log-transformed C average"), which is obtained by log-transforming the values in the “CAVG” column, then "BTR" should be set to “CAVG”. If the COV column contains “MEDBWT” ("Median-centered Body Weight"), which is the “BWT” covariate transformed by subtracting its median value from each observation, then "BTR" should be set to “BWT”.

  • TRTYPE (“Transformation Type”) – two options are available: “LOG” – for log-transformed covariates and “MED” – for median-centered covariates. Only filled in for transformed covariates.

  • STEP – fill in to change the scale of odds-ratio. Odds-ratio will be calculated per STEP units of continuous covariate. By default odds-ratio is calculated per one unit of continuous covariate.

  • NICENAME – add a user-friendly name for covariate that will appear in plot labels and table

1.3. Table of categorical covariates

The Table of categorical covariates contains the following columns:

  • COV (“Covariate”) – contains the names of model categorical covariates. Automatically filled in from the model file.

  • VAL (“Value”) - contains numeric codes of categories from the dataset. Filled in automatically.

  • NICENAME - add a user-friendly name for the category that will appear in plot labels and table.

  • REFFL ("Reference Flag") - value 1 indicates the reference category, while 0 corresponds to the other categories. Filled in automatically.

1.4. Running diagnostics and saving results

Visualization parameters can be modified from the side panel.

Click Run diagnostics button to start the analysis.

Click Save all to save plots and table. The results will be saved to the “Model diagnostics” folder in the working directory.

2. VPC plot

Plot Description

  • X-axis: Exposure
  • Y-axis: Predicted probability of response (in percent)
  • Line: Median predicted probability curve from the model
  • Shaded Area: Confidence Interval
  • Points: Median observed response probability within each quantile of exposure

Visualization Options

On the sidebar panel you can change the following parameters:

Number of replicas - number of simulated datasets generated using the model to estimate prediction intervals and assess the model's predictive performance.

Number of tiles - number of quartiles of exposure.

log x - add log-transformation of x-scale.

3. Covariate Sensitivity Plot

Plot Description

  • X-axis: Odds ratios — representing the effect of each covariate on the probability of response
  • Y-axis: continuous and categorical covariates.
  • Points: Estimated odds ratios for each covariate at different exposure levels
  • Error Bars: Confidence intervals for the odds ratios

Visualization Options

On the sidebar panel you can change the following parameters:

Select CI of parameters - select confidence interval for odds ratio values (e.g. value 0.95 means 95% confidence interval)

The following fields refer to Predictor distribution:

Central tendency – used for transformed continuous covariates. Specify median for covariates centered on the median, and mean for those centered on the mean.

Sensitivity analysis for continuous covariates is performed using the extreme quantiles of the covariate (e.g., 0.05 and 0.95 by default). On the plot, two points are shown for each continuous covariate, corresponding to the left quantile and right quantile.

Left quantile - left quantile value of continuous predictor.

Right quantile - right quantile value of continuous predictor.

log y - log-transformation of y-scale

add reference group

4. Table of odds ratios

Table represents numerical values of odds ratios and contains the following columns:

Term - names of the model terms (predictors). This includes the intercept, continuous covariates (e.g., age, weight), and categorical variables (e.g. sex, race) with their reference categories.

Estimate (CI) - estimated regression coefficient and its confidence interval (CI) from the logistic regression model. This value represents the change in the log-odds of the response per unit increase in the predictor.

p-value - statistical significance of the predictor. A small p-value (typically < 0.05) suggests that the predictor has a statistically significant effect on the response.

Odds ratio (CI) (per unit of measurement) - odds ratio and its CI for a one-unit increase in the predictor (e.g., 1 year for age, 1 kg for weight). For categorical variables, it represents the odds ratio relative to the reference category.

Odds ratio (CI) (per user-defined change) - odds ratio and its CI based on a user-specified change in the predictor value. For example, this might be a 10-year change in age or a defined change in drug concentration. This column allows users to interpret effect sizes more meaningfully in the context of practical changes.

Forward Simulations

In this tab, users can perform and visualize simulations using one of the fitted models

There are two panels at this tab:

  • Simulation options allows users to configure simulation settings
  • Visualization options enables customization of the simulation visualization

Simulation options

Forward simulations represent the final step in the exposure-response (ER) analysis workflow. By this stage, it is assumed that the users have already completed all prior steps and have obtained a list of final models—one fitted model per clinical endpoint. This list is typically created during the covariate search step and saved as FinalModelsList.RData. Alternatively, the users may generate the list manually. In that case, the file must be named FinalModelsList.RData and formatted as a list of generalized linear models.

Selecting a Model for Simulations

To perform simulation, the users must first select the model they wish to use. This is done in two main steps within the interface:

  1. Specify the Model Directory

    The users must indicate the directory containing the model list. This is configured using the Source input, which provides two options:

    • Uses the working directory selected in the Data Initialization panel.

    • Allows users to specify a different folder by clicking button

    The path to the selected directory is displayed below the Source input, allowing the users to confirm the correct directory before proceeding.

  2. Load and Select the Model

Once the working directory is set, the users should clicks the button to load the models from the FinalModelsList.RData into the interface.

After loading, the users select a specific model from the list by choosing its serial number via the input. Once selected, the users can adjust covariate values within the interface and proceed to run simulations using the chosen model.

Adjusting Covariate Values for Simulations

This section explains how to change covariate values in the interface for simulations.

Continuous covariates

There are several ways to specify continuous covariates values to simulate with.

  1. Define a range using Minimum and Maximum inputs. Then, either:

    • Specify the number of points within this range using the Length input.
    • Define the interval between points using the By input.

    If Length value is specified in the interface, the By value will be ignored. To use By value, clear the Length input.

  2. Enter a list of comma-separated covariate values into the Random Sequence input. These values will be used for simulations. Note that this option works in the absence of the Length and By values.

Categorical Covariates

Users can select specific values for categorical covariates to include in the simulation. Working with cont covs

Selecting Output Type

Users can specify the Output Type for calculations, choosing between:

  • Response - the simulation results will be in response scales. For example, for binomial model the probability of response will be returned with this option
  • Link - the simulation results will be on the scale of linear predictors, without applying the link function. Currently this option is not available on the interface
  • Terms - calculate a matrix giving fitted values in the model formula on the linear predictor scale. Currently it is not available on the interface

Confidence Intervals for Predictions

Users can toggle the inclusion of confidence intervals in model predictions using the checkbox.

Running simulations

After configuring all simulation options, users must click the button to start the process. The resulting plot can then be customized using the Visualization Options panel.

Visualization Options

In this panel, the user can customize how simulation results are displayed.

Main visualization options

The Select plot type input allows users to choose the type of plot. The available plot types are:

  • Scatter displays the simulated response versus the exposure metric as individual points. If standard errors (SE) were calculated on the Simulations panel, they will be shown as a ribbon
  • Pointrange similar to the Scatter plot, but the SE is displayed as an interval around each point.
  • Boxplot represents the simulated response using boxplots

Additional settings include:

  • Select X-axis variable defines the variable to be used on the X-axis
  • Select color variable specifies the variable used to color different data groups
  • Select shape variable assigns different point shapes based on groupings defined by this variable
  • Select line type variable determines the line styles used for different groups
  • Select group variable defines the grouping variable
  • Select facet variable sets the variable used to split the data into facets (subplots)

Aggregation Options

To perform aggregation of simulation results, the users should specify the central tendency measure via the Select central tendency measure input. This input has three options:

  • None aggregation of the data will be done
  • Mean aggregation will be done by averaging the data
  • Median metric will be used for aggregation

Similarly, the variability can be selected by the Select variance measure with options:

  • SD standard deviation
  • Range min-max range
  • IQR interquartile range
  • 80% CI 80% confidence interval
  • 90% CI 90% confidence interval
  • 95% CI 95% confidence interval
  • 99% CI 99% confidence interval

Plot Properties

If a Facet variable is defined, you can set the facet scaling using the Select facet scales input. Options include:

  • Free: both axes can vary across facets
  • Free_x: X-axis varies across facets; Y-axis remains fixed
  • Free_y: Y-axis varies across facets; X-axis remains fixed
  • Fixed: Both axes remain the same across all facets

Other options:

  • Round X-values: specify the number of decimal places for X-axis ticks' labels
  • X as factor: check this box to treat X-axis values as categorical factors
  • Add points: when enabled. observed data points will be added to boxplots

Cosmetics Settings

The title of the x axis can be customized by the X axis label input

Saving and Rendering Results

  • button: click to render the plot using the current settings

  • button: saves the generated plot to the project directory. The name used to save the plot can be specified by Plot name input

  • button: saves the simulation results as a data table. The name used to save the simulation results can be specified by Table name input

About Reporting module

Background

Results of mathematical modeling analyses ought to be communicated to various audiences: fellow modelers, diverse teams of experts (clinical pharmacologists, biologists, etc.), to the regulatory authorities, in the industry or academia, etc. Communication might happen through various means and is typically associated with the compilation of HTML, PDF, MS PowerPoint or MS Word files containing said results. Arrangement of these files takes a significant portion of project time and human resources, with the primary challenge residing in the continuous adjustments of the content (often involving a large volume of tables, images, and cross-references) as the project progresses. As such, a tool that is able to enhance automatization and reproducibility of reports is expected to decrease the timelines and facilitate the quality of modeling analyses.

Objectives

  • GUI for automatic report generation.
  • Provide a library of MS Word and MS PowerPoint report templates.
  • Parsing of the MS Word xml structure.
  • Generate quick reports from active Simurg sessions.

Sections of the module

Report generation

The Report generation tab offers a user-friendly interface for managing objects within .docx or similar files. It is also equipped with a library of templates tailored for various types of modeling analyses, such as population PK and PK/PD reports, exposure-response reports, compiled in accordance with FDA guidelines and current best practices within the industry.

Report initialization

Report generation can be started by one of two options:

  • Choosing a pre-made report template file from the drop-down list
  • Uploading an existing .docx file via button

Once the file is chosen or uploaded into the interface, it is parsed to determine the hierarchy of headers along with the corresponding objects located in each section. The file structure is then represented on the right panel.

Reporting of Simurg project

Generate quick reports from active Simurg sessions using button. After defining project directory path, default directories for the report objects would be determined based on standartized structure of Simurg project directories.

Managing file objects

Each file object is labeled with a caption and is automatically assigned a running number, facilitating easy cross-referencing of figures and tables throughout the text. An object can be added or removed by, respectively, clicking and buttons associated with headers and individual objects. An object type (figure or table) can be specified using option buttons.

An object can be linked with a source file (.jpg, .png, .tiff – for figures; .csv, .xls, .xlsx – for tables) via button. Path of the source file will then be displayed in red below the button. The reporting module stores relative paths in a .xml Control File, which can be visually reviewed and manually adjusted if needed. Upon the report generation, objects will be uploaded under the appropriate caption from the defined source file paths.

Updating of the report

If the Control File for the current report document exists at the source directory, source file paths which are defined there can be assigned to the file objects. If an object has information about its source file path in the Control File, the checkbox will appear to the right of the buttons associated with the object. If the checkbox is checked, the corresponding object will then be uploaded from the path defined in the Control File.

Export

The generated report can be exported by clicking button on the left panel. The window will then appear which allow choosing directory and the name of the .docx report file. Along with the report, the .xml Control File created during report completion will be uploaded to the same directory. Successful document saving is accompanied by the message