Describing your model in our yaml-based format
To make your model chap-compatible, you need your train and predict endpoints (as discussed here) need to be formally defined in a YAML format that follows the popular MLflow standard.
Your codebase need to contain a file named MLproject that defines the following:
- An entry point in the MLproject file called
trainwith parameterstrain_dataandmodel - An entry point in the MLproject file called
predictwith parametershistoric_data,future_data,modelandout_file
These should contain commands that can be run to train a model and predict the future using that model. The model parameter should be used to save a model in the train step that can be read and used in the predict step. CHAP will provide all the data (the other parameters) when running a model.
Here is an example of a valid MLproject file (taken from our minimalist_example).
The MLproject file can specify a docker image, Python virtual environment, uv-managed environment, or renv environment (for R models) that will be used when running the commands. An example of this is the MLproject file contained within our minimalist_example_r.
Environment options
Docker environment
Use docker_env to specify a Docker image:
MLflow/Conda environment
Use python_env to specify a conda/pip environment file (uses MLflow to manage):
uv environment
Use uv_env to specify a pyproject.toml for uv-managed environments. This is useful for models that use uv for dependency management:
Commands will be executed via uv run, which automatically handles the virtual environment. Make sure your model directory contains a valid pyproject.toml with dependencies specified. See the example uv model for a complete example.
Example MLproject file with uv:
name: my_model
uv_env: pyproject.toml
entry_points:
train:
parameters:
train_data: str
model: str
command: "python main.py train {train_data} {model}"
predict:
parameters:
model: str
historic_data: str
future_data: str
out_file: str
command: "python main.py predict {model} {historic_data} {future_data} {out_file}"
renv environment (for R models)
Use renv_env to specify an renv.lock file for R models that use renv for dependency management:
When CHAP runs your model, it will automatically:
- Look for the
renv.lockfile in your model directory - Run
renv::restore(prompt = FALSE)to install all required R packages - Execute your R commands with the restored environment
Your model directory should contain:
renv.lock- The lockfile specifying exact package versions (generated byrenv::snapshot())renv/directory - Contains renv activation scripts.Rprofile- Auto-activates renv when R starts (typically containssource("renv/activate.R"))
Example MLproject file with renv:
name: my_r_model
renv_env: renv.lock
entry_points:
train:
parameters:
train_data: str
model: str
command: "Rscript main.R train --train_data {train_data} --model {model}"
predict:
parameters:
historic_data: str
future_data: str
model: str
out_file: str
command: "Rscript main.R predict --model {model} --historic_data {historic_data} --future_data {future_data} --out_file {out_file}"
Setting up renv for your R model
- Initialize renv in your R project:
- Install your required packages:
- Create the lockfile:
This creates renv.lock with exact versions of all dependencies, ensuring reproducible environments.
See the minimalist R model example for a complete working example.
Specifying prediction length constraints
Include min_prediction_length and max_prediction_length in your model configuration to define how many time periods your model can predict ahead. When users need predictions beyond your max_prediction_length, CHAP automatically uses ExtendedPredictor to make iterative predictions (see supporting functionality).
Model Configuration Options
You can define configurable parameters in your MLproject file using user_options. This allows users to customize model behavior when running your model, without modifying the model code itself.
Schema structure
Each option in user_options has the following fields:
title: Display name for the parametertype: One ofstring,integer,number,boolean, orarraydescription: What the parameter doesdefault: Optional default value. If omitted, the parameter is required
Example MLproject with user_options
name: my_model
docker_env:
image: python:3.11
entry_points:
train:
parameters:
train_data: str
model: str
command: "python train.py {train_data} {model}"
predict:
parameters:
historic_data: str
future_data: str
model: str
out_file: str
command: "python predict.py {model} {historic_data} {future_data} {out_file}"
user_options:
n_lag_periods:
title: n_lag_periods
type: integer
default: 3
description: "Number of lag periods to include in the model"
learning_rate:
title: learning_rate
type: number
description: "Learning rate for training (required)"
Providing configuration values
Configuration values can be provided via the --model-configuration-yaml CLI flag when running eval or other commands:
The configuration YAML file should contain the parameter values:
Validation rules
- Options without a
defaultvalue are required and must be provided - Only options defined in
user_optionsare allowed in the configuration file - Values must match the specified type (e.g., integers for
integertype)
Examples in the codebase
See the following examples that use user_options: