Monday Afternoon - 23 Feb
See the Evaluation Walkthrough for a hands-on, step-by-step guide through the evaluation pipeline.
Workshop: Evaluating Models on the Laos Dengue Dataset
In this hands-on exercise you will download a real dataset, run two different models through the Chap evaluation pipeline, and compare the results.
1. Download the dataset
Create a working directory and download the Laos dengue dataset (monthly, admin-1 level):
$ mkdir laos-workshop && cd laos-workshop
$ curl -sL -o chap_LAO_admin1_monthly.csv \
"https://raw.githubusercontent.com/dhis2/climate-health-data/main/lao/chap_LAO_admin1_monthly.csv"
The CSV contains ~2800 rows covering 18 provinces from 1998-2010 with columns: time_period, location, disease_cases, population, location_name, rainfall, mean_temperature, mean_relative_humidity.
2. Explore the dataset
This opens an interactive plot in your browser showing standardized disease cases and climate features across all locations:

3. Evaluate Model A -- Minimalist baseline
Run a backtest with 2 train/test splits and a 3-month forecast horizon using a simple baseline model:
$ chap eval https://github.com/knutdrand/minimalist_example_uv \
chap_LAO_admin1_monthly.csv \
minimalist_eval.nc \
--backtest-params.n-splits 2 \
--backtest-params.n-periods 3
Generate the evaluation plot:
Evaluation plot -- observed vs. predicted cases per location and split:

4. Evaluate Model B -- Auto-EWARS
Now evaluate a more sophisticated model. The EWARS model is fetched directly from GitHub:
$ chap eval https://github.com/dhis2-chap/chap_auto_ewars \
chap_LAO_admin1_monthly.csv \
ewars_eval.nc \
--backtest-params.n-splits 2 \
--backtest-params.n-periods 3
Generate the evaluation plot:
Evaluation plot:

5. Compare models
Export aggregate metrics from both evaluations into a single CSV:
$ chap export-metrics \
--input-files minimalist_eval.nc \
--input-files ewars_eval.nc \
--output-file metrics_comparison.csv
The output CSV contains one row per model with columns for each metric:
| model | mae | rmse | crps | coverage_10_90 |
|---|---|---|---|---|
| minimalist_example_uv | 124.6 | 282.1 | 124.6 | 0.0 |
| chap_auto_ewars | 97.8 | 227.4 | 67.2 | 0.781 |
Lower MAE, RMSE, and CRPS indicate better accuracy. Coverage_10_90 measures how often the true value falls within the 10th-90th percentile prediction interval (ideal: 0.80).
In this comparison, the EWARS model outperforms the minimalist baseline across all metrics.