Skip to content
DHIS2.org Community GitHub

Evaluation Walkthrough

This walkthrough is for educational purposes. It breaks the evaluation pipeline into individual steps so you can see what happens at each stage. In practice, use the higher-level Evaluation.create (section 7) or the CLI chap evaluate command rather than calling the lower-level splitting and prediction functions directly.

For the conceptual overview and architecture diagrams, see Evaluation Pipeline.

1. Loading a Dataset

A DataSet is the central data structure in CHAP. It maps location names to typed time-series arrays. Load one from CSV:

```python exec="on" session="eval-walkthrough" source="above" from chap_core.spatio_temporal_data.temporal_dataclass import DataSet

dataset = DataSet.from_csv("example_data/laos_subset.csv")

Inspect locations, time range, and available fields:

```python exec="on" session="eval-walkthrough" source="above" result="text"
import dataclasses

print(list(dataset.keys()))
print(dataset.period_range)
print(len(dataset.period_range))

location = list(dataset.keys())[0]
field_names = [f.name for f in dataclasses.fields(dataset[location])]
print(field_names)

Each location holds arrays for time_period, rainfall, mean_temperature, disease_cases, and population.

2. Splitting the Data

The train_test_generator function implements expanding-window cross-validation. It returns a training set and an iterator of (historic, masked_future, future_truth) tuples.

```python exec="on" session="eval-walkthrough" source="above" from chap_core.assessment.dataset_splitting import train_test_generator

train_set, splits = train_test_generator( dataset, prediction_length=3, n_test_sets=4, stride=1 ) splits = list(splits)

The training set covers the earliest portion of the data:

```python exec="on" session="eval-walkthrough" source="above" result="text"
print(train_set.period_range)
print(len(train_set.period_range))

Each split provides three datasets per location:

  • historic_data -- all data up to the split point (grows each split)
  • masked_future_data -- future covariates without disease_cases
  • future_data -- full future data including disease_cases (ground truth)

```python exec="on" session="eval-walkthrough" source="above" result="text" for i, (historic, masked_future, future_truth) in enumerate(splits): print( f"Split {i}: historic periods={len(historic.period_range)}, " f"future range={future_truth.period_range}" )

## 3. How Test Instances Differ

The historic window expands by `stride` periods with each successive split, while
the future window slides forward:

```python exec="on" session="eval-walkthrough" source="above" result="text"
for i, (historic, masked_future, future_truth) in enumerate(splits):
    print(
        f"Split {i}: historic={len(historic.period_range)} periods, "
        f"future starts at {future_truth.period_range[0]}"
    )

The masked future data has climate features but no disease_cases, which is exactly what a model receives at prediction time:

```python exec="on" session="eval-walkthrough" source="above" result="text" location = list(splits[0][1].keys())[0] masked_fields = [f.name for f in dataclasses.fields(splits[0][1][location])] print(masked_fields)

## 4. Running a Prediction on a Test Instance

Train the `NaiveEstimator` (which predicts Poisson samples around each location's
historical mean) and predict on one split:

```python exec="on" session="eval-walkthrough" source="above"
from chap_core.predictor.naive_estimator import NaiveEstimator

estimator = NaiveEstimator()
predictor = estimator.train(train_set)

historic, masked_future, future_truth = splits[0]
predictions = predictor.predict(historic, masked_future)

The result is a DataSet[Samples] -- each location holds a 2D array of shape (n_periods, n_samples):

```python exec="on" session="eval-walkthrough" source="above" result="text" location = list(predictions.keys())[0] print(predictions[location].samples.shape)

## 5. Comparing Predictions to Truth

Merge predictions with ground truth using `DataSet.merge`:

```python exec="on" session="eval-walkthrough" source="above" result="text"
from chap_core.datatypes import SamplesWithTruth
import numpy as np

merged = future_truth.merge(predictions, result_dataclass=SamplesWithTruth)

location = list(merged.keys())[0]
print("Observed:", merged[location].disease_cases)
print("Predicted median:", np.median(merged[location].samples, axis=1))

Each SamplesWithTruth entry pairs the observed disease_cases with the predicted samples array, enabling metric computation.

6. Running a Full Backtest

The backtest function ties sections 2-5 together: it splits the data, trains the model once, predicts for each split, and merges with ground truth.

```python exec="on" session="eval-walkthrough" source="above" result="text" from chap_core.assessment.prediction_evaluator import backtest

results = list(backtest(estimator, dataset, prediction_length=3, n_test_sets=4, stride=1)) print(f"{len(results)} splits")

for i, result in enumerate(results): print(f"Split {i}: periods={result.period_range}")

Each result is a `DataSet[SamplesWithTruth]` covering all locations for one
test window.

## 7. Creating an Evaluation Object

`Evaluation.create` wraps the full backtest workflow and produces an object that
supports export to flat DataFrames and NetCDF files.

The `NaiveEstimator` provides `model_template_db` and `configured_model_db` class
attributes with the model metadata needed by the evaluation:

Run the evaluation:

```python exec="on" session="eval-walkthrough" source="above"
from chap_core.api_types import BackTestParams
from chap_core.assessment.evaluation import Evaluation

backtest_params = BackTestParams(n_periods=3, n_splits=4, stride=1)
evaluation = Evaluation.create(estimator.configured_model_db, estimator, dataset, backtest_params)

Export to flat DataFrames for inspection:

```python exec="on" session="eval-walkthrough" source="above" import pandas as pd

flat = evaluation.to_flat()

forecasts_df = pd.DataFrame(flat.forecasts) observations_df = pd.DataFrame(flat.observations)

print(forecasts_df.head().to_markdown())

Export to a NetCDF file for sharing or later analysis:

```python exec="on" session="eval-walkthrough" source="above" result="text"
import tempfile

with tempfile.NamedTemporaryFile(suffix=".nc", delete=False) as f:
    evaluation.to_file(f.name)
    print(f"Saved to {f.name}")