Evaluation Walkthrough
This walkthrough is for educational purposes. It breaks the evaluation pipeline
into individual steps so you can see what happens at each stage. In practice,
use the higher-level Evaluation.create (section 7) or the CLI chap evaluate
command rather than calling the lower-level splitting and prediction functions
directly.
For the conceptual overview and architecture diagrams, see Evaluation Pipeline.
1. Loading a Dataset
A DataSet is the central data structure in CHAP. It maps location names to
typed time-series arrays. Load one from CSV:
```python exec="on" session="eval-walkthrough" source="above" from chap_core.spatio_temporal_data.temporal_dataclass import DataSet
dataset = DataSet.from_csv("example_data/laos_subset.csv")
Inspect locations, time range, and available fields:
```python exec="on" session="eval-walkthrough" source="above" result="text"
import dataclasses
print(list(dataset.keys()))
print(dataset.period_range)
print(len(dataset.period_range))
location = list(dataset.keys())[0]
field_names = [f.name for f in dataclasses.fields(dataset[location])]
print(field_names)
Each location holds arrays for time_period, rainfall, mean_temperature,
disease_cases, and population.
2. Splitting the Data
The train_test_generator function implements expanding-window cross-validation.
It returns a training set and an iterator of (historic, masked_future, future_truth)
tuples.
```python exec="on" session="eval-walkthrough" source="above" from chap_core.assessment.dataset_splitting import train_test_generator
train_set, splits = train_test_generator( dataset, prediction_length=3, n_test_sets=4, stride=1 ) splits = list(splits)
The training set covers the earliest portion of the data:
```python exec="on" session="eval-walkthrough" source="above" result="text"
print(train_set.period_range)
print(len(train_set.period_range))
Each split provides three datasets per location:
- historic_data -- all data up to the split point (grows each split)
- masked_future_data -- future covariates without
disease_cases - future_data -- full future data including
disease_cases(ground truth)
```python exec="on" session="eval-walkthrough" source="above" result="text" for i, (historic, masked_future, future_truth) in enumerate(splits): print( f"Split {i}: historic periods={len(historic.period_range)}, " f"future range={future_truth.period_range}" )
## 3. How Test Instances Differ
The historic window expands by `stride` periods with each successive split, while
the future window slides forward:
```python exec="on" session="eval-walkthrough" source="above" result="text"
for i, (historic, masked_future, future_truth) in enumerate(splits):
print(
f"Split {i}: historic={len(historic.period_range)} periods, "
f"future starts at {future_truth.period_range[0]}"
)
The masked future data has climate features but no disease_cases, which is
exactly what a model receives at prediction time:
```python exec="on" session="eval-walkthrough" source="above" result="text" location = list(splits[0][1].keys())[0] masked_fields = [f.name for f in dataclasses.fields(splits[0][1][location])] print(masked_fields)
## 4. Running a Prediction on a Test Instance
Train the `NaiveEstimator` (which predicts Poisson samples around each location's
historical mean) and predict on one split:
```python exec="on" session="eval-walkthrough" source="above"
from chap_core.predictor.naive_estimator import NaiveEstimator
estimator = NaiveEstimator()
predictor = estimator.train(train_set)
historic, masked_future, future_truth = splits[0]
predictions = predictor.predict(historic, masked_future)
The result is a DataSet[Samples] -- each location holds a 2D array of shape
(n_periods, n_samples):
```python exec="on" session="eval-walkthrough" source="above" result="text" location = list(predictions.keys())[0] print(predictions[location].samples.shape)
## 5. Comparing Predictions to Truth
Merge predictions with ground truth using `DataSet.merge`:
```python exec="on" session="eval-walkthrough" source="above" result="text"
from chap_core.datatypes import SamplesWithTruth
import numpy as np
merged = future_truth.merge(predictions, result_dataclass=SamplesWithTruth)
location = list(merged.keys())[0]
print("Observed:", merged[location].disease_cases)
print("Predicted median:", np.median(merged[location].samples, axis=1))
Each SamplesWithTruth entry pairs the observed disease_cases with the
predicted samples array, enabling metric computation.
6. Running a Full Backtest
The backtest function ties sections 2-5 together: it splits the data, trains
the model once, predicts for each split, and merges with ground truth.
```python exec="on" session="eval-walkthrough" source="above" result="text" from chap_core.assessment.prediction_evaluator import backtest
results = list(backtest(estimator, dataset, prediction_length=3, n_test_sets=4, stride=1)) print(f"{len(results)} splits")
for i, result in enumerate(results): print(f"Split {i}: periods={result.period_range}")
Each result is a `DataSet[SamplesWithTruth]` covering all locations for one
test window.
## 7. Creating an Evaluation Object
`Evaluation.create` wraps the full backtest workflow and produces an object that
supports export to flat DataFrames and NetCDF files.
The `NaiveEstimator` provides `model_template_db` and `configured_model_db` class
attributes with the model metadata needed by the evaluation:
Run the evaluation:
```python exec="on" session="eval-walkthrough" source="above"
from chap_core.api_types import BackTestParams
from chap_core.assessment.evaluation import Evaluation
backtest_params = BackTestParams(n_periods=3, n_splits=4, stride=1)
evaluation = Evaluation.create(estimator.configured_model_db, estimator, dataset, backtest_params)
Export to flat DataFrames for inspection:
```python exec="on" session="eval-walkthrough" source="above" import pandas as pd
flat = evaluation.to_flat()
forecasts_df = pd.DataFrame(flat.forecasts) observations_df = pd.DataFrame(flat.observations)
print(forecasts_df.head().to_markdown())