Skip to content
DHIS2.org Community GitHub

Progressive Effects Walkthrough

This walkthrough shows how to progressively add modeling effects to a simple linear regression. Each step adds a new type of feature and we measure improvement via backtesting.

By the end, you will have built a model with location-specific offsets, seasonal patterns, climate covariates, and lagged disease cases.

1. Loading the Data

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
    exec_python(code, code_block_id, exec_globals)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
    exec(compiled, exec_globals)  # noqa: S102
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "<code block: session effects; n1>", line 1, in <module>
    from chap_core.spatio_temporal_data.temporal_dataclass import DataSet
ModuleNotFoundError: No module named 'chap_core'
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
    exec_python(code, code_block_id, exec_globals)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
    exec(compiled, exec_globals)  # noqa: S102
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "<code block: session effects; n2>", line 1, in <module>
    print("Locations:", list(dataset.keys()))
                             ^^^^^^^
NameError: name 'dataset' is not defined

2. A Basic Estimator

We define a BasicEstimator that takes a feature extraction function. Different feature functions produce different models, while the estimator handles the training and prediction boilerplate.

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
    exec_python(code, code_block_id, exec_globals)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
    exec(compiled, exec_globals)  # noqa: S102
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "<code block: session effects; n3>", line 1, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

The predict method combines all locations' historic and future data into a single DataFrame before extracting features. This ensures feature columns (like location dummies) stay consistent between training and prediction, and allows lag-based features to look back into the historic window.

3. Evaluation Helper

We use backtest to run expanding-window cross-validation and compute mean absolute error (MAE) for each model variant:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
    exec_python(code, code_block_id, exec_globals)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
    exec(compiled, exec_globals)  # noqa: S102
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "<code block: session effects; n4>", line 1, in <module>
    from chap_core.assessment.prediction_evaluator import backtest
ModuleNotFoundError: No module named 'chap_core'

4. Location-Specific Offset

The simplest region-aware feature: one indicator variable per location. This lets the model learn a different baseline for each region.

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
    exec_python(code, code_block_id, exec_globals)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
    exec(compiled, exec_globals)  # noqa: S102
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "<code block: session effects; n5>", line 5, in <module>
    mae = evaluate(BasicEstimator(location_offset), dataset)
          ^^^^^^^^
NameError: name 'evaluate' is not defined

5. Seasonal Effect

Disease incidence often follows seasonal patterns. Adding month-of-year indicators captures periodic variation:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
    exec_python(code, code_block_id, exec_globals)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
    exec(compiled, exec_globals)  # noqa: S102
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "<code block: session effects; n6>", line 7, in <module>
    mae = evaluate(BasicEstimator(location_and_season), dataset)
          ^^^^^^^^
NameError: name 'evaluate' is not defined

6. Climate Covariates

Chap provides future climate data (rainfall, temperature) at prediction time, so we can use these as features directly. This captures the relationship between climate conditions and disease incidence:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
    exec_python(code, code_block_id, exec_globals)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
    exec(compiled, exec_globals)  # noqa: S102
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "<code block: session effects; n7>", line 8, in <module>
    mae = evaluate(BasicEstimator(location_season_climate), dataset)
          ^^^^^^^^
NameError: name 'evaluate' is not defined

In practice, climate effects on disease are often delayed (e.g. rainfall affects mosquito breeding over weeks). You can also add lagged climate features using df.groupby("location")["rainfall"].shift(lag), but with limited data, adding many lag features risks overfitting.

7. Lagged Target (Disease Cases)

Past disease cases are typically the strongest predictor of future cases. However, lagged target introduces a technical difficulty: at prediction time, future disease cases are unknown.

The simplest solution is to only use lags at least as long as the forecast horizon. Since we predict 3 months ahead, lag 3 is the shortest usable lag -- its value is always known at prediction time.

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
    exec_python(code, code_block_id, exec_globals)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
    exec(compiled, exec_globals)  # noqa: S102
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  File "<code block: session effects; n8>", line 10, in <module>
    mae = evaluate(BasicEstimator(all_features), dataset)
          ^^^^^^^^
NameError: name 'evaluate' is not defined

Using shorter lags (e.g. lag 1 or 2) would require recursive forecasting: predicting one step ahead, feeding that prediction back as input, then predicting the next step. This is more complex to implement and can accumulate errors across steps.