Progressive Effects Walkthrough
This walkthrough shows how to progressively add modeling effects to a simple linear regression. Each step adds a new type of feature and we measure improvement via backtesting.
By the end, you will have built a model with location-specific offsets, seasonal patterns, climate covariates, and lagged disease cases.
1. Loading the Data
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session effects; n1>", line 1, in <module>
from chap_core.spatio_temporal_data.temporal_dataclass import DataSet
ModuleNotFoundError: No module named 'chap_core'
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session effects; n2>", line 1, in <module>
print("Locations:", list(dataset.keys()))
^^^^^^^
NameError: name 'dataset' is not defined
2. A Basic Estimator
We define a BasicEstimator that takes a feature extraction function.
Different feature functions produce different models, while the estimator
handles the training and prediction boilerplate.
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session effects; n3>", line 1, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'
The predict method combines all locations' historic and future data into a
single DataFrame before extracting features. This ensures feature columns
(like location dummies) stay consistent between training and prediction, and
allows lag-based features to look back into the historic window.
3. Evaluation Helper
We use backtest to run expanding-window cross-validation and compute
mean absolute error (MAE) for each model variant:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session effects; n4>", line 1, in <module>
from chap_core.assessment.prediction_evaluator import backtest
ModuleNotFoundError: No module named 'chap_core'
4. Location-Specific Offset
The simplest region-aware feature: one indicator variable per location. This lets the model learn a different baseline for each region.
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session effects; n5>", line 5, in <module>
mae = evaluate(BasicEstimator(location_offset), dataset)
^^^^^^^^
NameError: name 'evaluate' is not defined
5. Seasonal Effect
Disease incidence often follows seasonal patterns. Adding month-of-year indicators captures periodic variation:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session effects; n6>", line 7, in <module>
mae = evaluate(BasicEstimator(location_and_season), dataset)
^^^^^^^^
NameError: name 'evaluate' is not defined
6. Climate Covariates
Chap provides future climate data (rainfall, temperature) at prediction time, so we can use these as features directly. This captures the relationship between climate conditions and disease incidence:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session effects; n7>", line 8, in <module>
mae = evaluate(BasicEstimator(location_season_climate), dataset)
^^^^^^^^
NameError: name 'evaluate' is not defined
In practice, climate effects on disease are often delayed (e.g. rainfall
affects mosquito breeding over weeks). You can also add lagged climate
features using df.groupby("location")["rainfall"].shift(lag), but with
limited data, adding many lag features risks overfitting.
7. Lagged Target (Disease Cases)
Past disease cases are typically the strongest predictor of future cases. However, lagged target introduces a technical difficulty: at prediction time, future disease cases are unknown.
The simplest solution is to only use lags at least as long as the forecast horizon. Since we predict 3 months ahead, lag 3 is the shortest usable lag -- its value is always known at prediction time.
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session effects; n8>", line 10, in <module>
mae = evaluate(BasicEstimator(all_features), dataset)
^^^^^^^^
NameError: name 'evaluate' is not defined
Using shorter lags (e.g. lag 1 or 2) would require recursive forecasting: predicting one step ahead, feeding that prediction back as input, then predicting the next step. This is more complex to implement and can accumulate errors across steps.