Multi-Region Strategies Walkthrough
This walkthrough compares different strategies for handling multiple geographic regions in a forecasting model. Each strategy represents a different trade-off between sharing information across regions and allowing region-specific behavior.
1. Setup
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session regions; n1>", line 1, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'
We define two estimator classes. GlobalEstimator trains a single model
on all regions combined (features can include location information).
PerRegionEstimator trains a separate model for each region independently.
class GlobalEstimator:
def __init__(self, extract_features):
self.extract_features = extract_features
def train(self, data):
df = data.to_pandas()
X = self.extract_features(df)
y = df["disease_cases"].values
mask = np.isfinite(y) & np.all(np.isfinite(X.values), axis=1)
self.model = LinearRegression().fit(X[mask], y[mask])
return self
def predict(self, historic_data, future_data):
parts, future_mask = [], []
for location in future_data.keys():
hist = historic_data[location].to_pandas().assign(location=location)
fut = future_data[location].to_pandas().assign(location=location)
if "disease_cases" not in fut.columns:
fut["disease_cases"] = np.nan
parts.append(pd.concat([hist, fut], ignore_index=True))
future_mask += [False] * len(hist) + [True] * len(fut)
combined = pd.concat(parts, ignore_index=True)
X = self.extract_features(combined).fillna(0)
pred = np.clip(self.model.predict(X[future_mask]), 0, None)
results, i = {}, 0
for location in future_data.keys():
n = len(future_data[location])
results[location] = Samples(
future_data[location].time_period, pred[i : i + n].reshape(-1, 1)
)
i += n
return DataSet(results)
class PerRegionEstimator:
def __init__(self, extract_features):
self.extract_features = extract_features
def train(self, data):
self.models = {}
for location in data.keys():
df = data[location].to_pandas().assign(location=location)
X = self.extract_features(df)
y = df["disease_cases"].values
mask = np.isfinite(y) & np.all(np.isfinite(X.values), axis=1)
self.models[location] = LinearRegression().fit(X[mask], y[mask])
return self
def predict(self, historic_data, future_data):
results = {}
for location in future_data.keys():
hist = historic_data[location].to_pandas().assign(location=location)
fut = future_data[location].to_pandas().assign(location=location)
if "disease_cases" not in fut.columns:
fut["disease_cases"] = np.nan
combined = pd.concat([hist, fut], ignore_index=True)
n = len(fut)
X = self.extract_features(combined).iloc[-n:].fillna(0)
pred = np.clip(self.models[location].predict(X), 0, None)
results[location] = Samples(
future_data[location].time_period, pred.reshape(-1, 1)
)
return DataSet(results)
def evaluate(estimator, dataset, prediction_length=3, n_test_sets=4):
results = list(backtest(
estimator, dataset,
prediction_length=prediction_length, n_test_sets=n_test_sets,
))
errors = []
for result in results:
for location in result.keys():
truth = result[location].disease_cases
predicted = result[location].samples[:, 0]
errors.extend(np.abs(truth - predicted))
return np.mean(errors)
2. Strategy: Global Model (Ignoring Regions)
A single model trained on all regions, using only season and climate as features. The model has no way to distinguish between regions, so it predicts similar levels for all of them:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session regions; n4>", line 7, in <module>
mae = evaluate(GlobalEstimator(season_climate), dataset)
^^^^^^^
NameError: name 'dataset' is not defined
3. Strategy: Global Model with Location Offset
Adding location indicator variables gives the model a per-region intercept. All other effects (season, climate) are still shared across regions:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session regions; n5>", line 8, in <module>
mae = evaluate(GlobalEstimator(location_season_climate), dataset)
^^^^^^^
NameError: name 'dataset' is not defined
4. Strategy: Separate Model Per Region
Each region gets its own independently fitted model. This allows each region to have completely different seasonal patterns and climate responses:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session regions; n6>", line 1, in <module>
mae = evaluate(PerRegionEstimator(season_climate), dataset)
^^^^^^^
NameError: name 'dataset' is not defined
5. Strategy: Global Model with Location-Specific Seasonality
An intermediate approach: use a single global model, but create interaction features between location and month. This gives each region its own seasonal pattern while sharing climate effects:
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/python.py", line 71, in _run_python
exec_python(code, code_block_id, exec_globals)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.14.3/x64/lib/python3.14/site-packages/markdown_exec/_internal/formatters/_exec_python.py", line 8, in exec_python
exec(compiled, exec_globals) # noqa: S102
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "<code block: session regions; n7>", line 12, in <module>
mae = evaluate(GlobalEstimator(location_x_season_climate), dataset)
^^^^^^^
NameError: name 'dataset' is not defined
6. Discussion
With only 3 regions and 36 months of data, separate per-region models perform well -- each region has enough data to fit the simple model reliably. In datasets with many regions and less data per region, the balance shifts: separate models overfit, and shared approaches that "borrow strength" across regions become important.
Hierarchical (partial pooling) models offer a principled middle ground. Instead of fully sharing or fully separating parameters, they allow each region's parameters to deviate from a shared mean, with the amount of deviation learned from data. This is the primary use case for hierarchical Bayesian models, which Chap supports through frameworks like PyMC.