Run models in Chap with your own data
Column names
Rename your columns to match the expected names:
- Time column must be named
time_period - Location column must be named
location - Case count column must be named
disease_cases
For a complete example of a valid Chap CSV, see the Laos dengue dataset. In addition to the required columns, this file also includes the optional covariate columns: population, location_name, rainfall, mean_temperature, and mean_relative_humidity.
Time period format
Convert your dates to the correct format:
- Monthly data:
YYYY-MM(e.g.2023-01,2023-12) - Weekly data:
YYYY-Wnn(e.g.2023-W01,2023-W52)
Consecutive periods
All time periods must be consecutive with no gaps. Every location must have data for every time period in the dataset.
GeoJSON file
If you want spatial visualizations, create a GeoJSON file where each feature's identifier matches the location values in your CSV. Name the GeoJSON file with the same base name as your CSV (e.g. my_data.csv and my_data.geojson) for automatic discovery.
Example: transforming a pandas DataFrame
import pandas as pd
# Suppose you have a DataFrame with different column names
df = pd.DataFrame({
"date": ["2023-01-01", "2023-02-01", "2023-01-01", "2023-02-01"],
"region": ["Region_A", "Region_A", "Region_B", "Region_B"],
"cases": [12, 8, 30, 22],
"rain_mm": [37.9, 8.5, 55.3, 12.1],
})
# Rename columns to match Chap format
df = df.rename(columns={
"region": "location",
"cases": "disease_cases",
"rain_mm": "rainfall",
})
# Convert dates to YYYY-MM format
df["time_period"] = pd.to_datetime(df["date"]).dt.strftime("%Y-%m")
df = df.drop(columns=["date"])
# Reorder columns
df = df[["time_period", "rainfall", "disease_cases", "location"]]
An example of how to do this with climate tools is here.
Next: Validate your data