This document describes the interface contract for ForecastLabAI forecasting models.
All forecasting models implement the BaseForecaster abstract base class, which follows
scikit-learn conventions for estimators.
Fit the model on historical data.
Parameters:
y(np.ndarray): Target values, 1D array of shape[n_samples]X(np.ndarray | None): Optional exogenous features, 2D array of shape[n_samples, n_features]
Returns:
self: For method chaining
Raises:
ValueError: Ifyis empty or has insufficient observations for the model
Example:
model = NaiveForecaster()
model.fit(y=np.array([1, 2, 3, 4, 5]))Generate forecasts for the specified horizon.
Parameters:
horizon(int): Number of steps to forecastX(np.ndarray | None): Optional exogenous features for forecast period
Returns:
np.ndarray: Array of forecasts with shape[horizon]
Raises:
RuntimeError: If model has not been fitted
Example:
forecasts = model.predict(horizon=7)
# Returns: array([5., 5., 5., 5., 5., 5., 5.])Get model parameters (scikit-learn convention).
Returns:
dict: Dictionary of parameter names to values
Example:
params = model.get_params()
# Returns: {"random_state": 42}Set model parameters (scikit-learn convention).
Parameters:
**params: Parameter names and values to set
Returns:
self: For method chaining
Example:
model.set_params(random_state=99)Check if the model has been fitted.
Returns:
Trueiffit()has been called successfully
Class attribute — True when fit()/predict() REQUIRE a non-None X
feature frame. Baseline (target-only) models leave it False; feature-aware
models (e.g. the regression forecaster) override it to True. The forecasting
service branches on this flag instead of an isinstance check or a
model_type string comparison.
Each model type has a corresponding configuration schema:
{
"schema_version": "1.0",
"model_type": "naive"
}{
"schema_version": "1.0",
"model_type": "seasonal_naive",
"season_length": 7 # 1-365
}{
"schema_version": "1.0",
"model_type": "moving_average",
"window_size": 7 # 1-90
}{
"schema_version": "1.0",
"model_type": "regression",
"max_iter": 200, # 10-1000 (boosting iterations)
"learning_rate": 0.05, # 0.001-1.0
"max_depth": 6 # 1-20
}A feature-aware model (requires_features = True): it wraps scikit-learn's
HistGradientBoostingRegressor and consumes a per-day exogenous feature frame.
The feature-frame contract — the canonical column set, the historical vs future
frame shapes, and the leakage taxonomy — is documented in
feature_frame_contract.md.
{
"schema_version": "1.0",
"model_type": "lightgbm",
"n_estimators": 100, # 10-1000 (boosting rounds)
"max_depth": 6, # 1-20
"learning_rate": 0.1 # 0.001-1.0
}A feature-aware model (requires_features = True) wrapping
lightgbm.LGBMRegressor — the first advanced model in the MLZOO sequence
(PRP-30 / MLZOO-B). LightGBM is an optional dependency: install the
ml-lightgbm extra (uv sync --extra dev --extra ml-lightgbm) and enable
forecast_enable_lightgbm=true. It consumes the same canonical feature frame as
regression — see feature_frame_contract.md.
{
"schema_version": "1.0",
"model_type": "xgboost",
"n_estimators": 100, # 10-1000 (boosting rounds)
"max_depth": 6, # 1-20
"learning_rate": 0.1 # 0.001-1.0
}A feature-aware model (requires_features = True) wrapping
xgboost.XGBRegressor — the second advanced tree model in the MLZOO sequence
(PRP-MLZOO-C1). XGBoost is an optional dependency: install the
ml-xgboost extra (uv sync --extra dev --extra ml-xgboost) and enable
forecast_enable_xgboost=true. It consumes the same canonical feature frame as
regression and lightgbm — see feature_frame_contract.md.
{
"schema_version": "1.0",
"model_type": "prophet_like",
"alpha": 1.0 # 0.0-10000.0 (Ridge L2 regularization strength)
}A feature-aware model (requires_features = True) — a deterministic,
regularized additive linear model (MLZOO-C2). It is a scikit-learn
Pipeline of a SimpleImputer(median) + a Ridge(solver="cholesky") over the
same canonical 14-column feature frame as regression. Unlike the tree models
it ships always-enabled: pure scikit-learn, no optional extra, no feature
flag. It exposes a model-specific decompose() method that splits any forecast
into its additive trend / seasonality / holiday-regressor contributions.
It is "Prophet-like", not Prophet: it approximates Prophet's additive shape
with a linear model over engineered features. It does not add the real
prophet/Stan dependency and does not model changepoint trend, posterior
uncertainty intervals, or automatic seasonality discovery.
ŷ[t+h] = y[t] for all h ∈ [1, horizon]
Predicts the last observed value for all future horizons.
ŷ[t+h] = y[t + h - m] where m = season_length
Predicts the value from the same position in the previous seasonal cycle.
ŷ[t+h] = mean(y[t-window+1:t+1]) for all h ∈ [1, horizon]
Predicts the average of the last window_size observations.
ŷ[t+h] = HistGradientBoostingRegressor.predict(X[t+h])
Predicts each horizon day from its exogenous feature row X[t+h] (target
long-lags, calendar, and posited price/promotion inputs). Unlike the baselines
it REQUIRES a feature frame — see feature_frame_contract.md.
ŷ[t+h] = LGBMRegressor.predict(X[t+h])
Same exogenous-feature contract as the regression forecaster, but the estimator
is lightgbm.LGBMRegressor — gradient-boosted leaf-wise trees. Feature-aware
(requires_features = True), deterministic (n_jobs=1, deterministic=True,
force_col_wise=True, fixed random_state), and NaN-tolerant. Optional —
behind the ml-lightgbm extra and the forecast_enable_lightgbm flag.
ŷ[t+h] = XGBRegressor.predict(X[t+h])
Same exogenous-feature contract as the regression and LightGBM forecasters, but
the estimator is xgboost.XGBRegressor — gradient-boosted trees. Feature-aware
(requires_features = True), deterministic (n_jobs=1, tree_method="hist",
fixed random_state, no stochastic subsampling), and NaN-tolerant
(missing=np.nan). Optional — behind the ml-xgboost extra and the
forecast_enable_xgboost flag.
ŷ[t+h] = intercept + trend[t+h] + seasonality[t+h] + holiday_regressor[t+h]
An additive linear forecast: a Ridge fit gives ŷ = intercept + Σ coefᵢ·xᵢ,
and that sum is grouped into three Prophet-style components, each the partial
sum over its columns of the canonical 14-column frame:
| Component | Canonical columns |
|---|---|
trend |
lag_1, lag_7, lag_14, lag_28, days_since_launch |
seasonality |
dow_sin, dow_cos, month_sin, month_cos, is_weekend, is_month_end |
holiday_regressor |
price_factor, promo_active, is_holiday |
The three column sets partition all 14 columns exactly, so the additive
invariant holds: decompose(X)'s four parts sum (within float tolerance) to
predict(...). Feature-aware (requires_features = True), deterministic
(Ridge(solver="cholesky") closed-form, SimpleImputer(median)), and
NaN-tolerant via the imputer. Pure scikit-learn — always available, no extra,
no flag. The decompose() method (model-specific, not on BaseForecaster)
returns the four-way breakdown.
Models are persisted using ModelBundle which includes:
@dataclass
class ModelBundle:
model: BaseForecaster # Fitted model
config: ModelConfig # Configuration used
metadata: dict[str, Any] # Custom metadata (store_id, dates, etc.)
created_at: datetime # Save timestamp
python_version: str # Python version
sklearn_version: str # Scikit-learn version
lightgbm_version: str | None # LightGBM version (None if extra not installed)
xgboost_version: str | None # XGBoost version (None if extra not installed)
bundle_hash: str # Deterministic hashfrom app.features.forecasting.persistence import save_model_bundle, load_model_bundle
# Save
path = save_model_bundle(bundle, "./artifacts/models/my_model")
# Load
bundle = load_model_bundle(path)
forecasts = bundle.model.predict(horizon=7)All models must be deterministic given the same:
- Input data (
y,X) - Configuration parameters
random_state
This ensures reproducibility in experiments and backtesting.
| Method | Input Shape | Output Shape |
|---|---|---|
fit(y) |
[n_samples] |
self |
fit(y, X) |
y: [n_samples], X: [n_samples, n_features] |
self |
predict(horizon) |
int |
[horizon] |
predict(horizon, X) |
int, X: [horizon, n_features] |
[horizon] |
| Scenario | Exception | Message |
|---|---|---|
| Empty training data | ValueError |
"Cannot fit on empty array" |
| Insufficient data for seasonal | ValueError |
"Need at least {season_length} observations" |
| Insufficient data for MA | ValueError |
"Need at least {window_size} observations" |
| Predict before fit | RuntimeError |
"Model must be fitted before predict" |
| Unknown model type | ValueError |
"Unknown model type: {type}" |