diff --git a/_freeze/docs/reference/forecaster.recursive._forecaster_recursive_multiseries/execute-results/html.json b/_freeze/docs/reference/forecaster.recursive._forecaster_recursive_multiseries/execute-results/html.json new file mode 100644 index 00000000..d5c53df0 --- /dev/null +++ b/_freeze/docs/reference/forecaster.recursive._forecaster_recursive_multiseries/execute-results/html.json @@ -0,0 +1,16 @@ +{ + "hash": "11b50ecaec69a399d0d38f219b22edb0", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: forecaster.recursive._forecaster_recursive_multiseries\n---\n\n\n\n`forecaster.recursive._forecaster_recursive_multiseries`\n\n\n\n## Classes\n\n| Name | Description |\n| --- | --- |\n| [ForecasterRecursiveMultiSeries](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries) | Recursive autoregressive forecaster for multiple time series. |\n\n### ForecasterRecursiveMultiSeries { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries(\n estimator=None,\n lags=None,\n window_features=None,\n encoding='ordinal',\n transformer_series=None,\n transformer_exog=None,\n weight_func=None,\n series_weights=None,\n differentiation=None,\n dropna_from_series=False,\n fit_kwargs=None,\n binner_kwargs=None,\n forecaster_id=None,\n regressor=None,\n)\n```\n\nRecursive autoregressive forecaster for multiple time series.\n\nThis class turns any estimator compatible with the scikit-learn API into a\nrecursive autoregressive (multi-step) forecaster for multiple series.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------------------|--------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|\n| estimator | [object](`object`) | An instance of an estimator or pipeline compatible with the scikit-learn API. | `None` |\n| lags | ([int](`int`), [list](`list`), [np](`numpy`).[ndarray](`numpy.ndarray`), [range](`range`), None) | Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. - `int`: include lags from 1 to `lags` (included). - `list`, `1d numpy ndarray` or `range`: include only lags present in `lags`, all elements must be int. - `None`: no lags are included as predictors. | `None` |\n| window_features | ([object](`object`), [list](`list`), None) | Instance or list of instances used to create window features from the original time series. | `None` |\n| encoding | ([str](`str`), None) | Encoding used to identify the different series. - `'ordinal'`: single column with integer values [0, n_series - 1]. - `'ordinal_category'`: same as 'ordinal' but with pandas.category dtype. - `'onehot'`: binary column for each series. - `None`: no series identification column is created. Defaults to 'ordinal'. | `'ordinal'` |\n| transformer_series | ([transformer](`transformer`), [dict](`dict`), None) | Preprocessor compatible with scikit-learn (fit/transform/inverse_transform). Applied to each series before training. ColumnTransformers are not allowed. - Single transformer: cloned and applied to all series. - Dict: different transformer per series. | `None` |\n| transformer_exog | ([transformer](`transformer`), None) | Preprocessor for exogenous variables. `inverse_transform` is not available for ColumnTransformers. | `None` |\n| weight_func | ([Callable](`typing.Callable`), [dict](`dict`), None) | Function defining weights for each sample based on index. Only used if estimator supports `sample_weight`. - Single function: applied to all series. - Dict: `{'series_name': Callable}`, others get weight 1. | `None` |\n| series_weights | ([dict](`dict`), None) | Weights per series `{'series_name': float}`. Applied if estimator supports `sample_weight`. Others get weight 1. If `None`, all levels have weight 1. | `None` |\n| differentiation | ([int](`int`), [dict](`dict`), None) | Order of differencing applied to series. - `int`: same order for all series. - `dict`: different order per series (keys are series names). - `None`: no differencing. | `None` |\n| dropna_from_series | [bool](`bool`) | If `True`, drops rows with NaNs in X_train/y_train. If `False`, leaves NaNs and warns. Defaults to `False`. | `False` |\n| fit_kwargs | ([dict](`dict`), None) | Additional arguments for the estimator's `fit` method. | `None` |\n| binner_kwargs | ([dict](`dict`), None) | Arguments for `QuantileBinner` used for residuals. Includes `n_bins`, `method`, `subsample`, `random_state`, `dtype`. | `None` |\n| forecaster_id | ([str](`str`), [int](`int`), None) | Identifier for the forecaster. | `None` |\n| regressor | [object](`object`) | **Deprecated**, alias for `estimator`. | `None` |\n\n#### Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|------------------------------------|-----------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|\n| estimator | [object](`object`) | Estimator or pipeline compatible with scikit-learn. |\n| lags | [np](`numpy`).[ndarray](`numpy.ndarray`) | Lags used as predictors. |\n| lags_names | [list](`list`) | Names of the lags used as predictors. |\n| max_lag | [int](`int`) | Maximum lag included in `lags`. |\n| window_features | [list](`list`) | Classes used to create window features. |\n| window_features_names | [list](`list`) | Names of the window features in `X_train`. |\n| window_features_class_names | [list](`list`) | Names of classes for window features. |\n| max_size_window_features | [int](`int`) | Maximum window size for window features. |\n| window_size | [int](`int`) | Required window size for predictors. Max of `max_lag` and `max_size_window_features`, plus differentiation order if applicable. |\n| encoding | [str](`str`) | Encoding used to identify the different series. |\n| encoder | [sklearn](`sklearn`).[preprocessing](`sklearn.preprocessing`) | Scikit-learn encoder for series. |\n| encoding_mapping_ | [dict](`dict`) | Mapping of the encoding for series identities. |\n| transformer_series | ([transformer](`transformer`), [dict](`dict`)) | Transformer(s) for the series. |\n| transformer_series_ | [dict](`dict`) | Internal dictionary of series transformers. |\n| transformer_exog | [transformer](`transformer`) | Transformer for exogenous variables. |\n| weight_func | ([Callable](`typing.Callable`), [dict](`dict`)) | Weighting function(s). |\n| weight_func_ | [dict](`dict`) | Internal dictionary of weighting functions. |\n| source_code_weight_func | ([str](`str`), [dict](`dict`)) | Source code of weighting functions. |\n| series_weights | [dict](`dict`) | Weights associated with each series. |\n| series_weights_ | [dict](`dict`) | Internal dictionary of series weights. |\n| differentiation | ([int](`int`), [dict](`dict`)) | Differencing order applied to series. |\n| differentiation_max | [int](`int`) | Maximum order of differentiation. |\n| differentiator | ([TimeSeriesDifferentiator](`spotforecast2_safe.preprocessing.TimeSeriesDifferentiator`), [dict](`dict`)) | Differentiation objects. |\n| differentiator_ | [dict](`dict`) | Internal dictionary of differentiators. |\n| dropna_from_series | [bool](`bool`) | Whether to drop NaNs from training matrices. |\n| last_window_ | [dict](`dict`) | Last training window per series in original scale. |\n| index_type_ | [type](`type`) | Type of training index. |\n| index_freq_ | [str](`str`) | Frequency of training index. |\n| training_range_ | [dict](`dict`) | First/last training index values per series. |\n| series_names_in_ | [list](`list`) | Series names provided during training. |\n| exog_in_ | [bool](`bool`) | `True` if trained with exogenous variables. |\n| exog_names_in_ | [list](`list`) | Names of exogenous variables used. |\n| exog_type_in_ | [type](`type`) | Type of exogenous data used. |\n| exog_dtypes_in_ | [dict](`dict`) | Exogenous data types before transformation. |\n| exog_dtypes_out_ | [dict](`dict`) | Exogenous data types after transformation. |\n| X_train_series_names_in_ | [list](`list`) | Series names in the internal `X_train`. |\n| X_train_window_features_names_out_ | [list](`list`) | Window feature names in `X_train`. |\n| X_train_exog_names_out_ | [list](`list`) | Exogenous variable names in `X_train`. |\n| X_train_features_names_out_ | [list](`list`) | All feature column names in `X_train`. |\n| fit_kwargs | [dict](`dict`) | Arguments passed to estimator's `fit`. |\n| in_sample_residuals_ | [dict](`dict`) | Training residuals (up to 10k per series). |\n| in_sample_residuals_by_bin_ | [dict](`dict`) | Binned in-sample residuals. |\n| out_sample_residuals_ | [dict](`dict`) | Non-training residuals (up to 10k per series). |\n| out_sample_residuals_by_bin_ | [dict](`dict`) | Binned out-of-sample residuals. |\n| binner | [dict](`dict`) | `QuantileBinner` objects per series. |\n| binner_intervals_ | [dict](`dict`) | Binning intervals per series. |\n| binner_kwargs | [dict](`dict`) | Arguments used for `QuantileBinner`. |\n| creation_date | [str](`str`) | Forecaster creation date. |\n| is_fitted | [bool](`bool`) | `True` if the forecaster has been fitted. |\n| fit_date | [str](`str`) | Date of last fit. |\n| spotforecast_version | [str](`str`) | Version of the library used. |\n| python_version | [str](`str`) | Python version used. |\n| forecaster_id | ([str](`str`), [int](`int`)) | Forecaster identifier. |\n\n#### Notes {.doc-section .doc-section-notes}\n\nWeights are used to control the influence of each observation.\n`ForecasterRecursiveMultiSeries` accepts two types of weights:\n- `series_weights`: Controls relative importance of each series.\n- `weight_func`: Controls relative importance based on index values.\nIf both are provided, they are multiplied. Negative weights are not allowed.\n\n#### Examples {.doc-section .doc-section-examples}\n\n\n::: {#162183d1 .cell execution_count=1}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\nforecaster = ForecasterRecursiveMultiSeries(\n estimator=Ridge(),\n lags=3,\n forecaster_id=\"demo\",\n)\nforecaster.fit(series, suppress_warnings=True)\npredictions = forecaster.predict(steps=3, suppress_warnings=True)\nprint(predictions)\nassert predictions.shape == (6, 2)\nassert set(predictions[\"level\"].unique()) == {\"A\", \"B\"}\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n level pred\n60 A -0.115562\n60 B 0.202789\n61 A 0.037961\n61 B 0.158784\n62 A 0.075896\n62 B 0.061138\n```\n:::\n:::\n\n\n#### Methods\n\n| Name | Description |\n| --- | --- |\n| [create_predict_X](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.create_predict_X) | Create the predictors needed to predict `steps` ahead. |\n| [create_sample_weights](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.create_sample_weights) | Create weights for each observation according to the forecaster's attributes |\n| [create_train_X_y](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.create_train_X_y) | Create training matrices from multiple time series and exogenous variables. |\n| [fit](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.fit) | Training Forecaster. |\n| [get_feature_importances](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.get_feature_importances) | Return feature importances of the estimator. |\n| [predict](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict) | Predict n steps ahead. |\n| [predict_bootstrapping](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict_bootstrapping) | Generate multiple forecasting predictions using a bootstrapping process. |\n| [predict_dist](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict_dist) | Fit a given probability distribution for each step. |\n| [predict_interval](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict_interval) | Predict n steps ahead and estimate prediction intervals. |\n| [predict_quantiles](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict_quantiles) | Calculate the specified quantiles for each step. |\n| [set_fit_kwargs](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_fit_kwargs) | Set new values for the additional keyword arguments passed to the `fit` method. |\n| [set_in_sample_residuals](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_in_sample_residuals) | Set in-sample residuals in case they were not calculated during training. |\n| [set_lags](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_lags) | Set new value to the attribute `lags`. |\n| [set_out_sample_residuals](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_out_sample_residuals) | Set new values to the attribute `out_sample_residuals_`. |\n| [set_params](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_params) | Set new values to the parameters of the scikit-learn model. |\n| [set_window_features](#spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_window_features) | Set new value to the attribute `window_features`. |\n\n##### create_predict_X { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.create_predict_X }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.create_predict_X(\n steps,\n levels=None,\n last_window=None,\n exog=None,\n suppress_warnings=False,\n check_inputs=True,\n)\n```\n\nCreate the predictors needed to predict `steps` ahead.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------------|------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|------------|\n| steps | [int](`int`) | Number of steps to predict. | _required_ |\n| levels | ([str](`str`), [list](`list`), None) | Time series to be predicted. Defaults to `None`. | `None` |\n| last_window | ([pd](`pandas`).[DataFrame](`pandas.DataFrame`), None) | Series values for predictors. Defaults to `None`. | `None` |\n| exog | ([pd](`pandas`).[Series](`pandas.Series`), [pd](`pandas`).[DataFrame](`pandas.DataFrame`), [dict](`dict`), None) | Exogenous variable/s included as predictor/s. Defaults to `None`. | `None` |\n| suppress_warnings | [bool](`bool`) | If `True`, suppress warnings. Defaults to `False`. | `False` |\n| check_inputs | [bool](`bool`) | If `True`, check input for warnings/errors. Defaults to `True`. | `True` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|----------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: Long-format DataFrame with the predictors. |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#0a7c054d .cell execution_count=2}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(), lags=3)\nforecaster.fit(series, suppress_warnings=True)\nX_predict = forecaster.create_predict_X(steps=2, suppress_warnings=True)\nprint(\"X_predict shape:\", X_predict.shape)\nprint(X_predict.head(4))\nassert \"level\" in X_predict.columns\nassert \"lag_1\" in X_predict.columns\nassert X_predict.shape[0] == 2 * len(forecaster.series_names_in_)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nX_predict shape: (4, 5)\n level lag_1 lag_2 lag_3 _level_skforecast\n60 A -0.661703 -1.184118 0.696043 0.0\n60 B 1.164864 0.843733 0.726094 1.0\n61 A -0.115562 -0.661703 -1.184118 0.0\n61 B 0.202789 1.164864 0.843733 1.0\n```\n:::\n:::\n\n\n##### create_sample_weights { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.create_sample_weights }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.create_sample_weights(\n series_names_in_,\n X_train,\n)\n```\n\nCreate weights for each observation according to the forecaster's attributes\n`series_weights` and `weight_func`. The resulting weights are product of both\ntypes of weights.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------------|------------------------------------------------|-------------------------------------------------------|------------|\n| series_names_in_ | [list](`list`) | Names of the series (levels) used during training. | _required_ |\n| X_train | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Dataframe created with the `create_train_X_y` method. | _required_ |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------|---------------------------------------------|\n| | [np](`numpy`).[ndarray](`numpy.ndarray`) | np.ndarray: Weights to use in `fit` method. |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#32be58f1 .cell execution_count=3}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\n# Series A gets twice the weight of series B during training.\nforecaster = ForecasterRecursiveMultiSeries(\n estimator=Ridge(),\n lags=3,\n series_weights={\"A\": 2.0, \"B\": 1.0},\n)\nX_train, y_train = forecaster.create_train_X_y(series, suppress_warnings=True)\nweights = forecaster.create_sample_weights(\n series_names_in_=[\"A\", \"B\"], X_train=X_train\n)\nprint(\"weights shape:\", weights.shape)\nprint(\"unique weight values:\", np.unique(weights))\nassert weights is not None\nassert set(np.unique(weights)).issubset({1.0, 2.0})\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nweights shape: (114,)\nunique weight values: [1. 2.]\n```\n:::\n:::\n\n\n##### create_train_X_y { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.create_train_X_y }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.create_train_X_y(\n series,\n exog=None,\n suppress_warnings=False,\n)\n```\n\nCreate training matrices from multiple time series and exogenous variables.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------------|------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|------------|\n| series | ([pd](`pandas`).[DataFrame](`pandas.DataFrame`), [dict](`dict`)) | Training time series. | _required_ |\n| exog | ([pd](`pandas`).[Series](`pandas.Series`), [pd](`pandas`).[DataFrame](`pandas.DataFrame`), [dict](`dict`), None) | Exogenous variable/s included as predictor/s. Defaults to `None`. | `None` |\n| suppress_warnings | [bool](`bool`) | If `True`, skforecast warnings will be suppressed during creation. Defaults to `False`. | `False` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|--------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| tuple | [tuple](`tuple`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`), [pd](`pandas`).[Series](`pandas.Series`)\\] | A tuple containing: - X_train (pd.DataFrame): Training values (predictors). - y_train (pd.Series): Values (target) of the time series related to each row of `X_train`. |\n\n###### Notes {.doc-section .doc-section-notes}\n\nSee `_create_train_X_y` for detailed information on input types.\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#68840a36 .cell execution_count=4}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(), lags=3)\nX_train, y_train = forecaster.create_train_X_y(\n series, suppress_warnings=True\n)\nprint(\"X_train shape:\", X_train.shape)\nprint(\"y_train shape:\", y_train.shape)\nprint(X_train.head(3))\nassert X_train.shape[0] == y_train.shape[0]\nassert \"lag_1\" in X_train.columns\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nX_train shape: (114, 4)\ny_train shape: (114,)\n lag_1 lag_2 lag_3 _level_skforecast\n3 0.640423 -0.132105 0.125730 0\n4 0.104900 0.640423 -0.132105 0\n5 -0.535669 0.104900 0.640423 0\n```\n:::\n:::\n\n\n##### fit { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.fit }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.fit(\n series,\n exog=None,\n store_last_window=True,\n store_in_sample_residuals=False,\n random_state=123,\n suppress_warnings=False,\n)\n```\n\nTraining Forecaster.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|---------------------------|------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|------------|\n| series | ([pd](`pandas`).[DataFrame](`pandas.DataFrame`), [dict](`dict`)) | Training time series. | _required_ |\n| exog | ([pd](`pandas`).[Series](`pandas.Series`), [pd](`pandas`).[DataFrame](`pandas.DataFrame`), [dict](`dict`), None) | Exogenous variable/s included as predictor/s. Defaults to `None`. | `None` |\n| store_last_window | ([bool](`bool`), [list](`list`)) | Whether or not to store the last window (`last_window_`) of training data. Defaults to `True`. | `True` |\n| store_in_sample_residuals | [bool](`bool`) | If `True`, in-sample residuals will be stored in the forecaster object after fitting. Defaults to `False`. | `False` |\n| random_state | [int](`int`) | Set a seed for the random generator so that the stored sample residuals are always deterministic. Defaults to `123`. | `123` |\n| suppress_warnings | [bool](`bool`) | If `True`, skforecast warnings will be suppressed during training. Defaults to `False`. | `False` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|--------|---------------|\n| | None | None |\n\n###### Notes {.doc-section .doc-section-notes}\n\n- If `series` is a wide-format pandas DataFrame, each column represents\n a different time series.\n- If `series` is a long-format pandas DataFrame with a MultiIndex, the\n first level of the index must contain the series IDs.\n- If series is a dictionary, each key must be a series ID.\n- If `exog` is a wide-format pandas DataFrame, it must share the same\n index type as series.\n- If `exog` is a long-format pandas Series or DataFrame with a MultiIndex,\n the first level contains the series IDs.\n- If `exog` is a dictionary, each key must correspond to a series ID.\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#a23b726a .cell execution_count=5}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(), lags=3)\nforecaster.fit(\n series,\n store_in_sample_residuals=True,\n suppress_warnings=True,\n)\nprint(\"Fitted:\", forecaster.is_fitted)\nprint(\"Series seen:\", forecaster.series_names_in_)\nassert forecaster.is_fitted\nassert set(forecaster.series_names_in_) == {\"A\", \"B\"}\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nFitted: True\nSeries seen: ['A', 'B']\n```\n:::\n:::\n\n\n##### get_feature_importances { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.get_feature_importances }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.get_feature_importances(\n sort_importance=True,\n)\n```\n\nReturn feature importances of the estimator.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------------|----------------|-----------------------------------------------------------------------------------|-----------|\n| sort_importance | [bool](`bool`) | If `True`, sorts the feature importances in descending order. Defaults to `True`. | `True` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|-------------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: Feature importances associated with each predictor. |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#849f0f39 .cell execution_count=6}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.ensemble import RandomForestRegressor\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\nforecaster = ForecasterRecursiveMultiSeries(\n estimator=RandomForestRegressor(n_estimators=10, random_state=0),\n lags=3,\n)\nforecaster.fit(series, suppress_warnings=True)\nimportances = forecaster.get_feature_importances()\nprint(importances)\nassert \"feature\" in importances.columns\nassert \"importance\" in importances.columns\nassert abs(importances[\"importance\"].sum() - 1.0) < 1e-10\nassert len(importances) == len(forecaster.X_train_features_names_out_)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n feature importance\n2 lag_3 0.396469\n1 lag_2 0.381419\n0 lag_1 0.177297\n3 _level_skforecast 0.044816\n```\n:::\n:::\n\n\n##### predict { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict(\n steps,\n levels=None,\n last_window=None,\n exog=None,\n suppress_warnings=False,\n check_inputs=True,\n)\n```\n\nPredict n steps ahead.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------------|------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|------------|\n| steps | [int](`int`) | Number of steps to predict. | _required_ |\n| levels | ([str](`str`), [list](`list`), None) | Time series to be predicted. Defaults to `None`. | `None` |\n| last_window | ([pd](`pandas`).[DataFrame](`pandas.DataFrame`), None) | Series values for predictors. Defaults to `None`. | `None` |\n| exog | ([pd](`pandas`).[Series](`pandas.Series`), [pd](`pandas`).[DataFrame](`pandas.DataFrame`), [dict](`dict`), None) | Exogenous variable/s included as predictor/s. Defaults to `None`. | `None` |\n| suppress_warnings | [bool](`bool`) | If `True`, suppress warnings. Defaults to `False`. | `False` |\n| check_inputs | [bool](`bool`) | If `True`, check input for warnings/errors. Defaults to `True`. | `True` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|-----------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: Long-format DataFrame with the predictions. |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#08fd55b2 .cell execution_count=7}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(), lags=3)\nforecaster.fit(series, suppress_warnings=True)\npredictions = forecaster.predict(steps=4, suppress_warnings=True)\nprint(predictions)\nassert predictions.shape == (8, 2)\nassert \"level\" in predictions.columns\nassert \"pred\" in predictions.columns\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n level pred\n60 A -0.115562\n60 B 0.202789\n61 A 0.037961\n61 B 0.158784\n62 A 0.075896\n62 B 0.061138\n63 A 0.071201\n63 B 0.086680\n```\n:::\n:::\n\n\n##### predict_bootstrapping { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict_bootstrapping }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict_bootstrapping(\n steps,\n levels=None,\n last_window=None,\n exog=None,\n n_boot=250,\n use_in_sample_residuals=True,\n use_binned_residuals=True,\n random_state=123,\n suppress_warnings=False,\n)\n```\n\nGenerate multiple forecasting predictions using a bootstrapping process.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------------------|------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|------------|\n| steps | [int](`int`) | Number of steps to predict. | _required_ |\n| levels | ([str](`str`), [list](`list`), None) | Time series to be predicted. Defaults to `None`. | `None` |\n| last_window | ([pd](`pandas`).[DataFrame](`pandas.DataFrame`), None) | Series values used to create predictors. Defaults to `None`. | `None` |\n| exog | ([pd](`pandas`).[Series](`pandas.Series`), [pd](`pandas`).[DataFrame](`pandas.DataFrame`), [dict](`dict`), None) | Exogenous variable/s included as predictor/s. Defaults to `None`. | `None` |\n| n_boot | [int](`int`) | Number of bootstrapping iterations. Defaults to `250`. | `250` |\n| use_in_sample_residuals | [bool](`bool`) | If `True`, use residuals from training data. Defaults to `True`. | `True` |\n| use_binned_residuals | [bool](`bool`) | If `True`, residuals are selected based on predicted values. Defaults to `True`. | `True` |\n| random_state | [int](`int`) | Seed for reproducibility. Defaults to `123`. | `123` |\n| suppress_warnings | [bool](`bool`) | If `True`, suppress warnings. Defaults to `False`. | `False` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|-------------------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: Long-format DataFrame with the bootstrapping predictions. |\n\n###### References {.doc-section .doc-section-references}\n\n[1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos.\n https://otexts.com/fpp3/prediction-intervals.html\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#67c9f2b1 .cell execution_count=8}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(), lags=3)\nforecaster.fit(\n series, store_in_sample_residuals=True, suppress_warnings=True\n)\nboot_preds = forecaster.predict_bootstrapping(\n steps=2, n_boot=5, random_state=0, suppress_warnings=True\n)\nprint(boot_preds)\nassert boot_preds.shape == (4, 6) # 2 steps × 2 levels, 1 level col + 5 boot cols\nassert \"level\" in boot_preds.columns\nassert \"pred_boot_0\" in boot_preds.columns\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n level pred_boot_0 pred_boot_1 pred_boot_2 pred_boot_3 pred_boot_4\n60 A -0.499982 -1.203549 -1.203549 -0.006950 -0.006950\n60 B -0.618267 -0.724733 -0.724733 -0.724733 0.166372\n61 A 0.815271 0.002330 0.891047 0.519212 -2.334749\n61 B -0.713870 1.822347 1.822347 -0.274258 1.973761\n```\n:::\n:::\n\n\n##### predict_dist { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict_dist }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict_dist(\n steps,\n distribution,\n levels=None,\n last_window=None,\n exog=None,\n n_boot=250,\n use_in_sample_residuals=True,\n use_binned_residuals=True,\n random_state=123,\n suppress_warnings=False,\n)\n```\n\nFit a given probability distribution for each step.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------------------|------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|------------|\n| steps | [int](`int`) | Number of steps to predict. | _required_ |\n| distribution | [object](`object`) | A distribution object from scipy.stats. | _required_ |\n| levels | ([str](`str`), [list](`list`), None) | Time series to be predicted. Defaults to `None`. | `None` |\n| last_window | ([pd](`pandas`).[DataFrame](`pandas.DataFrame`), None) | Series values used to create predictors. Defaults to `None`. | `None` |\n| exog | ([pd](`pandas`).[Series](`pandas.Series`), [pd](`pandas`).[DataFrame](`pandas.DataFrame`), [dict](`dict`), None) | Exogenous variable/s included as predictor/s. Defaults to `None`. | `None` |\n| n_boot | [int](`int`) | Number of bootstrapping iterations. Defaults to `250`. | `250` |\n| use_in_sample_residuals | [bool](`bool`) | If `True`, use residuals from training data. Defaults to `True`. | `True` |\n| use_binned_residuals | [bool](`bool`) | If `True`, residuals are selected based on predicted values. Defaults to `True`. | `True` |\n| random_state | [int](`int`) | Seed for reproducibility. Defaults to `123`. | `123` |\n| suppress_warnings | [bool](`bool`) | If `True`, suppress warnings. Defaults to `False`. | `False` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|---------------------------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: Long-format DataFrame with parameters of the fitted distribution. |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#50e3cc3d .cell execution_count=9}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom scipy.stats import norm\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(), lags=3)\nforecaster.fit(\n series, store_in_sample_residuals=True, suppress_warnings=True\n)\ndist_preds = forecaster.predict_dist(\n steps=2,\n distribution=norm,\n n_boot=5,\n random_state=0,\n suppress_warnings=True,\n)\nprint(dist_preds)\nassert \"loc\" in dist_preds.columns\nassert \"scale\" in dist_preds.columns\nassert dist_preds.shape[0] == 4 # 2 steps × 2 levels\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n level loc scale\n60 A -0.584196 0.536789\n60 B -0.525219 0.348245\n61 A -0.021378 1.198080\n61 B 0.926065 1.169143\n```\n:::\n:::\n\n\n##### predict_interval { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict_interval }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict_interval(\n steps,\n levels=None,\n last_window=None,\n exog=None,\n method='conformal',\n interval=[5, 95],\n n_boot=250,\n use_in_sample_residuals=True,\n use_binned_residuals=True,\n random_state=123,\n suppress_warnings=False,\n)\n```\n\nPredict n steps ahead and estimate prediction intervals.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------------------|------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|---------------|\n| steps | [int](`int`) | Number of steps to predict. | _required_ |\n| levels | ([str](`str`), [list](`list`), None) | Time series to be predicted. Defaults to `None`. | `None` |\n| last_window | ([pd](`pandas`).[DataFrame](`pandas.DataFrame`), None) | Series values used to create predictors. Defaults to `None`. | `None` |\n| exog | ([pd](`pandas`).[Series](`pandas.Series`), [pd](`pandas`).[DataFrame](`pandas.DataFrame`), [dict](`dict`), None) | Exogenous variable/s included as predictor/s. Defaults to `None`. | `None` |\n| method | [str](`str`) | Technique used to estimate prediction intervals. Options: 'bootstrapping', 'conformal'. Defaults to `'conformal'`. | `'conformal'` |\n| interval | ([float](`float`), [list](`list`), [tuple](`tuple`)) | Confidence level of the prediction interval. Defaults to `[5, 95]`. | `[5, 95]` |\n| n_boot | [int](`int`) | Number of bootstrapping iterations. Defaults to `250`. | `250` |\n| use_in_sample_residuals | [bool](`bool`) | If `True`, use residuals from training data. Defaults to `True`. | `True` |\n| use_binned_residuals | [bool](`bool`) | If `True`, residuals are selected based on predicted values. Defaults to `True`. | `True` |\n| random_state | [int](`int`) | Seed for reproducibility. Defaults to `123`. | `123` |\n| suppress_warnings | [bool](`bool`) | If `True`, suppress warnings. Defaults to `False`. | `False` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|---------------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: Long-format DataFrame with predictions and intervals. |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#e2fc35af .cell execution_count=10}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(), lags=3)\nforecaster.fit(\n series, store_in_sample_residuals=True, suppress_warnings=True\n)\ninterval_preds = forecaster.predict_interval(\n steps=2, method=\"conformal\", suppress_warnings=True\n)\nprint(interval_preds)\nassert \"pred\" in interval_preds.columns\nassert \"lower_bound\" in interval_preds.columns\nassert \"upper_bound\" in interval_preds.columns\nassert interval_preds.shape[0] == 4 # 2 steps × 2 levels\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n level pred lower_bound upper_bound\n60 A -0.115562 -0.908236 0.677112\n60 B 0.202789 -0.808742 1.214321\n61 A 0.037961 -1.912966 1.988888\n61 B 0.158784 -1.994959 2.312527\n```\n:::\n:::\n\n\n##### predict_quantiles { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict_quantiles }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.predict_quantiles(\n steps,\n levels=None,\n last_window=None,\n exog=None,\n quantiles=[0.05, 0.5, 0.95],\n n_boot=250,\n use_in_sample_residuals=True,\n use_binned_residuals=True,\n random_state=123,\n suppress_warnings=False,\n)\n```\n\nCalculate the specified quantiles for each step.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------------------|------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|---------------------|\n| steps | [int](`int`) | Number of steps to predict. | _required_ |\n| levels | ([str](`str`), [list](`list`), None) | Time series to be predicted. Defaults to `None`. | `None` |\n| last_window | ([pd](`pandas`).[DataFrame](`pandas.DataFrame`), None) | Series values used to create predictors. Defaults to `None`. | `None` |\n| exog | ([pd](`pandas`).[Series](`pandas.Series`), [pd](`pandas`).[DataFrame](`pandas.DataFrame`), [dict](`dict`), None) | Exogenous variable/s included as predictor/s. Defaults to `None`. | `None` |\n| quantiles | ([list](`list`), [tuple](`tuple`)) | Sequence of quantiles to compute (0 to 1). Defaults to `[0.05, 0.5, 0.95]`. | `[0.05, 0.5, 0.95]` |\n| n_boot | [int](`int`) | Number of bootstrapping iterations. Defaults to `250`. | `250` |\n| use_in_sample_residuals | [bool](`bool`) | If `True`, use residuals from training data. Defaults to `True`. | `True` |\n| use_binned_residuals | [bool](`bool`) | If `True`, residuals are selected based on predicted values. Defaults to `True`. | `True` |\n| random_state | [int](`int`) | Seed for reproducibility. Defaults to `123`. | `123` |\n| suppress_warnings | [bool](`bool`) | If `True`, suppress warnings. Defaults to `False`. | `False` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|-------------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: Long-format DataFrame with the predicted quantiles. |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#335194ce .cell execution_count=11}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(), lags=3)\nforecaster.fit(\n series, store_in_sample_residuals=True, suppress_warnings=True\n)\nquantile_preds = forecaster.predict_quantiles(\n steps=2,\n quantiles=[0.1, 0.5, 0.9],\n n_boot=5,\n random_state=0,\n suppress_warnings=True,\n)\nprint(quantile_preds)\nassert \"q_0.1\" in quantile_preds.columns\nassert \"q_0.9\" in quantile_preds.columns\nassert quantile_preds.shape[0] == 4 # 2 steps × 2 levels\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n level q_0.1 q_0.5 q_0.9\n60 A -1.203549 -0.499982 -0.006950\n60 B -0.724733 -0.724733 -0.147483\n61 A -1.399917 0.519212 0.860737\n61 B -0.538025 1.822347 1.913195\n```\n:::\n:::\n\n\n##### set_fit_kwargs { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_fit_kwargs }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_fit_kwargs(\n fit_kwargs,\n)\n```\n\nSet new values for the additional keyword arguments passed to the `fit` method.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------|----------------|-------------------------------------------|------------|\n| fit_kwargs | [dict](`dict`) | Dict of the form {\"argument\": new_value}. | _required_ |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|--------|---------------|\n| | None | None |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#379ada9f .cell execution_count=12}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.ensemble import RandomForestRegressor\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nforecaster = ForecasterRecursiveMultiSeries(\n estimator=RandomForestRegressor(n_estimators=10, random_state=0),\n lags=3,\n)\nforecaster.set_fit_kwargs({})\nprint(\"fit_kwargs:\", forecaster.fit_kwargs)\nassert isinstance(forecaster.fit_kwargs, dict)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nfit_kwargs: {}\n```\n:::\n:::\n\n\n##### set_in_sample_residuals { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_in_sample_residuals }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_in_sample_residuals(\n series,\n exog=None,\n random_state=123,\n suppress_warnings=False,\n)\n```\n\nSet in-sample residuals in case they were not calculated during training.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------------|------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|------------|\n| series | ([pd](`pandas`).[DataFrame](`pandas.DataFrame`), [dict](`dict`)) | Training time series. | _required_ |\n| exog | ([pd](`pandas`).[Series](`pandas.Series`), [pd](`pandas`).[DataFrame](`pandas.DataFrame`), [dict](`dict`), None) | Exogenous variable/s included as predictor/s. Defaults to `None`. | `None` |\n| random_state | [int](`int`) | Seed for reproducibility. Defaults to `123`. | `123` |\n| suppress_warnings | [bool](`bool`) | If `True`, suppress warnings. Defaults to `False`. | `False` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|--------|---------------|\n| | None | None |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#253b0cbb .cell execution_count=13}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(), lags=3)\n# Fit without storing in-sample residuals, then compute them afterwards.\nforecaster.fit(series, store_in_sample_residuals=False, suppress_warnings=True)\nforecaster.set_in_sample_residuals(\n series, random_state=0, suppress_warnings=True\n)\nprint(\"in_sample_residuals_ keys:\", list(forecaster.in_sample_residuals_.keys()))\nassert \"A\" in forecaster.in_sample_residuals_\nassert \"B\" in forecaster.in_sample_residuals_\nassert \"_unknown_level\" in forecaster.in_sample_residuals_\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nin_sample_residuals_ keys: ['A', 'B', '_unknown_level']\n```\n:::\n:::\n\n\n##### set_lags { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_lags }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_lags(\n lags=None,\n)\n```\n\nSet new value to the attribute `lags`.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|--------------------------------------------------------------------------------------------------|----------------------------------------------|-----------|\n| lags | ([int](`int`), [list](`list`), [np](`numpy`).[ndarray](`numpy.ndarray`), [range](`range`), None) | Lags used as predictors. Defaults to `None`. | `None` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|--------|---------------|\n| | None | None |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#0fb072ee .cell execution_count=14}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(), lags=3)\nprint(\"lags before:\", forecaster.lags)\nforecaster.set_lags([1, 2, 5])\nprint(\"lags after:\", forecaster.lags)\nassert list(forecaster.lags) == [1, 2, 5]\nassert forecaster.max_lag == 5\nassert forecaster.window_size == 5\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nlags before: [1 2 3]\nlags after: [1 2 5]\n```\n:::\n:::\n\n\n##### set_out_sample_residuals { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_out_sample_residuals }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_out_sample_residuals(\n y_true,\n y_pred,\n append=False,\n random_state=123,\n)\n```\n\nSet new values to the attribute `out_sample_residuals_`.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------------|----------------|-------------------------------------------------------------------------------|------------|\n| y_true | [dict](`dict`) | Dictionary with the true values of the time series. | _required_ |\n| y_pred | [dict](`dict`) | Dictionary with the predicted values of the time series. | _required_ |\n| append | [bool](`bool`) | If `True`, new residuals are added to the existing ones. Defaults to `False`. | `False` |\n| random_state | [int](`int`) | Seed for reproducibility. Defaults to `123`. | `123` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|--------|---------------|\n| | None | None |\n\n###### Notes {.doc-section .doc-section-notes}\n\nOut-of-sample residuals can only be stored for series seen during fit.\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#dc52f498 .cell execution_count=15}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nrng = np.random.default_rng(0)\nseries = pd.DataFrame(\n {\"A\": rng.standard_normal(60), \"B\": rng.standard_normal(60)},\n index=pd.RangeIndex(60),\n)\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(), lags=3)\nforecaster.fit(series, store_in_sample_residuals=True, suppress_warnings=True)\n\n# Provide held-out true vs. predicted values for both series.\ny_true = {\"A\": rng.standard_normal(15), \"B\": rng.standard_normal(15)}\ny_pred = {\"A\": rng.standard_normal(15), \"B\": rng.standard_normal(15)}\nforecaster.set_out_sample_residuals(\n y_true=y_true, y_pred=y_pred, random_state=0\n)\nprint(\"out_sample_residuals_ keys:\", list(forecaster.out_sample_residuals_.keys()))\nassert \"A\" in forecaster.out_sample_residuals_\nassert \"_unknown_level\" in forecaster.out_sample_residuals_\n```\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
╭─────────────────────────────── ResidualsUsageWarning ────────────────────────────────╮\n The following bins of level 'A' have no out of sample residuals: [1, 3, 4, 6, 7, 8]. \n No predicted values fall in the interval [(-0.062349485011965466,                    \n -0.0238421177823508), (0.014822197019194024, 0.034236511827045776),                  \n (0.034236511827045776, 0.06902995015200565), (0.09203715525675162,                   \n 0.1258136241920062), (0.1258136241920062, 0.14607325565711665),                      \n (0.14607325565711665, 0.20931967159738368)]. Empty bins will be filled with a random \n sample of residuals.                                                                 \n                                                                                      \n Category : spotforecast2.exceptions.ResidualsUsageWarning                            \n Location :                                                                           \n /Users/bartz/workspace/spotforecast2-safe/src/spotforecast2_safe/forecaster/recursiv \n e/_forecaster_recursive_multiseries.py:3833                                          \n Suppress : warnings.simplefilter('ignore', category=ResidualsUsageWarning)           \n╰──────────────────────────────────────────────────────────────────────────────────────╯\n
\n```\n:::\n\n::: {.cell-output .cell-output-display}\n```{=html}\n
╭─────────────────────────────── ResidualsUsageWarning ────────────────────────────────╮\n The following bins of level 'B' have no out of sample residuals: [2, 5, 6, 7, 8]. No \n predicted values fall in the interval [(-0.010755494695769004, 0.03724690772306014), \n (0.10234442477429205, 0.1241077531887907), (0.1241077531887907, 0.1555232561633999), \n (0.1555232561633999, 0.19067649479196455), (0.19067649479196455,                     \n 0.2232361691047662)]. Empty bins will be filled with a random sample of residuals.   \n                                                                                      \n Category : spotforecast2.exceptions.ResidualsUsageWarning                            \n Location :                                                                           \n /Users/bartz/workspace/spotforecast2-safe/src/spotforecast2_safe/forecaster/recursiv \n e/_forecaster_recursive_multiseries.py:3833                                          \n Suppress : warnings.simplefilter('ignore', category=ResidualsUsageWarning)           \n╰──────────────────────────────────────────────────────────────────────────────────────╯\n
\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nout_sample_residuals_ keys: ['A', 'B', '_unknown_level']\n```\n:::\n:::\n\n\n##### set_params { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_params }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_params(\n params,\n)\n```\n\nSet new values to the parameters of the scikit-learn model.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|----------------|-------------------|------------|\n| params | [dict](`dict`) | Parameter values. | _required_ |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|--------|---------------|\n| | None | None |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#22761fc1 .cell execution_count=16}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\n\nwarnings.simplefilter(\"ignore\")\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(alpha=1.0), lags=3)\nprint(\"alpha before:\", forecaster.estimator.get_params()[\"alpha\"])\nforecaster.set_params({\"alpha\": 0.5})\nprint(\"alpha after:\", forecaster.estimator.get_params()[\"alpha\"])\nassert forecaster.estimator.get_params()[\"alpha\"] == 0.5\nassert not forecaster.is_fitted\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nalpha before: 1.0\nalpha after: 0.5\n```\n:::\n:::\n\n\n##### set_window_features { #spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_window_features }\n\n```python\nforecaster.recursive._forecaster_recursive_multiseries.ForecasterRecursiveMultiSeries.set_window_features(\n window_features=None,\n)\n```\n\nSet new value to the attribute `window_features`.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------------|--------------------------------------------|-----------------------------------------------------------------------------------|-----------|\n| window_features | ([object](`object`), [list](`list`), None) | Instance or list of instances used to create window features. Defaults to `None`. | `None` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|--------|---------------|\n| | None | None |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#6bb6196d .cell execution_count=17}\n``` {.python .cell-code}\nimport warnings\nimport numpy as np\nimport pandas as pd\nfrom sklearn.linear_model import Ridge\n\nfrom spotforecast2_safe.forecaster.recursive._forecaster_recursive_multiseries import (\n ForecasterRecursiveMultiSeries,\n)\nfrom spotforecast2_safe.preprocessing import RollingFeatures\n\nwarnings.simplefilter(\"ignore\")\nforecaster = ForecasterRecursiveMultiSeries(estimator=Ridge(), lags=3)\nprint(\"window_features before:\", forecaster.window_features)\nforecaster.set_window_features(RollingFeatures(stats=\"mean\", window_sizes=4))\nprint(\"window_features after:\", forecaster.window_features_names)\nassert forecaster.window_features is not None\nassert \"roll_mean_4\" in forecaster.window_features_names\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nwindow_features before: None\nwindow_features after: ['roll_mean_4']\n```\n:::\n:::\n\n\n", + "supporting": [ + "forecaster.recursive._forecaster_recursive_multiseries_files" + ], + "filters": [], + "includes": { + "include-in-header": [ + "\n\n\n" + ] + } + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/manager.demo_metrics.calculate_metrics/execute-results/html.json b/_freeze/docs/reference/manager.demo_metrics.calculate_metrics/execute-results/html.json new file mode 100644 index 00000000..ec6e2031 --- /dev/null +++ b/_freeze/docs/reference/manager.demo_metrics.calculate_metrics/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "275de27a383338eb229b788a42e8ac50", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: manager.demo_metrics.calculate_metrics\n---\n\n\n\n```python\nmanager.demo_metrics.calculate_metrics(actual, predicted)\n```\n\nCalculate MAE and MSE for numeric evaluation.\n\nComputes Mean Absolute Error (MAE) and Mean Squared Error (MSE) between\nactual and predicted values. These metrics are essential for evaluating\nforecasting model performance in safety-critical applications.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------|------------------------------------------|---------------------------------------------------------------|------------|\n| actual | [pd](`pandas`).[Series](`pandas.Series`) | Series of actual observed values. | _required_ |\n| predicted | [pd](`pandas`).[Series](`pandas.Series`) | Series of predicted values (must have same length as actual). | _required_ |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------------------------------------------|---------------------------------------------------------------------------------------------------|\n| | [Dict](`typing.Dict`)\\[[str](`str`), [float](`float`)\\] | Dict[str, float]: Dictionary containing: - 'MAE': Mean Absolute Error - 'MSE': Mean Squared Error |\n\n## Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|----------------------------|---------------------------------------------------------|\n| | [ValueError](`ValueError`) | If series have different lengths or contain NaN values. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#51aa8de6 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.manager.demo_metrics import calculate_metrics\n\n# Perfect predictions: both MAE and MSE should be zero\nactual = pd.Series([1.0, 2.0, 3.0, 4.0, 5.0])\npredicted = pd.Series([1.0, 2.0, 3.0, 4.0, 5.0])\nmetrics = calculate_metrics(actual, predicted)\nprint(f\"MAE: {metrics['MAE']:.4f}\")\nprint(f\"MSE: {metrics['MSE']:.4f}\")\nassert metrics[\"MAE\"] == 0.0\nassert metrics[\"MSE\"] == 0.0\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nMAE: 0.0000\nMSE: 0.0000\n```\n:::\n:::\n\n\n::: {#9a1b7136 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.manager.demo_metrics import calculate_metrics\n\n# Small symmetric errors: MAE == MSE == 1.0\nactual = pd.Series([10.0, 20.0, 30.0, 40.0])\npredicted = pd.Series([11.0, 19.0, 31.0, 39.0])\nmetrics = calculate_metrics(actual, predicted)\nprint(f\"MAE: {metrics['MAE']:.4f}\")\nprint(f\"MSE: {metrics['MSE']:.4f}\")\nassert metrics[\"MAE\"] == 1.0\nassert metrics[\"MSE\"] == 1.0\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nMAE: 1.0000\nMSE: 1.0000\n```\n:::\n:::\n\n\n::: {#8439531d .cell execution_count=3}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.manager.demo_metrics import calculate_metrics\n\n# Larger asymmetric errors\nactual = pd.Series([100.0, 200.0, 300.0])\npredicted = pd.Series([95.0, 210.0, 290.0])\nmetrics = calculate_metrics(actual, predicted)\nprint(f\"MAE: {metrics['MAE']:.4f}\")\nprint(f\"MSE: {metrics['MSE']:.4f}\")\nassert abs(metrics[\"MAE\"] - 25 / 3) < 1e-9\nassert abs(metrics[\"MSE\"] - 75.0) < 1e-9\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nMAE: 8.3333\nMSE: 75.0000\n```\n:::\n:::\n\n\n::: {#4271bd79 .cell execution_count=4}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.manager.demo_metrics import calculate_metrics\n\n# Safety-critical: validate metrics stay within acceptable bounds\nactual = pd.Series([50.0, 55.0, 60.0, 65.0, 70.0])\npredicted = pd.Series([51.0, 54.0, 61.0, 64.0, 71.0])\nmetrics = calculate_metrics(actual, predicted)\nassert metrics[\"MAE\"] < 2.0, \"MAE exceeds safety threshold\"\nassert metrics[\"MSE\"] < 5.0, \"MSE exceeds safety threshold\"\nprint(\"Safety validation passed\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nSafety validation passed\n```\n:::\n:::\n\n\n::: {#cd27106b .cell execution_count=5}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.manager.demo_metrics import calculate_metrics\n\n# Time series with a datetime index\ndates = pd.date_range(\"2024-01-01\", periods=5, freq=\"D\")\nactual = pd.Series([10.5, 11.2, 10.8, 11.5, 12.0], index=dates)\npredicted = pd.Series([10.3, 11.4, 10.9, 11.3, 12.1], index=dates)\nmetrics = calculate_metrics(actual, predicted)\nprint(f\"MAE: {metrics['MAE']:.4f}\")\nprint(f\"MSE: {metrics['MSE']:.4f}\")\nassert abs(metrics[\"MAE\"] - 0.16) < 1e-9\nassert abs(metrics[\"MSE\"] - 0.028) < 1e-9\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nMAE: 0.1600\nMSE: 0.0280\n```\n:::\n:::\n\n\n::: {#b5316029 .cell execution_count=6}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.manager.demo_metrics import calculate_metrics\n\n# Compare two models: lower MAE wins\nactual = pd.Series([1.0, 2.0, 3.0, 4.0, 5.0])\npred_model_a = pd.Series([1.1, 2.1, 2.9, 4.2, 4.8])\npred_model_b = pd.Series([1.5, 2.5, 3.5, 4.5, 5.5])\nmetrics_a = calculate_metrics(actual, pred_model_a)\nmetrics_b = calculate_metrics(actual, pred_model_b)\nwinner = \"A\" if metrics_a[\"MAE\"] < metrics_b[\"MAE\"] else \"B\"\nprint(f\"Model A MAE: {metrics_a['MAE']:.4f}\")\nprint(f\"Model B MAE: {metrics_b['MAE']:.4f}\")\nprint(f\"Model {winner} has better MAE\")\nassert winner == \"A\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nModel A MAE: 0.1400\nModel B MAE: 0.5000\nModel A has better MAE\n```\n:::\n:::\n\n\n", + "supporting": [ + "manager.demo_metrics.calculate_metrics_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/manager.predictor.get_model_prediction/execute-results/html.json b/_freeze/docs/reference/manager.predictor.get_model_prediction/execute-results/html.json new file mode 100644 index 00000000..8a5f3610 --- /dev/null +++ b/_freeze/docs/reference/manager.predictor.get_model_prediction/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "58c242eba1427e390e9e505d5a3555db", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: manager.predictor.get_model_prediction\n---\n\n\n\n```python\nmanager.predictor.get_model_prediction(\n model_name,\n model_dir=None,\n predict_size=None,\n)\n```\n\nGet the prediction package from the latest trained model.\n\nThis function retrieves the latest iteration of a specified model from the\ncache and calls its `package_prediction` method to obtain a comprehensive\nset of predictions and metrics.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------------|--------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|------------|\n| model_name | [str](`str`) | Name of the model to use (e.g., 'lgbm', 'xgb'). | _required_ |\n| model_dir | [Optional](`typing.Optional`)\\[[Union](`typing.Union`)\\[[str](`str`), [Path](`pathlib.Path`)\\]\\] | Directory where models are stored. If None, defaults to the library's cache home. | `None` |\n| predict_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Optional override for the prediction horizon. | `None` |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------|-------------------------------------------------------------|\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | A dictionary containing predictions and metrics produced by |\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | `package_prediction()`. |\n\n## Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|\n| | [FileNotFoundError](`FileNotFoundError`) | If no trained model is found for `model_name` in `model_dir`. |\n| | [AttributeError](`AttributeError`) | If the loaded model does not implement a `package_prediction` method. |\n| | [PredictionPackageError](`PredictionPackageError`) | If `package_prediction()` itself fails — see `forecaster.wrappers.model.ForecasterRecursiveModel.package_prediction`. |\n| | [OSError](`OSError`) | If the on-disk model file exists but cannot be deserialised (corrupt joblib). |\n\n## Notes {.doc-section .doc-section-notes}\n\n`predict_size` is accepted by `get_model_prediction()` but only has effect if the concrete model's `package_prediction()` accepts it.\nThe original `ForecasterRecursiveModel.package_prediction()` does not — so this parameter is currently forward-looking API design, not yet wired end-to-end.\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#bffee86b .cell execution_count=1}\n``` {.python .cell-code}\nimport tempfile\nfrom spotforecast2_safe.manager.predictor import get_model_prediction\n\n# When no model has been trained, get_model_prediction raises FileNotFoundError.\nwith tempfile.TemporaryDirectory() as tmpdir:\n try:\n get_model_prediction(\"lgbm\", model_dir=tmpdir)\n except FileNotFoundError as e:\n print(type(e).__name__)\n assert \"lgbm\" in str(e)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nFileNotFoundError\n```\n:::\n:::\n\n\n", + "supporting": [ + "manager.predictor.get_model_prediction_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/manager.trainer.get_last_model/execute-results/html.json b/_freeze/docs/reference/manager.trainer.get_last_model/execute-results/html.json new file mode 100644 index 00000000..1d5f654b --- /dev/null +++ b/_freeze/docs/reference/manager.trainer.get_last_model/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "97674d67521cfda53f1804f0ce987a1c", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: manager.trainer.get_last_model\n---\n\n\n\n```python\nmanager.trainer.get_last_model(model_name, model_dir=None)\n```\n\nGet the latest trained model from the cache.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------|--------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|------------|\n| model_name | [str](`str`) | Name of the model (e.g., 'lgbm', 'xgb'). | _required_ |\n| model_dir | [Optional](`typing.Optional`)\\[[Union](`typing.Union`)\\[[str](`str`), [Path](`pathlib.Path`)\\]\\] | Directory where models are stored. If None, defaults to the library's cache home. | `None` |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|-------------------------------------------------------|-------------------------------------------------------------------|\n| | [int](`int`) | A tuple (iteration, model_instance). If no model is found on disk |\n| | [Any](`typing.Any`) | (cache directory missing, no matching files, or no parseable |\n| | [tuple](`tuple`)\\[[int](`int`), [Any](`typing.Any`)\\] | iteration numbers), returns (-1, None) — a legitimate |\n| | [tuple](`tuple`)\\[[int](`int`), [Any](`typing.Any`)\\] | \"fresh install, no model yet\" state. |\n\n## Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|----------------------|--------------------------------------------------------------------------------------------------------------|\n| | [OSError](`OSError`) | If the latest model file exists on disk but cannot be deserialised (corrupt joblib, version mismatch, etc.). |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#cf7a6f88 .cell execution_count=1}\n``` {.python .cell-code}\nimport tempfile\nfrom spotforecast2_safe.manager.trainer import get_last_model\n\n# Empty directory — no model files yet.\nwith tempfile.TemporaryDirectory() as tmpdir:\n iteration, model = get_last_model(\"lgbm\", model_dir=tmpdir)\n print(iteration, model)\n assert iteration == -1\n assert model is None\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n-1 None\n```\n:::\n:::\n\n\n", + "supporting": [ + "manager.trainer.get_last_model_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/manager.trainer.get_path_model/execute-results/html.json b/_freeze/docs/reference/manager.trainer.get_path_model/execute-results/html.json new file mode 100644 index 00000000..3f19d501 --- /dev/null +++ b/_freeze/docs/reference/manager.trainer.get_path_model/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "71b72677e4f2cc9c44b60c46bb2350c2", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: manager.trainer.get_path_model\n---\n\n\n\n```python\nmanager.trainer.get_path_model(name, iteration, model_dir=None)\n```\n\nYield the path to a model file for a given iteration and model name.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------|--------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|------------|\n| name | [str](`str`) | Model name (e.g. ``\"lgbm\"``, ``\"xgb\"``). | _required_ |\n| iteration | [int](`int`) | Iteration of the model. | _required_ |\n| model_dir | [Optional](`typing.Optional`)\\[[Union](`typing.Union`)\\[[str](`str`), [Path](`pathlib.Path`)\\]\\] | Directory where models are stored. If *None*, defaults to `get_cache_home()`. | `None` |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------|--------------------------------------------------|\n| Path | [Path](`pathlib.Path`) | Full path where the model file should be stored. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#dc5148a6 .cell execution_count=1}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.manager.trainer import get_path_model\n\nwith tempfile.TemporaryDirectory() as tmpdir:\n p = get_path_model(\"lgbm\", 3, model_dir=tmpdir)\n print(p.name)\n assert p.name == \"lgbm_forecaster_3.joblib\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nlgbm_forecaster_3.joblib\n```\n:::\n:::\n\n\n", + "supporting": [ + "manager.trainer.get_path_model_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/manager.trainer.load_iteration/execute-results/html.json b/_freeze/docs/reference/manager.trainer.load_iteration/execute-results/html.json new file mode 100644 index 00000000..09750d65 --- /dev/null +++ b/_freeze/docs/reference/manager.trainer.load_iteration/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "065499f6b41024b08be56dbce57779b4", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: manager.trainer.load_iteration\n---\n\n\n\n```python\nmanager.trainer.load_iteration(name, iteration, model_dir=None)\n```\n\nLoad a saved model at a given iteration.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------|--------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|------------|\n| name | [str](`str`) | Model name (e.g. ``\"lgbm\"``). | _required_ |\n| iteration | [int](`int`) | Iteration of the model. | _required_ |\n| model_dir | [Optional](`typing.Optional`)\\[[Union](`typing.Union`)\\[[str](`str`), [Path](`pathlib.Path`)\\]\\] | Directory where models are stored. If *None*, defaults to `get_cache_home()`. | `None` |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------|------------------------------------------------------------------|\n| | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | The loaded model instance, or *None* if the file does not exist. |\n\n## Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| | [OSError](`OSError`) | If the model file exists on disk but cannot be deserialised (corrupt joblib, version mismatch, etc.). A missing file returns `None` instead — that's a legitimate \"no model yet\" state. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#7151a213 .cell execution_count=1}\n``` {.python .cell-code}\nimport tempfile\nfrom spotforecast2_safe.manager.trainer import load_iteration\n\nwith tempfile.TemporaryDirectory() as tmpdir:\n result = load_iteration(\"lgbm\", 99, model_dir=tmpdir)\n print(result)\n assert result is None\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nIteration 99 does not exist at /var/folders/dw/pvtj6mt91znd0hftcztqb0k00000gn/T/tmp8kneavsb/lgbm_forecaster_99.joblib!\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nNone\n```\n:::\n:::\n\n\n", + "supporting": [ + "manager.trainer.load_iteration_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/multitask.base.agg_predictor/execute-results/html.json b/_freeze/docs/reference/multitask.base.agg_predictor/execute-results/html.json new file mode 100644 index 00000000..418002e2 --- /dev/null +++ b/_freeze/docs/reference/multitask.base.agg_predictor/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "f9b29fb18af67df099e942e8dbd95ad6", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: multitask.base.agg_predictor\n---\n\n\n\n```python\nmultitask.base.agg_predictor(results, targets, weights)\n```\n\nAggregate per-target prediction packages into a weighted forecast.\n\nCombines future predictions, training predictions, and training actuals\nfrom per-target prediction packages into an aggregated package compatible\nwith downstream consumers. This is a module-level convenience function;\nthe same logic is available as ``BaseTask.agg_predictor``.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|---------|---------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|------------|\n| results | [Dict](`typing.Dict`)\\[[str](`str`), [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | Mapping of target name to prediction package (as returned by ``build_prediction_package``). | _required_ |\n| targets | [List](`typing.List`)\\[[str](`str`)\\] | Ordered list of target names to aggregate. | _required_ |\n| weights | [List](`typing.List`)\\[[float](`float`)\\] | Per-target aggregation weights aligned with ``targets``. | _required_ |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------|--------------------------------------------------------------------|\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | Aggregated prediction package with keys ``train_actual``, |\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | ``train_pred``, ``future_pred``, ``future_actual``, |\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | ``metrics_train``, ``metrics_future``, ``metrics_future_one_day``, |\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | ``validation_passed``, and (when present in all sources) |\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | ``test_actual``. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#3aa1221f .cell execution_count=1}\n``` {.python .cell-code}\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask.base import agg_predictor\n\nrng = np.random.default_rng(0)\nidx_train = pd.date_range(\"2023-01-01\", periods=100, freq=\"h\", tz=\"UTC\")\nidx_future = pd.date_range(\"2023-01-05 04:00\", periods=6, freq=\"h\", tz=\"UTC\")\n\ndef _pkg(train_val, future_val):\n return {\n \"train_actual\": pd.Series(np.full(100, train_val), index=idx_train),\n \"train_pred\": pd.Series(np.full(100, train_val * 0.99), index=idx_train),\n \"future_pred\": pd.Series(np.full(6, future_val), index=idx_future),\n \"future_actual\": pd.Series(dtype=\"float64\"),\n }\n\nresults = {\"wind\": _pkg(100.0, 110.0), \"solar\": _pkg(200.0, 210.0)}\nagg = agg_predictor(results, targets=[\"wind\", \"solar\"], weights=[0.5, 0.5])\nprint(f\"future_pred (weighted mean): {agg['future_pred'].iloc[0]:.1f}\")\nassert set(agg.keys()) >= {\"train_actual\", \"train_pred\", \"future_pred\", \"validation_passed\"}\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nfuture_pred (weighted mean): 160.0\n```\n:::\n:::\n\n\n", + "supporting": [ + "multitask.base.agg_predictor_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/multitask.factories.default_lgbm_forecaster_factory/execute-results/html.json b/_freeze/docs/reference/multitask.factories.default_lgbm_forecaster_factory/execute-results/html.json new file mode 100644 index 00000000..b3b8720a --- /dev/null +++ b/_freeze/docs/reference/multitask.factories.default_lgbm_forecaster_factory/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "caf17ac043d5315131ffdce0c712f2f6", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: multitask.factories.default_lgbm_forecaster_factory\n---\n\n\n\n```python\nmultitask.factories.default_lgbm_forecaster_factory(\n config,\n *,\n weight_func=None,\n target=None,\n)\n```\n\nReturn a fresh, unfitted LightGBM ``ForecasterRecursive``.\n\nMirrors the construction previously inlined in\n``BaseTask.create_forecaster``. ``target`` is accepted (and ignored by\nthis default) so that custom factories can specialise per target without\na signature change.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------|------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| config | [Any](`typing.Any`) | Any object satisfying the ``PipelineConfig`` protocol from ``spotforecast2_safe.multitask.base``. Reads ``random_state``, ``lags_consider``, and ``window_size``. | _required_ |\n| weight_func | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Optional per-sample weight function produced by the imputation step (``apply_imputation``). | `None` |\n| target | [Optional](`typing.Optional`)\\[[str](`str`)\\] | Target column name. Ignored by this default factory; provided for the benefit of custom factories that need it. | `None` |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|--------------------------------------------------------------------------------------|------------------------------------------------|\n| | [ForecasterRecursive](`spotforecast2_safe.forecaster.recursive.ForecasterRecursive`) | A new ``ForecasterRecursive`` ready to be fit. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#3f15170d .cell execution_count=1}\n``` {.python .cell-code}\nimport types\nfrom spotforecast2_safe.multitask.factories import default_lgbm_forecaster_factory\nfrom spotforecast2_safe.forecaster.recursive import ForecasterRecursive\n\n# Build a minimal config-like object that satisfies the PipelineConfig\n# protocol (random_state, lags_consider, window_size).\nconfig = types.SimpleNamespace(\n random_state=42,\n lags_consider=[1, 2, 3],\n window_size=3,\n)\n\nforecaster = default_lgbm_forecaster_factory(config, target=\"power\")\nassert isinstance(forecaster, ForecasterRecursive)\nassert list(forecaster.lags) == [1, 2, 3]\nprint(f\"type: {type(forecaster).__name__}\")\nprint(f\"lags: {list(forecaster.lags)}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ntype: ForecasterRecursive\nlags: [np.int64(1), np.int64(2), np.int64(3)]\n```\n:::\n:::\n\n\n", + "supporting": [ + "multitask.factories.default_lgbm_forecaster_factory_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/multitask.strategies.DefaultsStrategy/execute-results/html.json b/_freeze/docs/reference/multitask.strategies.DefaultsStrategy/execute-results/html.json new file mode 100644 index 00000000..937b6c12 --- /dev/null +++ b/_freeze/docs/reference/multitask.strategies.DefaultsStrategy/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "889fbc7d8273d90ef2394674d98143b3", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: multitask.strategies.DefaultsStrategy\n---\n\n\n\n```python\nmultitask.strategies.DefaultsStrategy()\n```\n\nApproach 2 — Train with defaults, no tuning, no cached params.\n\nThe simplest possible training strategy: leave the forecaster at the\nparameters produced by the factory and hand it back to\n``_train_and_predict_target`` for the explicit fit. Use this when the\ncaller wants a deterministic baseline that does not benefit from any\ncached tuning results — useful for ENTSO-E \"Approach 2: Training without\nTuning\" and for regression benchmarking.\n\nFunctionally equivalent to ``LazyStrategy(use_tuned_params=False)``;\nkept as a distinct class so the ``task=\"defaults\"`` routing reads\nintent at the call site (no implicit cache lookup).\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#7cb9b5af .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom sklearn.linear_model import LinearRegression\nfrom spotforecast2_safe.forecaster.recursive import ForecasterRecursive\nfrom spotforecast2_safe.multitask.strategies import DefaultsStrategy\n\nstrategy = DefaultsStrategy()\nassert strategy.name == \"defaults\"\nprint(f\"strategy.name={strategy.name!r}\")\n\nforecaster = ForecasterRecursive(estimator=LinearRegression(), lags=5)\ny_train = pd.Series(range(30), dtype=float, name=\"target_0\")\n\nclass _NullTask:\n pass\n\nresult = strategy.prepare_forecaster(_NullTask(), \"target_0\", forecaster, y_train)\nassert result is forecaster\nassert list(result.lags) == [1, 2, 3, 4, 5], f\"Unexpected lags: {list(result.lags)}\"\nprint(f\"Forecaster returned unchanged: lags={list(result.lags)}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nstrategy.name='defaults'\nForecaster returned unchanged: lags=[np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5)]\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [prepare_forecaster](#spotforecast2_safe.multitask.strategies.DefaultsStrategy.prepare_forecaster) | Return the forecaster unchanged — no tuning, no cached params. |\n\n### prepare_forecaster { #spotforecast2_safe.multitask.strategies.DefaultsStrategy.prepare_forecaster }\n\n```python\nmultitask.strategies.DefaultsStrategy.prepare_forecaster(\n task,\n target,\n forecaster,\n y_train,\n exog_train=None,\n)\n```\n\nReturn the forecaster unchanged — no tuning, no cached params.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------|---------------------------------------------------------------------------------|--------------------------------------------------|------------|\n| task | [Any](`typing.Any`) | Ignored; accepted for protocol compatibility. | _required_ |\n| target | [str](`str`) | Ignored; accepted for protocol compatibility. | _required_ |\n| forecaster | [Any](`typing.Any`) | The unfitted forecaster returned by the factory. | _required_ |\n| y_train | [pd](`pandas`).[Series](`pandas.Series`) | Ignored; accepted for protocol compatibility. | _required_ |\n| exog_train | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Ignored; accepted for protocol compatibility. | `None` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------|-----------------------------------------|\n| | [Any](`typing.Any`) | The same forecaster object, unmodified. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#dbc1b9ee .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nfrom sklearn.linear_model import LinearRegression\nfrom spotforecast2_safe.forecaster.recursive import ForecasterRecursive\nfrom spotforecast2_safe.multitask.strategies import DefaultsStrategy\n\nforecaster = ForecasterRecursive(estimator=LinearRegression(), lags=3)\ny_train = pd.Series(range(30), dtype=float, name=\"target_0\")\n\nclass _NullTask:\n pass\n\nstrategy = DefaultsStrategy()\nresult = strategy.prepare_forecaster(_NullTask(), \"target_0\", forecaster, y_train)\nassert result is forecaster\nprint(f\"Returned same forecaster object: {result is forecaster}\")\nprint(f\"Lags unchanged: {list(result.lags)}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nReturned same forecaster object: True\nLags unchanged: [np.int64(1), np.int64(2), np.int64(3)]\n```\n:::\n:::\n\n\n", + "supporting": [ + "multitask.strategies.DefaultsStrategy_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/multitask.strategies.LazyStrategy/execute-results/html.json b/_freeze/docs/reference/multitask.strategies.LazyStrategy/execute-results/html.json new file mode 100644 index 00000000..de95d598 --- /dev/null +++ b/_freeze/docs/reference/multitask.strategies.LazyStrategy/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "075042b187afa2abee72688e287cc0a1", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: multitask.strategies.LazyStrategy\n---\n\n\n\n```python\nmultitask.strategies.LazyStrategy(use_tuned_params=True, max_age_days=None)\n```\n\nApproach 1 — Lazy fitting with optional cached tuning.\n\nMirrors the body of ``execute_lazy`` between ``create_forecaster()`` and\n``_train_and_predict_target()``.\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#7adbbd25 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom sklearn.linear_model import LinearRegression\nfrom spotforecast2_safe.forecaster.recursive import ForecasterRecursive\nfrom spotforecast2_safe.multitask.strategies import LazyStrategy\n\n# Construct with default settings (cache lookup enabled).\nstrategy = LazyStrategy(use_tuned_params=True, max_age_days=30.0)\nassert strategy.name == \"lazy\"\nassert strategy.use_tuned_params is True\nassert strategy.max_age_days == 30.0\nprint(f\"strategy.name={strategy.name!r}, use_tuned_params={strategy.use_tuned_params}\")\n\n# When no cached results exist, the forecaster is returned unchanged.\nclass _NullTask:\n class logger:\n @staticmethod\n def info(*a, **kw):\n pass\n def load_tuning_results(self, target, max_age_days=None):\n return None\n\nforecaster = ForecasterRecursive(estimator=LinearRegression(), lags=3)\ny_train = pd.Series(range(30), dtype=float, name=\"target_0\")\nresult = strategy.prepare_forecaster(_NullTask(), \"target_0\", forecaster, y_train)\nassert result is forecaster\nprint(f\"Forecaster returned unchanged when cache miss: {result is forecaster}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nstrategy.name='lazy', use_tuned_params=True\nForecaster returned unchanged when cache miss: True\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [prepare_forecaster](#spotforecast2_safe.multitask.strategies.LazyStrategy.prepare_forecaster) | Optionally apply cached tuning results and return the forecaster. |\n\n### prepare_forecaster { #spotforecast2_safe.multitask.strategies.LazyStrategy.prepare_forecaster }\n\n```python\nmultitask.strategies.LazyStrategy.prepare_forecaster(\n task,\n target,\n forecaster,\n y_train,\n exog_train=None,\n)\n```\n\nOptionally apply cached tuning results and return the forecaster.\n\nWhen ``use_tuned_params`` is ``False``, or when no cached results are\nfound for ``target``, the forecaster is returned without modification.\nOtherwise, ``best_params`` are applied via ``forecaster.set_params``\nand ``best_lags`` via ``forecaster.set_lags`` (when available).\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------|------------|\n| task | [Any](`typing.Any`) | A ``BaseTask`` instance that exposes ``load_tuning_results`` and a ``logger``. | _required_ |\n| target | [str](`str`) | The target name used as key when loading cached results. | _required_ |\n| forecaster | [Any](`typing.Any`) | The unfitted forecaster returned by the factory. | _required_ |\n| y_train | [pd](`pandas`).[Series](`pandas.Series`) | Training series (unused by this strategy; kept for protocol compatibility). | _required_ |\n| exog_train | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Exogenous training frame (unused by this strategy). | `None` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------|-------------------------------------------------------------------|\n| | [Any](`typing.Any`) | The same forecaster object, with parameters updated in-place when |\n| | [Any](`typing.Any`) | cached tuning results were found and applied. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#dbe076e4 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nfrom sklearn.linear_model import LinearRegression\nfrom spotforecast2_safe.forecaster.recursive import ForecasterRecursive\nfrom spotforecast2_safe.multitask.strategies import LazyStrategy\n\nforecaster = ForecasterRecursive(estimator=LinearRegression(), lags=4)\ny_train = pd.Series(range(30), dtype=float, name=\"target_0\")\n\n# --- Path 1: use_tuned_params=False — forecaster returned as-is ---\nstrategy_no_cache = LazyStrategy(use_tuned_params=False)\n\nclass _NullTask:\n class logger:\n @staticmethod\n def info(*a, **kw):\n pass\n def load_tuning_results(self, target, max_age_days=None):\n return None\n\nresult = strategy_no_cache.prepare_forecaster(\n _NullTask(), \"target_0\", forecaster, y_train\n)\nassert result is forecaster\nprint(f\"Path 1 (no cache): lags unchanged = {list(result.lags)}\")\n\n# --- Path 2: cached results present — lags and params applied ---\nclass _CachedTask:\n class logger:\n @staticmethod\n def info(*a, **kw):\n pass\n def load_tuning_results(self, target, max_age_days=None):\n return {\n \"task_name\": \"lazy\",\n \"timestamp\": \"2026-01-01T00:00:00\",\n \"best_params\": {},\n \"best_lags\": [1, 2, 3],\n }\n\nstrategy_with_cache = LazyStrategy(use_tuned_params=True)\nresult2 = strategy_with_cache.prepare_forecaster(\n _CachedTask(), \"target_0\", forecaster, y_train\n)\nassert list(result2.lags) == [1, 2, 3]\nprint(f\"Path 2 (cache hit): lags updated to {list(result2.lags)}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nPath 1 (no cache): lags unchanged = [np.int64(1), np.int64(2), np.int64(3), np.int64(4)]\nPath 2 (cache hit): lags updated to [np.int64(1), np.int64(2), np.int64(3)]\n```\n:::\n:::\n\n\n", + "supporting": [ + "multitask.strategies.LazyStrategy_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/multitask.strategies.TrainingStrategy/execute-results/html.json b/_freeze/docs/reference/multitask.strategies.TrainingStrategy/execute-results/html.json new file mode 100644 index 00000000..f7ed40fd --- /dev/null +++ b/_freeze/docs/reference/multitask.strategies.TrainingStrategy/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "f4e5c15843478fb72cf983dd5c067fc0", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: multitask.strategies.TrainingStrategy\n---\n\n\n\n```python\nmultitask.strategies.TrainingStrategy()\n```\n\nStrategy interface for preparing a forecaster before the final fit.\n\nImplementations return a forecaster with any tuning/parameter changes\napplied. The final ``forecaster.fit(...)`` and prediction packaging are\nperformed by ``BaseTask._train_and_predict_target`` after this call.\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#73d1dd61 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.multitask.strategies import (\n TrainingStrategy,\n LazyStrategy,\n DefaultsStrategy,\n)\n\n# Both concrete strategies satisfy the TrainingStrategy protocol:\n# they expose a `name` attribute and a `prepare_forecaster` method.\nfor cls in (LazyStrategy, DefaultsStrategy):\n s = cls()\n assert hasattr(s, \"name\"), f\"{cls.__name__} missing .name\"\n assert callable(s.prepare_forecaster), f\"{cls.__name__} missing .prepare_forecaster\"\n print(f\"{cls.__name__}.name = {s.name!r}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLazyStrategy.name = 'lazy'\nDefaultsStrategy.name = 'defaults'\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [prepare_forecaster](#spotforecast2_safe.multitask.strategies.TrainingStrategy.prepare_forecaster) | Return a forecaster ready for the final fit step. |\n\n### prepare_forecaster { #spotforecast2_safe.multitask.strategies.TrainingStrategy.prepare_forecaster }\n\n```python\nmultitask.strategies.TrainingStrategy.prepare_forecaster(\n task,\n target,\n forecaster,\n y_train,\n exog_train=None,\n)\n```\n\nReturn a forecaster ready for the final fit step.\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#334b5245 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nfrom sklearn.linear_model import LinearRegression\nfrom spotforecast2_safe.forecaster.recursive import ForecasterRecursive\nfrom spotforecast2_safe.multitask.strategies import DefaultsStrategy\n\n# Demonstrate prepare_forecaster via a concrete implementation.\n# DefaultsStrategy is the simplest: it returns the forecaster unchanged.\nforecaster = ForecasterRecursive(estimator=LinearRegression(), lags=3)\ny_train = pd.Series(range(30), dtype=float, name=\"target_0\")\n\nclass _NullTask:\n pass\n\nstrategy = DefaultsStrategy()\nresult = strategy.prepare_forecaster(_NullTask(), \"target_0\", forecaster, y_train)\nassert result is forecaster\nprint(f\"prepare_forecaster returned the same object: {result is forecaster}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nprepare_forecaster returned the same object: True\n```\n:::\n:::\n\n\n", + "supporting": [ + "multitask.strategies.TrainingStrategy_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/preprocessing.exog_builder.ExogBuilder/execute-results/html.json b/_freeze/docs/reference/preprocessing.exog_builder.ExogBuilder/execute-results/html.json new file mode 100644 index 00000000..9d44e8ad --- /dev/null +++ b/_freeze/docs/reference/preprocessing.exog_builder.ExogBuilder/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "a2609746d66adb8d426bcdbd2732a956", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: preprocessing.exog_builder.ExogBuilder\n---\n\n\n\n```python\npreprocessing.exog_builder.ExogBuilder(\n periods=None,\n country_code=None,\n providers=None,\n on_provider_failure='raise',\n)\n```\n\nBuilds a set of exogenous features for a given date range.\n\nThis builder combines temporal features (day of year, day of week, hour, etc.)\nwith cyclical features encoded via RepeatingBasisFunctions and optional\nholiday indicators.\n\nOptional `ExogFeatureProvider` objects extend the built frame with additional\ndrivers (e.g. ENTSO-E day-ahead forecasts, COVID infection rate). Each\nprovider returns numeric, NaN-free columns aligned to the same hourly index;\na provider that cannot cover the range is either re-raised or skipped\naccording to *on_provider_failure*.\n\n## Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|---------------------|-----------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|\n| periods | [List](`typing.List`)\\[[Period](`spotforecast2_safe.data.data_classes.Period`)\\] | List of periodic features to encode. |\n| country_code | [Optional](`typing.Optional`)\\[[str](`str`)\\] | Country code for holiday lookups. |\n| holidays_list | [Optional](`typing.Optional`)\\[[holidays](`holidays`).[HolidayBase](`holidays.HolidayBase`)\\] | List of holidays for the specified country. |\n| providers | [List](`typing.List`)\\[[ExogFeatureProvider](`spotforecast2_safe.preprocessing.exog_providers.ExogFeatureProvider`)\\] | Extra exogenous-feature providers appended to every built frame. |\n| on_provider_failure | [str](`str`) | ``\"raise\"`` (default) to propagate an ``ExogProviderError`` from a provider, or ``\"skip\"`` to log and omit that provider's columns. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#580157e2 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.data.data_classes import Period\nfrom spotforecast2_safe.preprocessing.exog_builder import ExogBuilder\n\nperiods = [Period(name=\"hour\", n_periods=24, column=\"hour\", input_range=(0, 23))]\nbuilder = ExogBuilder(periods=periods, country_code=\"DE\")\nstart = pd.Timestamp(\"2025-01-01\", tz=\"UTC\")\nend = pd.Timestamp(\"2025-01-02\", tz=\"UTC\")\nexog = builder.build(start, end)\nprint(f\"shape: {exog.shape}\")\nassert exog.shape[1] > 0\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nshape: (25, 26)\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [build](#spotforecast2_safe.preprocessing.exog_builder.ExogBuilder.build) | Build the exogenous feature DataFrame for a date range. |\n\n### build { #spotforecast2_safe.preprocessing.exog_builder.ExogBuilder.build }\n\n```python\npreprocessing.exog_builder.ExogBuilder.build(start_date, end_date)\n```\n\nBuild the exogenous feature DataFrame for a date range.\n\nThe generated DataFrame has an hourly frequency.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------|------------------------------------------------|--------------------------------------|------------|\n| start_date | [pd](`pandas`).[Timestamp](`pandas.Timestamp`) | Start of the date range (inclusive). | _required_ |\n| end_date | [pd](`pandas`).[Timestamp](`pandas.Timestamp`) | End of the date range (inclusive). | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|--------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: DataFrame containing exogenous features. |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|----------------------------|-------------------------------|\n| | [ValueError](`ValueError`) | If the date range is invalid. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#0129ebb5 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.data.data_classes import Period\nfrom spotforecast2_safe.preprocessing.exog_builder import ExogBuilder\n\nperiods = [Period(name=\"hour\", n_periods=24, column=\"hour\", input_range=(0, 23))]\nbuilder = ExogBuilder(periods=periods, country_code=\"DE\")\nstart = pd.Timestamp(\"2025-01-01\", tz=\"UTC\")\nend = pd.Timestamp(\"2025-01-02\", tz=\"UTC\")\nexog = builder.build(start, end)\nprint(f\"shape: {exog.shape}, columns: {list(exog.columns[:4])}\")\nassert exog.shape == (25, 26)\nassert \"holidays\" in exog.columns\nassert \"is_weekend\" in exog.columns\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nshape: (25, 26), columns: ['hour_0', 'hour_1', 'hour_2', 'hour_3']\n```\n:::\n:::\n\n\n", + "supporting": [ + "preprocessing.exog_builder.ExogBuilder_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/preprocessing.exog_providers.EntsoeDayAheadPriceProvider/execute-results/html.json b/_freeze/docs/reference/preprocessing.exog_providers.EntsoeDayAheadPriceProvider/execute-results/html.json new file mode 100644 index 00000000..af557e93 --- /dev/null +++ b/_freeze/docs/reference/preprocessing.exog_providers.EntsoeDayAheadPriceProvider/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "89ffe028b305883a79ac56870928a0ab", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: preprocessing.exog_providers.EntsoeDayAheadPriceProvider\n---\n\n\n\n```python\npreprocessing.exog_providers.EntsoeDayAheadPriceProvider(\n data_home=None,\n max_gap=0,\n max_tail_gap=0,\n provider_window=None,\n)\n```\n\nENTSO-E day-ahead spot price (DE/LU) as an exogenous input.\n\nReads ``interim/day_ahead_price.csv`` via\n`spotforecast2_safe.data.fetch_data.load_day_ahead_price`. The day-ahead\nauction price is published on D-1 and is leakage-clean at forecast time;\nthe realised price must never be used.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------------|-----------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|-----------|\n| data_home | [DataHome](`spotforecast2_safe.preprocessing.exog_providers.DataHome`) | Root data directory forwarded to the loader. | `None` |\n| max_gap | [int](`int`) | Maximum contiguous missing-value run healed by ``_align_to_index``. See `_align_to_index` for full semantics. Defaults to ``0``. | `0` |\n| max_tail_gap | [int](`int`) | Extended healing budget for the trailing-edge NaN run. See `_align_to_index`. Defaults to ``0``. | `0` |\n| provider_window | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DatetimeIndex](`pandas.DatetimeIndex`)\\] | Validation index passed to ``_align_to_index`` as *validate_index*. See `_align_to_index`. Defaults to ``None``. | `None` |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#a27af96d .cell execution_count=1}\n``` {.python .cell-code}\nimport os\nimport shutil\nimport tempfile\n\nimport pandas as pd\n\nfrom spotforecast2_safe.preprocessing.exog_providers import (\n EntsoeDayAheadPriceProvider,\n)\n\ntmp = tempfile.mkdtemp()\nos.environ[\"SPOTFORECAST2_DATA\"] = tmp\nos.makedirs(os.path.join(tmp, \"interim\"), exist_ok=True)\nidx = pd.date_range(\"2023-06-01\", periods=24, freq=\"h\", tz=\"UTC\")\npd.DataFrame(\n {\"Day-ahead Price\": 95.0}, index=idx\n).rename_axis(\"Time (UTC)\").to_csv(\n os.path.join(tmp, \"interim\", \"day_ahead_price.csv\")\n)\n\nprovider = EntsoeDayAheadPriceProvider()\nout = provider.build(idx)\nprint(out.columns.tolist(), out.shape, out.dtypes.iloc[0].name)\nassert out.shape == (24, 1)\nassert not out.isna().any().any()\n\nshutil.rmtree(tmp)\ndel os.environ[\"SPOTFORECAST2_DATA\"]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n['entsoe_day_ahead_price'] (24, 1) float32\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [build](#spotforecast2_safe.preprocessing.exog_providers.EntsoeDayAheadPriceProvider.build) | Return the day-ahead price series aligned to *index*. |\n\n### build { #spotforecast2_safe.preprocessing.exog_providers.EntsoeDayAheadPriceProvider.build }\n\n```python\npreprocessing.exog_providers.EntsoeDayAheadPriceProvider.build(index)\n```\n\nReturn the day-ahead price series aligned to *index*.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|--------------------------------------------------------|------------------------------------------------------------------|------------|\n| index | [pd](`pandas`).[DatetimeIndex](`pandas.DatetimeIndex`) | Hourly ``DatetimeIndex`` (tz-aware UTC) for the forecast window. | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|----------------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: Single column ``entsoe_day_ahead_price``, ``float32``. |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|\n| | [ExogProviderError](`spotforecast2_safe.preprocessing.exog_providers.ExogProviderError`) | If ``interim/day_ahead_price.csv`` is missing or the ``Day-ahead Price`` column is absent. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#24dfd9f7 .cell execution_count=2}\n``` {.python .cell-code}\nimport os\nimport shutil\nimport tempfile\n\nimport pandas as pd\n\nfrom spotforecast2_safe.preprocessing.exog_providers import (\n EntsoeDayAheadPriceProvider,\n)\n\ntmp = tempfile.mkdtemp()\nos.environ[\"SPOTFORECAST2_DATA\"] = tmp\nos.makedirs(os.path.join(tmp, \"interim\"), exist_ok=True)\nidx = pd.date_range(\"2023-06-01\", periods=12, freq=\"h\", tz=\"UTC\")\npd.DataFrame(\n {\"Day-ahead Price\": 88.5}, index=idx\n).rename_axis(\"Time (UTC)\").to_csv(\n os.path.join(tmp, \"interim\", \"day_ahead_price.csv\")\n)\n\nout = EntsoeDayAheadPriceProvider().build(idx)\nprint(out.columns.tolist(), out.shape, float(out.iloc[0, 0]))\nassert out.shape == (12, 1)\n\nshutil.rmtree(tmp)\ndel os.environ[\"SPOTFORECAST2_DATA\"]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n['entsoe_day_ahead_price'] (12, 1) 88.5\n```\n:::\n:::\n\n\n", + "supporting": [ + "preprocessing.exog_providers.EntsoeDayAheadPriceProvider_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/preprocessing.exog_providers.EntsoeNetLoadProvider/execute-results/html.json b/_freeze/docs/reference/preprocessing.exog_providers.EntsoeNetLoadProvider/execute-results/html.json new file mode 100644 index 00000000..97ed95d3 --- /dev/null +++ b/_freeze/docs/reference/preprocessing.exog_providers.EntsoeNetLoadProvider/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "e7bb6becaa648c9488235c281af54bfa", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: preprocessing.exog_providers.EntsoeNetLoadProvider\n---\n\n\n\n```python\npreprocessing.exog_providers.EntsoeNetLoadProvider(\n data_home=None,\n max_gap=0,\n max_tail_gap=0,\n provider_window=None,\n)\n```\n\nENTSO-E day-ahead net load = Forecasted Load − (wind + solar) forecast.\n\nCombines the day-ahead Forecasted Load with the day-ahead renewable\nforecast to form the net-load prior the residual is often modelled against.\nBoth inputs are day-ahead (leakage-clean). Raises\n`ExogProviderError` if either input is unavailable.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------------|-----------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|-----------|\n| data_home | [DataHome](`spotforecast2_safe.preprocessing.exog_providers.DataHome`) | Root data directory forwarded to the loaders. | `None` |\n| max_gap | [int](`int`) | Maximum contiguous missing-value run healed by ``_align_to_index``. See `_align_to_index` for full semantics. Defaults to ``0``. | `0` |\n| max_tail_gap | [int](`int`) | Extended healing budget for the trailing-edge NaN run. See `_align_to_index`. Defaults to ``0``. | `0` |\n| provider_window | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DatetimeIndex](`pandas.DatetimeIndex`)\\] | Validation index passed to ``_align_to_index`` as *validate_index*. See `_align_to_index`. Defaults to ``None``. | `None` |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#66800281 .cell execution_count=1}\n``` {.python .cell-code}\nimport os\nimport shutil\nimport tempfile\n\nimport pandas as pd\n\nfrom spotforecast2_safe.preprocessing.exog_providers import (\n EntsoeNetLoadProvider,\n)\n\ntmp = tempfile.mkdtemp()\nos.environ[\"SPOTFORECAST2_DATA\"] = tmp\nos.makedirs(os.path.join(tmp, \"interim\"), exist_ok=True)\nidx = pd.date_range(\"2023-06-01\", periods=24, freq=\"h\", tz=\"UTC\")\npd.DataFrame(\n {\"Actual Load\": 100.0, \"Forecasted Load\": 90.0}, index=idx\n).rename_axis(\"Time (UTC)\").to_csv(\n os.path.join(tmp, \"interim\", \"energy_load.csv\")\n)\npd.DataFrame(\n {\"Solar\": 3.0, \"Wind Onshore\": 5.0}, index=idx\n).rename_axis(\"Time (UTC)\").to_csv(\n os.path.join(tmp, \"interim\", \"renewable_forecast.csv\")\n)\n\nprovider = EntsoeNetLoadProvider()\nout = provider.build(idx)\nprint(out.columns.tolist(), out.shape, float(out.iloc[0, 0]))\nassert out.shape == (24, 1)\nassert abs(float(out.iloc[0, 0]) - 82.0) < 0.1 # 90 - (3 + 5)\n\nshutil.rmtree(tmp)\ndel os.environ[\"SPOTFORECAST2_DATA\"]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n['entsoe_net_load'] (24, 1) 82.0\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [build](#spotforecast2_safe.preprocessing.exog_providers.EntsoeNetLoadProvider.build) | Return the day-ahead net load (Forecasted Load minus renewables). |\n\n### build { #spotforecast2_safe.preprocessing.exog_providers.EntsoeNetLoadProvider.build }\n\n```python\npreprocessing.exog_providers.EntsoeNetLoadProvider.build(index)\n```\n\nReturn the day-ahead net load (Forecasted Load minus renewables).\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|--------------------------------------------------------|------------------------------------------------------------------|------------|\n| index | [pd](`pandas`).[DatetimeIndex](`pandas.DatetimeIndex`) | Hourly ``DatetimeIndex`` (tz-aware UTC) for the forecast window. | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|---------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: Single column ``entsoe_net_load``, ``float32``. |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|\n| | [ExogProviderError](`spotforecast2_safe.preprocessing.exog_providers.ExogProviderError`) | If either ``energy_load.csv`` or ``renewable_forecast.csv`` is missing. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#68d861c6 .cell execution_count=2}\n``` {.python .cell-code}\nimport os\nimport shutil\nimport tempfile\n\nimport pandas as pd\n\nfrom spotforecast2_safe.preprocessing.exog_providers import (\n EntsoeNetLoadProvider,\n)\n\ntmp = tempfile.mkdtemp()\nos.environ[\"SPOTFORECAST2_DATA\"] = tmp\nos.makedirs(os.path.join(tmp, \"interim\"), exist_ok=True)\nidx = pd.date_range(\"2023-06-01\", periods=12, freq=\"h\", tz=\"UTC\")\npd.DataFrame(\n {\"Actual Load\": 100.0, \"Forecasted Load\": 80.0}, index=idx\n).rename_axis(\"Time (UTC)\").to_csv(\n os.path.join(tmp, \"interim\", \"energy_load.csv\")\n)\npd.DataFrame(\n {\"Solar\": 2.0, \"Wind Onshore\": 6.0}, index=idx\n).rename_axis(\"Time (UTC)\").to_csv(\n os.path.join(tmp, \"interim\", \"renewable_forecast.csv\")\n)\n\nout = EntsoeNetLoadProvider().build(idx)\nprint(out.columns.tolist(), out.shape, float(out.iloc[0, 0]))\nassert out.shape == (12, 1)\n\nshutil.rmtree(tmp)\ndel os.environ[\"SPOTFORECAST2_DATA\"]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n['entsoe_net_load'] (12, 1) 72.0\n```\n:::\n:::\n\n\n", + "supporting": [ + "preprocessing.exog_providers.EntsoeNetLoadProvider_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/preprocessing.exog_providers.EntsoeRenewableForecastProvider/execute-results/html.json b/_freeze/docs/reference/preprocessing.exog_providers.EntsoeRenewableForecastProvider/execute-results/html.json new file mode 100644 index 00000000..9c74b602 --- /dev/null +++ b/_freeze/docs/reference/preprocessing.exog_providers.EntsoeRenewableForecastProvider/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "a5c379f70fcf7b890f775f25a43eab1f", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: preprocessing.exog_providers.EntsoeRenewableForecastProvider\n---\n\n\n\n```python\npreprocessing.exog_providers.EntsoeRenewableForecastProvider(\n data_home=None,\n max_gap=0,\n max_tail_gap=0,\n provider_window=None,\n)\n```\n\nENTSO-E day-ahead wind and solar generation forecast.\n\nReads ``interim/renewable_forecast.csv`` via\n`spotforecast2_safe.data.fetch_data.load_renewable_forecast` and emits two\ncolumns: ``entsoe_wind_forecast`` (sum of all wind columns) and\n``entsoe_solar_forecast`` (sum of all solar columns). Day-ahead forecasts are\nleakage-clean; the realised generation must never be used.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------------|-----------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|-----------|\n| data_home | [DataHome](`spotforecast2_safe.preprocessing.exog_providers.DataHome`) | Root data directory forwarded to the loader. | `None` |\n| max_gap | [int](`int`) | Maximum contiguous missing-value run healed by ``_align_to_index``. See `_align_to_index` for full semantics. Defaults to ``0``. | `0` |\n| max_tail_gap | [int](`int`) | Extended healing budget for the trailing-edge NaN run. See `_align_to_index`. Defaults to ``0``. | `0` |\n| provider_window | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DatetimeIndex](`pandas.DatetimeIndex`)\\] | Validation index passed to ``_align_to_index`` as *validate_index*. See `_align_to_index`. Defaults to ``None``. | `None` |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#4bc4dca0 .cell execution_count=1}\n``` {.python .cell-code}\nimport os\nimport shutil\nimport tempfile\n\nimport pandas as pd\n\nfrom spotforecast2_safe.preprocessing.exog_providers import (\n EntsoeRenewableForecastProvider,\n)\n\ntmp = tempfile.mkdtemp()\nos.environ[\"SPOTFORECAST2_DATA\"] = tmp\nos.makedirs(os.path.join(tmp, \"interim\"), exist_ok=True)\nidx = pd.date_range(\"2023-06-01\", periods=24, freq=\"h\", tz=\"UTC\")\npd.DataFrame(\n {\"Solar\": 3.0, \"Wind Onshore\": 5.0}, index=idx\n).rename_axis(\"Time (UTC)\").to_csv(\n os.path.join(tmp, \"interim\", \"renewable_forecast.csv\")\n)\n\nprovider = EntsoeRenewableForecastProvider()\nout = provider.build(idx)\nprint(out.columns.tolist(), out.shape, out.dtypes.iloc[0].name)\nassert set(out.columns) == {\"entsoe_wind_forecast\", \"entsoe_solar_forecast\"}\nassert out.shape == (24, 2)\n\nshutil.rmtree(tmp)\ndel os.environ[\"SPOTFORECAST2_DATA\"]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n['entsoe_wind_forecast', 'entsoe_solar_forecast'] (24, 2) float32\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [build](#spotforecast2_safe.preprocessing.exog_providers.EntsoeRenewableForecastProvider.build) | Return wind and solar forecast columns aligned to *index*. |\n\n### build { #spotforecast2_safe.preprocessing.exog_providers.EntsoeRenewableForecastProvider.build }\n\n```python\npreprocessing.exog_providers.EntsoeRenewableForecastProvider.build(index)\n```\n\nReturn wind and solar forecast columns aligned to *index*.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|--------------------------------------------------------|------------------------------------------------------------------|------------|\n| index | [pd](`pandas`).[DatetimeIndex](`pandas.DatetimeIndex`) | Hourly ``DatetimeIndex`` (tz-aware UTC) for the forecast window. | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|------------------------------------------------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: Two columns — ``entsoe_wind_forecast`` and ``entsoe_solar_forecast`` — as ``float32``. |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|\n| | [ExogProviderError](`spotforecast2_safe.preprocessing.exog_providers.ExogProviderError`) | If ``interim/renewable_forecast.csv`` is missing or contains no wind or solar columns. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#5a61b9a6 .cell execution_count=2}\n``` {.python .cell-code}\nimport os\nimport shutil\nimport tempfile\n\nimport pandas as pd\n\nfrom spotforecast2_safe.preprocessing.exog_providers import (\n EntsoeRenewableForecastProvider,\n)\n\ntmp = tempfile.mkdtemp()\nos.environ[\"SPOTFORECAST2_DATA\"] = tmp\nos.makedirs(os.path.join(tmp, \"interim\"), exist_ok=True)\nidx = pd.date_range(\"2023-06-01\", periods=12, freq=\"h\", tz=\"UTC\")\npd.DataFrame(\n {\"Solar\": 2.0, \"Wind Onshore\": 4.0}, index=idx\n).rename_axis(\"Time (UTC)\").to_csv(\n os.path.join(tmp, \"interim\", \"renewable_forecast.csv\")\n)\n\nout = EntsoeRenewableForecastProvider().build(idx)\nprint(out.columns.tolist(), out.shape)\nassert not out.isna().any().any()\n\nshutil.rmtree(tmp)\ndel os.environ[\"SPOTFORECAST2_DATA\"]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n['entsoe_wind_forecast', 'entsoe_solar_forecast'] (12, 2)\n```\n:::\n:::\n\n\n", + "supporting": [ + "preprocessing.exog_providers.EntsoeRenewableForecastProvider_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/preprocessing.exog_providers.ExogFeatureProvider/execute-results/html.json b/_freeze/docs/reference/preprocessing.exog_providers.ExogFeatureProvider/execute-results/html.json new file mode 100644 index 00000000..0301baab --- /dev/null +++ b/_freeze/docs/reference/preprocessing.exog_providers.ExogFeatureProvider/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "128a1093f1a101e6a0e5a0894c35e229", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: preprocessing.exog_providers.ExogFeatureProvider\n---\n\n\n\n```python\npreprocessing.exog_providers.ExogFeatureProvider()\n```\n\nContract for a pluggable exogenous-feature source.\n\nA provider maps the hourly target index to a numeric feature\nframe on that exact index. Subclasses set `name` (a short identifier\nused in logs and as the default column name) and implement `build`.\n\nImplementations should load their backing data lazily inside `build`\nand raise `ExogProviderError` when the data is missing or cannot\ncover the requested range, so the fail-safe policy lives in one place.\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#13b1770f .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.preprocessing.exog_providers import (\n ExogFeatureProvider,\n ExogProviderError,\n)\n\nclass ConstantProvider(ExogFeatureProvider):\n name = \"constant\"\n\n def build(self, index: pd.DatetimeIndex) -> pd.DataFrame:\n return pd.DataFrame({\"constant\": 1.0}, index=index).astype(\"float32\")\n\nidx = pd.date_range(\"2023-06-01\", periods=6, freq=\"h\", tz=\"UTC\")\np = ConstantProvider()\nout = p.build(idx)\nprint(p.name, out.shape, out.dtypes[\"constant\"].name)\nassert out.shape == (6, 1)\nassert not out.isna().any().any()\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nconstant (6, 1) float32\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [build](#spotforecast2_safe.preprocessing.exog_providers.ExogFeatureProvider.build) | Return features aligned to *index*. |\n\n### build { #spotforecast2_safe.preprocessing.exog_providers.ExogFeatureProvider.build }\n\n```python\npreprocessing.exog_providers.ExogFeatureProvider.build(index)\n```\n\nReturn features aligned to *index*.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|--------------------------------------------------------|----------------------------------------------------------------------------------------------------|------------|\n| index | [pd](`pandas`).[DatetimeIndex](`pandas.DatetimeIndex`) | Hourly ``DatetimeIndex`` (typically tz-aware UTC) covering the full training-plus-forecast window. | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: Numeric columns indexed exactly by *index*, NaN-free within the validated window (the full *index* unless a ``provider_window`` was set at construction). |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------------------------------------|---------------------------------------|\n| | [ExogProviderError](`spotforecast2_safe.preprocessing.exog_providers.ExogProviderError`) | If the provider cannot cover *index*. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#d7bf0390 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.preprocessing.exog_providers import (\n ExogFeatureProvider,\n)\n\nclass LinearProvider(ExogFeatureProvider):\n name = \"linear\"\n\n def build(self, index: pd.DatetimeIndex) -> pd.DataFrame:\n vals = range(len(index))\n return pd.DataFrame({\"linear\": list(vals)}, index=index).astype(\"float32\")\n\nidx = pd.date_range(\"2023-06-01\", periods=4, freq=\"h\", tz=\"UTC\")\nout = LinearProvider().build(idx)\nprint(out.shape, out[\"linear\"].tolist())\nassert out.shape == (4, 1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n(4, 1) [0.0, 1.0, 2.0, 3.0]\n```\n:::\n:::\n\n\n", + "supporting": [ + "preprocessing.exog_providers.ExogFeatureProvider_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/preprocessing.exog_providers.build_providers_from_config/execute-results/html.json b/_freeze/docs/reference/preprocessing.exog_providers.build_providers_from_config/execute-results/html.json new file mode 100644 index 00000000..3d746c58 --- /dev/null +++ b/_freeze/docs/reference/preprocessing.exog_providers.build_providers_from_config/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "1b5b8aad8f36e0041c96442af9190fdf", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: preprocessing.exog_providers.build_providers_from_config\n---\n\n\n\n```python\npreprocessing.exog_providers.build_providers_from_config(\n config,\n *,\n data_home=None,\n provider_window=None,\n)\n```\n\nConstruct providers by reading the registry flags off a config object.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------|------------|\n| config | [Union](`typing.Union`)\\[\\'ConfigEntsoe\\', \\'ConfigMulti\\', [object](`object`)\\] | A config object (e.g. ``ConfigEntsoe`` / ``ConfigMulti``) whose attributes include the `EXOG_PROVIDER_REGISTRY` flag names. | _required_ |\n| data_home | [DataHome](`spotforecast2_safe.preprocessing.exog_providers.DataHome`) | Root data directory forwarded to each provider. | `None` |\n| provider_window | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DatetimeIndex](`pandas.DatetimeIndex`)\\] | Validation index forwarded to each provider. Overrides the per-provider window; ``None`` uses the full request index. | `None` |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|-----------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|\n| | [List](`typing.List`)\\[[ExogFeatureProvider](`spotforecast2_safe.preprocessing.exog_providers.ExogFeatureProvider`)\\] | List[ExogFeatureProvider]: Providers for the flags set to ``True`` on |\n| | [List](`typing.List`)\\[[ExogFeatureProvider](`spotforecast2_safe.preprocessing.exog_providers.ExogFeatureProvider`)\\] | *config*. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#4be15918 .cell execution_count=1}\n``` {.python .cell-code}\nfrom spotforecast2_safe.preprocessing.exog_providers import (\n build_providers_from_config,\n)\n\nclass SimpleConfig:\n include_covid_infection_rate = True\n include_entsoe_forecast_load = False\n include_entsoe_renewable_forecast = False\n include_entsoe_net_load = False\n include_entsoe_day_ahead_price = False\n exog_max_gap_hours = 0\n exog_max_tail_gap_hours = 0\n\nproviders = build_providers_from_config(SimpleConfig())\nprint([p.name for p in providers])\nassert len(providers) == 1\nassert providers[0].name == \"covid_infection_rate\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n['covid_infection_rate']\n```\n:::\n:::\n\n\n", + "supporting": [ + "preprocessing.exog_providers.build_providers_from_config_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/preprocessing.repeating_basis_function/execute-results/html.json b/_freeze/docs/reference/preprocessing.repeating_basis_function/execute-results/html.json new file mode 100644 index 00000000..7924bcae --- /dev/null +++ b/_freeze/docs/reference/preprocessing.repeating_basis_function/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "138304ef71290cd6ccbd425b54e5de63", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: preprocessing.repeating_basis_function\n---\n\n\n\n`preprocessing.repeating_basis_function`\n\nRepeating Basis Function transformer for cyclical features.\n\n## Classes\n\n| Name | Description |\n| --- | --- |\n| [RepeatingBasisFunction](#spotforecast2_safe.preprocessing.repeating_basis_function.RepeatingBasisFunction) | Transformer that encodes cyclical features using repeating radial basis functions. |\n\n### RepeatingBasisFunction { #spotforecast2_safe.preprocessing.repeating_basis_function.RepeatingBasisFunction }\n\n```python\npreprocessing.repeating_basis_function.RepeatingBasisFunction(\n n_periods,\n column,\n input_range,\n remainder='drop',\n)\n```\n\nTransformer that encodes cyclical features using repeating radial basis functions.\n\nThis transformer places Gaussian basis functions across the specified input range\nand wraps them around to handle periodicity (e.g., day of year, hour of day).\nIt is a simplified implementation to avoid external dependencies like scikit-lego.\n\n#### Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|-------------|-------------------------------------------------------|--------------------------------------------------------------------|\n| n_periods | [int](`int`) | Number of basis functions to place. |\n| column | [str](`str`) | Name of the column in the input DataFrame/Series to transform. |\n| input_range | [Tuple](`typing.Tuple`)\\[[int](`int`), [int](`int`)\\] | The range of the input values (min, max). |\n| remainder | [str](`str`) | Policy for remaining columns (currently only 'drop' is supported). |\n\n#### Examples {.doc-section .doc-section-examples}\n\n\n::: {#a4e52ae6 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.preprocessing.repeating_basis_function import RepeatingBasisFunction\n\nX = pd.DataFrame({\"hour\": [0, 6, 12, 18, 23]})\nrbf = RepeatingBasisFunction(n_periods=4, column=\"hour\", input_range=(0, 23))\nfeatures = rbf.fit_transform(X)\nprint(features.shape)\nassert features.shape == (5, 4)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n(5, 4)\n```\n:::\n:::\n\n\n#### Methods\n\n| Name | Description |\n| --- | --- |\n| [fit](#spotforecast2_safe.preprocessing.repeating_basis_function.RepeatingBasisFunction.fit) | Fitted transformer (no-op). |\n| [transform](#spotforecast2_safe.preprocessing.repeating_basis_function.RepeatingBasisFunction.transform) | Transform the input data into RBF features. |\n\n##### fit { #spotforecast2_safe.preprocessing.repeating_basis_function.RepeatingBasisFunction.fit }\n\n```python\npreprocessing.repeating_basis_function.RepeatingBasisFunction.fit(X, y=None)\n```\n\nFitted transformer (no-op).\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|---------------------|---------------|------------|\n| X | [Any](`typing.Any`) | Input data. | _required_ |\n| y | [Any](`typing.Any`) | Ignored. | `None` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|--------------------------------------------------------------------------------------------------------------|-------------------------|\n| self | [RepeatingBasisFunction](`spotforecast2_safe.preprocessing.repeating_basis_function.RepeatingBasisFunction`) | The fitted transformer. |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#67a5fe16 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.preprocessing.repeating_basis_function import RepeatingBasisFunction\n\nX = pd.DataFrame({\"month\": [1, 3, 6, 9, 12]})\nrbf = RepeatingBasisFunction(n_periods=6, column=\"month\", input_range=(1, 12))\nfitted = rbf.fit(X)\nprint(type(fitted))\nassert fitted is rbf\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n\n```\n:::\n:::\n\n\n##### transform { #spotforecast2_safe.preprocessing.repeating_basis_function.RepeatingBasisFunction.transform }\n\n```python\npreprocessing.repeating_basis_function.RepeatingBasisFunction.transform(X)\n```\n\nTransform the input data into RBF features.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|---------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|------------|\n| X | [Union](`typing.Union`)\\[[pd](`pandas`).[Series](`pandas.Series`), [pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Input DataFrame or Series containing the column to transform. | _required_ |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------|------------------------------------------------------------------------------|\n| | [np](`numpy`).[ndarray](`numpy.ndarray`) | np.ndarray: Array of transformed features with shape (n_samples, n_periods). |\n\n###### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|----------------------------|----------------------------------------------------|\n| | [ValueError](`ValueError`) | If the specified column is not found in the input. |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#3c6c34f3 .cell execution_count=3}\n``` {.python .cell-code}\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.preprocessing.repeating_basis_function import RepeatingBasisFunction\n\nX = pd.DataFrame({\"hour\": list(range(0, 24, 4))})\nrbf = RepeatingBasisFunction(n_periods=4, column=\"hour\", input_range=(0, 23))\nrbf.fit(X)\nfeatures = rbf.transform(X)\nprint(f\"shape: {features.shape}, dtype: {features.dtype}\")\nassert features.shape == (6, 4)\nassert features.dtype == np.float64\nassert np.all((features >= 0) & (features <= 1))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nshape: (6, 4), dtype: float64\n```\n:::\n:::\n\n\n", + "supporting": [ + "preprocessing.repeating_basis_function_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/splitter.split_one_step/execute-results/html.json b/_freeze/docs/reference/splitter.split_one_step/execute-results/html.json new file mode 100644 index 00000000..149fcc6f --- /dev/null +++ b/_freeze/docs/reference/splitter.split_one_step/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "6534a6c6a192b4edbc33b137333dedf0", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: splitter.split_one_step\n---\n\n\n\n`splitter.split_one_step`\n\nOne step ahead cross-validation splitting.\n\n## Classes\n\n| Name | Description |\n| --- | --- |\n| [OneStepAheadFold](#spotforecast2_safe.splitter.split_one_step.OneStepAheadFold) | Class to split time series data into train and test folds for one-step-ahead |\n\n### OneStepAheadFold { #spotforecast2_safe.splitter.split_one_step.OneStepAheadFold }\n\n```python\nsplitter.split_one_step.OneStepAheadFold(\n initial_train_size,\n window_size=None,\n differentiation=None,\n return_all_indexes=False,\n verbose=True,\n)\n```\n\nClass to split time series data into train and test folds for one-step-ahead\nforecasting.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------------------|--------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| initial_train_size | [int](`int`) \\| [str](`str`) \\| [pd](`pandas`).[Timestamp](`pandas.Timestamp`) | Number of observations used for initial training. - If an integer, the number of observations used for initial training. - If a date string or pandas Timestamp, it is the last date included in the initial training set. | _required_ |\n| window_size | [int](`int`) | Number of observations needed to generate the autoregressive predictors. Defaults to None. | `None` |\n| differentiation | [int](`int`) | Number of observations to use for differentiation. This is used to extend the `last_window` as many observations as the differentiation order. Defaults to None. | `None` |\n| return_all_indexes | [bool](`bool`) | Whether to return all indexes or only the start and end indexes of each fold. Defaults to False. | `False` |\n| verbose | [bool](`bool`) | Whether to print information about generated folds. Defaults to True. | `True` |\n\n#### Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|--------------------|----------------|------------------------------------------------------------------------------------------------------------------------------------------------|\n| initial_train_size | [int](`int`) | Number of observations used for initial training. |\n| window_size | [int](`int`) | Number of observations needed to generate the autoregressive predictors. |\n| differentiation | [int](`int`) | Number of observations to use for differentiation. This is used to extend the `last_window` as many observations as the differentiation order. |\n| return_all_indexes | [bool](`bool`) | Whether to return all indexes or only the start and end indexes of each fold. |\n| verbose | [bool](`bool`) | Whether to print information about generated folds. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n\n::: {#b367b573 .cell execution_count=1}\n``` {.python .cell-code}\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.splitter.split_one_step import OneStepAheadFold\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2025-01-01\", periods=120, freq=\"h\", tz=\"UTC\")\ny = pd.Series(\n 50 + 10 * np.sin(np.arange(120) / 12) + rng.normal(0, 1, 120),\n index=idx,\n name=\"load\",\n)\n\ncv = OneStepAheadFold(initial_train_size=96, verbose=False)\nfold = cv.split(y, as_pandas=True)\nprint(fold)\nassert fold[\"train_end\"].iloc[0] == 96\nassert fold[\"test_start\"].iloc[0] == 96\nassert fold[\"test_end\"].iloc[0] == 120\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n fold train_start train_end test_start test_end fit_forecaster\n0 0 0 96 96 120 True\n```\n:::\n:::\n\n\n#### Methods\n\n| Name | Description |\n| --- | --- |\n| [split](#spotforecast2_safe.splitter.split_one_step.OneStepAheadFold.split) | Split the time series data into train and test folds. |\n\n##### split { #spotforecast2_safe.splitter.split_one_step.OneStepAheadFold.split }\n\n```python\nsplitter.split_one_step.OneStepAheadFold.split(\n X,\n as_pandas=False,\n externally_fitted=None,\n)\n```\n\nSplit the time series data into train and test folds.\n\n###### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|------------|\n| X | [pd](`pandas`).[Series](`pandas.Series`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) \\| [pd](`pandas`).[Index](`pandas.Index`) \\| [dict](`dict`) | Time series data or index to split. | _required_ |\n| as_pandas | [bool](`bool`) | If True, the folds are returned as a DataFrame. This is useful to visualize the folds in a more interpretable way. Defaults to False. | `False` |\n| externally_fitted | [Any](`typing.Any`) | This argument is not used in this class. It is included for API consistency. Defaults to None. | `None` |\n\n###### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | list \\| pd.DataFrame: A list of lists containing the indices (position) of |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | the fold. The list contains 2 lists with the following information: |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | - fold: fold number. |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | - [train_start, train_end]: list with the start and end positions of the training set. |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | - [test_start, test_end]: list with the start and end positions of the test set. These are the observations used to evaluate the forecaster. |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | - fit_forecaster: boolean indicating whether the forecaster should be fitted in this fold. |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | It is important to note that the returned values are the positions of the |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | observations and not the actual values of the index, so they can be used to |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | slice the data directly using iloc. |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | If `as_pandas` is `True`, the folds are returned as a DataFrame with the |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | following columns: 'fold', 'train_start', 'train_end', 'test_start', |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | 'test_end', 'fit_forecaster'. |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Following the python convention, the start index is inclusive and the end |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | index is exclusive. This means that the last index is not included in the |\n| | [list](`list`) \\| [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | slice. |\n\n###### Examples {.doc-section .doc-section-examples}\n\n::: {#b76700fe .cell execution_count=2}\n``` {.python .cell-code}\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.splitter.split_one_step import OneStepAheadFold\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2025-01-01\", periods=100, freq=\"h\", tz=\"UTC\")\ny = pd.Series(\n 50 + 10 * np.sin(np.arange(100) / 12) + rng.normal(0, 1, 100),\n index=idx,\n name=\"load\",\n)\n\ncv = OneStepAheadFold(initial_train_size=80, verbose=False)\n\n# List form: [fold_id, [train_start, train_end], [test_start, test_end], fit]\nfold_list = cv.split(y)\nprint(\"fold list:\", fold_list)\nassert fold_list[1] == [0, 80]\nassert fold_list[2] == [80, 100]\n\n# DataFrame form for human-readable inspection\nfold_df = cv.split(y, as_pandas=True)\nprint(fold_df)\nassert fold_df.shape == (1, 6)\nassert int(fold_df[\"train_end\"].iloc[0]) == 80\nassert int(fold_df[\"test_end\"].iloc[0]) == 100\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nfold list: [0, [0, 80], [80, 100], True]\n fold train_start train_end test_start test_end fit_forecaster\n0 0 0 80 80 100 True\n```\n:::\n:::\n\n\n", + "supporting": [ + "splitter.split_one_step_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/stats.spectral.PeriodogramResult/execute-results/html.json b/_freeze/docs/reference/stats.spectral.PeriodogramResult/execute-results/html.json new file mode 100644 index 00000000..a77a77b6 --- /dev/null +++ b/_freeze/docs/reference/stats.spectral.PeriodogramResult/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "e1cc9a695238b319db14c18a95c14a48", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: stats.spectral.PeriodogramResult\n---\n\n\n\n```python\nstats.spectral.PeriodogramResult(spectrum, top_periods)\n```\n\nContainer for the output of `compute_periodogram()`.\n\n## Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|-------------|------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| spectrum | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | ``pd.DataFrame`` indexed by frequency with one column ``\"power\"``. Frequency is expressed in units of \"cycles per sample\" (the convention used by `scipy.signal.periodogram`). |\n| top_periods | [pd](`pandas`).[Series](`pandas.Series`) | ``pd.Series`` of the top-k periods (1 / frequency) indexed by their spectral power, sorted from strongest to weakest peak. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#37acbe0f .cell execution_count=1}\n``` {.python .cell-code}\nimport numpy as np\nimport pandas as pd\n\nfrom spotforecast2_safe.stats.spectral import compute_periodogram\n\nrng = np.random.default_rng(0)\nt = np.arange(256)\ny = pd.Series(np.sin(2 * np.pi * t / 24) + 0.1 * rng.standard_normal(256))\nresult = compute_periodogram(y, max_peaks=3)\nprint(type(result).__name__)\nprint(result.spectrum.columns.tolist())\nassert isinstance(result.top_periods, pd.Series)\nassert len(result.top_periods) == 3\nprint(f\"Dominant period: {result.top_periods.index[0]:.1f} samples\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nPeriodogramResult\n['power']\nDominant period: 23.3 samples\n```\n:::\n:::\n\n\n", + "supporting": [ + "stats.spectral.PeriodogramResult_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_freeze/docs/reference/tasks.task_safe_demo/execute-results/html.json b/_freeze/docs/reference/tasks.task_safe_demo/execute-results/html.json new file mode 100644 index 00000000..9c6ae01a --- /dev/null +++ b/_freeze/docs/reference/tasks.task_safe_demo/execute-results/html.json @@ -0,0 +1,12 @@ +{ + "hash": "ef025a76ddeda93850186c5ea4878a4f", + "result": { + "engine": "jupyter", + "markdown": "---\ntitle: tasks.task_safe_demo\n---\n\n\n\n`tasks.task_safe_demo`\n\nTask demo: compare baseline, covariate, and custom LightGBM forecasts against ground truth.\n\nThis script executes the baseline N-to-1 task, the covariate-enhanced N-to-1\npipeline, and a custom LightGBM model with optimized hyperparameters, then loads\nthe ground truth from a specified data directory.\nLogging Mechanism: This script uses a dual-handler logging system designed for safety-critical MLOps:\n\n* Console Handler:\n Provides real-time progress updates to `stdout`.\n* File Handler:\n Persists all log messages (including debug/tracebacks) to a timestamped file in `{model_root}/logs/`.\n* Log File Location:\n By default, logs are saved to `~/spotforecast2_safe_models/logs/task_safe_demo_YYYYMMDD_HHMMSS.log`.\n* Safety-Critical Features:\n * Persistent file-based logging for auditability.\n * Path management using pathlib for cross-platform reliability.\n * Explicit input validation and existence checks.\n * Comprehensive error handling with traceback logging.\n * Deterministic random seeding where applicable.\n * Minimal dependency footprint (no plotting libraries).\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#c6a58df9 .cell execution_count=1}\n``` {.python .cell-code}\n# These are shell commands; they cannot run inside a Python kernel.\n# Run with default settings (force training):\n# uv run spotforecast-safe-demo\n# Skip training (use cached models if available):\n# uv run spotforecast-safe-demo --force_train false\n# Specify a custom data path:\n# uv run spotforecast-safe-demo --data_path /path/to/data.csv\n# Enable logging:\n# uv run spotforecast-safe-demo --logging true\n```\n:::\n\n\n## Functions\n\n| Name | Description |\n| --- | --- |\n| [main](#spotforecast2_safe.tasks.task_safe_demo.main) | Main execution entry point. |\n\n### main { #spotforecast2_safe.tasks.task_safe_demo.main }\n\n```python\ntasks.task_safe_demo.main(\n force_train=True,\n data_path=None,\n logging_enabled=False,\n)\n```\n\nMain execution entry point.\nReturns 0 on success, non-zero on failure.\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#9738489a .cell execution_count=2}\n``` {.python .cell-code}\nfrom pathlib import Path\nfrom spotforecast2_safe.tasks.task_safe_demo import main\n\n# Fail-fast path: when the ground truth file does not exist,\n# main() returns 1 immediately without attempting any training.\nresult = main(\n force_train=False,\n data_path=Path(\"/nonexistent/path/data.csv\"),\n logging_enabled=False,\n)\nprint(f\"Return code (missing data): {result}\")\nassert result == 1, f\"Expected 1, got {result}\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nReturn code (missing data): 1\n```\n:::\n:::\n\n\n", + "supporting": [ + "tasks.task_safe_demo_files" + ], + "filters": [], + "includes": {} + } +} \ No newline at end of file diff --git a/_quarto.yml b/_quarto.yml index 175d4403..e9326b4f 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -745,11 +745,23 @@ quartodoc: # ── Weather ─────────────────────────────────────────────────────────────── - title: "Weather" desc: | - Weather data integration using the Open-Meteo API. + Weather data integration using the Open-Meteo API, derived weather + features (degree-hours, apparent temperature, dew point), and + population-weighted multi-city spatial aggregation. contents: - weather.client.WeatherClient - weather.client.WeatherService - weather.features.get_weather_features + - weather.derived.heating_degree_hours + - weather.derived.cooling_degree_hours + - weather.derived.dew_point + - weather.derived.apparent_temperature + - weather.derived.population_weighted_average + - weather.derived.add_derived_weather_features + - weather.locations.WeatherLocation + - weather.locations.default_german_locations + - weather.locations.coordinates + - weather.locations.weights # ── Downloader ──────────────────────────────────────────────────────────── - title: "Downloader" diff --git a/docs/reference/configurator.config_multi.ConfigMulti.qmd b/docs/reference/configurator.config_multi.ConfigMulti.qmd index 59b90762..55abb55d 100644 --- a/docs/reference/configurator.config_multi.ConfigMulti.qmd +++ b/docs/reference/configurator.config_multi.ConfigMulti.qmd @@ -28,6 +28,11 @@ configurator.config_multi.ConfigMulti( include_weather_windows=False, include_holiday_features=False, include_holiday_adjacency_features=False, + use_population_weighted_weather=False, + include_degree_hours=False, + include_apparent_temperature=False, + degree_hours_base_heating=15.0, + degree_hours_base_cooling=22.0, poly_features_degree=1, max_poly_features=10, poly_mi_n_jobs=-1, diff --git a/docs/reference/index.qmd b/docs/reference/index.qmd index 40c2ba06..4b994e00 100644 --- a/docs/reference/index.qmd +++ b/docs/reference/index.qmd @@ -238,7 +238,9 @@ construction. ## Weather -Weather data integration using the Open-Meteo API. +Weather data integration using the Open-Meteo API, derived weather +features (degree-hours, apparent temperature, dew point), and +population-weighted multi-city spatial aggregation. | | | @@ -246,6 +248,16 @@ Weather data integration using the Open-Meteo API. | [weather.client.WeatherClient](weather.client.WeatherClient.qmd#spotforecast2_safe.weather.client.WeatherClient) | Client for fetching weather data from Open-Meteo API. | | [weather.client.WeatherService](weather.client.WeatherService.qmd#spotforecast2_safe.weather.client.WeatherService) | High-level service for weather data generation. | | [weather.features.get_weather_features](weather.features.get_weather_features.qmd#spotforecast2_safe.weather.features.get_weather_features) | Fetch weather data and compute rolling-window features. | +| [weather.derived.heating_degree_hours](weather.derived.heating_degree_hours.qmd#spotforecast2_safe.weather.derived.heating_degree_hours) | Heating degree-hours :math:`\max(base - T, 0)`. | +| [weather.derived.cooling_degree_hours](weather.derived.cooling_degree_hours.qmd#spotforecast2_safe.weather.derived.cooling_degree_hours) | Cooling degree-hours :math:`\max(T - base, 0)`. | +| [weather.derived.dew_point](weather.derived.dew_point.qmd#spotforecast2_safe.weather.derived.dew_point) | Dew-point temperature via the Magnus-Tetens approximation. | +| [weather.derived.apparent_temperature](weather.derived.apparent_temperature.qmd#spotforecast2_safe.weather.derived.apparent_temperature) | Steadman apparent ("feels-like") temperature. | +| [weather.derived.population_weighted_average](weather.derived.population_weighted_average.qmd#spotforecast2_safe.weather.derived.population_weighted_average) | Combine per-location weather frames into one demand-weighted index. | +| [weather.derived.add_derived_weather_features](weather.derived.add_derived_weather_features.qmd#spotforecast2_safe.weather.derived.add_derived_weather_features) | Append the requested derived columns to a raw weather frame (fail-safe). | +| [weather.locations.WeatherLocation](weather.locations.WeatherLocation.qmd#spotforecast2_safe.weather.locations.WeatherLocation) | A single weather sampling location with a population weight. | +| [weather.locations.default_german_locations](weather.locations.default_german_locations.qmd#spotforecast2_safe.weather.locations.default_german_locations) | Return the default population-weighted German load-centre registry. | +| [weather.locations.coordinates](weather.locations.coordinates.qmd#spotforecast2_safe.weather.locations.coordinates) | Extract ``(latitude, longitude)`` pairs in order. | +| [weather.locations.weights](weather.locations.weights.qmd#spotforecast2_safe.weather.locations.weights) | Extract the raw (un-normalised) weights in order. | ## Downloader diff --git a/docs/reference/weather.derived.add_derived_weather_features.qmd b/docs/reference/weather.derived.add_derived_weather_features.qmd new file mode 100644 index 00000000..9f9ca893 --- /dev/null +++ b/docs/reference/weather.derived.add_derived_weather_features.qmd @@ -0,0 +1,72 @@ +# weather.derived.add_derived_weather_features { #spotforecast2_safe.weather.derived.add_derived_weather_features } + +```python +weather.derived.add_derived_weather_features( + weather, + features, + *, + hdh_base=DEFAULT_HDH_BASE_C, + cdh_base=DEFAULT_CDH_BASE_C, + temp_col=TEMP_COL, + humidity_col=HUMIDITY_COL, + wind_col=WIND_COL, + wind_speed_unit='kmh', +) +``` + +Append the requested derived columns to a raw weather frame (fail-safe). + +A thin, deterministic orchestrator over :func:`heating_degree_hours`, +:func:`cooling_degree_hours`, :func:`dew_point`, and +:func:`apparent_temperature`. The original columns are preserved; the +requested derived columns are appended in the fixed order of +:data:`DERIVED_FEATURE_KEYS` (stable column ordering). The default +*wind_speed_unit* is ``"kmh"`` to match the Open-Meteo payload consumed by +:func:`spotforecast2_safe.weather.get_weather_features`. + +## Parameters {.doc-section .doc-section-parameters} + +| Name | Type | Description | Default | +|-----------------|------------------------------------------------|-----------------------------------------------------------------------------------------------------|----------------------| +| weather | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Raw weather frame containing the source columns the requested features need. | _required_ | +| features | [Sequence](`typing.Sequence`)\[[str](`str`)\] | Any subset of :data:`DERIVED_FEATURE_KEYS`. An empty sequence returns *weather* unchanged (a copy). | _required_ | +| hdh_base | [float](`float`) | Heating base temperature in °C. | `DEFAULT_HDH_BASE_C` | +| cdh_base | [float](`float`) | Cooling base temperature in °C. | `DEFAULT_CDH_BASE_C` | +| temp_col | [str](`str`) | Name of the temperature column. | `TEMP_COL` | +| humidity_col | [str](`str`) | Name of the relative-humidity column. | `HUMIDITY_COL` | +| wind_col | [str](`str`) | Name of the wind-speed column. | `WIND_COL` | +| wind_speed_unit | [str](`str`) | Unit of *wind_col*, ``"ms"`` or ``"kmh"`` (default). | `'kmh'` | + +## Returns {.doc-section .doc-section-returns} + +| Name | Type | Description | +|--------|------------------------------------------------|----------------------------------------------------------------------| +| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: A copy of *weather* with the requested derived columns | +| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | appended. | + +## Raises {.doc-section .doc-section-raises} + +| Name | Type | Description | +|--------|----------------------------|------------------------------------------------------------------------------------------| +| | [ValueError](`ValueError`) | If a requested feature is unknown, or a source column it needs is absent from *weather*. | + +## Examples {.doc-section .doc-section-examples} + +```{python} +import pandas as pd +from spotforecast2_safe.weather.derived import add_derived_weather_features + +idx = pd.date_range("2023-07-01", periods=2, freq="h", tz="UTC") +weather = pd.DataFrame( + { + "temperature_2m": [28.0, 30.0], + "relative_humidity_2m": [60.0, 55.0], + "wind_speed_10m": [7.2, 10.8], # km/h + }, + index=idx, +) +out = add_derived_weather_features( + weather, ["hdh", "cdh", "apparent_temperature", "dew_point"] +) +print(sorted(set(out.columns) - set(weather.columns))) +``` \ No newline at end of file diff --git a/docs/reference/weather.derived.apparent_temperature.qmd b/docs/reference/weather.derived.apparent_temperature.qmd new file mode 100644 index 00000000..8442b1fd --- /dev/null +++ b/docs/reference/weather.derived.apparent_temperature.qmd @@ -0,0 +1,53 @@ +# weather.derived.apparent_temperature { #spotforecast2_safe.weather.derived.apparent_temperature } + +```python +weather.derived.apparent_temperature( + temperature, + relative_humidity, + wind_speed, + *, + wind_speed_unit='ms', +) +``` + +Steadman apparent ("feels-like") temperature. + +:math:`AT = T + 0.33 e - 0.70 w - 4.00`, where the water-vapour pressure +:math:`e = (rh/100)\,6.105\,\exp(17.27 T/(237.7+T))` in hPa and ``w`` is +the 10 m wind speed in m/s. This is the Australian Bureau of Meteorology +formulation and captures the humidity load-driver that dry-bulb temperature +misses (Maia-Silva et al. 2020, ``maia20a``). + +## Parameters {.doc-section .doc-section-parameters} + +| Name | Type | Description | Default | +|-------------------|------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|------------| +| temperature | [pd](`pandas`).[Series](`pandas.Series`) | Hourly air temperature in °C. | _required_ | +| relative_humidity | [pd](`pandas`).[Series](`pandas.Series`) | Relative humidity in percent, ``0 ≤ rh ≤ 100``. | _required_ | +| wind_speed | [pd](`pandas`).[Series](`pandas.Series`) | 10 m wind speed; unit set by *wind_speed_unit*. | _required_ | +| wind_speed_unit | [str](`str`) | ``"ms"`` (metres per second, the formula's native unit, default) or ``"kmh"`` (kilometres per hour — the Open-Meteo default, converted internally). | `'ms'` | + +## Returns {.doc-section .doc-section-returns} + +| Name | Type | Description | +|--------|------------------------------------------|-----------------------------------------------------------------------| +| | [pd](`pandas`).[Series](`pandas.Series`) | pd.Series: ``apparent_temperature`` in °C, ``float64``, same index as | +| | [pd](`pandas`).[Series](`pandas.Series`) | the inputs. | + +## Raises {.doc-section .doc-section-raises} + +| Name | Type | Description | +|--------|----------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------| +| | [ValueError](`ValueError`) | If any input is not a NaN-free numeric Series, the indices differ, humidity is outside ``[0, 100]``, or *wind_speed_unit* is not ``"ms"`` or ``"kmh"``. | + +## Examples {.doc-section .doc-section-examples} + +```{python} +import pandas as pd +from spotforecast2_safe.weather.derived import apparent_temperature + +t = pd.Series([30.0]) +rh = pd.Series([70.0]) +w = pd.Series([2.0]) # m/s +print(round(apparent_temperature(t, rh, w).iloc[0], 2)) +``` \ No newline at end of file diff --git a/docs/reference/weather.derived.cooling_degree_hours.qmd b/docs/reference/weather.derived.cooling_degree_hours.qmd new file mode 100644 index 00000000..ab22e2b7 --- /dev/null +++ b/docs/reference/weather.derived.cooling_degree_hours.qmd @@ -0,0 +1,37 @@ +# weather.derived.cooling_degree_hours { #spotforecast2_safe.weather.derived.cooling_degree_hours } + +```python +weather.derived.cooling_degree_hours(temperature, base=DEFAULT_CDH_BASE_C) +``` + +Cooling degree-hours :math:`\max(T - base, 0)`. + +## Parameters {.doc-section .doc-section-parameters} + +| Name | Type | Description | Default | +|-------------|------------------------------------------|-------------------------------------------------------|----------------------| +| temperature | [pd](`pandas`).[Series](`pandas.Series`) | Hourly air temperature in °C. | _required_ | +| base | [float](`float`) | Cooling base temperature in °C. Defaults to ``22.0``. | `DEFAULT_CDH_BASE_C` | + +## Returns {.doc-section .doc-section-returns} + +| Name | Type | Description | +|--------|------------------------------------------|--------------------------------------------------------------| +| | [pd](`pandas`).[Series](`pandas.Series`) | pd.Series: ``cdh`` per timestamp, ``float64``, same index as | +| | [pd](`pandas`).[Series](`pandas.Series`) | *temperature*. Zero whenever it is cooler than *base*. | + +## Raises {.doc-section .doc-section-raises} + +| Name | Type | Description | +|--------|----------------------------|----------------------------------------------------| +| | [ValueError](`ValueError`) | If *temperature* is not a NaN-free numeric Series. | + +## Examples {.doc-section .doc-section-examples} + +```{python} +import pandas as pd +from spotforecast2_safe.weather.derived import cooling_degree_hours + +t = pd.Series([5.0, 22.0, 30.0]) +print(cooling_degree_hours(t, base=22.0).tolist()) +``` \ No newline at end of file diff --git a/docs/reference/weather.derived.dew_point.qmd b/docs/reference/weather.derived.dew_point.qmd new file mode 100644 index 00000000..2ecd8cb2 --- /dev/null +++ b/docs/reference/weather.derived.dew_point.qmd @@ -0,0 +1,43 @@ +# weather.derived.dew_point { #spotforecast2_safe.weather.derived.dew_point } + +```python +weather.derived.dew_point(temperature, relative_humidity) +``` + +Dew-point temperature via the Magnus-Tetens approximation. + +Uses :math:`\gamma = \ln(rh/100) + aT/(b+T)` and +:math:`T_d = b\gamma/(a-\gamma)` with ``a = 17.625``, ``b = 243.04`` °C, +valid for ``-40 °C ≤ T ≤ 60 °C``. The humidity ratio is floored at a tiny +epsilon purely to keep the logarithm finite at ``rh = 0`` (a numerical +domain guard, not measurement imputation). + +## Parameters {.doc-section .doc-section-parameters} + +| Name | Type | Description | Default | +|-------------------|------------------------------------------|-------------------------------------------------|------------| +| temperature | [pd](`pandas`).[Series](`pandas.Series`) | Hourly air temperature in °C. | _required_ | +| relative_humidity | [pd](`pandas`).[Series](`pandas.Series`) | Relative humidity in percent, ``0 ≤ rh ≤ 100``. | _required_ | + +## Returns {.doc-section .doc-section-returns} + +| Name | Type | Description | +|--------|------------------------------------------|------------------------------------------------------------------------| +| | [pd](`pandas`).[Series](`pandas.Series`) | pd.Series: ``dew_point`` in °C, ``float64``, same index as the inputs. | + +## Raises {.doc-section .doc-section-raises} + +| Name | Type | Description | +|--------|----------------------------|--------------------------------------------------------------------------------------------------------------------------| +| | [ValueError](`ValueError`) | If either input is not a NaN-free numeric Series, their indices differ, or any humidity value lies outside ``[0, 100]``. | + +## Examples {.doc-section .doc-section-examples} + +```{python} +import pandas as pd +from spotforecast2_safe.weather.derived import dew_point + +t = pd.Series([20.0, 20.0]) +rh = pd.Series([100.0, 50.0]) +print([round(v, 2) for v in dew_point(t, rh).tolist()]) +``` \ No newline at end of file diff --git a/docs/reference/weather.derived.heating_degree_hours.qmd b/docs/reference/weather.derived.heating_degree_hours.qmd new file mode 100644 index 00000000..13e61d57 --- /dev/null +++ b/docs/reference/weather.derived.heating_degree_hours.qmd @@ -0,0 +1,37 @@ +# weather.derived.heating_degree_hours { #spotforecast2_safe.weather.derived.heating_degree_hours } + +```python +weather.derived.heating_degree_hours(temperature, base=DEFAULT_HDH_BASE_C) +``` + +Heating degree-hours :math:`\max(base - T, 0)`. + +## Parameters {.doc-section .doc-section-parameters} + +| Name | Type | Description | Default | +|-------------|------------------------------------------|-------------------------------------------------------|----------------------| +| temperature | [pd](`pandas`).[Series](`pandas.Series`) | Hourly air temperature in °C. | _required_ | +| base | [float](`float`) | Heating base temperature in °C. Defaults to ``15.0``. | `DEFAULT_HDH_BASE_C` | + +## Returns {.doc-section .doc-section-returns} + +| Name | Type | Description | +|--------|------------------------------------------|--------------------------------------------------------------| +| | [pd](`pandas`).[Series](`pandas.Series`) | pd.Series: ``hdh`` per timestamp, ``float64``, same index as | +| | [pd](`pandas`).[Series](`pandas.Series`) | *temperature*. Zero whenever it is warmer than *base*. | + +## Raises {.doc-section .doc-section-raises} + +| Name | Type | Description | +|--------|----------------------------|----------------------------------------------------| +| | [ValueError](`ValueError`) | If *temperature* is not a NaN-free numeric Series. | + +## Examples {.doc-section .doc-section-examples} + +```{python} +import pandas as pd +from spotforecast2_safe.weather.derived import heating_degree_hours + +t = pd.Series([5.0, 15.0, 25.0]) +print(heating_degree_hours(t, base=15.0).tolist()) +``` \ No newline at end of file diff --git a/docs/reference/weather.derived.population_weighted_average.qmd b/docs/reference/weather.derived.population_weighted_average.qmd new file mode 100644 index 00000000..324500ae --- /dev/null +++ b/docs/reference/weather.derived.population_weighted_average.qmd @@ -0,0 +1,46 @@ +# weather.derived.population_weighted_average { #spotforecast2_safe.weather.derived.population_weighted_average } + +```python +weather.derived.population_weighted_average(frames, weights) +``` + +Combine per-location weather frames into one demand-weighted index. + +Forms :math:`\sum_i \tilde{w}_i X_i` where :math:`\tilde{w}` are the +*weights* normalised to sum to one. All frames must share the same index and +the same columns (the identical Open-Meteo schema fetched per location), so +the result is a single frame with that schema representing a national, +population-weighted weather signal (Zimmermann & Ziel 2025, ``zimm25a``). + +## Parameters {.doc-section .doc-section-parameters} + +| Name | Type | Description | Default | +|---------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|------------| +| frames | [Sequence](`typing.Sequence`)\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\] | One weather frame per location, all with the same ``DatetimeIndex`` and the same columns. | _required_ | +| weights | [Sequence](`typing.Sequence`)\[[float](`float`)\] | One non-negative weight per frame (e.g. city population). They are normalised internally; their absolute scale is irrelevant. | _required_ | + +## Returns {.doc-section .doc-section-returns} + +| Name | Type | Description | +|--------|------------------------------------------------|-------------------------------------------------------------------------| +| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | pd.DataFrame: The weighted-average frame, same index and columns as the | +| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | inputs (column order taken from ``frames[0]``). | + +## Raises {.doc-section .doc-section-raises} + +| Name | Type | Description | +|--------|----------------------------|--------------------------------------------------------------------------------------------------------------------------| +| | [ValueError](`ValueError`) | If *frames* is empty, lengths mismatch, weights are negative or sum to zero, or the frames disagree on index or columns. | + +## Examples {.doc-section .doc-section-examples} + +```{python} +import pandas as pd +from spotforecast2_safe.weather.derived import population_weighted_average + +idx = pd.date_range("2023-06-01", periods=3, freq="h", tz="UTC") +a = pd.DataFrame({"temperature_2m": [10.0, 10.0, 10.0]}, index=idx) +b = pd.DataFrame({"temperature_2m": [20.0, 20.0, 20.0]}, index=idx) +out = population_weighted_average([a, b], [3.0, 1.0]) +print(out["temperature_2m"].tolist()) # (3*10 + 1*20)/4 == 12.5 +``` \ No newline at end of file diff --git a/docs/reference/weather.features.get_weather_features.qmd b/docs/reference/weather.features.get_weather_features.qmd index 50f4c655..2b18e46b 100644 --- a/docs/reference/weather.features.get_weather_features.qmd +++ b/docs/reference/weather.features.get_weather_features.qmd @@ -15,6 +15,12 @@ weather.features.get_weather_features( fallback_on_failure=True, cache_home=None, verbose=False, + locations=None, + location_weights=None, + derived_features=None, + hdh_base=DEFAULT_HDH_BASE_C, + cdh_base=DEFAULT_CDH_BASE_C, + wind_speed_unit='kmh', ) ``` @@ -28,21 +34,27 @@ windows. ## Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|---------------------|--------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| -| data | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Reference time series DataFrame used only for validation (shape / temporal coverage checks via `curate_weather()`). | _required_ | -| start | [Union](`typing.Union`)\[[str](`str`), [pd](`pandas`).[Timestamp](`pandas.Timestamp`)\] | Start of the feature window. String values are parsed with ``utc=True``. | _required_ | -| cov_end | [Union](`typing.Union`)\[[str](`str`), [pd](`pandas`).[Timestamp](`pandas.Timestamp`)\] | Inclusive end of the feature window (must cover the full forecast horizon beyond ``end``). String values are parsed with ``utc=True``. | _required_ | -| forecast_horizon | [int](`int`) | Number of forecast steps; passed to `curate_weather()` for validation. | _required_ | -| latitude | [float](`float`) | Latitude of the target location in decimal degrees. Defaults to ``51.5136`` (Dortmund, Germany). | `51.5136` | -| longitude | [float](`float`) | Longitude of the target location in decimal degrees. Defaults to ``7.4653`` (Dortmund, Germany). | `7.4653` | -| timezone | [str](`str`) | Timezone label applied to the generated index. Defaults to ``"UTC"``. | `'UTC'` | -| freq | [str](`str`) | Pandas-compatible frequency string for the output index. Defaults to ``"h"`` (hourly). | `'h'` | -| window_periods | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[str](`str`)\]\] | Rolling window sizes passed to `WindowFeatures`. Defaults to ``["1D", "7D"]``. | `None` | -| window_functions | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[str](`str`)\]\] | Aggregation functions applied over each window. Defaults to ``["mean", "max", "min"]``. | `None` | -| fallback_on_failure | [bool](`bool`) | If ``True``, use locally cached fallback data when the weather API is unavailable. Defaults to ``True``. | `True` | -| cache_home | [Optional](`typing.Optional`)\[[Union](`typing.Union`)\[[str](`str`), [Path](`pathlib.Path`)\]\] | Optional path to cache directory. When provided, fetched weather data is cached in ``/weather_cache.parquet``. When None (default), no caching is performed. | `None` | -| verbose | [bool](`bool`) | If ``True``, print progress messages to stdout. Defaults to ``False``. | `False` | +| Name | Type | Description | Default | +|---------------------|---------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------| +| data | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Reference time series DataFrame used only for validation (shape / temporal coverage checks via `curate_weather()`). | _required_ | +| start | [Union](`typing.Union`)\[[str](`str`), [pd](`pandas`).[Timestamp](`pandas.Timestamp`)\] | Start of the feature window. String values are parsed with ``utc=True``. | _required_ | +| cov_end | [Union](`typing.Union`)\[[str](`str`), [pd](`pandas`).[Timestamp](`pandas.Timestamp`)\] | Inclusive end of the feature window (must cover the full forecast horizon beyond ``end``). String values are parsed with ``utc=True``. | _required_ | +| forecast_horizon | [int](`int`) | Number of forecast steps; passed to `curate_weather()` for validation. | _required_ | +| latitude | [float](`float`) | Latitude of the target location in decimal degrees. Defaults to ``51.5136`` (Dortmund, Germany). | `51.5136` | +| longitude | [float](`float`) | Longitude of the target location in decimal degrees. Defaults to ``7.4653`` (Dortmund, Germany). | `7.4653` | +| timezone | [str](`str`) | Timezone label applied to the generated index. Defaults to ``"UTC"``. | `'UTC'` | +| freq | [str](`str`) | Pandas-compatible frequency string for the output index. Defaults to ``"h"`` (hourly). | `'h'` | +| window_periods | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[str](`str`)\]\] | Rolling window sizes passed to `WindowFeatures`. Defaults to ``["1D", "7D"]``. | `None` | +| window_functions | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[str](`str`)\]\] | Aggregation functions applied over each window. Defaults to ``["mean", "max", "min"]``. | `None` | +| fallback_on_failure | [bool](`bool`) | If ``True``, use locally cached fallback data when the weather API is unavailable. Defaults to ``True``. | `True` | +| cache_home | [Optional](`typing.Optional`)\[[Union](`typing.Union`)\[[str](`str`), [Path](`pathlib.Path`)\]\] | Optional path to cache directory. When provided, fetched weather data is cached in ``/weather_cache.parquet``. When None (default), no caching is performed. | `None` | +| verbose | [bool](`bool`) | If ``True``, print progress messages to stdout. Defaults to ``False``. | `False` | +| locations | [Optional](`typing.Optional`)\[[Sequence](`typing.Sequence`)\[[Tuple](`typing.Tuple`)\[[float](`float`), [float](`float`)\]\]\] | Optional sequence of ``(latitude, longitude)`` pairs for a **population-weighted multi-city** weather index. When ``None`` (default) the single ``latitude``/``longitude`` point is used, preserving prior behaviour exactly. When given, each location is fetched and the raw frames are combined via `population_weighted_average` using *location_weights*. See `spotforecast2_safe.weather.locations`. | `None` | +| location_weights | [Optional](`typing.Optional`)\[[Sequence](`typing.Sequence`)\[[float](`float`)\]\] | Non-negative weight per entry in *locations* (e.g. city population). Required when *locations* is given; normalised internally. | `None` | +| derived_features | [Optional](`typing.Optional`)\[[Sequence](`typing.Sequence`)\[[str](`str`)\]\] | Optional subset of ``{"hdh", "cdh", "apparent_temperature", "dew_point"}``. When given, those columns are derived from the (weighted) weather and rolled up alongside the raw fields. ``None`` (default) adds nothing. See `add_derived_weather_features`. | `None` | +| hdh_base | [float](`float`) | Heating base temperature (°C) for ``hdh``. Defaults to ``15.0``. | `DEFAULT_HDH_BASE_C` | +| cdh_base | [float](`float`) | Cooling base temperature (°C) for ``cdh``. Defaults to ``22.0``. | `DEFAULT_CDH_BASE_C` | +| wind_speed_unit | [str](`str`) | Unit of the fetched ``wind_speed_10m`` column for apparent-temperature, ``"ms"`` or ``"kmh"``. Defaults to ``"kmh"`` (the Open-Meteo default). | `'kmh'` | ## Returns {.doc-section .doc-section-returns} diff --git a/docs/reference/weather.locations.WeatherLocation.qmd b/docs/reference/weather.locations.WeatherLocation.qmd new file mode 100644 index 00000000..e8631af3 --- /dev/null +++ b/docs/reference/weather.locations.WeatherLocation.qmd @@ -0,0 +1,25 @@ +# weather.locations.WeatherLocation { #spotforecast2_safe.weather.locations.WeatherLocation } + +```python +weather.locations.WeatherLocation(name, latitude, longitude, weight) +``` + +A single weather sampling location with a population weight. + +## Attributes {.doc-section .doc-section-attributes} + +| Name | Type | Description | +|-----------|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------| +| name | [str](`str`) | Human-readable city name (used for logs/inspection only). | +| latitude | [float](`float`) | Latitude in decimal degrees. | +| longitude | [float](`float`) | Longitude in decimal degrees. | +| weight | [float](`float`) | Non-negative relative weight (here, approximate population in thousands). Absolute scale is irrelevant — weights are normalised by the consumer. | + +## Examples {.doc-section .doc-section-examples} + +```{python} +from spotforecast2_safe.weather.locations import WeatherLocation + +loc = WeatherLocation("Köln", 50.9375, 6.9603, 1073.0) +print(loc.name, loc.latitude, loc.weight) +``` \ No newline at end of file diff --git a/docs/reference/weather.locations.coordinates.qmd b/docs/reference/weather.locations.coordinates.qmd new file mode 100644 index 00000000..d3c5ce0f --- /dev/null +++ b/docs/reference/weather.locations.coordinates.qmd @@ -0,0 +1,31 @@ +# weather.locations.coordinates { #spotforecast2_safe.weather.locations.coordinates } + +```python +weather.locations.coordinates(locations) +``` + +Extract ``(latitude, longitude)`` pairs in order. + +## Parameters {.doc-section .doc-section-parameters} + +| Name | Type | Description | Default | +|-----------|------------------------------------------------------------------------------------------------------------|------------------------|------------| +| locations | [Sequence](`typing.Sequence`)\[[WeatherLocation](`spotforecast2_safe.weather.locations.WeatherLocation`)\] | The locations to read. | _required_ | + +## Returns {.doc-section .doc-section-returns} + +| Name | Type | Description | +|--------|----------------------------------------------------------------------------------------|-------------------------------------------------------------------------| +| | [List](`typing.List`)\[[Tuple](`typing.Tuple`)\[[float](`float`), [float](`float`)\]\] | List[Tuple[float, float]]: One ``(lat, lon)`` pair per location, in the | +| | [List](`typing.List`)\[[Tuple](`typing.Tuple`)\[[float](`float`), [float](`float`)\]\] | same order. | + +## Examples {.doc-section .doc-section-examples} + +```{python} +from spotforecast2_safe.weather.locations import ( + coordinates, + default_german_locations, +) + +print(coordinates(default_german_locations())[:2]) +``` \ No newline at end of file diff --git a/docs/reference/weather.locations.default_german_locations.qmd b/docs/reference/weather.locations.default_german_locations.qmd new file mode 100644 index 00000000..c2d7a2a6 --- /dev/null +++ b/docs/reference/weather.locations.default_german_locations.qmd @@ -0,0 +1,23 @@ +# weather.locations.default_german_locations { #spotforecast2_safe.weather.locations.default_german_locations } + +```python +weather.locations.default_german_locations() +``` + +Return the default population-weighted German load-centre registry. + +## Returns {.doc-section .doc-section-returns} + +| Name | Type | Description | +|--------|---------------------------------------------------------------------------|-----------------------------------------------------------------------| +| | [WeatherLocation](`spotforecast2_safe.weather.locations.WeatherLocation`) | Tuple[WeatherLocation, ...]: :data:`GERMAN_LOAD_CENTERS`, in a fixed, | +| | ... | deterministic order. | + +## Examples {.doc-section .doc-section-examples} + +```{python} +from spotforecast2_safe.weather.locations import default_german_locations + +locs = default_german_locations() +print(len(locs), locs[0].name) +``` \ No newline at end of file diff --git a/docs/reference/weather.locations.weights.qmd b/docs/reference/weather.locations.weights.qmd new file mode 100644 index 00000000..8f405db0 --- /dev/null +++ b/docs/reference/weather.locations.weights.qmd @@ -0,0 +1,33 @@ +# weather.locations.weights { #spotforecast2_safe.weather.locations.weights } + +```python +weather.locations.weights(locations) +``` + +Extract the raw (un-normalised) weights in order. + +## Parameters {.doc-section .doc-section-parameters} + +| Name | Type | Description | Default | +|-----------|------------------------------------------------------------------------------------------------------------|------------------------|------------| +| locations | [Sequence](`typing.Sequence`)\[[WeatherLocation](`spotforecast2_safe.weather.locations.WeatherLocation`)\] | The locations to read. | _required_ | + +## Returns {.doc-section .doc-section-returns} + +| Name | Type | Description | +|--------|-------------------------------------------|-------------------------------------------------------------------------| +| | [List](`typing.List`)\[[float](`float`)\] | List[float]: One weight per location, in the same order. Consumers | +| | [List](`typing.List`)\[[float](`float`)\] | normalise these (e.g. | +| | [List](`typing.List`)\[[float](`float`)\] | func:`spotforecast2_safe.weather.derived.population_weighted_average`). | + +## Examples {.doc-section .doc-section-examples} + +```{python} +from spotforecast2_safe.weather.locations import ( + default_german_locations, + weights, +) + +w = weights(default_german_locations()) +print(len(w), w[0]) +``` \ No newline at end of file diff --git a/src/spotforecast2_safe/configurator/config_multi.py b/src/spotforecast2_safe/configurator/config_multi.py index e360e2c0..cd704709 100644 --- a/src/spotforecast2_safe/configurator/config_multi.py +++ b/src/spotforecast2_safe/configurator/config_multi.py @@ -311,6 +311,20 @@ class ConfigMulti: include_weather_windows: bool = False include_holiday_features: bool = False include_holiday_adjacency_features: bool = False + # Global / population-weighted weather and derived weather features + # (consumed by spotforecast2.multitask.base.build_exogenous_features via + # spotforecast2_safe.weather.get_weather_features). All default off so the + # pipeline stays byte-identical to the single-point baseline. + # ``use_population_weighted_weather`` fetches the fixed German load-centre + # registry (spotforecast2_safe.weather.locations) and combines the cities + # by population weight instead of sampling the single latitude/longitude. + # ``include_degree_hours`` adds heating/cooling degree-hours (hdh/cdh) and + # ``include_apparent_temperature`` adds apparent temperature + dew point. + use_population_weighted_weather: bool = False + include_degree_hours: bool = False + include_apparent_temperature: bool = False + degree_hours_base_heating: float = 15.0 + degree_hours_base_cooling: float = 22.0 poly_features_degree: int = 1 max_poly_features: int = 10 poly_mi_n_jobs: Optional[int] = -1 diff --git a/src/spotforecast2_safe/multitask/base.py b/src/spotforecast2_safe/multitask/base.py index 8640be22..b69c42c9 100644 --- a/src/spotforecast2_safe/multitask/base.py +++ b/src/spotforecast2_safe/multitask/base.py @@ -67,6 +67,11 @@ from spotforecast2_safe.processing.agg_predict import agg_predict from spotforecast2_safe.splitter.split_ts_cv import TimeSeriesFold from spotforecast2_safe.weather import WeatherFetchError, get_weather_features +from spotforecast2_safe.weather.locations import coordinates as _weather_coordinates +from spotforecast2_safe.weather.locations import ( + default_german_locations as _default_german_locations, +) +from spotforecast2_safe.weather.locations import weights as _weather_weights logger = logging.getLogger(__name__) @@ -1034,6 +1039,20 @@ def build_exogenous_features(self) -> "BaseTask": self.logger.info("Building exogenous features...") # 4a. Weather (with opt-in fail-safe handling for Open-Meteo failures) + # Optional global, population-weighted multi-city sampling and derived + # weather features (degree-hours / apparent temperature / dew point). + # getattr defaults keep configs that predate these fields working. + weather_locations = None + weather_location_weights = None + if getattr(self.config, "use_population_weighted_weather", False): + centers = _default_german_locations() + weather_locations = _weather_coordinates(centers) + weather_location_weights = _weather_weights(centers) + weather_derived: list[str] = [] + if getattr(self.config, "include_degree_hours", False): + weather_derived.extend(["hdh", "cdh"]) + if getattr(self.config, "include_apparent_temperature", False): + weather_derived.extend(["apparent_temperature", "dew_point"]) try: weather_features, self.weather_aligned = get_weather_features( data=self.df_pipeline, @@ -1046,6 +1065,11 @@ def build_exogenous_features(self) -> "BaseTask": freq="h", cache_home=self.config.cache_home, verbose=self.config.verbose, + locations=weather_locations, + location_weights=weather_location_weights, + derived_features=weather_derived or None, + hdh_base=getattr(self.config, "degree_hours_base_heating", 15.0), + cdh_base=getattr(self.config, "degree_hours_base_cooling", 22.0), ) except WeatherFetchError as exc: if self.config.on_weather_failure == "raise": diff --git a/src/spotforecast2_safe/weather/__init__.py b/src/spotforecast2_safe/weather/__init__.py index 7c24ff28..bac44f2a 100644 --- a/src/spotforecast2_safe/weather/__init__.py +++ b/src/spotforecast2_safe/weather/__init__.py @@ -59,11 +59,40 @@ """ from .client import WeatherClient, WeatherFetchError, WeatherService +from .derived import ( + add_derived_weather_features, + apparent_temperature, + cooling_degree_hours, + dew_point, + heating_degree_hours, + population_weighted_average, +) from .features import get_weather_features +from .locations import ( + GERMAN_LOAD_CENTERS, + WeatherLocation, + coordinates, + default_german_locations, + weights, +) __all__ = [ "WeatherClient", "WeatherFetchError", "WeatherService", "get_weather_features", + # Derived features (degree-hours, apparent temperature, dew point) and + # population-weighted spatial aggregation. + "add_derived_weather_features", + "apparent_temperature", + "cooling_degree_hours", + "dew_point", + "heating_degree_hours", + "population_weighted_average", + # German population-weighted load-centre registry. + "GERMAN_LOAD_CENTERS", + "WeatherLocation", + "coordinates", + "default_german_locations", + "weights", ] diff --git a/src/spotforecast2_safe/weather/derived.py b/src/spotforecast2_safe/weather/derived.py new file mode 100644 index 00000000..cdbeeb54 --- /dev/null +++ b/src/spotforecast2_safe/weather/derived.py @@ -0,0 +1,390 @@ +# SPDX-FileCopyrightText: 2026 bartzbeielstein +# SPDX-License-Identifier: AGPL-3.0-or-later + +"""Derived weather features and population-weighted spatial aggregation. + +This module turns the raw Open-Meteo fields (``temperature_2m``, +``relative_humidity_2m``, ``wind_speed_10m``, …) into the cheapest, most +reliable accuracy levers in load forecasting — the "feature-matrix" gains that +live in :math:`x_t`, not in the model :math:`f(\\cdot)`: + +- **Heating / cooling degree-hours** (``hdh`` / ``cdh``) split the U-shaped + temperature–load response into its two arms so a tree model need not + rediscover the kink (Rubattu et al. 2023, ``ruba23a``). +- **Apparent temperature** and **dew point** fold in the humidity effect that + dry-bulb temperature misses; ignoring it under-counts summer cooling load by + ten to fifteen percent (Maia-Silva et al. 2020, ``maia20a``). +- **Population-weighted spatial aggregation** combines several German load + centres into one demand-weighted weather index that tracks national load far + better than a single station (Zimmermann & Ziel 2025, ``zimm25a``); see + :mod:`spotforecast2_safe.weather.locations` for the default centres. + +Every function here is **pure, deterministic, and fail-safe** in line with the +safety-critical contract: identical inputs produce identical outputs (no RNG, no +wall-clock, stable column order), and an input that cannot be transformed (a +missing source column, an out-of-range humidity, mismatched indices) raises an +explicit ``ValueError`` rather than being silently repaired or imputed. +""" + +from __future__ import annotations + +from typing import Sequence + +import numpy as np +import pandas as pd + +# Default base temperatures (°C). 15 °C is the long-standing European heating +# threshold; 22 °C is a common cooling threshold. Both are overridable. +DEFAULT_HDH_BASE_C = 15.0 +DEFAULT_CDH_BASE_C = 22.0 + +# Open-Meteo default field names this module consumes. +TEMP_COL = "temperature_2m" +HUMIDITY_COL = "relative_humidity_2m" +WIND_COL = "wind_speed_10m" + +# Recognised derived-feature keys for :func:`add_derived_weather_features`. +DERIVED_FEATURE_KEYS = ("hdh", "cdh", "apparent_temperature", "dew_point") + + +def _require_series(values: pd.Series, name: str) -> pd.Series: + """Validate *values* is a numeric, NaN-free ``Series`` (fail-safe).""" + if not isinstance(values, pd.Series): + raise ValueError( + f"{name} must be a pandas Series; got {type(values).__name__}." + ) + if values.isna().any(): + raise ValueError(f"{name} contains NaN; refusing to fabricate values.") + return values.astype("float64") + + +def heating_degree_hours( + temperature: pd.Series, base: float = DEFAULT_HDH_BASE_C +) -> pd.Series: + """Heating degree-hours :math:`\\max(base - T, 0)`. + + Args: + temperature: Hourly air temperature in °C. + base: Heating base temperature in °C. Defaults to ``15.0``. + + Returns: + pd.Series: ``hdh`` per timestamp, ``float64``, same index as + *temperature*. Zero whenever it is warmer than *base*. + + Raises: + ValueError: If *temperature* is not a NaN-free numeric Series. + + Examples: + ```{python} + import pandas as pd + from spotforecast2_safe.weather.derived import heating_degree_hours + + t = pd.Series([5.0, 15.0, 25.0]) + print(heating_degree_hours(t, base=15.0).tolist()) + ``` + """ + t = _require_series(temperature, "temperature") + return (base - t).clip(lower=0.0).rename("hdh") + + +def cooling_degree_hours( + temperature: pd.Series, base: float = DEFAULT_CDH_BASE_C +) -> pd.Series: + """Cooling degree-hours :math:`\\max(T - base, 0)`. + + Args: + temperature: Hourly air temperature in °C. + base: Cooling base temperature in °C. Defaults to ``22.0``. + + Returns: + pd.Series: ``cdh`` per timestamp, ``float64``, same index as + *temperature*. Zero whenever it is cooler than *base*. + + Raises: + ValueError: If *temperature* is not a NaN-free numeric Series. + + Examples: + ```{python} + import pandas as pd + from spotforecast2_safe.weather.derived import cooling_degree_hours + + t = pd.Series([5.0, 22.0, 30.0]) + print(cooling_degree_hours(t, base=22.0).tolist()) + ``` + """ + t = _require_series(temperature, "temperature") + return (t - base).clip(lower=0.0).rename("cdh") + + +def dew_point(temperature: pd.Series, relative_humidity: pd.Series) -> pd.Series: + """Dew-point temperature via the Magnus-Tetens approximation. + + Uses :math:`\\gamma = \\ln(rh/100) + aT/(b+T)` and + :math:`T_d = b\\gamma/(a-\\gamma)` with ``a = 17.625``, ``b = 243.04`` °C, + valid for ``-40 °C ≤ T ≤ 60 °C``. The humidity ratio is floored at a tiny + epsilon purely to keep the logarithm finite at ``rh = 0`` (a numerical + domain guard, not measurement imputation). + + Args: + temperature: Hourly air temperature in °C. + relative_humidity: Relative humidity in percent, ``0 ≤ rh ≤ 100``. + + Returns: + pd.Series: ``dew_point`` in °C, ``float64``, same index as the inputs. + + Raises: + ValueError: If either input is not a NaN-free numeric Series, their + indices differ, or any humidity value lies outside ``[0, 100]``. + + Examples: + ```{python} + import pandas as pd + from spotforecast2_safe.weather.derived import dew_point + + t = pd.Series([20.0, 20.0]) + rh = pd.Series([100.0, 50.0]) + print([round(v, 2) for v in dew_point(t, rh).tolist()]) + ``` + """ + t = _require_series(temperature, "temperature") + rh = _require_series(relative_humidity, "relative_humidity") + if not t.index.equals(rh.index): + raise ValueError("temperature and relative_humidity must share an index.") + if (rh < 0).any() or (rh > 100).any(): + raise ValueError("relative_humidity must lie within [0, 100] percent.") + + a, b = 17.625, 243.04 + ratio = np.clip(rh.to_numpy() / 100.0, 1e-6, 1.0) + gamma = np.log(ratio) + (a * t.to_numpy()) / (b + t.to_numpy()) + td = (b * gamma) / (a - gamma) + return pd.Series(td, index=t.index, name="dew_point").astype("float64") + + +def apparent_temperature( + temperature: pd.Series, + relative_humidity: pd.Series, + wind_speed: pd.Series, + *, + wind_speed_unit: str = "ms", +) -> pd.Series: + """Steadman apparent ("feels-like") temperature. + + :math:`AT = T + 0.33 e - 0.70 w - 4.00`, where the water-vapour pressure + :math:`e = (rh/100)\\,6.105\\,\\exp(17.27 T/(237.7+T))` in hPa and ``w`` is + the 10 m wind speed in m/s. This is the Australian Bureau of Meteorology + formulation and captures the humidity load-driver that dry-bulb temperature + misses (Maia-Silva et al. 2020, ``maia20a``). + + Args: + temperature: Hourly air temperature in °C. + relative_humidity: Relative humidity in percent, ``0 ≤ rh ≤ 100``. + wind_speed: 10 m wind speed; unit set by *wind_speed_unit*. + wind_speed_unit: ``"ms"`` (metres per second, the formula's native unit, + default) or ``"kmh"`` (kilometres per hour — the Open-Meteo + default, converted internally). + + Returns: + pd.Series: ``apparent_temperature`` in °C, ``float64``, same index as + the inputs. + + Raises: + ValueError: If any input is not a NaN-free numeric Series, the indices + differ, humidity is outside ``[0, 100]``, or *wind_speed_unit* is + not ``"ms"`` or ``"kmh"``. + + Examples: + ```{python} + import pandas as pd + from spotforecast2_safe.weather.derived import apparent_temperature + + t = pd.Series([30.0]) + rh = pd.Series([70.0]) + w = pd.Series([2.0]) # m/s + print(round(apparent_temperature(t, rh, w).iloc[0], 2)) + ``` + """ + if wind_speed_unit not in ("ms", "kmh"): + raise ValueError( + f"wind_speed_unit must be 'ms' or 'kmh'; got {wind_speed_unit!r}." + ) + t = _require_series(temperature, "temperature") + rh = _require_series(relative_humidity, "relative_humidity") + w = _require_series(wind_speed, "wind_speed") + if not (t.index.equals(rh.index) and t.index.equals(w.index)): + raise ValueError( + "temperature, relative_humidity and wind_speed must share an index." + ) + if (rh < 0).any() or (rh > 100).any(): + raise ValueError("relative_humidity must lie within [0, 100] percent.") + + w_ms = w / 3.6 if wind_speed_unit == "kmh" else w + e = (rh / 100.0) * 6.105 * np.exp(17.27 * t / (237.7 + t)) + at = t + 0.33 * e - 0.70 * w_ms - 4.00 + return at.rename("apparent_temperature").astype("float64") + + +def population_weighted_average( + frames: Sequence[pd.DataFrame], weights: Sequence[float] +) -> pd.DataFrame: + """Combine per-location weather frames into one demand-weighted index. + + Forms :math:`\\sum_i \\tilde{w}_i X_i` where :math:`\\tilde{w}` are the + *weights* normalised to sum to one. All frames must share the same index and + the same columns (the identical Open-Meteo schema fetched per location), so + the result is a single frame with that schema representing a national, + population-weighted weather signal (Zimmermann & Ziel 2025, ``zimm25a``). + + Args: + frames: One weather frame per location, all with the same + ``DatetimeIndex`` and the same columns. + weights: One non-negative weight per frame (e.g. city population). They + are normalised internally; their absolute scale is irrelevant. + + Returns: + pd.DataFrame: The weighted-average frame, same index and columns as the + inputs (column order taken from ``frames[0]``). + + Raises: + ValueError: If *frames* is empty, lengths mismatch, weights are + negative or sum to zero, or the frames disagree on index or columns. + + Examples: + ```{python} + import pandas as pd + from spotforecast2_safe.weather.derived import population_weighted_average + + idx = pd.date_range("2023-06-01", periods=3, freq="h", tz="UTC") + a = pd.DataFrame({"temperature_2m": [10.0, 10.0, 10.0]}, index=idx) + b = pd.DataFrame({"temperature_2m": [20.0, 20.0, 20.0]}, index=idx) + out = population_weighted_average([a, b], [3.0, 1.0]) + print(out["temperature_2m"].tolist()) # (3*10 + 1*20)/4 == 12.5 + ``` + """ + frames = list(frames) + weights = list(weights) + if not frames: + raise ValueError("population_weighted_average requires at least one frame.") + if len(frames) != len(weights): + raise ValueError( + f"frames ({len(frames)}) and weights ({len(weights)}) length mismatch." + ) + w = np.asarray(weights, dtype="float64") + if (w < 0).any(): + raise ValueError("weights must be non-negative.") + total = float(w.sum()) + if total <= 0.0: + raise ValueError("weights must sum to a positive value.") + + base_index = frames[0].index + base_columns = list(frames[0].columns) + for i, f in enumerate(frames[1:], start=1): + if not f.index.equals(base_index): + raise ValueError(f"frame {i} has a different index from frame 0.") + if list(f.columns) != base_columns: + raise ValueError(f"frame {i} has different columns from frame 0.") + + norm = w / total + acc = frames[0][base_columns].astype("float64") * norm[0] + for i in range(1, len(frames)): + acc = acc + frames[i][base_columns].astype("float64") * norm[i] + return acc + + +def add_derived_weather_features( + weather: pd.DataFrame, + features: Sequence[str], + *, + hdh_base: float = DEFAULT_HDH_BASE_C, + cdh_base: float = DEFAULT_CDH_BASE_C, + temp_col: str = TEMP_COL, + humidity_col: str = HUMIDITY_COL, + wind_col: str = WIND_COL, + wind_speed_unit: str = "kmh", +) -> pd.DataFrame: + """Append the requested derived columns to a raw weather frame (fail-safe). + + A thin, deterministic orchestrator over :func:`heating_degree_hours`, + :func:`cooling_degree_hours`, :func:`dew_point`, and + :func:`apparent_temperature`. The original columns are preserved; the + requested derived columns are appended in the fixed order of + :data:`DERIVED_FEATURE_KEYS` (stable column ordering). The default + *wind_speed_unit* is ``"kmh"`` to match the Open-Meteo payload consumed by + :func:`spotforecast2_safe.weather.get_weather_features`. + + Args: + weather: Raw weather frame containing the source columns the requested + features need. + features: Any subset of :data:`DERIVED_FEATURE_KEYS`. An empty sequence + returns *weather* unchanged (a copy). + hdh_base: Heating base temperature in °C. + cdh_base: Cooling base temperature in °C. + temp_col: Name of the temperature column. + humidity_col: Name of the relative-humidity column. + wind_col: Name of the wind-speed column. + wind_speed_unit: Unit of *wind_col*, ``"ms"`` or ``"kmh"`` (default). + + Returns: + pd.DataFrame: A copy of *weather* with the requested derived columns + appended. + + Raises: + ValueError: If a requested feature is unknown, or a source column it + needs is absent from *weather*. + + Examples: + ```{python} + import pandas as pd + from spotforecast2_safe.weather.derived import add_derived_weather_features + + idx = pd.date_range("2023-07-01", periods=2, freq="h", tz="UTC") + weather = pd.DataFrame( + { + "temperature_2m": [28.0, 30.0], + "relative_humidity_2m": [60.0, 55.0], + "wind_speed_10m": [7.2, 10.8], # km/h + }, + index=idx, + ) + out = add_derived_weather_features( + weather, ["hdh", "cdh", "apparent_temperature", "dew_point"] + ) + print(sorted(set(out.columns) - set(weather.columns))) + ``` + """ + unknown = [f for f in features if f not in DERIVED_FEATURE_KEYS] + if unknown: + raise ValueError( + f"unknown derived feature(s) {unknown}; " + f"valid keys are {list(DERIVED_FEATURE_KEYS)}." + ) + + def _col(name: str, needed_for: str) -> pd.Series: + if name not in weather.columns: + raise ValueError( + f"derived feature '{needed_for}' needs column '{name}', " + f"absent from weather frame (has {list(weather.columns)})." + ) + return weather[name] + + out = weather.copy() + # Iterate in the canonical order so column order is deterministic regardless + # of the order the caller listed the features in. + for key in DERIVED_FEATURE_KEYS: + if key not in features: + continue + if key == "hdh": + out["hdh"] = heating_degree_hours(_col(temp_col, "hdh"), base=hdh_base) + elif key == "cdh": + out["cdh"] = cooling_degree_hours(_col(temp_col, "cdh"), base=cdh_base) + elif key == "dew_point": + out["dew_point"] = dew_point( + _col(temp_col, "dew_point"), _col(humidity_col, "dew_point") + ) + elif key == "apparent_temperature": + out["apparent_temperature"] = apparent_temperature( + _col(temp_col, "apparent_temperature"), + _col(humidity_col, "apparent_temperature"), + _col(wind_col, "apparent_temperature"), + wind_speed_unit=wind_speed_unit, + ) + return out diff --git a/src/spotforecast2_safe/weather/features.py b/src/spotforecast2_safe/weather/features.py index eb401e25..c7a3eaf1 100644 --- a/src/spotforecast2_safe/weather/features.py +++ b/src/spotforecast2_safe/weather/features.py @@ -10,7 +10,7 @@ """ from pathlib import Path -from typing import List, Optional, Tuple, Union +from typing import List, Optional, Sequence, Tuple, Union import numpy as np import pandas as pd @@ -19,6 +19,12 @@ from spotforecast2_safe.preprocessing.curate_data import curate_weather from spotforecast2_safe.utils.convert_to_utc import to_utc_timestamp from spotforecast2_safe.weather.client import WeatherFetchError +from spotforecast2_safe.weather.derived import ( + DEFAULT_CDH_BASE_C, + DEFAULT_HDH_BASE_C, + add_derived_weather_features, + population_weighted_average, +) # Longest run of consecutive missing ``freq`` steps that ``get_weather_features`` # will forward-fill during alignment. A longer gap (e.g. the recent window the @@ -56,6 +62,12 @@ def get_weather_features( fallback_on_failure: bool = True, cache_home: Optional[Union[str, Path]] = None, verbose: bool = False, + locations: Optional[Sequence[Tuple[float, float]]] = None, + location_weights: Optional[Sequence[float]] = None, + derived_features: Optional[Sequence[str]] = None, + hdh_base: float = DEFAULT_HDH_BASE_C, + cdh_base: float = DEFAULT_CDH_BASE_C, + wind_speed_unit: str = "kmh", ) -> Tuple[pd.DataFrame, pd.DataFrame]: """Fetch weather data and compute rolling-window features. @@ -99,6 +111,29 @@ def get_weather_features( no caching is performed. verbose: If ``True``, print progress messages to stdout. Defaults to ``False``. + locations: Optional sequence of ``(latitude, longitude)`` pairs for a + **population-weighted multi-city** weather index. When ``None`` + (default) the single ``latitude``/``longitude`` point is used, + preserving prior behaviour exactly. When given, each location is + fetched and the raw frames are combined via + `population_weighted_average` + using *location_weights*. See + `spotforecast2_safe.weather.locations`. + location_weights: Non-negative weight per entry in *locations* (e.g. + city population). Required when *locations* is given; normalised + internally. + derived_features: Optional subset of ``{"hdh", "cdh", + "apparent_temperature", "dew_point"}``. When given, those columns + are derived from the (weighted) weather and rolled up alongside the + raw fields. ``None`` (default) adds nothing. See + `add_derived_weather_features`. + hdh_base: Heating base temperature (°C) for ``hdh``. Defaults to + ``15.0``. + cdh_base: Cooling base temperature (°C) for ``cdh``. Defaults to + ``22.0``. + wind_speed_unit: Unit of the fetched ``wind_speed_10m`` column for + apparent-temperature, ``"ms"`` or ``"kmh"``. Defaults to ``"kmh"`` + (the Open-Meteo default). Returns: tuple[pd.DataFrame, pd.DataFrame]: A two-element tuple: @@ -165,24 +200,40 @@ def get_weather_features( if verbose: print("Fetching weather data...") - weather_df = fetch_weather_data( - cov_start=start, - cov_end=cov_end, - latitude=latitude, - longitude=longitude, - timezone=timezone, - freq=freq, - fallback_on_failure=fallback_on_failure, - cache_home=cache_home, - ) + def _fetch_one(lat: float, lon: float) -> pd.DataFrame: + """Fetch one location and trim to the exact ``[start, cov_end]`` window. - # Open-Meteo returns whole-day forecast blocks aligned to its own clock, - # so the raw payload can extend a few hours past ``cov_end``. Trim to the - # exact ``[start, cov_end]`` window before ``curate_weather`` so the - # row-count assertion compares against the window actually used by the - # downstream reindex; otherwise it prints a spurious "wrong shape" - # warning even though alignment will repair the count. - weather_df = weather_df.loc[start:cov_end] + Open-Meteo returns whole-day forecast blocks aligned to its own clock, + so the raw payload can extend a few hours past ``cov_end``. Trimming + here keeps the row-count assertion in ``curate_weather`` honest and + guarantees every location shares one index before any weighted combine. + """ + frame = fetch_weather_data( + cov_start=start, + cov_end=cov_end, + latitude=lat, + longitude=lon, + timezone=timezone, + freq=freq, + fallback_on_failure=fallback_on_failure, + cache_home=cache_home, + ) + return frame.loc[start:cov_end] + + if locations is not None: + if location_weights is None: + raise ValueError("location_weights is required when locations is supplied.") + if len(locations) != len(location_weights): + raise ValueError( + f"locations ({len(locations)}) and location_weights " + f"({len(location_weights)}) must have equal length." + ) + frames = [_fetch_one(lat, lon) for (lat, lon) in locations] + # population_weighted_average is fail-safe: it raises if the per-city + # frames disagree on index or columns rather than silently aligning. + weather_df = population_weighted_average(frames, list(location_weights)) + else: + weather_df = _fetch_one(latitude, longitude) curate_weather(weather_df, data, forecast_horizon=forecast_horizon) @@ -224,6 +275,19 @@ def get_weather_features( if weather_aligned_filled.isnull().any().any(): raise ValueError("Missing values in weather data could not be filled") + # Derive degree-hours / apparent-temperature / dew-point on the NaN-free + # frame (fail-safe: raises if a source column is missing) so they are rolled + # up alongside the raw fields by WindowFeatures below. + if derived_features: + weather_aligned_filled = add_derived_weather_features( + weather_aligned_filled, + list(derived_features), + hdh_base=hdh_base, + cdh_base=cdh_base, + wind_speed_unit=wind_speed_unit, + ) + weather_columns = weather_aligned_filled.columns.tolist() + wf_transformer = WindowFeatures( variables=weather_columns, window=window_periods, diff --git a/src/spotforecast2_safe/weather/locations.py b/src/spotforecast2_safe/weather/locations.py new file mode 100644 index 00000000..0155ed76 --- /dev/null +++ b/src/spotforecast2_safe/weather/locations.py @@ -0,0 +1,149 @@ +# SPDX-FileCopyrightText: 2026 bartzbeielstein +# SPDX-License-Identifier: AGPL-3.0-or-later + +"""Population-weighted weather-location registry for Germany. + +A single weather station (the historical default, Dortmund) cannot represent +national electricity load: demand is concentrated in the large cities, while the +climate signal varies across the country. The remedy recommended in the +forecasting literature is a **population-weighted multi-city temperature index** +(Zimmermann & Ziel 2025, ``zimm25a``): sample several load centres and combine +them with fixed weights proportional to population. + +This module supplies that fixed, deterministic registry. The default set +:data:`GERMAN_LOAD_CENTERS` is chosen "smartly" along two axes: + +- **Population weighting** — each centre's weight is its approximate city + population (in thousands), so the index leans toward where the load actually + is (cities) rather than treating every point equally. +- **Geographic spread** — the centres deliberately span the north (Hamburg, + Bremen, Hannover), south (München, Stuttgart, Nürnberg), east (Berlin, + Leipzig, Dresden) and west (Köln, Düsseldorf, Dortmund, Frankfurt), so the + rural/peripheral climate diversity between the big cities is still captured + instead of collapsing onto one urban heat profile. + +The data are static published figures, so the registry is fully deterministic +and key-free, and the weights flow into +:func:`spotforecast2_safe.weather.derived.population_weighted_average`. +""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import List, Sequence, Tuple + + +@dataclass(frozen=True) +class WeatherLocation: + """A single weather sampling location with a population weight. + + Attributes: + name: Human-readable city name (used for logs/inspection only). + latitude: Latitude in decimal degrees. + longitude: Longitude in decimal degrees. + weight: Non-negative relative weight (here, approximate population in + thousands). Absolute scale is irrelevant — weights are normalised + by the consumer. + + Examples: + ```{python} + from spotforecast2_safe.weather.locations import WeatherLocation + + loc = WeatherLocation("Köln", 50.9375, 6.9603, 1073.0) + print(loc.name, loc.latitude, loc.weight) + ``` + """ + + name: str + latitude: float + longitude: float + weight: float + + +# Thirteen German load centres spanning all regions, weighted by approximate +# city population (thousands, rounded published figures). Geographic spread is +# intentional: the set is not simply "the N largest cities" but a regionally +# balanced sample so the national index is not dominated by one climate zone. +GERMAN_LOAD_CENTERS: Tuple[WeatherLocation, ...] = ( + WeatherLocation("Berlin", 52.5200, 13.4050, 3677.0), + WeatherLocation("Hamburg", 53.5511, 9.9937, 1906.0), + WeatherLocation("München", 48.1351, 11.5820, 1512.0), + WeatherLocation("Köln", 50.9375, 6.9603, 1073.0), + WeatherLocation("Frankfurt", 50.1109, 8.6821, 773.0), + WeatherLocation("Stuttgart", 48.7758, 9.1829, 632.0), + WeatherLocation("Düsseldorf", 51.2277, 6.7735, 620.0), + WeatherLocation("Leipzig", 51.3397, 12.3731, 617.0), + WeatherLocation("Dortmund", 51.5136, 7.4653, 588.0), + WeatherLocation("Bremen", 53.0793, 8.8017, 567.0), + WeatherLocation("Dresden", 51.0504, 13.7373, 556.0), + WeatherLocation("Hannover", 52.3759, 9.7320, 535.0), + WeatherLocation("Nürnberg", 49.4521, 11.0767, 523.0), +) + + +def default_german_locations() -> Tuple[WeatherLocation, ...]: + """Return the default population-weighted German load-centre registry. + + Returns: + Tuple[WeatherLocation, ...]: :data:`GERMAN_LOAD_CENTERS`, in a fixed, + deterministic order. + + Examples: + ```{python} + from spotforecast2_safe.weather.locations import default_german_locations + + locs = default_german_locations() + print(len(locs), locs[0].name) + ``` + """ + return GERMAN_LOAD_CENTERS + + +def coordinates( + locations: Sequence[WeatherLocation], +) -> List[Tuple[float, float]]: + """Extract ``(latitude, longitude)`` pairs in order. + + Args: + locations: The locations to read. + + Returns: + List[Tuple[float, float]]: One ``(lat, lon)`` pair per location, in the + same order. + + Examples: + ```{python} + from spotforecast2_safe.weather.locations import ( + coordinates, + default_german_locations, + ) + + print(coordinates(default_german_locations())[:2]) + ``` + """ + return [(loc.latitude, loc.longitude) for loc in locations] + + +def weights(locations: Sequence[WeatherLocation]) -> List[float]: + """Extract the raw (un-normalised) weights in order. + + Args: + locations: The locations to read. + + Returns: + List[float]: One weight per location, in the same order. Consumers + normalise these (e.g. + :func:`spotforecast2_safe.weather.derived.population_weighted_average`). + + Examples: + ```{python} + from spotforecast2_safe.weather.locations import ( + default_german_locations, + weights, + ) + + w = weights(default_german_locations()) + print(len(w), w[0]) + ``` + """ + return [loc.weight for loc in locations] diff --git a/tests/test_weather_derived.py b/tests/test_weather_derived.py new file mode 100644 index 00000000..11312431 --- /dev/null +++ b/tests/test_weather_derived.py @@ -0,0 +1,263 @@ +# SPDX-FileCopyrightText: 2026 bartzbeielstein +# SPDX-License-Identifier: AGPL-3.0-or-later + +"""Tests for derived weather features and population-weighted aggregation. + +Covers the pure-function contract of +``spotforecast2_safe.weather.derived``: numerical correctness, determinism +(identical input → identical output), and the fail-safe rule (an input that +cannot be transformed raises ``ValueError`` rather than being silently +repaired). +""" + +import numpy as np +import pandas as pd +import pytest + +from spotforecast2_safe.weather.derived import ( + DERIVED_FEATURE_KEYS, + add_derived_weather_features, + apparent_temperature, + cooling_degree_hours, + dew_point, + heating_degree_hours, + population_weighted_average, +) + + +def _idx(n: int) -> pd.DatetimeIndex: + return pd.date_range("2023-06-01", periods=n, freq="h", tz="UTC") + + +class TestDegreeHours: + """Heating/cooling degree-hours split the U-shaped response into two arms.""" + + def test_heating_degree_hours_values_and_base(self): + t = pd.Series([5.0, 15.0, 25.0]) + assert heating_degree_hours(t, base=15.0).tolist() == [10.0, 0.0, 0.0] + # A higher base lifts the heating demand everywhere below it. + assert heating_degree_hours(t, base=18.0).tolist() == [13.0, 3.0, 0.0] + + def test_cooling_degree_hours_values_and_base(self): + t = pd.Series([5.0, 22.0, 30.0]) + assert cooling_degree_hours(t, base=22.0).tolist() == [0.0, 0.0, 8.0] + + def test_names(self): + t = pd.Series([1.0]) + assert heating_degree_hours(t).name == "hdh" + assert cooling_degree_hours(t).name == "cdh" + + def test_nan_input_raises(self): + t = pd.Series([1.0, np.nan, 3.0]) + with pytest.raises(ValueError, match="NaN"): + heating_degree_hours(t) + with pytest.raises(ValueError, match="NaN"): + cooling_degree_hours(t) + + def test_non_series_raises(self): + with pytest.raises(ValueError, match="Series"): + heating_degree_hours([1.0, 2.0]) # type: ignore[arg-type] + + def test_deterministic(self): + t = pd.Series([3.0, 17.0, 28.0]) + pd.testing.assert_series_equal(heating_degree_hours(t), heating_degree_hours(t)) + + +class TestDewPoint: + """Magnus-Tetens dew point folds humidity into a temperature.""" + + def test_saturation_equals_temperature(self): + t = pd.Series([20.0, 0.0]) + rh = pd.Series([100.0, 100.0]) + out = dew_point(t, rh) + # At 100% RH, dew point ≈ air temperature. + assert abs(out.iloc[0] - 20.0) < 0.05 + assert abs(out.iloc[1] - 0.0) < 0.05 + + def test_half_humidity_known_value(self): + out = dew_point(pd.Series([20.0]), pd.Series([50.0])) + assert abs(out.iloc[0] - 9.26) < 0.1 + + def test_index_mismatch_raises(self): + t = pd.Series([20.0], index=[pd.Timestamp("2023-01-01", tz="UTC")]) + rh = pd.Series([50.0], index=[pd.Timestamp("2023-01-02", tz="UTC")]) + with pytest.raises(ValueError, match="share an index"): + dew_point(t, rh) + + def test_humidity_out_of_range_raises(self): + with pytest.raises(ValueError, match=r"\[0, 100\]"): + dew_point(pd.Series([20.0]), pd.Series([150.0])) + + def test_zero_humidity_is_finite(self): + out = dew_point(pd.Series([20.0]), pd.Series([0.0])) + assert np.isfinite(out.iloc[0]) + + +class TestApparentTemperature: + """Steadman apparent temperature captures the humidity load driver.""" + + def test_known_value_ms(self): + out = apparent_temperature( + pd.Series([30.0]), pd.Series([70.0]), pd.Series([2.0]) + ) + assert abs(out.iloc[0] - 34.37) < 0.1 + + def test_wind_unit_conversion(self): + # 7.2 km/h == 2.0 m/s, so the two calls must agree. + ms = apparent_temperature( + pd.Series([30.0]), pd.Series([70.0]), pd.Series([2.0]), wind_speed_unit="ms" + ) + kmh = apparent_temperature( + pd.Series([30.0]), + pd.Series([70.0]), + pd.Series([7.2]), + wind_speed_unit="kmh", + ) + assert abs(ms.iloc[0] - kmh.iloc[0]) < 1e-9 + + def test_bad_unit_raises(self): + with pytest.raises(ValueError, match="wind_speed_unit"): + apparent_temperature( + pd.Series([30.0]), + pd.Series([70.0]), + pd.Series([2.0]), + wind_speed_unit="mph", + ) + + def test_index_mismatch_raises(self): + t = pd.Series([30.0], index=[0]) + rh = pd.Series([70.0], index=[0]) + w = pd.Series([2.0], index=[1]) + with pytest.raises(ValueError, match="share an index"): + apparent_temperature(t, rh, w) + + +class TestPopulationWeightedAverage: + """Combine per-city frames into one demand-weighted national index.""" + + def test_weighted_value(self): + idx = _idx(3) + a = pd.DataFrame({"temperature_2m": [10.0] * 3}, index=idx) + b = pd.DataFrame({"temperature_2m": [20.0] * 3}, index=idx) + out = population_weighted_average([a, b], [3.0, 1.0]) + # (3*10 + 1*20) / 4 == 12.5 + assert out["temperature_2m"].tolist() == [12.5, 12.5, 12.5] + + def test_scale_invariance(self): + idx = _idx(2) + a = pd.DataFrame({"x": [10.0, 10.0]}, index=idx) + b = pd.DataFrame({"x": [20.0, 20.0]}, index=idx) + out1 = population_weighted_average([a, b], [3.0, 1.0]) + out2 = population_weighted_average([a, b], [300.0, 100.0]) + pd.testing.assert_frame_equal(out1, out2) + + def test_preserves_column_order(self): + idx = _idx(2) + cols = ["temperature_2m", "wind_speed_10m", "relative_humidity_2m"] + a = pd.DataFrame({c: [1.0, 1.0] for c in cols}, index=idx)[cols] + b = pd.DataFrame({c: [3.0, 3.0] for c in cols}, index=idx)[cols] + out = population_weighted_average([a, b], [1.0, 1.0]) + assert list(out.columns) == cols + + def test_empty_raises(self): + with pytest.raises(ValueError, match="at least one frame"): + population_weighted_average([], []) + + def test_length_mismatch_raises(self): + idx = _idx(2) + a = pd.DataFrame({"x": [1.0, 1.0]}, index=idx) + with pytest.raises(ValueError, match="length mismatch"): + population_weighted_average([a], [1.0, 2.0]) + + def test_negative_weight_raises(self): + idx = _idx(2) + a = pd.DataFrame({"x": [1.0, 1.0]}, index=idx) + b = pd.DataFrame({"x": [2.0, 2.0]}, index=idx) + with pytest.raises(ValueError, match="non-negative"): + population_weighted_average([a, b], [1.0, -1.0]) + + def test_zero_sum_raises(self): + idx = _idx(2) + a = pd.DataFrame({"x": [1.0, 1.0]}, index=idx) + with pytest.raises(ValueError, match="positive value"): + population_weighted_average([a], [0.0]) + + def test_index_mismatch_raises(self): + a = pd.DataFrame({"x": [1.0]}, index=_idx(1)) + b = pd.DataFrame( + {"x": [2.0]}, + index=pd.date_range("2024-01-01", periods=1, freq="h", tz="UTC"), + ) + with pytest.raises(ValueError, match="different index"): + population_weighted_average([a, b], [1.0, 1.0]) + + def test_column_mismatch_raises(self): + idx = _idx(1) + a = pd.DataFrame({"x": [1.0]}, index=idx) + b = pd.DataFrame({"y": [2.0]}, index=idx) + with pytest.raises(ValueError, match="different columns"): + population_weighted_average([a, b], [1.0, 1.0]) + + +class TestAddDerivedWeatherFeatures: + """The orchestrator appends derived columns deterministically, fail-safe.""" + + def _weather(self) -> pd.DataFrame: + idx = _idx(4) + return pd.DataFrame( + { + "temperature_2m": [28.0, 30.0, 12.0, 8.0], + "relative_humidity_2m": [60.0, 55.0, 80.0, 90.0], + "wind_speed_10m": [7.2, 10.8, 3.6, 1.8], # km/h + }, + index=idx, + ) + + def test_adds_all_four_columns(self): + out = add_derived_weather_features( + self._weather(), ["hdh", "cdh", "apparent_temperature", "dew_point"] + ) + added = set(out.columns) - set(self._weather().columns) + assert added == {"hdh", "cdh", "apparent_temperature", "dew_point"} + + def test_original_columns_preserved(self): + w = self._weather() + out = add_derived_weather_features(w, ["hdh"]) + for c in w.columns: + assert c in out.columns + pd.testing.assert_series_equal(out["temperature_2m"], w["temperature_2m"]) + + def test_canonical_column_order_independent_of_request_order(self): + w = self._weather() + out_a = add_derived_weather_features(w, ["dew_point", "hdh"]) + out_b = add_derived_weather_features(w, ["hdh", "dew_point"]) + # Derived columns always appear in DERIVED_FEATURE_KEYS order. + derived_a = [c for c in out_a.columns if c in DERIVED_FEATURE_KEYS] + derived_b = [c for c in out_b.columns if c in DERIVED_FEATURE_KEYS] + assert derived_a == derived_b == ["hdh", "dew_point"] + + def test_empty_features_returns_copy(self): + w = self._weather() + out = add_derived_weather_features(w, []) + pd.testing.assert_frame_equal(out, w) + assert out is not w # a copy, not the same object + + def test_unknown_feature_raises(self): + with pytest.raises(ValueError, match="unknown derived feature"): + add_derived_weather_features(self._weather(), ["humidex"]) + + def test_missing_source_column_raises(self): + only_temp = self._weather()[["temperature_2m"]] + with pytest.raises(ValueError, match="apparent_temperature.*needs column"): + add_derived_weather_features(only_temp, ["apparent_temperature"]) + + def test_deterministic(self): + w = self._weather() + pd.testing.assert_frame_equal( + add_derived_weather_features(w, ["hdh", "cdh"]), + add_derived_weather_features(w, ["hdh", "cdh"]), + ) + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) diff --git a/tests/test_weather_locations.py b/tests/test_weather_locations.py new file mode 100644 index 00000000..9fbb7967 --- /dev/null +++ b/tests/test_weather_locations.py @@ -0,0 +1,65 @@ +# SPDX-FileCopyrightText: 2026 bartzbeielstein +# SPDX-License-Identifier: AGPL-3.0-or-later + +"""Tests for the German population-weighted weather-location registry.""" + +import pytest + +from spotforecast2_safe.weather.locations import ( + GERMAN_LOAD_CENTERS, + WeatherLocation, + coordinates, + default_german_locations, + weights, +) + + +class TestRegistry: + def test_default_returns_registry(self): + assert default_german_locations() is GERMAN_LOAD_CENTERS + assert len(GERMAN_LOAD_CENTERS) == 13 + + def test_all_weights_positive(self): + assert all(loc.weight > 0 for loc in GERMAN_LOAD_CENTERS) + + def test_names_unique(self): + names = [loc.name for loc in GERMAN_LOAD_CENTERS] + assert len(names) == len(set(names)) + + def test_coordinates_within_germany(self): + # Germany spans roughly 47–55 °N and 5–15 °E. + for loc in GERMAN_LOAD_CENTERS: + assert 47.0 <= loc.latitude <= 55.5, loc.name + assert 5.0 <= loc.longitude <= 15.5, loc.name + + def test_geographic_spread(self): + # The set must not collapse onto one climate zone: it spans north + # (>53 °N) and south (<49 °N), east (>12 °E) and west (<8 °E). + lats = [loc.latitude for loc in GERMAN_LOAD_CENTERS] + lons = [loc.longitude for loc in GERMAN_LOAD_CENTERS] + assert max(lats) > 53.0 and min(lats) < 49.0 + assert max(lons) > 12.0 and min(lons) < 8.0 + + def test_coordinates_and_weights_align(self): + coords = coordinates(GERMAN_LOAD_CENTERS) + w = weights(GERMAN_LOAD_CENTERS) + assert len(coords) == len(w) == len(GERMAN_LOAD_CENTERS) + assert coords[0] == ( + GERMAN_LOAD_CENTERS[0].latitude, + GERMAN_LOAD_CENTERS[0].longitude, + ) + assert w[0] == GERMAN_LOAD_CENTERS[0].weight + + def test_deterministic_order(self): + assert coordinates(default_german_locations()) == coordinates( + default_german_locations() + ) + + def test_frozen_dataclass(self): + loc = WeatherLocation("X", 50.0, 8.0, 1.0) + with pytest.raises(Exception): + loc.weight = 2.0 # type: ignore[misc] + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) diff --git a/tests/test_weather_population_weighting.py b/tests/test_weather_population_weighting.py new file mode 100644 index 00000000..97c1c512 --- /dev/null +++ b/tests/test_weather_population_weighting.py @@ -0,0 +1,175 @@ +# SPDX-FileCopyrightText: 2026 bartzbeielstein +# SPDX-License-Identifier: AGPL-3.0-or-later + +"""Wired-path tests for population-weighted and derived weather features. + +The Open-Meteo fetch is monkeypatched (mirroring +``tests/test_weather_features_trim.py``) so these tests are fully offline and +deterministic. They check that ``get_weather_features`` combines multiple +locations by weight and derives the requested feature columns. +""" + +from unittest.mock import patch + +import pandas as pd +import pytest + +from spotforecast2_safe.weather import get_weather_features + +START = pd.Timestamp("2024-01-01 00:00", tz="UTC") +COV_END = pd.Timestamp("2024-01-03 23:00", tz="UTC") +FORECAST_HORIZON = 24 + + +def _data() -> pd.DataFrame: + idx = pd.date_range(START, periods=48, freq="h", tz="UTC") + return pd.DataFrame({"load": range(48)}, index=idx) + + +def _raw(temperature: float, humidity: float = 50.0, wind: float = 5.0) -> pd.DataFrame: + idx = pd.date_range(START, COV_END, freq="h", tz="UTC") + return pd.DataFrame( + { + "temperature_2m": [temperature] * len(idx), + "relative_humidity_2m": [humidity] * len(idx), + "wind_speed_10m": [wind] * len(idx), + }, + index=idx, + ) + + +class TestPopulationWeightedPath: + def test_two_cities_combined_by_weight(self): + lat_a, lat_b = 52.52, 48.135 + + def fake_fetch(*, latitude, **kwargs): + return _raw(10.0 if latitude == lat_a else 20.0) + + with patch( + "spotforecast2_safe.data.fetch_data.fetch_weather_data", + side_effect=fake_fetch, + ): + _, aligned = get_weather_features( + data=_data(), + start=START, + cov_end=COV_END, + forecast_horizon=FORECAST_HORIZON, + locations=[(lat_a, 13.405), (lat_b, 11.582)], + location_weights=[3.0, 1.0], + ) + + # (3*10 + 1*20) / 4 == 12.5 at every hour. + assert (aligned["temperature_2m"].round(6) == 12.5).all() + + def test_locations_without_weights_raises(self): + with patch( + "spotforecast2_safe.data.fetch_data.fetch_weather_data", + side_effect=lambda **kw: _raw(10.0), + ): + with pytest.raises(ValueError, match="location_weights is required"): + get_weather_features( + data=_data(), + start=START, + cov_end=COV_END, + forecast_horizon=FORECAST_HORIZON, + locations=[(52.52, 13.405)], + ) + + def test_length_mismatch_raises(self): + with patch( + "spotforecast2_safe.data.fetch_data.fetch_weather_data", + side_effect=lambda **kw: _raw(10.0), + ): + with pytest.raises(ValueError, match="equal length"): + get_weather_features( + data=_data(), + start=START, + cov_end=COV_END, + forecast_horizon=FORECAST_HORIZON, + locations=[(52.52, 13.405), (48.135, 11.582)], + location_weights=[1.0], + ) + + +class TestDerivedFeaturePath: + def test_derived_columns_present(self): + with patch( + "spotforecast2_safe.data.fetch_data.fetch_weather_data", + return_value=_raw(temperature=25.0, humidity=60.0, wind=7.2), + ): + features, _ = get_weather_features( + data=_data(), + start=START, + cov_end=COV_END, + forecast_horizon=FORECAST_HORIZON, + derived_features=["hdh", "cdh", "apparent_temperature", "dew_point"], + ) + + for col in ("hdh", "cdh", "apparent_temperature", "dew_point"): + assert col in features.columns + assert ( + not features[["hdh", "cdh", "apparent_temperature", "dew_point"]] + .isna() + .any() + .any() + ) + + def test_no_derived_by_default(self): + with patch( + "spotforecast2_safe.data.fetch_data.fetch_weather_data", + return_value=_raw(temperature=25.0), + ): + features, _ = get_weather_features( + data=_data(), + start=START, + cov_end=COV_END, + forecast_horizon=FORECAST_HORIZON, + ) + for col in ("hdh", "cdh", "apparent_temperature", "dew_point"): + assert col not in features.columns + + def test_degree_hours_respects_base(self): + # temperature 25 °C, heating base 30 → hdh = 5 everywhere. + with patch( + "spotforecast2_safe.data.fetch_data.fetch_weather_data", + return_value=_raw(temperature=25.0), + ): + features, _ = get_weather_features( + data=_data(), + start=START, + cov_end=COV_END, + forecast_horizon=FORECAST_HORIZON, + derived_features=["hdh"], + hdh_base=30.0, + ) + assert (features["hdh"].round(6) == 5.0).all() + + +class TestCombinedPath: + def test_population_weighted_plus_derived(self): + lat_a, lat_b = 52.52, 48.135 + + def fake_fetch(*, latitude, **kwargs): + return _raw(10.0 if latitude == lat_a else 30.0, humidity=70.0, wind=3.6) + + with patch( + "spotforecast2_safe.data.fetch_data.fetch_weather_data", + side_effect=fake_fetch, + ): + features, aligned = get_weather_features( + data=_data(), + start=START, + cov_end=COV_END, + forecast_horizon=FORECAST_HORIZON, + locations=[(lat_a, 13.405), (lat_b, 11.582)], + location_weights=[1.0, 1.0], + derived_features=["cdh"], + cdh_base=18.0, + ) + # weighted temp = (10 + 30) / 2 == 20; cdh with base 18 == 2. + assert (aligned["temperature_2m"].round(6) == 20.0).all() + assert (features["cdh"].round(6) == 2.0).all() + + +if __name__ == "__main__": + pytest.main([__file__, "-v"])