From 95d45d2ee3a6f53b3f17716987f4f32579105897 Mon Sep 17 00:00:00 2001 From: bartzbeielstein <32470350+bartzbeielstein@users.noreply.github.com> Date: Sun, 7 Jun 2026 14:38:10 +0200 Subject: [PATCH 1/2] feat(preprocessing): add deviation rule to target-corruption detector MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Third detector rule: flag hours where target − reference falls below -deviation_mw for at least deviation_slots consecutive native-cadence slots, with the published day-ahead forecast as the reference column. Dropout-only by design — the 2026-06 incident class is exclusively negative (observed to −16.6 GW under the forecast) while staying below the dynamics thresholds (2026-06-07 frontier: 5.6 GW steps under the 6 GW limit at Actual − Forecast = −11.6 GW), so level-vs-reference is the discriminator the range/step rules cannot see. Semantics: - inert-by-default: new knobs target_qc_deviation_mw/_ref/_slots default to None/None/2; pipeline stays byte-identical when unset - frontier-safe: NaN in either column compares False, so the publication-lag tail never flags and a gap breaks a consecutive run - missing reference column skips the rule silently (mirrors absent target handling); reference is never NaN-ed by heal/truncate when targets are scoped to the actuals Plumbed through ConfigEntsoe/ConfigMulti, validate_config, and the prepare_data call site (detector activation now includes deviation-only configurations). Co-Authored-By: Claude Opus 4.8 (1M context) --- .../execute-results/html.json | 4 +- .../execute-results/html.json | 4 +- .../execute-results/html.json | 6 +- .../execute-results/html.json | 6 +- ...onfigurator.config_entsoe.ConfigEntsoe.qmd | 3 + .../configurator.config_multi.ConfigMulti.qmd | 3 + ...ruption.apply_target_corruption_policy.qmd | 30 ++- ...et_corruption.detect_target_corruption.qmd | 77 +++++- .../configurator/_base_config.py | 21 ++ .../configurator/config_entsoe.py | 18 +- .../configurator/config_multi.py | 18 +- src/spotforecast2_safe/multitask/base.py | 8 +- .../preprocessing/target_corruption.py | 152 ++++++++++- tests/preprocessing/test_target_corruption.py | 237 ++++++++++++++++++ tests/test_config_target_corruption_knobs.py | 49 ++++ 15 files changed, 592 insertions(+), 44 deletions(-) diff --git a/_freeze/docs/reference/configurator.config_entsoe.ConfigEntsoe/execute-results/html.json b/_freeze/docs/reference/configurator.config_entsoe.ConfigEntsoe/execute-results/html.json index da417fa7..7092228f 100644 --- a/_freeze/docs/reference/configurator.config_entsoe.ConfigEntsoe/execute-results/html.json +++ b/_freeze/docs/reference/configurator.config_entsoe.ConfigEntsoe/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "26c31af260d1c9d5c1d758af0ff5c81d", + "hash": "59c932afe666a648007b9ced2b524519", "result": { "engine": "jupyter", - "markdown": "---\ntitle: configurator.config_entsoe.ConfigEntsoe\n---\n\n\n\n```python\nconfigurator.config_entsoe.ConfigEntsoe(\n country_code='DE',\n periods=None,\n lags_consider=None,\n train_size=None,\n end_train_default='2025-12-31 00:00+00:00',\n delta_val=None,\n predict_size=24,\n cv_block_size=None,\n refit_size=7,\n random_state=314159,\n n_hyperparameters_trials=20,\n data_filename='interim/energy_load.csv',\n targets=None,\n use_outlier_detection=True,\n contamination=0.01,\n imputation_method='weighted',\n window_size=72,\n imputation_window_size=None,\n use_exogenous_features=True,\n latitude=51.5136,\n longitude=7.4653,\n timezone='UTC',\n state='NW',\n include_weather_windows=False,\n include_holiday_features=False,\n include_holiday_adjacency_features=False,\n poly_features_degree=1,\n max_poly_features=10,\n poly_mi_n_jobs=-1,\n poly_mi_sample_size=4000,\n include_covid_infection_rate=False,\n include_entsoe_forecast_load=False,\n include_entsoe_renewable_forecast=False,\n include_entsoe_net_load=False,\n include_entsoe_day_ahead_price=False,\n index_name='Time (UTC)',\n bounds=None,\n verbose=False,\n cache_home=None,\n n_trials_optuna=15,\n n_trials_spotoptim=10,\n n_initial_spotoptim=5,\n n_jobs_spotoptim=None,\n warm_start_lags=False,\n task='lazy',\n agg_weights=None,\n forecaster_factory=None,\n data_loader=None,\n test_data_loader=None,\n auto_save_models=True,\n data_frame_name='default',\n number_folds=10,\n on_weather_failure='raise',\n on_exog_provider_failure='raise',\n exog_max_gap_hours=0,\n exog_max_tail_gap_hours=0,\n exog_provider_window='full',\n retrain_max_age=None,\n target_qc_range_mw=None,\n target_qc_step_mw=None,\n target_qc_window_days=None,\n target_corruption_policy='abort',\n target_max_heal_hours=0,\n target_anchor_zone_hours=168,\n)\n```\n\nConfiguration for the ENTSO-E forecasting pipeline.\n\nSingle-target counterpart to ``ConfigMulti``. Used by the ENTSO-E CLI\n(``spotforecast2.tasks.task_entsoe``) and any other single-target pipeline\nrouted through ``spotforecast2.multitask.runner.run(config_cls=ConfigEntsoe)``.\n\n``country_code`` is the canonical ISO 3166-1 alpha-2 country-code\nattribute used by both API queries and the multitask ``PipelineConfig``\nprotocol.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------------------------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|\n| country_code | [str](`str`) | ISO 3166-1 alpha-2 country code (e.g. ``\"DE\"``). | `'DE'` |\n| periods | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[Period](`spotforecast2_safe.data.Period`)\\]\\] | Cyclical feature encodings. | `None` |\n| lags_consider | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[int](`int`)\\]\\] | Lag values for autoregressive features. | `None` |\n| train_size | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timedelta](`pandas.Timedelta`)\\] | Training window. | `None` |\n| end_train_default | [str](`str`) | Default end-of-training timestamp (ISO). | `'2025-12-31 00:00+00:00'` |\n| delta_val | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timedelta](`pandas.Timedelta`)\\] | Validation window. | `None` |\n| predict_size | [int](`int`) | Prediction horizon in hours. | `24` |\n| cv_block_size | [int](`int`) \\| None | Cross-validation test-block width in hours. Defaults to ``None``, meaning the CV uses ``predict_size``. Set to a fixed value (e.g. ``24``) to decouple the cross-validation horizon from a render-dependent live ``predict_size``. | `None` |\n| refit_size | [int](`int`) | Refit cadence in days. | `7` |\n| random_state | [int](`int`) | Random seed. | `314159` |\n| n_hyperparameters_trials | [int](`int`) | Hyperparameter-tuning trial budget. | `20` |\n| data_filename | [str](`str`) | Path to the merged interim CSV. | `'interim/energy_load.csv'` |\n| targets | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[str](`str`)\\]\\] | Active target column names. ``None`` until set after data loading. For ENTSO-E this is typically ``[\"Actual Load\"]``. | `None` |\n| use_outlier_detection | [bool](`bool`) | Apply IsolationForest-based outlier removal. Defaults to ``True``. | `True` |\n| contamination | [float](`float`) | IsolationForest contamination fraction. | `0.01` |\n| imputation_method | [str](`str`) | Gap-filling strategy. | `'weighted'` |\n| window_size | [int](`int`) | Rolling window for weighted imputation. Also the LightGBM rolling-mean feature window in the ENTSO-E factories. | `72` |\n| imputation_window_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Width of the gap-penalty zone (in hours) around each imputed value for the ``\"weighted\"`` strategy. When ``None`` (default), falls back to ``window_size``, so existing behaviour is unchanged. Set this to decouple the imputation penalty zone from the rolling-feature window. | `None` |\n| use_exogenous_features | [bool](`bool`) | Build weather/calendar/holiday features. | `True` |\n| latitude | [float](`float`) | Location latitude. | `51.5136` |\n| longitude | [float](`float`) | Location longitude. | `7.4653` |\n| timezone | [str](`str`) | IANA timezone string. | `'UTC'` |\n| state | [str](`str`) | Subdivision code for regional holidays. | `'NW'` |\n| include_weather_windows | [bool](`bool`) | Weather-window feature toggle. | `False` |\n| include_holiday_features | [bool](`bool`) | Holiday feature toggle. | `False` |\n| include_holiday_adjacency_features | [bool](`bool`) | Brückentag and before/after-holiday indicator toggle. Defaults to ``False``. | `False` |\n| poly_features_degree | [int](`int`) | Polynomial-interaction degree passed to the feature builder. ``1`` (default) generates no interactions; ``2`` adds pairwise bilinear terms; ``3+`` higher order. | `1` |\n| max_poly_features | [int](`int`) | Cap on polynomial interaction columns. When more than this many ``poly_*`` columns are generated, only the top ``max_poly_features`` ranked by mutual information with the target are kept (``<= 0`` disables the cap). Defaults to ``10``. | `10` |\n| poly_mi_n_jobs | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Parallel jobs for the mutual-information ranking that enforces ``max_poly_features``. ``-1`` (default) uses all cores; ``None`` runs single-threaded. Parallelism does not change the selection. | `-1` |\n| poly_mi_sample_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Row cap for that ranking; longer series are scored on a reproducible random subsample of this size (seeded by ``random_state``), which can change which borderline columns make the top K. ``None`` scores every row (the pre-15.8 behaviour). Defaults to ``4000``. | `4000` |\n| include_covid_infection_rate | [bool](`bool`) | Append the bundled German national COVID-19 7-day incidence (RKI) as an exogenous level regressor. Defaults to ``False``. | `False` |\n| include_entsoe_forecast_load | [bool](`bool`) | Append the ENTSO-E day-ahead Forecasted Load as a near-oracle exogenous prior. Defaults to ``False``. | `False` |\n| include_entsoe_renewable_forecast | [bool](`bool`) | Append the ENTSO-E day-ahead wind and solar generation forecast. Defaults to ``False``. | `False` |\n| include_entsoe_net_load | [bool](`bool`) | Append the ENTSO-E day-ahead net load (Forecasted Load minus wind/solar forecast). Defaults to ``False``. | `False` |\n| include_entsoe_day_ahead_price | [bool](`bool`) | Append the ENTSO-E day-ahead spot price (DE/LU). Defaults to ``False``. | `False` |\n| index_name | [str](`str`) | Datetime column name when the DataFrame index is reset. ENTSO-E CSVs use ``\"Time (UTC)\"``; defaults to that. | `'Time (UTC)'` |\n| bounds | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[tuple](`tuple`)\\]\\] | Per-column outlier bounds. For single-target ENTSO-E this is typically ``None`` or a single ``[(lower, upper)]`` entry. | `None` |\n| verbose | [bool](`bool`) | Verbose pipeline output. | `False` |\n| cache_home | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Cache directory override. | `None` |\n| n_trials_optuna | [int](`int`) | Optuna Bayesian-search trial budget. | `15` |\n| n_trials_spotoptim | [int](`int`) | SpotOptim surrogate-search trial budget. | `10` |\n| n_initial_spotoptim | [int](`int`) | SpotOptim initial random evaluations. | `5` |\n| n_jobs_spotoptim | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Worker count for SpotOptim's parallel (steady-state) evaluation. ``None`` (default) runs sequentially; ``-1`` uses all CPU cores; a positive integer pins the worker count. Parallel tuning is faster but, being steady-state, changes the search trajectory, so the tuned result is not bit-identical to a sequential run even with a fixed ``random_state``. | `None` |\n| warm_start_lags | [bool](`bool`) | Seed the SpotOptim search with ``lags_consider``. | `False` |\n| task | [str](`str`) | Active prediction task name. | `'lazy'` |\n| agg_weights | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[float](`float`)\\]\\] | Per-target aggregation weights. For single-target use this is typically ``[1.0]`` or ``None``. | `None` |\n| forecaster_factory | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Callable ``factory(config, *, weight_func, target) -> forecaster`` consumed by ``BaseTask.create_forecaster``. ``None`` falls back to the default LightGBM factory. | `None` |\n| data_loader | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Callable ``data_loader(config)`` returning a pandas DataFrame. Invoked by ``BaseTask.prepare_data`` when no DataFrame is supplied — the ENTSO-E pipeline hook for ``download_new_data`` / ``merge_build_manual``. | `None` |\n| test_data_loader | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Callable ``test_data_loader(config)`` returning a pandas DataFrame with ground-truth values for the prediction horizon. Invoked by ``BaseTask.prepare_data`` when no test DataFrame is supplied; the returned frame populates ``test_actual`` and ``metrics_future`` in the prediction package. | `None` |\n| auto_save_models | [bool](`bool`) | Whether ``BaseTask._run_strategy`` should persist fitted forecasters to ``/models/`` after every training run. Defaults to ``True``. | `True` |\n| data_frame_name | [str](`str`) | Identifier for the active dataset. Used by ``BaseTask`` to name cache subdirectories, model files, and the per-dataset log file. Defaults to ``\"default\"``. | `'default'` |\n| number_folds | [int](`int`) | Number of folds used by ``BaseTask.cv_ts`` when building the ``TimeSeriesSplit`` cross-validation splitter for tuning tasks. Defaults to ``10``. | `10` |\n| on_weather_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Policy for handling Open-Meteo fetch failures inside ``BaseTask.build_exogenous_features``. ``\"raise\"`` (default) aborts the pipeline with a ``WeatherFetchError`` and preserves the safety-critical fail-safe semantics. ``\"skip\"`` logs a warning and continues with empty weather features so the rest of the pipeline can run without the Open-Meteo dependency. | `'raise'` |\n| on_exog_provider_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Policy for an exogenous-provider failure inside ``ExogBuilder.build``. ``\"raise\"`` (default) propagates the ``ExogProviderError`` (fail-safe); ``\"skip\"`` logs a warning and omits that provider's columns. | `'raise'` |\n| exog_max_gap_hours | [int](`int`) | Maximum length, in hours, of a contiguous run of missing exogenous-provider values healed before the provider is rejected. Interior gaps are time-interpolated; leading/trailing edge gaps are back-/forward-filled. ``0`` (default) keeps the strict fail-safe (any gap raises). Healed runs are logged with count and span. Only already-published day-ahead vintages are involved, so healing is leakage-clean (CR-3). | `0` |\n| exog_max_tail_gap_hours | [int](`int`) | Extended healing budget, in hours, applied exclusively to the trailing-edge NaN run (the run containing the last index timestamp). The effective tail budget is ``max(exog_max_gap_hours, exog_max_tail_gap_hours)``. The canonical use case is the ENTSO-E day-ahead publication frontier: the last published vintage is zero-order-held forward to the forecast horizon without touching interior gaps (CR-3-clean). When ``exog_max_tail_gap_hours <= exog_max_gap_hours`` the parameter is inert (the interior budget already covers the tail) and a warning is logged. Defaults to ``0``. | `0` |\n| exog_provider_window | [Literal](`typing.Literal`)\\[\\'full\\', \\'train\\'\\] | Span the exogenous providers are validated against. ``\"full\"`` (default) requires coverage of the entire ``data_start``→``cov_end`` request, matching prior behaviour. ``\"train\"`` validates only the consumed window ``[start_train_ts, cov_end]``, tolerating missing values before the training window. Honoured by the MultiTask pipeline; the forecaster-wrapper path currently always validates the full span. | `'full'` |\n| retrain_max_age | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Maximum age of a previously trained model before retraining is required. Consumed by ``spotforecast2_safe.manager.trainer.should_retrain`` to gate scheduled retraining workflows. Defaults to ``Timedelta(days=7)``. | `None` |\n\n## Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|--------------------|----------------------------------------------------|--------------------------------------------------------------------------------------------|\n| country_code | [str](`str`) | ISO country code used for API queries and holiday feature generation. |\n| auto_save_models | [bool](`bool`) | Whether to auto-persist fitted forecasters after each training run. |\n| data_frame_name | [str](`str`) | Active-dataset identifier used for cache and log-file naming. |\n| number_folds | [int](`int`) | Cross-validation fold count for tuning tasks. |\n| on_weather_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Open-Meteo fetch-failure policy: ``\"raise\"`` aborts, ``\"skip\"`` continues without weather. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#baeed703 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\n\nfrom spotforecast2_safe.configurator.config_entsoe import ConfigEntsoe\n\n# Use default configuration\nconfig = ConfigEntsoe()\nprint(config.country_code)\nprint(config.predict_size)\nprint(config.random_state)\n\n# Create custom configuration\ncustom_config = ConfigEntsoe(\n country_code=\"FR\",\n predict_size=48,\n cv_block_size=24,\n random_state=42,\n)\nprint(custom_config.country_code)\nprint(custom_config.predict_size)\nprint(custom_config.cv_block_size)\n\n# Verify training window\nassert config.train_size == pd.Timedelta(days=3 * 365)\n\n# Check default periods\nprint(len(config.periods))\nprint(config.periods[0].name)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nDE\n24\n314159\nFR\n48\n24\n5\ndaily\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [get_params](#spotforecast2_safe.configurator.config_entsoe.ConfigEntsoe.get_params) | Get parameters for this configuration object. |\n| [set_params](#spotforecast2_safe.configurator.config_entsoe.ConfigEntsoe.set_params) | Set the parameters of this configuration object. |\n\n### get_params { #spotforecast2_safe.configurator.config_entsoe.ConfigEntsoe.get_params }\n\n```python\nconfigurator.config_entsoe.ConfigEntsoe.get_params(deep=True)\n```\n\nGet parameters for this configuration object.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|----------------|-----------------------------------------------------------------------------------------------------------|-----------|\n| deep | [bool](`bool`) | If True, will return the parameters for this configuration and contained sub-objects that are estimators. | `True` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|-----------------------------------------------------------|-------------------------------------------------------|\n| params | [Dict](`typing.Dict`)\\[[str](`str`), [object](`object`)\\] | Dictionary of parameter names mapped to their values. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#5db3e393 .cell execution_count=2}\n``` {.python .cell-code}\nfrom spotforecast2_safe.configurator.config_entsoe import ConfigEntsoe\n\nconfig = ConfigEntsoe(country_code=\"FR\")\np = config.get_params()\nprint(p[\"country_code\"])\nprint(p[\"predict_size\"])\nassert p[\"country_code\"] == \"FR\"\nassert p[\"predict_size\"] == 24\nassert p[\"cv_block_size\"] is None\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nFR\n24\n```\n:::\n:::\n\n\n### set_params { #spotforecast2_safe.configurator.config_entsoe.ConfigEntsoe.set_params }\n\n```python\nconfigurator.config_entsoe.ConfigEntsoe.set_params(params=None, **kwargs)\n```\n\nSet the parameters of this configuration object.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|----------|-----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|\n| params | [Dict](`typing.Dict`)\\[[str](`str`), [object](`object`)\\] | Optional dictionary of parameter names mapped to their new values. | `None` |\n| **kwargs | [object](`object`) | Additional parameter names mapped to their new values. It supports configuring nested 'Period' objects using the `periods____` notation. | `{}` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------------|------------------------------------------------------------------------------|--------------------------------------------------------------------------------|\n| ConfigEntsoe | [ConfigEntsoe](`spotforecast2_safe.configurator.config_entsoe.ConfigEntsoe`) | The configuration instance with updated parameters (supports method chaining). |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#55088f71 .cell execution_count=3}\n``` {.python .cell-code}\nfrom spotforecast2_safe.configurator.config_entsoe import ConfigEntsoe\n\nconfig = ConfigEntsoe()\n\n# Flat parameter setting\nconfig.set_params(country_code=\"FR\", predict_size=48)\nprint(config.country_code)\nprint(config.predict_size)\nassert config.country_code == \"FR\"\nassert config.predict_size == 48\n\n# Deep parameter setting for nested Period objects\nconfig.set_params(periods__daily__n_periods=24)\ndaily_n = next(p.n_periods for p in config.periods if p.name == \"daily\")\nprint(daily_n)\nassert daily_n == 24\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nFR\n48\n24\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: configurator.config_entsoe.ConfigEntsoe\n---\n\n\n\n```python\nconfigurator.config_entsoe.ConfigEntsoe(\n country_code='DE',\n periods=None,\n lags_consider=None,\n train_size=None,\n end_train_default='2025-12-31 00:00+00:00',\n delta_val=None,\n predict_size=24,\n cv_block_size=None,\n refit_size=7,\n random_state=314159,\n n_hyperparameters_trials=20,\n data_filename='interim/energy_load.csv',\n targets=None,\n use_outlier_detection=True,\n contamination=0.01,\n imputation_method='weighted',\n window_size=72,\n imputation_window_size=None,\n use_exogenous_features=True,\n latitude=51.5136,\n longitude=7.4653,\n timezone='UTC',\n state='NW',\n include_weather_windows=False,\n include_holiday_features=False,\n include_holiday_adjacency_features=False,\n poly_features_degree=1,\n max_poly_features=10,\n poly_mi_n_jobs=-1,\n poly_mi_sample_size=4000,\n include_covid_infection_rate=False,\n include_entsoe_forecast_load=False,\n include_entsoe_renewable_forecast=False,\n include_entsoe_net_load=False,\n include_entsoe_day_ahead_price=False,\n index_name='Time (UTC)',\n bounds=None,\n verbose=False,\n cache_home=None,\n n_trials_optuna=15,\n n_trials_spotoptim=10,\n n_initial_spotoptim=5,\n n_jobs_spotoptim=None,\n warm_start_lags=False,\n task='lazy',\n agg_weights=None,\n forecaster_factory=None,\n data_loader=None,\n test_data_loader=None,\n auto_save_models=True,\n data_frame_name='default',\n number_folds=10,\n on_weather_failure='raise',\n on_exog_provider_failure='raise',\n exog_max_gap_hours=0,\n exog_max_tail_gap_hours=0,\n exog_provider_window='full',\n retrain_max_age=None,\n target_qc_range_mw=None,\n target_qc_step_mw=None,\n target_qc_window_days=None,\n target_corruption_policy='abort',\n target_max_heal_hours=0,\n target_anchor_zone_hours=168,\n target_qc_deviation_mw=None,\n target_qc_deviation_ref=None,\n target_qc_deviation_slots=2,\n)\n```\n\nConfiguration for the ENTSO-E forecasting pipeline.\n\nSingle-target counterpart to ``ConfigMulti``. Used by the ENTSO-E CLI\n(``spotforecast2.tasks.task_entsoe``) and any other single-target pipeline\nrouted through ``spotforecast2.multitask.runner.run(config_cls=ConfigEntsoe)``.\n\n``country_code`` is the canonical ISO 3166-1 alpha-2 country-code\nattribute used by both API queries and the multitask ``PipelineConfig``\nprotocol.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------------------------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|\n| country_code | [str](`str`) | ISO 3166-1 alpha-2 country code (e.g. ``\"DE\"``). | `'DE'` |\n| periods | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[Period](`spotforecast2_safe.data.Period`)\\]\\] | Cyclical feature encodings. | `None` |\n| lags_consider | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[int](`int`)\\]\\] | Lag values for autoregressive features. | `None` |\n| train_size | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timedelta](`pandas.Timedelta`)\\] | Training window. | `None` |\n| end_train_default | [str](`str`) | Default end-of-training timestamp (ISO). | `'2025-12-31 00:00+00:00'` |\n| delta_val | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timedelta](`pandas.Timedelta`)\\] | Validation window. | `None` |\n| predict_size | [int](`int`) | Prediction horizon in hours. | `24` |\n| cv_block_size | [int](`int`) \\| None | Cross-validation test-block width in hours. Defaults to ``None``, meaning the CV uses ``predict_size``. Set to a fixed value (e.g. ``24``) to decouple the cross-validation horizon from a render-dependent live ``predict_size``. | `None` |\n| refit_size | [int](`int`) | Refit cadence in days. | `7` |\n| random_state | [int](`int`) | Random seed. | `314159` |\n| n_hyperparameters_trials | [int](`int`) | Hyperparameter-tuning trial budget. | `20` |\n| data_filename | [str](`str`) | Path to the merged interim CSV. | `'interim/energy_load.csv'` |\n| targets | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[str](`str`)\\]\\] | Active target column names. ``None`` until set after data loading. For ENTSO-E this is typically ``[\"Actual Load\"]``. | `None` |\n| use_outlier_detection | [bool](`bool`) | Apply IsolationForest-based outlier removal. Defaults to ``True``. | `True` |\n| contamination | [float](`float`) | IsolationForest contamination fraction. | `0.01` |\n| imputation_method | [str](`str`) | Gap-filling strategy. | `'weighted'` |\n| window_size | [int](`int`) | Rolling window for weighted imputation. Also the LightGBM rolling-mean feature window in the ENTSO-E factories. | `72` |\n| imputation_window_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Width of the gap-penalty zone (in hours) around each imputed value for the ``\"weighted\"`` strategy. When ``None`` (default), falls back to ``window_size``, so existing behaviour is unchanged. Set this to decouple the imputation penalty zone from the rolling-feature window. | `None` |\n| use_exogenous_features | [bool](`bool`) | Build weather/calendar/holiday features. | `True` |\n| latitude | [float](`float`) | Location latitude. | `51.5136` |\n| longitude | [float](`float`) | Location longitude. | `7.4653` |\n| timezone | [str](`str`) | IANA timezone string. | `'UTC'` |\n| state | [str](`str`) | Subdivision code for regional holidays. | `'NW'` |\n| include_weather_windows | [bool](`bool`) | Weather-window feature toggle. | `False` |\n| include_holiday_features | [bool](`bool`) | Holiday feature toggle. | `False` |\n| include_holiday_adjacency_features | [bool](`bool`) | Brückentag and before/after-holiday indicator toggle. Defaults to ``False``. | `False` |\n| poly_features_degree | [int](`int`) | Polynomial-interaction degree passed to the feature builder. ``1`` (default) generates no interactions; ``2`` adds pairwise bilinear terms; ``3+`` higher order. | `1` |\n| max_poly_features | [int](`int`) | Cap on polynomial interaction columns. When more than this many ``poly_*`` columns are generated, only the top ``max_poly_features`` ranked by mutual information with the target are kept (``<= 0`` disables the cap). Defaults to ``10``. | `10` |\n| poly_mi_n_jobs | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Parallel jobs for the mutual-information ranking that enforces ``max_poly_features``. ``-1`` (default) uses all cores; ``None`` runs single-threaded. Parallelism does not change the selection. | `-1` |\n| poly_mi_sample_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Row cap for that ranking; longer series are scored on a reproducible random subsample of this size (seeded by ``random_state``), which can change which borderline columns make the top K. ``None`` scores every row (the pre-15.8 behaviour). Defaults to ``4000``. | `4000` |\n| include_covid_infection_rate | [bool](`bool`) | Append the bundled German national COVID-19 7-day incidence (RKI) as an exogenous level regressor. Defaults to ``False``. | `False` |\n| include_entsoe_forecast_load | [bool](`bool`) | Append the ENTSO-E day-ahead Forecasted Load as a near-oracle exogenous prior. Defaults to ``False``. | `False` |\n| include_entsoe_renewable_forecast | [bool](`bool`) | Append the ENTSO-E day-ahead wind and solar generation forecast. Defaults to ``False``. | `False` |\n| include_entsoe_net_load | [bool](`bool`) | Append the ENTSO-E day-ahead net load (Forecasted Load minus wind/solar forecast). Defaults to ``False``. | `False` |\n| include_entsoe_day_ahead_price | [bool](`bool`) | Append the ENTSO-E day-ahead spot price (DE/LU). Defaults to ``False``. | `False` |\n| index_name | [str](`str`) | Datetime column name when the DataFrame index is reset. ENTSO-E CSVs use ``\"Time (UTC)\"``; defaults to that. | `'Time (UTC)'` |\n| bounds | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[tuple](`tuple`)\\]\\] | Per-column outlier bounds. For single-target ENTSO-E this is typically ``None`` or a single ``[(lower, upper)]`` entry. | `None` |\n| verbose | [bool](`bool`) | Verbose pipeline output. | `False` |\n| cache_home | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Cache directory override. | `None` |\n| n_trials_optuna | [int](`int`) | Optuna Bayesian-search trial budget. | `15` |\n| n_trials_spotoptim | [int](`int`) | SpotOptim surrogate-search trial budget. | `10` |\n| n_initial_spotoptim | [int](`int`) | SpotOptim initial random evaluations. | `5` |\n| n_jobs_spotoptim | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Worker count for SpotOptim's parallel (steady-state) evaluation. ``None`` (default) runs sequentially; ``-1`` uses all CPU cores; a positive integer pins the worker count. Parallel tuning is faster but, being steady-state, changes the search trajectory, so the tuned result is not bit-identical to a sequential run even with a fixed ``random_state``. | `None` |\n| warm_start_lags | [bool](`bool`) | Seed the SpotOptim search with ``lags_consider``. | `False` |\n| task | [str](`str`) | Active prediction task name. | `'lazy'` |\n| agg_weights | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[float](`float`)\\]\\] | Per-target aggregation weights. For single-target use this is typically ``[1.0]`` or ``None``. | `None` |\n| forecaster_factory | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Callable ``factory(config, *, weight_func, target) -> forecaster`` consumed by ``BaseTask.create_forecaster``. ``None`` falls back to the default LightGBM factory. | `None` |\n| data_loader | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Callable ``data_loader(config)`` returning a pandas DataFrame. Invoked by ``BaseTask.prepare_data`` when no DataFrame is supplied — the ENTSO-E pipeline hook for ``download_new_data`` / ``merge_build_manual``. | `None` |\n| test_data_loader | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Callable ``test_data_loader(config)`` returning a pandas DataFrame with ground-truth values for the prediction horizon. Invoked by ``BaseTask.prepare_data`` when no test DataFrame is supplied; the returned frame populates ``test_actual`` and ``metrics_future`` in the prediction package. | `None` |\n| auto_save_models | [bool](`bool`) | Whether ``BaseTask._run_strategy`` should persist fitted forecasters to ``/models/`` after every training run. Defaults to ``True``. | `True` |\n| data_frame_name | [str](`str`) | Identifier for the active dataset. Used by ``BaseTask`` to name cache subdirectories, model files, and the per-dataset log file. Defaults to ``\"default\"``. | `'default'` |\n| number_folds | [int](`int`) | Number of folds used by ``BaseTask.cv_ts`` when building the ``TimeSeriesSplit`` cross-validation splitter for tuning tasks. Defaults to ``10``. | `10` |\n| on_weather_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Policy for handling Open-Meteo fetch failures inside ``BaseTask.build_exogenous_features``. ``\"raise\"`` (default) aborts the pipeline with a ``WeatherFetchError`` and preserves the safety-critical fail-safe semantics. ``\"skip\"`` logs a warning and continues with empty weather features so the rest of the pipeline can run without the Open-Meteo dependency. | `'raise'` |\n| on_exog_provider_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Policy for an exogenous-provider failure inside ``ExogBuilder.build``. ``\"raise\"`` (default) propagates the ``ExogProviderError`` (fail-safe); ``\"skip\"`` logs a warning and omits that provider's columns. | `'raise'` |\n| exog_max_gap_hours | [int](`int`) | Maximum length, in hours, of a contiguous run of missing exogenous-provider values healed before the provider is rejected. Interior gaps are time-interpolated; leading/trailing edge gaps are back-/forward-filled. ``0`` (default) keeps the strict fail-safe (any gap raises). Healed runs are logged with count and span. Only already-published day-ahead vintages are involved, so healing is leakage-clean (CR-3). | `0` |\n| exog_max_tail_gap_hours | [int](`int`) | Extended healing budget, in hours, applied exclusively to the trailing-edge NaN run (the run containing the last index timestamp). The effective tail budget is ``max(exog_max_gap_hours, exog_max_tail_gap_hours)``. The canonical use case is the ENTSO-E day-ahead publication frontier: the last published vintage is zero-order-held forward to the forecast horizon without touching interior gaps (CR-3-clean). When ``exog_max_tail_gap_hours <= exog_max_gap_hours`` the parameter is inert (the interior budget already covers the tail) and a warning is logged. Defaults to ``0``. | `0` |\n| exog_provider_window | [Literal](`typing.Literal`)\\[\\'full\\', \\'train\\'\\] | Span the exogenous providers are validated against. ``\"full\"`` (default) requires coverage of the entire ``data_start``→``cov_end`` request, matching prior behaviour. ``\"train\"`` validates only the consumed window ``[start_train_ts, cov_end]``, tolerating missing values before the training window. Honoured by the MultiTask pipeline; the forecaster-wrapper path currently always validates the full span. | `'full'` |\n| retrain_max_age | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Maximum age of a previously trained model before retraining is required. Consumed by ``spotforecast2_safe.manager.trainer.should_retrain`` to gate scheduled retraining workflows. Defaults to ``Timedelta(days=7)``. | `None` |\n\n## Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|--------------------|----------------------------------------------------|--------------------------------------------------------------------------------------------|\n| country_code | [str](`str`) | ISO country code used for API queries and holiday feature generation. |\n| auto_save_models | [bool](`bool`) | Whether to auto-persist fitted forecasters after each training run. |\n| data_frame_name | [str](`str`) | Active-dataset identifier used for cache and log-file naming. |\n| number_folds | [int](`int`) | Cross-validation fold count for tuning tasks. |\n| on_weather_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Open-Meteo fetch-failure policy: ``\"raise\"`` aborts, ``\"skip\"`` continues without weather. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#3838177c .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\n\nfrom spotforecast2_safe.configurator.config_entsoe import ConfigEntsoe\n\n# Use default configuration\nconfig = ConfigEntsoe()\nprint(config.country_code)\nprint(config.predict_size)\nprint(config.random_state)\n\n# Create custom configuration\ncustom_config = ConfigEntsoe(\n country_code=\"FR\",\n predict_size=48,\n cv_block_size=24,\n random_state=42,\n)\nprint(custom_config.country_code)\nprint(custom_config.predict_size)\nprint(custom_config.cv_block_size)\n\n# Verify training window\nassert config.train_size == pd.Timedelta(days=3 * 365)\n\n# Check default periods\nprint(len(config.periods))\nprint(config.periods[0].name)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nDE\n24\n314159\nFR\n48\n24\n5\ndaily\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [get_params](#spotforecast2_safe.configurator.config_entsoe.ConfigEntsoe.get_params) | Get parameters for this configuration object. |\n| [set_params](#spotforecast2_safe.configurator.config_entsoe.ConfigEntsoe.set_params) | Set the parameters of this configuration object. |\n\n### get_params { #spotforecast2_safe.configurator.config_entsoe.ConfigEntsoe.get_params }\n\n```python\nconfigurator.config_entsoe.ConfigEntsoe.get_params(deep=True)\n```\n\nGet parameters for this configuration object.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|----------------|-----------------------------------------------------------------------------------------------------------|-----------|\n| deep | [bool](`bool`) | If True, will return the parameters for this configuration and contained sub-objects that are estimators. | `True` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|-----------------------------------------------------------|-------------------------------------------------------|\n| params | [Dict](`typing.Dict`)\\[[str](`str`), [object](`object`)\\] | Dictionary of parameter names mapped to their values. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#87521d47 .cell execution_count=2}\n``` {.python .cell-code}\nfrom spotforecast2_safe.configurator.config_entsoe import ConfigEntsoe\n\nconfig = ConfigEntsoe(country_code=\"FR\")\np = config.get_params()\nprint(p[\"country_code\"])\nprint(p[\"predict_size\"])\nassert p[\"country_code\"] == \"FR\"\nassert p[\"predict_size\"] == 24\nassert p[\"cv_block_size\"] is None\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nFR\n24\n```\n:::\n:::\n\n\n### set_params { #spotforecast2_safe.configurator.config_entsoe.ConfigEntsoe.set_params }\n\n```python\nconfigurator.config_entsoe.ConfigEntsoe.set_params(params=None, **kwargs)\n```\n\nSet the parameters of this configuration object.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|----------|-----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|\n| params | [Dict](`typing.Dict`)\\[[str](`str`), [object](`object`)\\] | Optional dictionary of parameter names mapped to their new values. | `None` |\n| **kwargs | [object](`object`) | Additional parameter names mapped to their new values. It supports configuring nested 'Period' objects using the `periods____` notation. | `{}` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------------|------------------------------------------------------------------------------|--------------------------------------------------------------------------------|\n| ConfigEntsoe | [ConfigEntsoe](`spotforecast2_safe.configurator.config_entsoe.ConfigEntsoe`) | The configuration instance with updated parameters (supports method chaining). |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#a6fe9c15 .cell execution_count=3}\n``` {.python .cell-code}\nfrom spotforecast2_safe.configurator.config_entsoe import ConfigEntsoe\n\nconfig = ConfigEntsoe()\n\n# Flat parameter setting\nconfig.set_params(country_code=\"FR\", predict_size=48)\nprint(config.country_code)\nprint(config.predict_size)\nassert config.country_code == \"FR\"\nassert config.predict_size == 48\n\n# Deep parameter setting for nested Period objects\nconfig.set_params(periods__daily__n_periods=24)\ndaily_n = next(p.n_periods for p in config.periods if p.name == \"daily\")\nprint(daily_n)\nassert daily_n == 24\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nFR\n48\n24\n```\n:::\n:::\n\n\n", "supporting": [ "configurator.config_entsoe.ConfigEntsoe_files/figure-html" ], diff --git a/_freeze/docs/reference/configurator.config_multi.ConfigMulti/execute-results/html.json b/_freeze/docs/reference/configurator.config_multi.ConfigMulti/execute-results/html.json index cb41eb73..ea4db239 100644 --- a/_freeze/docs/reference/configurator.config_multi.ConfigMulti/execute-results/html.json +++ b/_freeze/docs/reference/configurator.config_multi.ConfigMulti/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "fc052fd934deeae4b42dfd8e50c81d82", + "hash": "255430576026657649ea871c9f95dbc8", "result": { "engine": "jupyter", - "markdown": "---\ntitle: configurator.config_multi.ConfigMulti\n---\n\n\n\n```python\nconfigurator.config_multi.ConfigMulti(\n country_code='DE',\n periods=None,\n lags_consider=None,\n train_size=None,\n end_train_default='2025-12-31 00:00+00:00',\n delta_val=None,\n predict_size=24,\n cv_block_size=None,\n refit_size=7,\n random_state=314159,\n n_hyperparameters_trials=20,\n data_filename='interim/energy_load.csv',\n targets=None,\n use_outlier_detection=True,\n contamination=0.01,\n imputation_method='weighted',\n window_size=72,\n imputation_window_size=None,\n use_exogenous_features=True,\n latitude=51.5136,\n longitude=7.4653,\n timezone='UTC',\n state='NW',\n include_weather_windows=False,\n include_holiday_features=False,\n include_holiday_adjacency_features=False,\n poly_features_degree=1,\n max_poly_features=10,\n poly_mi_n_jobs=-1,\n poly_mi_sample_size=4000,\n include_covid_infection_rate=False,\n include_entsoe_forecast_load=False,\n include_entsoe_renewable_forecast=False,\n include_entsoe_net_load=False,\n include_entsoe_day_ahead_price=False,\n index_name='DateTime',\n bounds=None,\n verbose=False,\n cache_home=None,\n n_trials_optuna=15,\n n_trials_spotoptim=10,\n n_initial_spotoptim=5,\n n_jobs_spotoptim=None,\n warm_start_lags=False,\n task='lazy',\n agg_weights=None,\n forecaster_factory=None,\n data_loader=None,\n test_data_loader=None,\n auto_save_models=True,\n data_frame_name='default',\n number_folds=10,\n on_weather_failure='raise',\n on_exog_provider_failure='raise',\n exog_max_gap_hours=0,\n exog_max_tail_gap_hours=0,\n exog_provider_window='full',\n target_qc_range_mw=None,\n target_qc_step_mw=None,\n target_qc_window_days=None,\n target_corruption_policy='abort',\n target_max_heal_hours=0,\n target_anchor_zone_hours=168,\n)\n```\n\nConfiguration for the multi-input forecasting pipeline.\n\nThis class manages all configuration parameters for the multi-input task,\nincluding training/prediction intervals, data sources, and feature\nengineering specifications. All parameters can be customized during\ninitialization or used with sensible defaults.\n\n``country_code`` serves as the single ISO country code used for both\nAPI queries and holiday feature generation.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------------------------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|\n| country_code | [str](`str`) | ISO 3166-1 alpha-2 country code (e.g. ``\"DE\"``). Used for both API queries and holiday feature generation. | `'DE'` |\n| periods | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[Period](`spotforecast2_safe.data.Period`)\\]\\] | List of Period objects defining cyclical feature encodings. | `None` |\n| lags_consider | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[int](`int`)\\]\\] | List of lag values to consider for feature selection. | `None` |\n| train_size | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timedelta](`pandas.Timedelta`)\\] | Time window for training data. | `None` |\n| end_train_default | [str](`str`) | Default end date for training period (ISO format with timezone). | `'2025-12-31 00:00+00:00'` |\n| delta_val | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timedelta](`pandas.Timedelta`)\\] | Validation window size. | `None` |\n| predict_size | [int](`int`) | Number of hours to predict ahead. | `24` |\n| cv_block_size | [int](`int`) \\| None | Cross-validation test-block width in hours. Defaults to ``None``, meaning the CV uses ``predict_size``. Set to a fixed value (e.g. ``24``) to decouple the cross-validation horizon from a render-dependent live ``predict_size``. | `None` |\n| refit_size | [int](`int`) | Number of days between model refits. | `7` |\n| random_state | [int](`int`) | Random seed for reproducibility. | `314159` |\n| n_hyperparameters_trials | [int](`int`) | Number of trials for hyperparameter optimization. | `20` |\n| data_filename | [str](`str`) | Path to the interim merged data file. | `'interim/energy_load.csv'` |\n| targets | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[str](`str`)\\]\\] | List of target column names to train models for. When ``None`` (default), no targets are pre-selected; set this attribute after loading the dataset (e.g. ``config.targets = df.columns.tolist()``). Replaces standalone ``TARGETS`` and ``target_columns`` variables in pipeline scripts, providing a single source of truth for the active target set. | `None` |\n| use_outlier_detection | [bool](`bool`) | If True, apply IsolationForest-based outlier removal. | `True` |\n| contamination | [float](`float`) | Proportion of outliers for IsolationForest (0 < contamination < 0.5). | `0.01` |\n| imputation_method | [str](`str`) | Gap-filling strategy — ``\"weighted\"`` (n2n-style rolling weights) or ``\"linear\"`` (linear interpolation). | `'weighted'` |\n| window_size | [int](`int`) | Rolling window size in hours for gap detection (weighted imputation). | `72` |\n| use_exogenous_features | [bool](`bool`) | If True, build weather/calendar/day-night/holiday features. | `True` |\n| latitude | [float](`float`) | Latitude of the target location in decimal degrees. | `51.5136` |\n| longitude | [float](`float`) | Longitude of the target location in decimal degrees. | `7.4653` |\n| timezone | [str](`str`) | IANA timezone string for the target location (e.g. ``\"Europe/Berlin\"``). | `'UTC'` |\n| state | [str](`str`) | ISO 3166-2 subdivision code for regional holidays (e.g. ``\"NW\"``). | `'NW'` |\n| include_weather_windows | [bool](`bool`) | If True, include rolling weather-window features. | `False` |\n| include_holiday_features | [bool](`bool`) | If True, include public-holiday indicator features. | `False` |\n| include_holiday_adjacency_features | [bool](`bool`) | If True, include Brückentag and before/after-holiday indicators (``is_brueckentag``, ``is_before_holiday``, ``is_after_holiday``). Defaults to ``False``. | `False` |\n| poly_features_degree | [int](`int`) | Polynomial-interaction degree. ``1`` (default) generates no interactions; ``2`` adds pairwise bilinear terms; ``3+`` higher order. | `1` |\n| max_poly_features | [int](`int`) | Cap on polynomial interaction columns; only the top ``max_poly_features`` ranked by mutual information with the target are kept (``<= 0`` disables). Defaults to ``10``. | `10` |\n| poly_mi_n_jobs | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Parallel jobs for the mutual-information ranking that enforces ``max_poly_features``. ``-1`` (default) uses all cores; ``None`` runs single-threaded. Parallelism does not change the selection. | `-1` |\n| poly_mi_sample_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Row cap for that ranking; longer series are scored on a reproducible random subsample of this size (seeded by ``random_state``), which can change which borderline columns make the top K. ``None`` scores every row (the pre-15.8 behaviour). Defaults to ``4000``. | `4000` |\n| index_name | [str](`str`) | Name assigned to the datetime column when the index is reset. Defaults to ``\"DateTime\"``. | `'DateTime'` |\n| bounds | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[tuple](`tuple`)\\]\\] | Per-column outlier bounds as a list of ``(lower, upper)`` tuples, one entry per target column. ``None`` until set. | `None` |\n| verbose | [bool](`bool`) | If ``True``, enable verbose output for pipeline steps. Defaults to ``False``. | `False` |\n| cache_home | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Path to the cache directory. ``None`` means the library default (``~/spotforecast2_cache/``) is used. | `None` |\n| n_trials_optuna | [int](`int`) | Number of Optuna Bayesian-search trials for hyperparameter optimization (task 3). Defaults to ``15``. | `15` |\n| n_trials_spotoptim | [int](`int`) | Number of SpotOptim surrogate-search trials (task 4). Defaults to ``10``. | `10` |\n| n_initial_spotoptim | [int](`int`) | Number of initial random evaluations for SpotOptim (task 4). Defaults to ``5``. | `5` |\n| n_jobs_spotoptim | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Worker count for SpotOptim's parallel (steady-state) evaluation. ``None`` (default) runs sequentially; ``-1`` uses all CPU cores; a positive integer pins the worker count. Parallel tuning is faster but, being steady-state, changes the search trajectory, so the tuned result is not bit-identical to a sequential run even with a fixed ``random_state``. | `None` |\n| warm_start_lags | [bool](`bool`) | When True, the SpotOptim task injects ``lags_consider`` as a candidate lag set and seeds the optimizer's first evaluation with it. Defaults to ``False``. | `False` |\n| task | [str](`str`) | Active prediction task — one of ``\"lazy\"``, ``\"training\"``, ``\"optuna\"``, or ``\"spotoptim\"``. Defaults to ``\"lazy\"``. | `'lazy'` |\n| agg_weights | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[float](`float`)\\]\\] | Per-target aggregation weights used when combining individual target forecasts into a single weighted sum. The list must contain one weight per entry in ``targets`` (in the same order). Positive values add the target's contribution; negative values invert it. Slice the list to ``agg_weights[:len(targets)]`` when only a subset of targets is active. Defaults to ``None`` (no weights pre-defined; set after loading the dataset). | `None` |\n| auto_save_models | [bool](`bool`) | Whether ``BaseTask._run_strategy`` should persist fitted forecasters to ``/models/`` after every training run. Defaults to ``True`` so that saved models are immediately available for ``PredictTask`` without an explicit ``save_models()`` call. | `True` |\n| data_frame_name | [str](`str`) | Identifier for the active dataset. Used by ``BaseTask`` to name cache subdirectories, model files, and the per-dataset log file. Defaults to ``\"default\"``. | `'default'` |\n| on_weather_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Policy for handling Open-Meteo fetch failures inside ``BaseTask.build_exogenous_features``. ``\"raise\"`` (default) aborts the pipeline with a ``WeatherFetchError`` and preserves the safety-critical fail-safe semantics. ``\"skip\"`` logs a warning and continues with empty weather features so the rest of the pipeline can run without the Open-Meteo dependency. | `'raise'` |\n| exog_max_gap_hours | [int](`int`) | Maximum length, in hours, of a contiguous run of missing exogenous-provider values healed before the provider is rejected. Interior gaps are time-interpolated; leading/trailing edge gaps are back-/forward-filled. ``0`` (default) keeps the strict fail-safe (any gap raises). Healed runs are logged with count and span. Only already-published day-ahead vintages are involved, so healing is leakage-clean (CR-3). | `0` |\n| exog_max_tail_gap_hours | [int](`int`) | Extended healing budget, in hours, applied exclusively to the trailing-edge NaN run (the run containing the last index timestamp). The effective tail budget is ``max(exog_max_gap_hours, exog_max_tail_gap_hours)``. The canonical use case is the ENTSO-E day-ahead publication frontier: the last published vintage is zero-order-held forward to the forecast horizon without touching interior gaps (CR-3-clean). When ``exog_max_tail_gap_hours <= exog_max_gap_hours`` the parameter is inert (the interior budget already covers the tail) and a warning is logged. Defaults to ``0``. | `0` |\n| exog_provider_window | [Literal](`typing.Literal`)\\[\\'full\\', \\'train\\'\\] | Span the exogenous providers are validated against. ``\"full\"`` (default) requires coverage of the entire ``data_start``→``cov_end`` request, matching prior behaviour. ``\"train\"`` validates only the consumed window ``[start_train_ts, cov_end]``, tolerating missing values before the training window. Honoured by the MultiTask pipeline; the forecaster-wrapper path currently always validates the full span. | `'full'` |\n\n## Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|------------------------------------|----------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| country_code | [str](`str`) | ISO country code for API queries and holiday generation. |\n| periods | [List](`typing.List`)\\[[Period](`spotforecast2_safe.data.Period`)\\] | Cyclical feature encoding specifications. |\n| lags_consider | [List](`typing.List`)\\[[int](`int`)\\] | Lag values for autoregressive features. |\n| train_size | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Training data window. |\n| end_train_default | [str](`str`) | Default training end date. |\n| delta_val | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Validation window. |\n| predict_size | [int](`int`) | Prediction horizon in hours. |\n| refit_size | [int](`int`) | Refit interval in days. |\n| random_state | [int](`int`) | Random seed. |\n| n_hyperparameters_trials | [int](`int`) | Hyperparameter tuning trials. |\n| targets | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[str](`str`)\\]\\] | Active target column names. ``None`` until explicitly set from the loaded dataset. |\n| use_outlier_detection | [bool](`bool`) | IsolationForest outlier removal toggle. |\n| contamination | [float](`float`) | IsolationForest contamination fraction. |\n| imputation_method | [str](`str`) | Gap-filling strategy (``\"weighted\"`` or ``\"linear\"``). |\n| window_size | [int](`int`) | Rolling window size for weighted imputation. |\n| use_exogenous_features | [bool](`bool`) | Exogenous feature construction toggle. |\n| latitude | [float](`float`) | Location latitude. |\n| longitude | [float](`float`) | Location longitude. |\n| timezone | [str](`str`) | IANA timezone string. |\n| state | [str](`str`) | Subdivision code for regional holidays. |\n| include_weather_windows | [bool](`bool`) | Weather-window feature toggle. |\n| include_holiday_features | [bool](`bool`) | Holiday feature toggle. |\n| include_holiday_adjacency_features | [bool](`bool`) | Brückentag and before/after-holiday indicator toggle. Defaults to ``False``. |\n| poly_features_degree | [int](`int`) | Polynomial-interaction degree (1 = off). |\n| max_poly_features | [int](`int`) | Cap on kept ``poly_*`` columns (top-K by MI). |\n| poly_mi_n_jobs | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Parallel jobs for the MI ranking (``-1`` = all cores; selection-invariant). |\n| poly_mi_sample_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Row cap for the MI ranking (``None`` = score every row). |\n| include_covid_infection_rate | [bool](`bool`) | Append the bundled RKI German national COVID-19 7-day incidence as an exogenous regressor. |\n| include_entsoe_forecast_load | [bool](`bool`) | Append the ENTSO-E day-ahead Forecasted Load as a near-oracle exogenous prior. |\n| include_entsoe_renewable_forecast | [bool](`bool`) | Append the ENTSO-E day-ahead wind/solar generation forecast. |\n| include_entsoe_net_load | [bool](`bool`) | Append the ENTSO-E day-ahead net load (Forecasted Load minus wind/solar forecast). |\n| include_entsoe_day_ahead_price | [bool](`bool`) | Append the ENTSO-E day-ahead spot price (DE/LU). |\n| index_name | [str](`str`) | Datetime column name used when resetting the index. |\n| bounds | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[tuple](`tuple`)\\]\\] | Per-column outlier bounds ``(lower, upper)``. |\n| verbose | [bool](`bool`) | Verbose output toggle. |\n| cache_home | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Path to the cache directory. |\n| n_trials_optuna | [int](`int`) | Number of Optuna hyperparameter-search trials. |\n| n_trials_spotoptim | [int](`int`) | Number of SpotOptim search trials. |\n| n_initial_spotoptim | [int](`int`) | Number of initial SpotOptim evaluations. |\n| n_jobs_spotoptim | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Worker count for SpotOptim's parallel (steady-state) evaluation (``None`` = sequential, ``-1`` = all cores). |\n| warm_start_lags | [bool](`bool`) | Seed the SpotOptim search with ``lags_consider``. |\n| task | [str](`str`) | Active prediction task (``\"lazy\"``, ``\"training\"``, ``\"optuna\"``, or ``\"spotoptim\"``). |\n| agg_weights | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[float](`float`)\\]\\] | Per-target aggregation weights. One weight per entry in ``targets``; positive values add, negative values invert the target's contribution. ``None`` until set. |\n| auto_save_models | [bool](`bool`) | Whether to auto-persist fitted forecasters after each training run. |\n| data_frame_name | [str](`str`) | Active-dataset identifier used for cache and log-file naming. |\n| number_folds | [int](`int`) | Cross-validation fold count for tuning tasks. |\n| on_weather_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Open-Meteo fetch-failure policy: ``\"raise\"`` aborts, ``\"skip\"`` continues without weather. |\n| on_exog_provider_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Exog-provider failure policy in ``ExogBuilder.build``: ``\"raise\"`` (default) propagates the ``ExogProviderError``; ``\"skip\"`` logs and omits the failing provider's columns. |\n| exog_max_gap_hours | [int](`int`) | Maximum contiguous gap in hours that providers will heal before raising (0 = strict fail-safe). |\n| exog_provider_window | [Literal](`typing.Literal`)\\[\\'full\\', \\'train\\'\\] | Validation window for exog providers: ``\"full\"`` (default) or ``\"train\"``. |\n\n## Notes {.doc-section .doc-section-notes}\n\nThe default period configurations use specific `n_periods` to balance resolution and smoothing:\n- **Daily**: `n_periods=12` (24h) provides ~2h resolution, smoothing hourly noise and halving dimensionality.\n- **Weekly**: `n_periods` typically matches range (1:1) to distinguish day-of-week patterns.\n- **Yearly**: `n_periods=12` (365d) provides ~1 month resolution, capturing broad seasonal trends without overfitting.\n\nSee `docs/PERIOD_CONFIGURATION_RATIONALE.md` for a detailed analysis.\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#78a37d25 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\nconfig = ConfigMulti()\nprint(f\"country_code: {config.country_code}\")\nprint(f\"Predict size: {config.predict_size}\")\nprint(f\"Random state: {config.random_state}\")\nprint(f\"Targets (default): {config.targets}\")\nprint(f\"agg_weights (default): {config.agg_weights}\")\nprint(f\"index_name: {config.index_name}\")\nprint(f\"bounds: {config.bounds}\")\n\n# Set targets and bounds (user input that stays on the config)\nconfig.targets = [\"A\", \"B\", \"C\"]\nconfig.bounds = [(-2500, 4500), (-10, 3000)]\nprint(f\"Targets (after setting): {config.targets}\")\nprint(f\"bounds: {config.bounds}\")\n\n# Create custom configuration — country_code serves both API and holiday purposes\ncustom_config = ConfigMulti(\n country_code='FR',\n predict_size=48,\n random_state=42,\n targets=[\"A\", \"B\"],\n index_name=\"DateTime\",\n)\nprint(f\"country_code: {custom_config.country_code}\")\nprint(f\"Predict size: {custom_config.predict_size}\")\nprint(f\"Random state: {custom_config.random_state}\")\nprint(f\"Targets: {custom_config.targets}\")\n\n# Verify training window\nprint(f\"Training window: {config.train_size == pd.Timedelta(days=3 * 365)}\")\n\n# Check default periods\nprint(f\"Number of periods: {len(config.periods)}\")\nprint(f\"First period name: {config.periods[0].name}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ncountry_code: DE\nPredict size: 24\nRandom state: 314159\nTargets (default): None\nagg_weights (default): None\nindex_name: DateTime\nbounds: None\nTargets (after setting): ['A', 'B', 'C']\nbounds: [(-2500, 4500), (-10, 3000)]\ncountry_code: FR\nPredict size: 48\nRandom state: 42\nTargets: ['A', 'B']\nTraining window: True\nNumber of periods: 5\nFirst period name: daily\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [get_params](#spotforecast2_safe.configurator.config_multi.ConfigMulti.get_params) | Get parameters for this configuration object. |\n| [set_params](#spotforecast2_safe.configurator.config_multi.ConfigMulti.set_params) | Set the parameters of this configuration object. |\n\n### get_params { #spotforecast2_safe.configurator.config_multi.ConfigMulti.get_params }\n\n```python\nconfigurator.config_multi.ConfigMulti.get_params(deep=True)\n```\n\nGet parameters for this configuration object.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|----------------|-----------------------------------------------------------------------------------------------------------|-----------|\n| deep | [bool](`bool`) | If True, will return the parameters for this configuration and contained sub-objects that are estimators. | `True` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|-----------------------------------------------------------|-------------------------------------------------------|\n| params | [Dict](`typing.Dict`)\\[[str](`str`), [object](`object`)\\] | Dictionary of parameter names mapped to their values. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#84f01597 .cell execution_count=2}\n``` {.python .cell-code}\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\nconfig = ConfigMulti(country_code=\"FR\")\np = config.get_params()\nprint(f\"country_code: {p['country_code']}\")\nprint(f\"Predict size: {p['predict_size']}\")\nprint(f\"Random state: {p['random_state']}\")\nprint(f\"index_name: {p['index_name']}\")\nprint(f\"bounds: {p['bounds']}\")\nprint(f\"agg_weights: {p['agg_weights']}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ncountry_code: FR\nPredict size: 24\nRandom state: 314159\nindex_name: DateTime\nbounds: None\nagg_weights: None\n```\n:::\n:::\n\n\n### set_params { #spotforecast2_safe.configurator.config_multi.ConfigMulti.set_params }\n\n```python\nconfigurator.config_multi.ConfigMulti.set_params(params=None, **kwargs)\n```\n\nSet the parameters of this configuration object.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|----------|-----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|\n| params | [Dict](`typing.Dict`)\\[[str](`str`), [object](`object`)\\] | Optional dictionary of parameter names mapped to their new values. | `None` |\n| **kwargs | [object](`object`) | Additional parameter names mapped to their new values. It supports configuring nested 'Period' objects using the `periods____` notation. | `{}` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|-------------|---------------------------------------------------------------------------|--------------------------------------------------------------------------------|\n| ConfigMulti | [ConfigMulti](`spotforecast2_safe.configurator.config_multi.ConfigMulti`) | The configuration instance with updated parameters (supports method chaining). |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#43f6aa06 .cell execution_count=3}\n``` {.python .cell-code}\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\nconfig = ConfigMulti()\n_ = config.set_params(country_code=\"FR\", predict_size=48)\nprint(f\"country_code: {config.country_code}\")\nprint(f\"Predict size: {config.predict_size}\")\nprint(f\"Random state: {config.random_state}\")\n\n# Deep parameter setting\n_ = config.set_params(periods__daily__n_periods=24)\nprint(next(p.n_periods for p in config.periods if p.name == \"daily\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ncountry_code: FR\nPredict size: 48\nRandom state: 314159\n24\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: configurator.config_multi.ConfigMulti\n---\n\n\n\n```python\nconfigurator.config_multi.ConfigMulti(\n country_code='DE',\n periods=None,\n lags_consider=None,\n train_size=None,\n end_train_default='2025-12-31 00:00+00:00',\n delta_val=None,\n predict_size=24,\n cv_block_size=None,\n refit_size=7,\n random_state=314159,\n n_hyperparameters_trials=20,\n data_filename='interim/energy_load.csv',\n targets=None,\n use_outlier_detection=True,\n contamination=0.01,\n imputation_method='weighted',\n window_size=72,\n imputation_window_size=None,\n use_exogenous_features=True,\n latitude=51.5136,\n longitude=7.4653,\n timezone='UTC',\n state='NW',\n include_weather_windows=False,\n include_holiday_features=False,\n include_holiday_adjacency_features=False,\n poly_features_degree=1,\n max_poly_features=10,\n poly_mi_n_jobs=-1,\n poly_mi_sample_size=4000,\n include_covid_infection_rate=False,\n include_entsoe_forecast_load=False,\n include_entsoe_renewable_forecast=False,\n include_entsoe_net_load=False,\n include_entsoe_day_ahead_price=False,\n index_name='DateTime',\n bounds=None,\n verbose=False,\n cache_home=None,\n n_trials_optuna=15,\n n_trials_spotoptim=10,\n n_initial_spotoptim=5,\n n_jobs_spotoptim=None,\n warm_start_lags=False,\n task='lazy',\n agg_weights=None,\n forecaster_factory=None,\n data_loader=None,\n test_data_loader=None,\n auto_save_models=True,\n data_frame_name='default',\n number_folds=10,\n on_weather_failure='raise',\n on_exog_provider_failure='raise',\n exog_max_gap_hours=0,\n exog_max_tail_gap_hours=0,\n exog_provider_window='full',\n target_qc_range_mw=None,\n target_qc_step_mw=None,\n target_qc_window_days=None,\n target_corruption_policy='abort',\n target_max_heal_hours=0,\n target_anchor_zone_hours=168,\n target_qc_deviation_mw=None,\n target_qc_deviation_ref=None,\n target_qc_deviation_slots=2,\n)\n```\n\nConfiguration for the multi-input forecasting pipeline.\n\nThis class manages all configuration parameters for the multi-input task,\nincluding training/prediction intervals, data sources, and feature\nengineering specifications. All parameters can be customized during\ninitialization or used with sensible defaults.\n\n``country_code`` serves as the single ISO country code used for both\nAPI queries and holiday feature generation.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------------------------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|\n| country_code | [str](`str`) | ISO 3166-1 alpha-2 country code (e.g. ``\"DE\"``). Used for both API queries and holiday feature generation. | `'DE'` |\n| periods | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[Period](`spotforecast2_safe.data.Period`)\\]\\] | List of Period objects defining cyclical feature encodings. | `None` |\n| lags_consider | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[int](`int`)\\]\\] | List of lag values to consider for feature selection. | `None` |\n| train_size | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timedelta](`pandas.Timedelta`)\\] | Time window for training data. | `None` |\n| end_train_default | [str](`str`) | Default end date for training period (ISO format with timezone). | `'2025-12-31 00:00+00:00'` |\n| delta_val | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timedelta](`pandas.Timedelta`)\\] | Validation window size. | `None` |\n| predict_size | [int](`int`) | Number of hours to predict ahead. | `24` |\n| cv_block_size | [int](`int`) \\| None | Cross-validation test-block width in hours. Defaults to ``None``, meaning the CV uses ``predict_size``. Set to a fixed value (e.g. ``24``) to decouple the cross-validation horizon from a render-dependent live ``predict_size``. | `None` |\n| refit_size | [int](`int`) | Number of days between model refits. | `7` |\n| random_state | [int](`int`) | Random seed for reproducibility. | `314159` |\n| n_hyperparameters_trials | [int](`int`) | Number of trials for hyperparameter optimization. | `20` |\n| data_filename | [str](`str`) | Path to the interim merged data file. | `'interim/energy_load.csv'` |\n| targets | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[str](`str`)\\]\\] | List of target column names to train models for. When ``None`` (default), no targets are pre-selected; set this attribute after loading the dataset (e.g. ``config.targets = df.columns.tolist()``). Replaces standalone ``TARGETS`` and ``target_columns`` variables in pipeline scripts, providing a single source of truth for the active target set. | `None` |\n| use_outlier_detection | [bool](`bool`) | If True, apply IsolationForest-based outlier removal. | `True` |\n| contamination | [float](`float`) | Proportion of outliers for IsolationForest (0 < contamination < 0.5). | `0.01` |\n| imputation_method | [str](`str`) | Gap-filling strategy — ``\"weighted\"`` (n2n-style rolling weights) or ``\"linear\"`` (linear interpolation). | `'weighted'` |\n| window_size | [int](`int`) | Rolling window size in hours for gap detection (weighted imputation). | `72` |\n| use_exogenous_features | [bool](`bool`) | If True, build weather/calendar/day-night/holiday features. | `True` |\n| latitude | [float](`float`) | Latitude of the target location in decimal degrees. | `51.5136` |\n| longitude | [float](`float`) | Longitude of the target location in decimal degrees. | `7.4653` |\n| timezone | [str](`str`) | IANA timezone string for the target location (e.g. ``\"Europe/Berlin\"``). | `'UTC'` |\n| state | [str](`str`) | ISO 3166-2 subdivision code for regional holidays (e.g. ``\"NW\"``). | `'NW'` |\n| include_weather_windows | [bool](`bool`) | If True, include rolling weather-window features. | `False` |\n| include_holiday_features | [bool](`bool`) | If True, include public-holiday indicator features. | `False` |\n| include_holiday_adjacency_features | [bool](`bool`) | If True, include Brückentag and before/after-holiday indicators (``is_brueckentag``, ``is_before_holiday``, ``is_after_holiday``). Defaults to ``False``. | `False` |\n| poly_features_degree | [int](`int`) | Polynomial-interaction degree. ``1`` (default) generates no interactions; ``2`` adds pairwise bilinear terms; ``3+`` higher order. | `1` |\n| max_poly_features | [int](`int`) | Cap on polynomial interaction columns; only the top ``max_poly_features`` ranked by mutual information with the target are kept (``<= 0`` disables). Defaults to ``10``. | `10` |\n| poly_mi_n_jobs | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Parallel jobs for the mutual-information ranking that enforces ``max_poly_features``. ``-1`` (default) uses all cores; ``None`` runs single-threaded. Parallelism does not change the selection. | `-1` |\n| poly_mi_sample_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Row cap for that ranking; longer series are scored on a reproducible random subsample of this size (seeded by ``random_state``), which can change which borderline columns make the top K. ``None`` scores every row (the pre-15.8 behaviour). Defaults to ``4000``. | `4000` |\n| index_name | [str](`str`) | Name assigned to the datetime column when the index is reset. Defaults to ``\"DateTime\"``. | `'DateTime'` |\n| bounds | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[tuple](`tuple`)\\]\\] | Per-column outlier bounds as a list of ``(lower, upper)`` tuples, one entry per target column. ``None`` until set. | `None` |\n| verbose | [bool](`bool`) | If ``True``, enable verbose output for pipeline steps. Defaults to ``False``. | `False` |\n| cache_home | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Path to the cache directory. ``None`` means the library default (``~/spotforecast2_cache/``) is used. | `None` |\n| n_trials_optuna | [int](`int`) | Number of Optuna Bayesian-search trials for hyperparameter optimization (task 3). Defaults to ``15``. | `15` |\n| n_trials_spotoptim | [int](`int`) | Number of SpotOptim surrogate-search trials (task 4). Defaults to ``10``. | `10` |\n| n_initial_spotoptim | [int](`int`) | Number of initial random evaluations for SpotOptim (task 4). Defaults to ``5``. | `5` |\n| n_jobs_spotoptim | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Worker count for SpotOptim's parallel (steady-state) evaluation. ``None`` (default) runs sequentially; ``-1`` uses all CPU cores; a positive integer pins the worker count. Parallel tuning is faster but, being steady-state, changes the search trajectory, so the tuned result is not bit-identical to a sequential run even with a fixed ``random_state``. | `None` |\n| warm_start_lags | [bool](`bool`) | When True, the SpotOptim task injects ``lags_consider`` as a candidate lag set and seeds the optimizer's first evaluation with it. Defaults to ``False``. | `False` |\n| task | [str](`str`) | Active prediction task — one of ``\"lazy\"``, ``\"training\"``, ``\"optuna\"``, or ``\"spotoptim\"``. Defaults to ``\"lazy\"``. | `'lazy'` |\n| agg_weights | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[float](`float`)\\]\\] | Per-target aggregation weights used when combining individual target forecasts into a single weighted sum. The list must contain one weight per entry in ``targets`` (in the same order). Positive values add the target's contribution; negative values invert it. Slice the list to ``agg_weights[:len(targets)]`` when only a subset of targets is active. Defaults to ``None`` (no weights pre-defined; set after loading the dataset). | `None` |\n| auto_save_models | [bool](`bool`) | Whether ``BaseTask._run_strategy`` should persist fitted forecasters to ``/models/`` after every training run. Defaults to ``True`` so that saved models are immediately available for ``PredictTask`` without an explicit ``save_models()`` call. | `True` |\n| data_frame_name | [str](`str`) | Identifier for the active dataset. Used by ``BaseTask`` to name cache subdirectories, model files, and the per-dataset log file. Defaults to ``\"default\"``. | `'default'` |\n| on_weather_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Policy for handling Open-Meteo fetch failures inside ``BaseTask.build_exogenous_features``. ``\"raise\"`` (default) aborts the pipeline with a ``WeatherFetchError`` and preserves the safety-critical fail-safe semantics. ``\"skip\"`` logs a warning and continues with empty weather features so the rest of the pipeline can run without the Open-Meteo dependency. | `'raise'` |\n| exog_max_gap_hours | [int](`int`) | Maximum length, in hours, of a contiguous run of missing exogenous-provider values healed before the provider is rejected. Interior gaps are time-interpolated; leading/trailing edge gaps are back-/forward-filled. ``0`` (default) keeps the strict fail-safe (any gap raises). Healed runs are logged with count and span. Only already-published day-ahead vintages are involved, so healing is leakage-clean (CR-3). | `0` |\n| exog_max_tail_gap_hours | [int](`int`) | Extended healing budget, in hours, applied exclusively to the trailing-edge NaN run (the run containing the last index timestamp). The effective tail budget is ``max(exog_max_gap_hours, exog_max_tail_gap_hours)``. The canonical use case is the ENTSO-E day-ahead publication frontier: the last published vintage is zero-order-held forward to the forecast horizon without touching interior gaps (CR-3-clean). When ``exog_max_tail_gap_hours <= exog_max_gap_hours`` the parameter is inert (the interior budget already covers the tail) and a warning is logged. Defaults to ``0``. | `0` |\n| exog_provider_window | [Literal](`typing.Literal`)\\[\\'full\\', \\'train\\'\\] | Span the exogenous providers are validated against. ``\"full\"`` (default) requires coverage of the entire ``data_start``→``cov_end`` request, matching prior behaviour. ``\"train\"`` validates only the consumed window ``[start_train_ts, cov_end]``, tolerating missing values before the training window. Honoured by the MultiTask pipeline; the forecaster-wrapper path currently always validates the full span. | `'full'` |\n\n## Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|------------------------------------|----------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| country_code | [str](`str`) | ISO country code for API queries and holiday generation. |\n| periods | [List](`typing.List`)\\[[Period](`spotforecast2_safe.data.Period`)\\] | Cyclical feature encoding specifications. |\n| lags_consider | [List](`typing.List`)\\[[int](`int`)\\] | Lag values for autoregressive features. |\n| train_size | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Training data window. |\n| end_train_default | [str](`str`) | Default training end date. |\n| delta_val | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Validation window. |\n| predict_size | [int](`int`) | Prediction horizon in hours. |\n| refit_size | [int](`int`) | Refit interval in days. |\n| random_state | [int](`int`) | Random seed. |\n| n_hyperparameters_trials | [int](`int`) | Hyperparameter tuning trials. |\n| targets | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[str](`str`)\\]\\] | Active target column names. ``None`` until explicitly set from the loaded dataset. |\n| use_outlier_detection | [bool](`bool`) | IsolationForest outlier removal toggle. |\n| contamination | [float](`float`) | IsolationForest contamination fraction. |\n| imputation_method | [str](`str`) | Gap-filling strategy (``\"weighted\"`` or ``\"linear\"``). |\n| window_size | [int](`int`) | Rolling window size for weighted imputation. |\n| use_exogenous_features | [bool](`bool`) | Exogenous feature construction toggle. |\n| latitude | [float](`float`) | Location latitude. |\n| longitude | [float](`float`) | Location longitude. |\n| timezone | [str](`str`) | IANA timezone string. |\n| state | [str](`str`) | Subdivision code for regional holidays. |\n| include_weather_windows | [bool](`bool`) | Weather-window feature toggle. |\n| include_holiday_features | [bool](`bool`) | Holiday feature toggle. |\n| include_holiday_adjacency_features | [bool](`bool`) | Brückentag and before/after-holiday indicator toggle. Defaults to ``False``. |\n| poly_features_degree | [int](`int`) | Polynomial-interaction degree (1 = off). |\n| max_poly_features | [int](`int`) | Cap on kept ``poly_*`` columns (top-K by MI). |\n| poly_mi_n_jobs | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Parallel jobs for the MI ranking (``-1`` = all cores; selection-invariant). |\n| poly_mi_sample_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Row cap for the MI ranking (``None`` = score every row). |\n| include_covid_infection_rate | [bool](`bool`) | Append the bundled RKI German national COVID-19 7-day incidence as an exogenous regressor. |\n| include_entsoe_forecast_load | [bool](`bool`) | Append the ENTSO-E day-ahead Forecasted Load as a near-oracle exogenous prior. |\n| include_entsoe_renewable_forecast | [bool](`bool`) | Append the ENTSO-E day-ahead wind/solar generation forecast. |\n| include_entsoe_net_load | [bool](`bool`) | Append the ENTSO-E day-ahead net load (Forecasted Load minus wind/solar forecast). |\n| include_entsoe_day_ahead_price | [bool](`bool`) | Append the ENTSO-E day-ahead spot price (DE/LU). |\n| index_name | [str](`str`) | Datetime column name used when resetting the index. |\n| bounds | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[tuple](`tuple`)\\]\\] | Per-column outlier bounds ``(lower, upper)``. |\n| verbose | [bool](`bool`) | Verbose output toggle. |\n| cache_home | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Path to the cache directory. |\n| n_trials_optuna | [int](`int`) | Number of Optuna hyperparameter-search trials. |\n| n_trials_spotoptim | [int](`int`) | Number of SpotOptim search trials. |\n| n_initial_spotoptim | [int](`int`) | Number of initial SpotOptim evaluations. |\n| n_jobs_spotoptim | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Worker count for SpotOptim's parallel (steady-state) evaluation (``None`` = sequential, ``-1`` = all cores). |\n| warm_start_lags | [bool](`bool`) | Seed the SpotOptim search with ``lags_consider``. |\n| task | [str](`str`) | Active prediction task (``\"lazy\"``, ``\"training\"``, ``\"optuna\"``, or ``\"spotoptim\"``). |\n| agg_weights | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[float](`float`)\\]\\] | Per-target aggregation weights. One weight per entry in ``targets``; positive values add, negative values invert the target's contribution. ``None`` until set. |\n| auto_save_models | [bool](`bool`) | Whether to auto-persist fitted forecasters after each training run. |\n| data_frame_name | [str](`str`) | Active-dataset identifier used for cache and log-file naming. |\n| number_folds | [int](`int`) | Cross-validation fold count for tuning tasks. |\n| on_weather_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Open-Meteo fetch-failure policy: ``\"raise\"`` aborts, ``\"skip\"`` continues without weather. |\n| on_exog_provider_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Exog-provider failure policy in ``ExogBuilder.build``: ``\"raise\"`` (default) propagates the ``ExogProviderError``; ``\"skip\"`` logs and omits the failing provider's columns. |\n| exog_max_gap_hours | [int](`int`) | Maximum contiguous gap in hours that providers will heal before raising (0 = strict fail-safe). |\n| exog_provider_window | [Literal](`typing.Literal`)\\[\\'full\\', \\'train\\'\\] | Validation window for exog providers: ``\"full\"`` (default) or ``\"train\"``. |\n\n## Notes {.doc-section .doc-section-notes}\n\nThe default period configurations use specific `n_periods` to balance resolution and smoothing:\n- **Daily**: `n_periods=12` (24h) provides ~2h resolution, smoothing hourly noise and halving dimensionality.\n- **Weekly**: `n_periods` typically matches range (1:1) to distinguish day-of-week patterns.\n- **Yearly**: `n_periods=12` (365d) provides ~1 month resolution, capturing broad seasonal trends without overfitting.\n\nSee `docs/PERIOD_CONFIGURATION_RATIONALE.md` for a detailed analysis.\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#c56e3366 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\nconfig = ConfigMulti()\nprint(f\"country_code: {config.country_code}\")\nprint(f\"Predict size: {config.predict_size}\")\nprint(f\"Random state: {config.random_state}\")\nprint(f\"Targets (default): {config.targets}\")\nprint(f\"agg_weights (default): {config.agg_weights}\")\nprint(f\"index_name: {config.index_name}\")\nprint(f\"bounds: {config.bounds}\")\n\n# Set targets and bounds (user input that stays on the config)\nconfig.targets = [\"A\", \"B\", \"C\"]\nconfig.bounds = [(-2500, 4500), (-10, 3000)]\nprint(f\"Targets (after setting): {config.targets}\")\nprint(f\"bounds: {config.bounds}\")\n\n# Create custom configuration — country_code serves both API and holiday purposes\ncustom_config = ConfigMulti(\n country_code='FR',\n predict_size=48,\n random_state=42,\n targets=[\"A\", \"B\"],\n index_name=\"DateTime\",\n)\nprint(f\"country_code: {custom_config.country_code}\")\nprint(f\"Predict size: {custom_config.predict_size}\")\nprint(f\"Random state: {custom_config.random_state}\")\nprint(f\"Targets: {custom_config.targets}\")\n\n# Verify training window\nprint(f\"Training window: {config.train_size == pd.Timedelta(days=3 * 365)}\")\n\n# Check default periods\nprint(f\"Number of periods: {len(config.periods)}\")\nprint(f\"First period name: {config.periods[0].name}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ncountry_code: DE\nPredict size: 24\nRandom state: 314159\nTargets (default): None\nagg_weights (default): None\nindex_name: DateTime\nbounds: None\nTargets (after setting): ['A', 'B', 'C']\nbounds: [(-2500, 4500), (-10, 3000)]\ncountry_code: FR\nPredict size: 48\nRandom state: 42\nTargets: ['A', 'B']\nTraining window: True\nNumber of periods: 5\nFirst period name: daily\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [get_params](#spotforecast2_safe.configurator.config_multi.ConfigMulti.get_params) | Get parameters for this configuration object. |\n| [set_params](#spotforecast2_safe.configurator.config_multi.ConfigMulti.set_params) | Set the parameters of this configuration object. |\n\n### get_params { #spotforecast2_safe.configurator.config_multi.ConfigMulti.get_params }\n\n```python\nconfigurator.config_multi.ConfigMulti.get_params(deep=True)\n```\n\nGet parameters for this configuration object.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|----------------|-----------------------------------------------------------------------------------------------------------|-----------|\n| deep | [bool](`bool`) | If True, will return the parameters for this configuration and contained sub-objects that are estimators. | `True` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|-----------------------------------------------------------|-------------------------------------------------------|\n| params | [Dict](`typing.Dict`)\\[[str](`str`), [object](`object`)\\] | Dictionary of parameter names mapped to their values. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#0734b515 .cell execution_count=2}\n``` {.python .cell-code}\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\nconfig = ConfigMulti(country_code=\"FR\")\np = config.get_params()\nprint(f\"country_code: {p['country_code']}\")\nprint(f\"Predict size: {p['predict_size']}\")\nprint(f\"Random state: {p['random_state']}\")\nprint(f\"index_name: {p['index_name']}\")\nprint(f\"bounds: {p['bounds']}\")\nprint(f\"agg_weights: {p['agg_weights']}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ncountry_code: FR\nPredict size: 24\nRandom state: 314159\nindex_name: DateTime\nbounds: None\nagg_weights: None\n```\n:::\n:::\n\n\n### set_params { #spotforecast2_safe.configurator.config_multi.ConfigMulti.set_params }\n\n```python\nconfigurator.config_multi.ConfigMulti.set_params(params=None, **kwargs)\n```\n\nSet the parameters of this configuration object.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|----------|-----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|\n| params | [Dict](`typing.Dict`)\\[[str](`str`), [object](`object`)\\] | Optional dictionary of parameter names mapped to their new values. | `None` |\n| **kwargs | [object](`object`) | Additional parameter names mapped to their new values. It supports configuring nested 'Period' objects using the `periods____` notation. | `{}` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|-------------|---------------------------------------------------------------------------|--------------------------------------------------------------------------------|\n| ConfigMulti | [ConfigMulti](`spotforecast2_safe.configurator.config_multi.ConfigMulti`) | The configuration instance with updated parameters (supports method chaining). |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#80d62ef1 .cell execution_count=3}\n``` {.python .cell-code}\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\nconfig = ConfigMulti()\n_ = config.set_params(country_code=\"FR\", predict_size=48)\nprint(f\"country_code: {config.country_code}\")\nprint(f\"Predict size: {config.predict_size}\")\nprint(f\"Random state: {config.random_state}\")\n\n# Deep parameter setting\n_ = config.set_params(periods__daily__n_periods=24)\nprint(next(p.n_periods for p in config.periods if p.name == \"daily\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ncountry_code: FR\nPredict size: 48\nRandom state: 314159\n24\n```\n:::\n:::\n\n\n", "supporting": [ "configurator.config_multi.ConfigMulti_files/figure-html" ], diff --git a/_freeze/docs/reference/preprocessing.target_corruption.apply_target_corruption_policy/execute-results/html.json b/_freeze/docs/reference/preprocessing.target_corruption.apply_target_corruption_policy/execute-results/html.json index e6678094..42c9bdff 100644 --- a/_freeze/docs/reference/preprocessing.target_corruption.apply_target_corruption_policy/execute-results/html.json +++ b/_freeze/docs/reference/preprocessing.target_corruption.apply_target_corruption_policy/execute-results/html.json @@ -1,10 +1,10 @@ { - "hash": "c4552001740d4891f73965c8f9d72558", + "hash": "e3c446fcf6322a2af4d37e462a9d11a9", "result": { "engine": "jupyter", - "markdown": "---\ntitle: preprocessing.target_corruption.apply_target_corruption_policy\n---\n\n\n\n```python\npreprocessing.target_corruption.apply_target_corruption_policy(\n df,\n *,\n targets,\n policy,\n range_mw,\n step_mw,\n window_days,\n max_heal_hours,\n anchor_zone_hours,\n cutoff,\n logger,\n)\n```\n\nApply the configured corruption policy to the native-cadence frame.\n\nThis is the single entry point for the target-corruption sub-pipeline.\nIt runs :func:`detect_target_corruption` and then dispatches to the\n``policy`` branch. The function always acts only on target columns;\nexogenous columns are never modified.\n\nPolicy semantics:\n\n- **noop** (detector inert or no flags): returns the **same** ``df``\n object (no copy) and a report with ``action=\"noop\"``.\n- **abort**: raises :class:`~spotforecast2_safe.exceptions.TargetCorruptionError`\n with the span list and operator guidance.\n- **heal**: refuses and raises when (a) any flagged slot lies within\n ``anchor_zone_hours`` of ``cutoff``, or (b) ``n_flagged_hours >\n max_heal_hours``. Otherwise sets flagged target slots to ``NaN`` so\n that ``apply_imputation(\"weighted_interp\")`` can interpolate and\n zero-weight them.\n- **truncate**: sets ALL target columns to ``NaN`` from\n ``first_flagged_hour`` onward. The existing 16.3.1 trailing-clamp in\n ``prepare_data`` then retracts ``data_end`` to the new last observed\n target.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------------|---------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| df | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Native-cadence ``DataFrame`` indexed by a ``DatetimeIndex``. | _required_ |\n| targets | [Sequence](`typing.Sequence`)\\[[str](`str`)\\] | Target column names. | _required_ |\n| policy | [str](`str`) | One of ``\"abort\"``, ``\"heal\"``, ``\"truncate\"``. | _required_ |\n| range_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Range-rule threshold (MW); ``None`` skips that rule. | _required_ |\n| step_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Step-rule threshold (MW); ``None`` skips that rule. | _required_ |\n| window_days | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Look-back window for the detector (days); ``None`` makes the detector inert. | _required_ |\n| max_heal_hours | [int](`int`) | Maximum number of flagged hours the ``\"heal\"`` policy will accept. ``0`` effectively disables healing. | _required_ |\n| anchor_zone_hours | [int](`int`) | Hours before ``cutoff`` that are protected from healing. Flagged slots inside this zone force a refusal. Default is ``168`` (one week). | _required_ |\n| cutoff | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timestamp](`pandas.Timestamp`)\\] | The effective training cutoff timestamp used for the anchor-zone check. ``None`` disables the zone check. | _required_ |\n| logger | [logging](`logging`).[Logger](`logging.Logger`) | Standard-library logger for WARNING/INFO messages. | _required_ |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Tuple of ``(df_out, report)`` where ``df_out`` is either the |\n| | [TargetCorruptionReport](`spotforecast2_safe.preprocessing.target_corruption.TargetCorruptionReport`) | original ``df`` object (noop) or a mutated copy (heal/truncate), |\n| | [Tuple](`typing.Tuple`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`), [TargetCorruptionReport](`spotforecast2_safe.preprocessing.target_corruption.TargetCorruptionReport`)\\] | and ``report`` is a :class:`TargetCorruptionReport`. |\n\n## Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|--------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|\n| | [TargetCorruptionError](`spotforecast2_safe.exceptions.TargetCorruptionError`) | On ``policy=\"abort\"`` when flags are found, or on ``policy=\"heal\"`` when the heal guard refuses. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#924efd88 .cell execution_count=1}\n``` {.python .cell-code}\nimport logging\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.preprocessing.target_corruption import (\n apply_target_corruption_policy,\n)\n\nlog = logging.getLogger(\"demo\")\nidx = pd.date_range(\"2026-06-03\", periods=96, freq=\"15min\", tz=\"UTC\")\nvals = [55_000.0] * 96\nvals[5] = 44_000.0 # 11 GW step -> flags the 00:00 hour\ndf = pd.DataFrame({\"load\": vals}, index=idx)\n\ndf_out, report = apply_target_corruption_policy(\n df,\n targets=[\"load\"],\n policy=\"truncate\",\n range_mw=5_000,\n step_mw=8_000,\n window_days=3,\n max_heal_hours=0,\n anchor_zone_hours=168,\n cutoff=None,\n logger=log,\n)\nassert report.fired\nassert report.action == \"truncate\"\n# All target slots from first_flagged_hour onward are NaN\nnanned = df_out.loc[report.first_flagged_hour:, \"load\"]\nassert nanned.isna().all()\nprint(\"action:\", report.action, \"flagged hours:\", report.n_flagged_hours)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\ntarget_corruption[truncate]: corrupt target tail from 2026-06-03 01:00:00+00:00; NaN-ed 1 hour(s), trailing clamp will retract data_end\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\naction: truncate flagged hours: 1\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: preprocessing.target_corruption.apply_target_corruption_policy\n---\n\n\n\n```python\npreprocessing.target_corruption.apply_target_corruption_policy(\n df,\n *,\n targets,\n policy,\n range_mw,\n step_mw,\n window_days,\n max_heal_hours,\n anchor_zone_hours,\n cutoff,\n logger,\n deviation_mw=None,\n deviation_ref=None,\n deviation_slots=2,\n)\n```\n\nApply the configured corruption policy to the native-cadence frame.\n\nThis is the single entry point for the target-corruption sub-pipeline.\nIt runs :func:`detect_target_corruption` and then dispatches to the\n``policy`` branch. The function always acts only on target columns;\nexogenous columns are never modified.\n\nPolicy semantics:\n\n- **noop** (detector inert or no flags): returns the **same** ``df``\n object (no copy) and a report with ``action=\"noop\"``.\n- **abort**: raises :class:`~spotforecast2_safe.exceptions.TargetCorruptionError`\n with the span list and operator guidance.\n- **heal**: refuses and raises when (a) any flagged slot lies within\n ``anchor_zone_hours`` of ``cutoff``, or (b) ``n_flagged_hours >\n max_heal_hours``. Otherwise sets flagged target slots to ``NaN`` so\n that ``apply_imputation(\"weighted_interp\")`` can interpolate and\n zero-weight them.\n- **truncate**: sets ALL target columns to ``NaN`` from\n ``first_flagged_hour`` onward. The existing 16.3.1 trailing-clamp in\n ``prepare_data`` then retracts ``data_end`` to the new last observed\n target.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------------|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| df | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Native-cadence ``DataFrame`` indexed by a ``DatetimeIndex``. | _required_ |\n| targets | [Sequence](`typing.Sequence`)\\[[str](`str`)\\] | Target column names. | _required_ |\n| policy | [str](`str`) | One of ``\"abort\"``, ``\"heal\"``, ``\"truncate\"``. | _required_ |\n| range_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Range-rule threshold (MW); ``None`` skips that rule. | _required_ |\n| step_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Step-rule threshold (MW); ``None`` skips that rule. | _required_ |\n| window_days | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Look-back window for the detector (days); ``None`` makes the detector inert. | _required_ |\n| max_heal_hours | [int](`int`) | Maximum number of flagged hours the ``\"heal\"`` policy will accept. ``0`` effectively disables healing. | _required_ |\n| anchor_zone_hours | [int](`int`) | Hours before ``cutoff`` that are protected from healing. Flagged slots inside this zone force a refusal. Default is ``168`` (one week). | _required_ |\n| cutoff | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timestamp](`pandas.Timestamp`)\\] | The effective training cutoff timestamp used for the anchor-zone check. ``None`` disables the zone check. | _required_ |\n| logger | [logging](`logging`).[Logger](`logging.Logger`) | Standard-library logger for WARNING/INFO messages. | _required_ |\n| deviation_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Deviation-rule threshold (MW, positive magnitude): flags sustained dropouts ``target − reference < -deviation_mw``. ``None`` skips that rule. See `detect_target_corruption`. | `None` |\n| deviation_ref | [Optional](`typing.Optional`)\\[[str](`str`)\\] | Reference column name for the deviation rule (e.g. ``\"Forecasted Load\"``). When enabling this rule, scope ``targets`` to the actuals only (e.g. ``[\"Actual Load\"]``) so that ``heal``/``truncate`` NaN only the actual and the reference survives as an exogenous prior. | `None` |\n| deviation_slots | [int](`int`) | Minimum consecutive sub-hourly slots for the deviation rule (default ``2``). | `2` |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Tuple of ``(df_out, report)`` where ``df_out`` is either the |\n| | [TargetCorruptionReport](`spotforecast2_safe.preprocessing.target_corruption.TargetCorruptionReport`) | original ``df`` object (noop) or a mutated copy (heal/truncate), |\n| | [Tuple](`typing.Tuple`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`), [TargetCorruptionReport](`spotforecast2_safe.preprocessing.target_corruption.TargetCorruptionReport`)\\] | and ``report`` is a :class:`TargetCorruptionReport`. |\n\n## Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|--------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|\n| | [TargetCorruptionError](`spotforecast2_safe.exceptions.TargetCorruptionError`) | On ``policy=\"abort\"`` when flags are found, or on ``policy=\"heal\"`` when the heal guard refuses. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#501d3ebe .cell execution_count=1}\n``` {.python .cell-code}\nimport logging\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.preprocessing.target_corruption import (\n apply_target_corruption_policy,\n)\n\nlog = logging.getLogger(\"demo\")\nidx = pd.date_range(\"2026-06-03\", periods=96, freq=\"15min\", tz=\"UTC\")\nvals = [55_000.0] * 96\nvals[5] = 44_000.0 # 11 GW step -> flags the 00:00 hour\ndf = pd.DataFrame({\"load\": vals}, index=idx)\n\ndf_out, report = apply_target_corruption_policy(\n df,\n targets=[\"load\"],\n policy=\"truncate\",\n range_mw=5_000,\n step_mw=8_000,\n window_days=3,\n max_heal_hours=0,\n anchor_zone_hours=168,\n cutoff=None,\n logger=log,\n)\nassert report.fired\nassert report.action == \"truncate\"\n# All target slots from first_flagged_hour onward are NaN\nnanned = df_out.loc[report.first_flagged_hour:, \"load\"]\nassert nanned.isna().all()\nprint(\"action:\", report.action, \"flagged hours:\", report.n_flagged_hours)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\ntarget_corruption[truncate]: corrupt target tail from 2026-06-03 01:00:00+00:00; NaN-ed 1 hour(s), trailing clamp will retract data_end\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\naction: truncate flagged hours: 1\n```\n:::\n:::\n\n\n", "supporting": [ - "preprocessing.target_corruption.apply_target_corruption_policy_files" + "preprocessing.target_corruption.apply_target_corruption_policy_files/figure-html" ], "filters": [], "includes": {} diff --git a/_freeze/docs/reference/preprocessing.target_corruption.detect_target_corruption/execute-results/html.json b/_freeze/docs/reference/preprocessing.target_corruption.detect_target_corruption/execute-results/html.json index c73a6158..8f383817 100644 --- a/_freeze/docs/reference/preprocessing.target_corruption.detect_target_corruption/execute-results/html.json +++ b/_freeze/docs/reference/preprocessing.target_corruption.detect_target_corruption/execute-results/html.json @@ -1,10 +1,10 @@ { - "hash": "7dd9705960361ff99aefd892719fe3e3", + "hash": "4c054038f0742ab143305463c939dea7", "result": { "engine": "jupyter", - "markdown": "---\ntitle: preprocessing.target_corruption.detect_target_corruption\n---\n\n\n\n```python\npreprocessing.target_corruption.detect_target_corruption(\n df,\n *,\n targets,\n range_mw,\n step_mw,\n window_days,\n)\n```\n\nDetect physically-impossible target-column corruption in the native frame.\n\nApplies two independent rules on the native-cadence (e.g. 15-min) series\nwithin a rolling look-back window ending at the last observed target\ntimestamp:\n\n- **Range rule** (sub-hourly cadence only): an hour is flagged when\n ``intra-hour max - intra-hour min > range_mw`` for any target column.\n Vacuously skipped for hourly-or-coarser cadence (intra-hour range is\n undefined on a single slot per hour).\n- **Step rule**: an hour is flagged when any ``|adjacent-slot diff|``\n that *touches* that hour exceeds ``step_mw`` for any target column.\n Applies to all cadences.\n\nFlags are OR-ed across target columns. ALL native-cadence slots of a\nflagged calendar hour are marked ``True`` in the returned boolean\n``Series``, so downstream NaN-ing operates on full hours rather than\nindividual sub-hourly slots.\n\nThe detector is **inert** (returns all-``False``) unless ``window_days``\nis set AND at least one of ``range_mw`` / ``step_mw`` is set. If the\ndata is shorter than ``window_days``, the window is clamped to\n``df.index.min()`` without raising.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------|---------------------------------------------------|--------------------------------------------------------------------------------------------------------------|------------|\n| df | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Native-cadence ``DataFrame`` indexed by a ``DatetimeIndex``. Must contain all columns listed in ``targets``. | _required_ |\n| targets | [Sequence](`typing.Sequence`)\\[[str](`str`)\\] | Sequence of target column names to inspect. | _required_ |\n| range_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum allowed intra-hour range (MW). ``None`` skips the range rule. | _required_ |\n| step_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum allowed absolute adjacent-slot difference (MW). ``None`` skips the step rule. | _required_ |\n| window_days | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Number of days before the last observed target to include in the scan. ``None`` makes the detector inert. | _required_ |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------|--------------------------------------------------------------------|\n| | [pd](`pandas`).[Series](`pandas.Series`) | Boolean ``pd.Series`` aligned to ``df.index``. ``True`` means the |\n| | [pd](`pandas`).[Series](`pandas.Series`) | slot belongs to a flagged calendar hour. All-``False`` when the |\n| | [pd](`pandas`).[Series](`pandas.Series`) | detector is inert or no corruption is found. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#851d62c6 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.preprocessing.target_corruption import (\n detect_target_corruption,\n)\n\n# 15-min cadence; one GW dropout at 12:15 inside the window\nidx = pd.date_range(\"2026-06-03\", periods=48, freq=\"15min\", tz=\"UTC\")\nvals = [55_000.0] * 48\nvals[5] = 44_000.0 # 11 GW step drop -> flags 12:00 hour\ndf = pd.DataFrame({\"load\": vals}, index=idx)\n\nmask = detect_target_corruption(\n df, targets=[\"load\"], range_mw=5_000, step_mw=8_000, window_days=3\n)\n# Slots in the 12:00 hour (index 4-7) are flagged\nassert mask.iloc[4:8].all(), \"Slots in the flagged hour must be True\"\nassert not mask.iloc[8:].any(), \"Subsequent clean slots must be False\"\nprint(\"flagged:\", mask.sum(), \"slots\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nflagged: 4 slots\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: preprocessing.target_corruption.detect_target_corruption\n---\n\n\n\n```python\npreprocessing.target_corruption.detect_target_corruption(\n df,\n *,\n targets,\n range_mw,\n step_mw,\n window_days,\n deviation_mw=None,\n deviation_ref=None,\n deviation_slots=2,\n)\n```\n\nDetect physically-impossible target-column corruption in the native frame.\n\nApplies two independent rules on the native-cadence (e.g. 15-min) series\nwithin a rolling look-back window ending at the last observed target\ntimestamp:\n\n- **Range rule** (sub-hourly cadence only): an hour is flagged when\n ``intra-hour max - intra-hour min > range_mw`` for any target column.\n Vacuously skipped for hourly-or-coarser cadence (intra-hour range is\n undefined on a single slot per hour).\n- **Step rule**: an hour is flagged when any ``|adjacent-slot diff|``\n that *touches* that hour exceeds ``step_mw`` for any target column.\n Applies to all cadences.\n- **Deviation rule** (dropout-only, all cadences): an hour is flagged\n when ``target − reference < -deviation_mw`` holds for at least\n ``deviation_slots`` *consecutive* native-cadence slots within the\n scan window, where the reference is a published companion column\n such as the ENTSO-E day-ahead ``\"Forecasted Load\"``. The rule is\n asymmetric by design: the known corruption class is exclusively a\n dropout *below* the day-ahead forecast, while actuals above the\n forecast are ordinary under-forecasting. ``NaN`` in either column\n yields a ``NaN`` difference, which compares ``False`` — so the\n publication-lag frontier (forecast published, actual not yet) never\n flags, and a data gap breaks a consecutive run. On hourly-or-coarser\n cadence the sustained requirement collapses to a single slot. The\n rule is silently skipped when ``deviation_ref`` is missing from the\n frame (mirroring how absent target columns are skipped).\n\nFlags are OR-ed across target columns. ALL native-cadence slots of a\nflagged calendar hour are marked ``True`` in the returned boolean\n``Series``, so downstream NaN-ing operates on full hours rather than\nindividual sub-hourly slots.\n\nThe detector is **inert** (returns all-``False``) unless ``window_days``\nis set AND at least one of ``range_mw`` / ``step_mw`` / ``deviation_mw``\nis set. If the data is shorter than ``window_days``, the window is\nclamped to ``df.index.min()`` without raising.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------------|---------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| df | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Native-cadence ``DataFrame`` indexed by a ``DatetimeIndex``. Must contain all columns listed in ``targets``. | _required_ |\n| targets | [Sequence](`typing.Sequence`)\\[[str](`str`)\\] | Sequence of target column names to inspect. | _required_ |\n| range_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum allowed intra-hour range (MW). ``None`` skips the range rule. | _required_ |\n| step_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum allowed absolute adjacent-slot difference (MW). ``None`` skips the step rule. | _required_ |\n| window_days | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Number of days before the last observed target to include in the scan. ``None`` makes the detector inert. | _required_ |\n| deviation_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum allowed dropout below the reference column (MW, positive magnitude): slots with ``target − reference < -deviation_mw`` are candidates. ``None`` skips the deviation rule. | `None` |\n| deviation_ref | [Optional](`typing.Optional`)\\[[str](`str`)\\] | Name of the reference column (e.g. ``\"Forecasted Load\"``). The rule is skipped when ``None`` or when the column is absent from ``df``. The reference column itself is never checked as a target by this rule. | `None` |\n| deviation_slots | [int](`int`) | Minimum number of *consecutive* sub-hourly slots the dropout must sustain before any hour is flagged (default ``2`` — a single-slot blip is more likely a metering glitch than the oscillating dropout class). Clamped to ``1`` on hourly-or-coarser cadence. | `2` |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------|--------------------------------------------------------------------|\n| | [pd](`pandas`).[Series](`pandas.Series`) | Boolean ``pd.Series`` aligned to ``df.index``. ``True`` means the |\n| | [pd](`pandas`).[Series](`pandas.Series`) | slot belongs to a flagged calendar hour. All-``False`` when the |\n| | [pd](`pandas`).[Series](`pandas.Series`) | detector is inert or no corruption is found. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#ed42fb8e .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.preprocessing.target_corruption import (\n detect_target_corruption,\n)\n\n# 15-min cadence; one GW dropout at 12:15 inside the window\nidx = pd.date_range(\"2026-06-03\", periods=48, freq=\"15min\", tz=\"UTC\")\nvals = [55_000.0] * 48\nvals[5] = 44_000.0 # 11 GW step drop -> flags 12:00 hour\ndf = pd.DataFrame({\"load\": vals}, index=idx)\n\nmask = detect_target_corruption(\n df, targets=[\"load\"], range_mw=5_000, step_mw=8_000, window_days=3\n)\n# Slots in the 12:00 hour (index 4-7) are flagged\nassert mask.iloc[4:8].all(), \"Slots in the flagged hour must be True\"\nassert not mask.iloc[8:].any(), \"Subsequent clean slots must be False\"\nprint(\"flagged:\", mask.sum(), \"slots\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nflagged: 4 slots\n```\n:::\n:::\n\n\n::: {#643afacd .cell execution_count=2}\n``` {.python .cell-code}\n# Deviation rule: a sub-threshold dropout the dynamics rules miss.\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.preprocessing.target_corruption import (\n detect_target_corruption,\n)\n\nidx = pd.date_range(\"2026-06-07\", periods=16, freq=\"15min\", tz=\"UTC\")\nforecast = pd.Series(48_000.0, index=idx)\nactual = forecast.copy()\n# Two consecutive slots 11.6 GW below the forecast, stepping by\n# only 5.8 GW per slot — below a 6 GW step rule, no range breach.\nactual.iloc[4] = forecast.iloc[4] - 5_800.0\nactual.iloc[5] = forecast.iloc[5] - 11_600.0\nactual.iloc[6] = forecast.iloc[6] - 11_600.0\nactual.iloc[7] = forecast.iloc[7] - 5_800.0\n# Publication-lag frontier: forecast published, actual not yet.\nactual.iloc[12:] = np.nan\ndf = pd.DataFrame({\"Actual Load\": actual, \"Forecasted Load\": forecast})\n\ndyn_only = detect_target_corruption(\n df, targets=[\"Actual Load\"],\n range_mw=15_000, step_mw=6_000, window_days=3,\n)\nwith_dev = detect_target_corruption(\n df, targets=[\"Actual Load\"],\n range_mw=15_000, step_mw=6_000, window_days=3,\n deviation_mw=8_000, deviation_ref=\"Forecasted Load\",\n)\nassert not dyn_only.any(), \"dynamics rules miss the dropout\"\nassert with_dev.iloc[4:8].any(), \"deviation rule catches it\"\nassert not with_dev.iloc[12:].any(), \"NaN frontier never flags\"\nprint(\"dynamics-only:\", int(dyn_only.sum()), \"| with deviation:\",\n int(with_dev.sum()))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ndynamics-only: 0 | with deviation: 4\n```\n:::\n:::\n\n\n", "supporting": [ - "preprocessing.target_corruption.detect_target_corruption_files" + "preprocessing.target_corruption.detect_target_corruption_files/figure-html" ], "filters": [], "includes": {} diff --git a/docs/reference/configurator.config_entsoe.ConfigEntsoe.qmd b/docs/reference/configurator.config_entsoe.ConfigEntsoe.qmd index ecca62bf..ea56a48e 100644 --- a/docs/reference/configurator.config_entsoe.ConfigEntsoe.qmd +++ b/docs/reference/configurator.config_entsoe.ConfigEntsoe.qmd @@ -66,6 +66,9 @@ configurator.config_entsoe.ConfigEntsoe( target_corruption_policy='abort', target_max_heal_hours=0, target_anchor_zone_hours=168, + target_qc_deviation_mw=None, + target_qc_deviation_ref=None, + target_qc_deviation_slots=2, ) ``` diff --git a/docs/reference/configurator.config_multi.ConfigMulti.qmd b/docs/reference/configurator.config_multi.ConfigMulti.qmd index 56c47f46..6c657ad1 100644 --- a/docs/reference/configurator.config_multi.ConfigMulti.qmd +++ b/docs/reference/configurator.config_multi.ConfigMulti.qmd @@ -65,6 +65,9 @@ configurator.config_multi.ConfigMulti( target_corruption_policy='abort', target_max_heal_hours=0, target_anchor_zone_hours=168, + target_qc_deviation_mw=None, + target_qc_deviation_ref=None, + target_qc_deviation_slots=2, ) ``` diff --git a/docs/reference/preprocessing.target_corruption.apply_target_corruption_policy.qmd b/docs/reference/preprocessing.target_corruption.apply_target_corruption_policy.qmd index 5d22ff02..869372f8 100644 --- a/docs/reference/preprocessing.target_corruption.apply_target_corruption_policy.qmd +++ b/docs/reference/preprocessing.target_corruption.apply_target_corruption_policy.qmd @@ -13,6 +13,9 @@ preprocessing.target_corruption.apply_target_corruption_policy( anchor_zone_hours, cutoff, logger, + deviation_mw=None, + deviation_ref=None, + deviation_slots=2, ) ``` @@ -41,18 +44,21 @@ Policy semantics: ## Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|-------------------|---------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|------------| -| df | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Native-cadence ``DataFrame`` indexed by a ``DatetimeIndex``. | _required_ | -| targets | [Sequence](`typing.Sequence`)\[[str](`str`)\] | Target column names. | _required_ | -| policy | [str](`str`) | One of ``"abort"``, ``"heal"``, ``"truncate"``. | _required_ | -| range_mw | [Optional](`typing.Optional`)\[[float](`float`)\] | Range-rule threshold (MW); ``None`` skips that rule. | _required_ | -| step_mw | [Optional](`typing.Optional`)\[[float](`float`)\] | Step-rule threshold (MW); ``None`` skips that rule. | _required_ | -| window_days | [Optional](`typing.Optional`)\[[int](`int`)\] | Look-back window for the detector (days); ``None`` makes the detector inert. | _required_ | -| max_heal_hours | [int](`int`) | Maximum number of flagged hours the ``"heal"`` policy will accept. ``0`` effectively disables healing. | _required_ | -| anchor_zone_hours | [int](`int`) | Hours before ``cutoff`` that are protected from healing. Flagged slots inside this zone force a refusal. Default is ``168`` (one week). | _required_ | -| cutoff | [Optional](`typing.Optional`)\[[pd](`pandas`).[Timestamp](`pandas.Timestamp`)\] | The effective training cutoff timestamp used for the anchor-zone check. ``None`` disables the zone check. | _required_ | -| logger | [logging](`logging`).[Logger](`logging.Logger`) | Standard-library logger for WARNING/INFO messages. | _required_ | +| Name | Type | Description | Default | +|-------------------|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| +| df | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Native-cadence ``DataFrame`` indexed by a ``DatetimeIndex``. | _required_ | +| targets | [Sequence](`typing.Sequence`)\[[str](`str`)\] | Target column names. | _required_ | +| policy | [str](`str`) | One of ``"abort"``, ``"heal"``, ``"truncate"``. | _required_ | +| range_mw | [Optional](`typing.Optional`)\[[float](`float`)\] | Range-rule threshold (MW); ``None`` skips that rule. | _required_ | +| step_mw | [Optional](`typing.Optional`)\[[float](`float`)\] | Step-rule threshold (MW); ``None`` skips that rule. | _required_ | +| window_days | [Optional](`typing.Optional`)\[[int](`int`)\] | Look-back window for the detector (days); ``None`` makes the detector inert. | _required_ | +| max_heal_hours | [int](`int`) | Maximum number of flagged hours the ``"heal"`` policy will accept. ``0`` effectively disables healing. | _required_ | +| anchor_zone_hours | [int](`int`) | Hours before ``cutoff`` that are protected from healing. Flagged slots inside this zone force a refusal. Default is ``168`` (one week). | _required_ | +| cutoff | [Optional](`typing.Optional`)\[[pd](`pandas`).[Timestamp](`pandas.Timestamp`)\] | The effective training cutoff timestamp used for the anchor-zone check. ``None`` disables the zone check. | _required_ | +| logger | [logging](`logging`).[Logger](`logging.Logger`) | Standard-library logger for WARNING/INFO messages. | _required_ | +| deviation_mw | [Optional](`typing.Optional`)\[[float](`float`)\] | Deviation-rule threshold (MW, positive magnitude): flags sustained dropouts ``target − reference < -deviation_mw``. ``None`` skips that rule. See `detect_target_corruption`. | `None` | +| deviation_ref | [Optional](`typing.Optional`)\[[str](`str`)\] | Reference column name for the deviation rule (e.g. ``"Forecasted Load"``). When enabling this rule, scope ``targets`` to the actuals only (e.g. ``["Actual Load"]``) so that ``heal``/``truncate`` NaN only the actual and the reference survives as an exogenous prior. | `None` | +| deviation_slots | [int](`int`) | Minimum consecutive sub-hourly slots for the deviation rule (default ``2``). | `2` | ## Returns {.doc-section .doc-section-returns} diff --git a/docs/reference/preprocessing.target_corruption.detect_target_corruption.qmd b/docs/reference/preprocessing.target_corruption.detect_target_corruption.qmd index 8d8fdd67..07c58ab2 100644 --- a/docs/reference/preprocessing.target_corruption.detect_target_corruption.qmd +++ b/docs/reference/preprocessing.target_corruption.detect_target_corruption.qmd @@ -8,6 +8,9 @@ preprocessing.target_corruption.detect_target_corruption( range_mw, step_mw, window_days, + deviation_mw=None, + deviation_ref=None, + deviation_slots=2, ) ``` @@ -24,6 +27,20 @@ timestamp: - **Step rule**: an hour is flagged when any ``|adjacent-slot diff|`` that *touches* that hour exceeds ``step_mw`` for any target column. Applies to all cadences. +- **Deviation rule** (dropout-only, all cadences): an hour is flagged + when ``target − reference < -deviation_mw`` holds for at least + ``deviation_slots`` *consecutive* native-cadence slots within the + scan window, where the reference is a published companion column + such as the ENTSO-E day-ahead ``"Forecasted Load"``. The rule is + asymmetric by design: the known corruption class is exclusively a + dropout *below* the day-ahead forecast, while actuals above the + forecast are ordinary under-forecasting. ``NaN`` in either column + yields a ``NaN`` difference, which compares ``False`` — so the + publication-lag frontier (forecast published, actual not yet) never + flags, and a data gap breaks a consecutive run. On hourly-or-coarser + cadence the sustained requirement collapses to a single slot. The + rule is silently skipped when ``deviation_ref`` is missing from the + frame (mirroring how absent target columns are skipped). Flags are OR-ed across target columns. ALL native-cadence slots of a flagged calendar hour are marked ``True`` in the returned boolean @@ -31,19 +48,22 @@ flagged calendar hour are marked ``True`` in the returned boolean individual sub-hourly slots. The detector is **inert** (returns all-``False``) unless ``window_days`` -is set AND at least one of ``range_mw`` / ``step_mw`` is set. If the -data is shorter than ``window_days``, the window is clamped to -``df.index.min()`` without raising. +is set AND at least one of ``range_mw`` / ``step_mw`` / ``deviation_mw`` +is set. If the data is shorter than ``window_days``, the window is +clamped to ``df.index.min()`` without raising. ## Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|-------------|---------------------------------------------------|--------------------------------------------------------------------------------------------------------------|------------| -| df | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Native-cadence ``DataFrame`` indexed by a ``DatetimeIndex``. Must contain all columns listed in ``targets``. | _required_ | -| targets | [Sequence](`typing.Sequence`)\[[str](`str`)\] | Sequence of target column names to inspect. | _required_ | -| range_mw | [Optional](`typing.Optional`)\[[float](`float`)\] | Maximum allowed intra-hour range (MW). ``None`` skips the range rule. | _required_ | -| step_mw | [Optional](`typing.Optional`)\[[float](`float`)\] | Maximum allowed absolute adjacent-slot difference (MW). ``None`` skips the step rule. | _required_ | -| window_days | [Optional](`typing.Optional`)\[[int](`int`)\] | Number of days before the last observed target to include in the scan. ``None`` makes the detector inert. | _required_ | +| Name | Type | Description | Default | +|-----------------|---------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| +| df | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Native-cadence ``DataFrame`` indexed by a ``DatetimeIndex``. Must contain all columns listed in ``targets``. | _required_ | +| targets | [Sequence](`typing.Sequence`)\[[str](`str`)\] | Sequence of target column names to inspect. | _required_ | +| range_mw | [Optional](`typing.Optional`)\[[float](`float`)\] | Maximum allowed intra-hour range (MW). ``None`` skips the range rule. | _required_ | +| step_mw | [Optional](`typing.Optional`)\[[float](`float`)\] | Maximum allowed absolute adjacent-slot difference (MW). ``None`` skips the step rule. | _required_ | +| window_days | [Optional](`typing.Optional`)\[[int](`int`)\] | Number of days before the last observed target to include in the scan. ``None`` makes the detector inert. | _required_ | +| deviation_mw | [Optional](`typing.Optional`)\[[float](`float`)\] | Maximum allowed dropout below the reference column (MW, positive magnitude): slots with ``target − reference < -deviation_mw`` are candidates. ``None`` skips the deviation rule. | `None` | +| deviation_ref | [Optional](`typing.Optional`)\[[str](`str`)\] | Name of the reference column (e.g. ``"Forecasted Load"``). The rule is skipped when ``None`` or when the column is absent from ``df``. The reference column itself is never checked as a target by this rule. | `None` | +| deviation_slots | [int](`int`) | Minimum number of *consecutive* sub-hourly slots the dropout must sustain before any hour is flagged (default ``2`` — a single-slot blip is more likely a metering glitch than the oscillating dropout class). Clamped to ``1`` on hourly-or-coarser cadence. | `2` | ## Returns {.doc-section .doc-section-returns} @@ -75,4 +95,41 @@ mask = detect_target_corruption( assert mask.iloc[4:8].all(), "Slots in the flagged hour must be True" assert not mask.iloc[8:].any(), "Subsequent clean slots must be False" print("flagged:", mask.sum(), "slots") +``` + +```{python} +# Deviation rule: a sub-threshold dropout the dynamics rules miss. +import pandas as pd +import numpy as np +from spotforecast2_safe.preprocessing.target_corruption import ( + detect_target_corruption, +) + +idx = pd.date_range("2026-06-07", periods=16, freq="15min", tz="UTC") +forecast = pd.Series(48_000.0, index=idx) +actual = forecast.copy() +# Two consecutive slots 11.6 GW below the forecast, stepping by +# only 5.8 GW per slot — below a 6 GW step rule, no range breach. +actual.iloc[4] = forecast.iloc[4] - 5_800.0 +actual.iloc[5] = forecast.iloc[5] - 11_600.0 +actual.iloc[6] = forecast.iloc[6] - 11_600.0 +actual.iloc[7] = forecast.iloc[7] - 5_800.0 +# Publication-lag frontier: forecast published, actual not yet. +actual.iloc[12:] = np.nan +df = pd.DataFrame({"Actual Load": actual, "Forecasted Load": forecast}) + +dyn_only = detect_target_corruption( + df, targets=["Actual Load"], + range_mw=15_000, step_mw=6_000, window_days=3, +) +with_dev = detect_target_corruption( + df, targets=["Actual Load"], + range_mw=15_000, step_mw=6_000, window_days=3, + deviation_mw=8_000, deviation_ref="Forecasted Load", +) +assert not dyn_only.any(), "dynamics rules miss the dropout" +assert with_dev.iloc[4:8].any(), "deviation rule catches it" +assert not with_dev.iloc[12:].any(), "NaN frontier never flags" +print("dynamics-only:", int(dyn_only.sum()), "| with deviation:", + int(with_dev.sum())) ``` \ No newline at end of file diff --git a/src/spotforecast2_safe/configurator/_base_config.py b/src/spotforecast2_safe/configurator/_base_config.py index 5fac009c..b2a17dd9 100644 --- a/src/spotforecast2_safe/configurator/_base_config.py +++ b/src/spotforecast2_safe/configurator/_base_config.py @@ -355,3 +355,24 @@ def validate_config(config: object) -> None: raise ValueError( f"target_anchor_zone_hours must be >= 0; got {target_anchor_zone_hours}." ) + + target_qc_deviation_mw = getattr(config, "target_qc_deviation_mw", None) + if target_qc_deviation_mw is not None and target_qc_deviation_mw < 0: + raise ValueError( + f"target_qc_deviation_mw must be >= 0 if set; got {target_qc_deviation_mw}." + ) + + target_qc_deviation_ref = getattr(config, "target_qc_deviation_ref", None) + if target_qc_deviation_ref is not None and not isinstance( + target_qc_deviation_ref, str + ): + raise ValueError( + "target_qc_deviation_ref must be a column name (str) if set; " + f"got {target_qc_deviation_ref!r}." + ) + + target_qc_deviation_slots = getattr(config, "target_qc_deviation_slots", None) + if target_qc_deviation_slots is not None and target_qc_deviation_slots < 1: + raise ValueError( + f"target_qc_deviation_slots must be >= 1; got {target_qc_deviation_slots}." + ) diff --git a/src/spotforecast2_safe/configurator/config_entsoe.py b/src/spotforecast2_safe/configurator/config_entsoe.py index 17fbb8bb..b5dcb483 100644 --- a/src/spotforecast2_safe/configurator/config_entsoe.py +++ b/src/spotforecast2_safe/configurator/config_entsoe.py @@ -285,6 +285,9 @@ class ConfigEntsoe: "target_corruption_policy", "target_max_heal_hours", "target_anchor_zone_hours", + "target_qc_deviation_mw", + "target_qc_deviation_ref", + "target_qc_deviation_slots", ) def __init__( @@ -375,18 +378,26 @@ def __init__( retrain_max_age: Optional[pd.Timedelta] = None, # Target-side corruption detector knobs. # Detector active only when target_qc_window_days AND at least one of - # target_qc_range_mw / target_qc_step_mw are set. Defaults are all - # None / off, so the pipeline is byte-identical to the pre-feature baseline. + # target_qc_range_mw / target_qc_step_mw / target_qc_deviation_mw are + # set. Defaults are all None / off, so the pipeline is byte-identical + # to the pre-feature baseline. # Recommended episode policy: "truncate" (auto-extends predict_size). # "heal" under the default anchor_zone_hours=168 with a <=7-day QC window # never engages (refusal by design — lowering the zone is a deliberate # operator decision). + # The deviation rule (dropout-only, vs a published reference column + # such as "Forecasted Load") catches corruption that stays below the + # dynamics thresholds; when enabling it, scope `targets` to the + # actuals so heal/truncate leave the reference column intact. target_qc_range_mw: Optional[float] = None, target_qc_step_mw: Optional[float] = None, target_qc_window_days: Optional[int] = None, target_corruption_policy: str = "abort", target_max_heal_hours: int = 0, target_anchor_zone_hours: int = 168, + target_qc_deviation_mw: Optional[float] = None, + target_qc_deviation_ref: Optional[str] = None, + target_qc_deviation_slots: int = 2, ): """Initialize ConfigEntsoe with specified or default parameters.""" self.country_code = country_code @@ -510,6 +521,9 @@ def __init__( self.target_corruption_policy = target_corruption_policy self.target_max_heal_hours = target_max_heal_hours self.target_anchor_zone_hours = target_anchor_zone_hours + self.target_qc_deviation_mw = target_qc_deviation_mw + self.target_qc_deviation_ref = target_qc_deviation_ref + self.target_qc_deviation_slots = target_qc_deviation_slots validate_config(self) def get_params(self, deep: bool = True) -> Dict[str, object]: diff --git a/src/spotforecast2_safe/configurator/config_multi.py b/src/spotforecast2_safe/configurator/config_multi.py index 5c8c4cb7..9657e354 100644 --- a/src/spotforecast2_safe/configurator/config_multi.py +++ b/src/spotforecast2_safe/configurator/config_multi.py @@ -338,6 +338,9 @@ class ConfigMulti: "target_corruption_policy", "target_max_heal_hours", "target_anchor_zone_hours", + "target_qc_deviation_mw", + "target_qc_deviation_ref", + "target_qc_deviation_slots", ) def __init__( @@ -426,18 +429,26 @@ def __init__( exog_provider_window: Literal["full", "train"] = "full", # Target-side corruption detector knobs. # Detector active only when target_qc_window_days AND at least one of - # target_qc_range_mw / target_qc_step_mw are set. Defaults are all - # None / off, so the pipeline is byte-identical to the pre-feature baseline. + # target_qc_range_mw / target_qc_step_mw / target_qc_deviation_mw are + # set. Defaults are all None / off, so the pipeline is byte-identical + # to the pre-feature baseline. # Recommended episode policy: "truncate" (auto-extends predict_size). # "heal" under the default anchor_zone_hours=168 with a <=7-day QC window # never engages (refusal by design — lowering the zone is a deliberate # operator decision). + # The deviation rule (dropout-only, vs a published reference column + # such as "Forecasted Load") catches corruption that stays below the + # dynamics thresholds; when enabling it, scope `targets` to the + # actuals so heal/truncate leave the reference column intact. target_qc_range_mw: Optional[float] = None, target_qc_step_mw: Optional[float] = None, target_qc_window_days: Optional[int] = None, target_corruption_policy: str = "abort", target_max_heal_hours: int = 0, target_anchor_zone_hours: int = 168, + target_qc_deviation_mw: Optional[float] = None, + target_qc_deviation_ref: Optional[str] = None, + target_qc_deviation_slots: int = 2, ): """Initialize ConfigMulti with specified or default parameters.""" self.country_code = country_code @@ -563,6 +574,9 @@ def __init__( self.target_corruption_policy = target_corruption_policy self.target_max_heal_hours = target_max_heal_hours self.target_anchor_zone_hours = target_anchor_zone_hours + self.target_qc_deviation_mw = target_qc_deviation_mw + self.target_qc_deviation_ref = target_qc_deviation_ref + self.target_qc_deviation_slots = target_qc_deviation_slots validate_config(self) def get_params(self, deep: bool = True) -> Dict[str, object]: diff --git a/src/spotforecast2_safe/multitask/base.py b/src/spotforecast2_safe/multitask/base.py index b669f31b..4a7c9580 100644 --- a/src/spotforecast2_safe/multitask/base.py +++ b/src/spotforecast2_safe/multitask/base.py @@ -575,6 +575,9 @@ def prepare_data( _tc_window = getattr(self.config, "target_qc_window_days", None) _tc_max_heal = getattr(self.config, "target_max_heal_hours", 0) _tc_anchor = getattr(self.config, "target_anchor_zone_hours", 168) + _tc_dev = getattr(self.config, "target_qc_deviation_mw", None) + _tc_dev_ref = getattr(self.config, "target_qc_deviation_ref", None) + _tc_dev_slots = getattr(self.config, "target_qc_deviation_slots", 2) # Derive the effective cutoff for the anchor-zone check: mirror the # end_train_default / last_ts logic above (ADR §2 step 1). @@ -605,7 +608,7 @@ def prepare_data( # imputation-forcing side-effect — is never entered for a # half-configured detector. _tc_configured = _tc_window is not None and ( - _tc_range is not None or _tc_step is not None + _tc_range is not None or _tc_step is not None or _tc_dev is not None ) if _tc_targets_present and _tc_configured: # The heal policy needs apply_imputation's "weighted_interp" @@ -626,6 +629,9 @@ def prepare_data( anchor_zone_hours=_tc_anchor, cutoff=_tc_cutoff, logger=self.logger, + deviation_mw=_tc_dev, + deviation_ref=_tc_dev_ref, + deviation_slots=_tc_dev_slots, ) else: self._tc_force_weighted_interp = False diff --git a/src/spotforecast2_safe/preprocessing/target_corruption.py b/src/spotforecast2_safe/preprocessing/target_corruption.py index aa1a1611..cd4ccda9 100644 --- a/src/spotforecast2_safe/preprocessing/target_corruption.py +++ b/src/spotforecast2_safe/preprocessing/target_corruption.py @@ -6,16 +6,23 @@ ENTSO-E 15-min Actual Load occasionally contains physically-impossible multi-GW intra-hour dropouts (e.g. 2026-06-03..05 incident) that poison the recursive forecaster's anchor and lag features. This module provides -a shared dynamics detector over the native-cadence (15-min) series and a +a shared detector over the native-cadence (15-min) series — two dynamics +rules (intra-hour range, adjacent-slot step) plus a deviation rule against +a published reference column such as the day-ahead load forecast — and a policy knob to handle detected corruptions. +The deviation rule exists because the dropout class can stay *below* the +dynamics thresholds while sitting many GW under the day-ahead forecast +(observed 2026-06-07: 5.6 GW steps under a 6 GW limit at Actual − Forecast += −11.6 GW). Level-vs-reference is the discriminator dynamics cannot see. + Empirical gating: ENTSO-E does not retroactively correct gross intra-hour dropouts of this class. The recommended episode policy is ``"truncate"``; ``"heal"`` is safe only for short interior spans far from the anchor zone (and is refused by design when either condition is violated); ``"abort"`` surfaces the error immediately for manual review. -By default all six knobs are off / at safe defaults, so the pipeline is +By default all knobs are off / at safe defaults, so the pipeline is byte-identical to the pre-feature baseline. Public API (re-exported from ``preprocessing.__init__``): @@ -126,6 +133,9 @@ def detect_target_corruption( range_mw: Optional[float], step_mw: Optional[float], window_days: Optional[int], + deviation_mw: Optional[float] = None, + deviation_ref: Optional[str] = None, + deviation_slots: int = 2, ) -> pd.Series: """Detect physically-impossible target-column corruption in the native frame. @@ -140,6 +150,20 @@ def detect_target_corruption( - **Step rule**: an hour is flagged when any ``|adjacent-slot diff|`` that *touches* that hour exceeds ``step_mw`` for any target column. Applies to all cadences. + - **Deviation rule** (dropout-only, all cadences): an hour is flagged + when ``target − reference < -deviation_mw`` holds for at least + ``deviation_slots`` *consecutive* native-cadence slots within the + scan window, where the reference is a published companion column + such as the ENTSO-E day-ahead ``"Forecasted Load"``. The rule is + asymmetric by design: the known corruption class is exclusively a + dropout *below* the day-ahead forecast, while actuals above the + forecast are ordinary under-forecasting. ``NaN`` in either column + yields a ``NaN`` difference, which compares ``False`` — so the + publication-lag frontier (forecast published, actual not yet) never + flags, and a data gap breaks a consecutive run. On hourly-or-coarser + cadence the sustained requirement collapses to a single slot. The + rule is silently skipped when ``deviation_ref`` is missing from the + frame (mirroring how absent target columns are skipped). Flags are OR-ed across target columns. ALL native-cadence slots of a flagged calendar hour are marked ``True`` in the returned boolean @@ -147,9 +171,9 @@ def detect_target_corruption( individual sub-hourly slots. The detector is **inert** (returns all-``False``) unless ``window_days`` - is set AND at least one of ``range_mw`` / ``step_mw`` is set. If the - data is shorter than ``window_days``, the window is clamped to - ``df.index.min()`` without raising. + is set AND at least one of ``range_mw`` / ``step_mw`` / ``deviation_mw`` + is set. If the data is shorter than ``window_days``, the window is + clamped to ``df.index.min()`` without raising. Args: df: Native-cadence ``DataFrame`` indexed by a ``DatetimeIndex``. @@ -161,6 +185,19 @@ def detect_target_corruption( ``None`` skips the step rule. window_days: Number of days before the last observed target to include in the scan. ``None`` makes the detector inert. + deviation_mw: Maximum allowed dropout below the reference column + (MW, positive magnitude): slots with + ``target − reference < -deviation_mw`` are candidates. + ``None`` skips the deviation rule. + deviation_ref: Name of the reference column (e.g. + ``"Forecasted Load"``). The rule is skipped when ``None`` or + when the column is absent from ``df``. The reference column + itself is never checked as a target by this rule. + deviation_slots: Minimum number of *consecutive* sub-hourly slots + the dropout must sustain before any hour is flagged (default + ``2`` — a single-slot blip is more likely a metering glitch + than the oscillating dropout class). Clamped to ``1`` on + hourly-or-coarser cadence. Returns: Boolean ``pd.Series`` aligned to ``df.index``. ``True`` means the @@ -189,9 +226,48 @@ def detect_target_corruption( assert not mask.iloc[8:].any(), "Subsequent clean slots must be False" print("flagged:", mask.sum(), "slots") ``` + + ```{python} + # Deviation rule: a sub-threshold dropout the dynamics rules miss. + import pandas as pd + import numpy as np + from spotforecast2_safe.preprocessing.target_corruption import ( + detect_target_corruption, + ) + + idx = pd.date_range("2026-06-07", periods=16, freq="15min", tz="UTC") + forecast = pd.Series(48_000.0, index=idx) + actual = forecast.copy() + # Two consecutive slots 11.6 GW below the forecast, stepping by + # only 5.8 GW per slot — below a 6 GW step rule, no range breach. + actual.iloc[4] = forecast.iloc[4] - 5_800.0 + actual.iloc[5] = forecast.iloc[5] - 11_600.0 + actual.iloc[6] = forecast.iloc[6] - 11_600.0 + actual.iloc[7] = forecast.iloc[7] - 5_800.0 + # Publication-lag frontier: forecast published, actual not yet. + actual.iloc[12:] = np.nan + df = pd.DataFrame({"Actual Load": actual, "Forecasted Load": forecast}) + + dyn_only = detect_target_corruption( + df, targets=["Actual Load"], + range_mw=15_000, step_mw=6_000, window_days=3, + ) + with_dev = detect_target_corruption( + df, targets=["Actual Load"], + range_mw=15_000, step_mw=6_000, window_days=3, + deviation_mw=8_000, deviation_ref="Forecasted Load", + ) + assert not dyn_only.any(), "dynamics rules miss the dropout" + assert with_dev.iloc[4:8].any(), "deviation rule catches it" + assert not with_dev.iloc[12:].any(), "NaN frontier never flags" + print("dynamics-only:", int(dyn_only.sum()), "| with deviation:", + int(with_dev.sum())) + ``` """ # Detector is inert when the caller has not configured it. - if window_days is None or (range_mw is None and step_mw is None): + if window_days is None or ( + range_mw is None and step_mw is None and deviation_mw is None + ): return pd.Series(False, index=df.index) # Derive cadence via mode of consecutive diffs (robust to irregular index). @@ -207,7 +283,17 @@ def detect_target_corruption( return pd.Series(False, index=df.index) window_start = max(df.index.min(), last_target_ts - pd.Timedelta(days=window_days)) - scan = df.loc[window_start:last_target_ts, list(targets)] + # The deviation rule needs its reference column inside the scan slice + # (same window, same UTC view) even when it is not a target. + scan_cols = list(targets) + if ( + deviation_mw is not None + and deviation_ref is not None + and deviation_ref in df.columns + and deviation_ref not in scan_cols + ): + scan_cols.append(deviation_ref) + scan = df.loc[window_start:last_target_ts, scan_cols] # DST safety: floor("h")/resample("h") on a tz-aware non-UTC index can # raise on ambiguous wall times (fall-back hour). All hour bucketing is # therefore done on a UTC view; positions are unchanged, and for the @@ -250,6 +336,42 @@ def detect_target_corruption( flagged_hours.add(ts.floor("h")) flagged_hours.add((ts - cadence).floor("h")) + # --- deviation rule (dropout vs reference, all cadences) --- + # Silently skipped when the ref column is absent, mirroring the + # `col not in scan.columns: continue` treatment of target columns. + if ( + deviation_mw is not None + and deviation_ref is not None + and deviation_ref in scan.columns + ): + ref = scan[deviation_ref] + # A single hourly slot already aggregates the dropout; the + # consecutive-slots requirement is meaningful sub-hourly only. + eff_slots = max(1, int(deviation_slots)) if is_sub_hourly else 1 + for col in targets: + if col not in scan.columns or col == deviation_ref: + continue + # NaN in either column -> NaN difference -> compares False, so + # the publication-lag frontier (forecast published, actual not + # yet) never flags and a gap breaks a consecutive run. + below = (scan[col] - ref < -float(deviation_mw)).to_numpy() + if eff_slots > 1: + # sustained[i]: slots i-eff_slots+1 .. i are all below. + sustained = below.copy() + for k in range(1, eff_slots): + sustained[k:] &= below[:-k] + sustained[: eff_slots - 1] = False + # Flag every slot of each qualifying run, not just its end. + run_mask = sustained.copy() + for k in range(1, eff_slots): + run_mask[:-k] |= sustained[k:] + else: + run_mask = below + # A deviation is a level property of the slot itself — no + # predecessor-hour semantics (unlike the step rule). + for ts in scan.index[run_mask]: + flagged_hours.add(ts.floor("h")) + if not flagged_hours: return pd.Series(False, index=df.index) @@ -279,6 +401,9 @@ def apply_target_corruption_policy( anchor_zone_hours: int, cutoff: Optional[pd.Timestamp], logger: logging.Logger, + deviation_mw: Optional[float] = None, + deviation_ref: Optional[str] = None, + deviation_slots: int = 2, ) -> Tuple[pd.DataFrame, TargetCorruptionReport]: """Apply the configured corruption policy to the native-cadence frame. @@ -319,6 +444,16 @@ def apply_target_corruption_policy( cutoff: The effective training cutoff timestamp used for the anchor-zone check. ``None`` disables the zone check. logger: Standard-library logger for WARNING/INFO messages. + deviation_mw: Deviation-rule threshold (MW, positive magnitude): + flags sustained dropouts ``target − reference < -deviation_mw``. + ``None`` skips that rule. See `detect_target_corruption`. + deviation_ref: Reference column name for the deviation rule (e.g. + ``"Forecasted Load"``). When enabling this rule, scope + ``targets`` to the actuals only (e.g. ``["Actual Load"]``) so + that ``heal``/``truncate`` NaN only the actual and the + reference survives as an exogenous prior. + deviation_slots: Minimum consecutive sub-hourly slots for the + deviation rule (default ``2``). Returns: Tuple of ``(df_out, report)`` where ``df_out`` is either the @@ -371,6 +506,9 @@ def apply_target_corruption_policy( range_mw=range_mw, step_mw=step_mw, window_days=window_days, + deviation_mw=deviation_mw, + deviation_ref=deviation_ref, + deviation_slots=deviation_slots, ) if not flag_mask.any(): diff --git a/tests/preprocessing/test_target_corruption.py b/tests/preprocessing/test_target_corruption.py index 000f9709..911e3bc1 100644 --- a/tests/preprocessing/test_target_corruption.py +++ b/tests/preprocessing/test_target_corruption.py @@ -744,3 +744,240 @@ def test_step_at_first_slot_flags_both_hours(self): assert mask.loc[ slots_h01 ].all(), "All slots in the step's own hour (01:00) must be flagged." + + +# --------------------------------------------------------------------------- +# Deviation rule (dropout vs reference column) +# --------------------------------------------------------------------------- + +# Chapter-style thresholds: the injected dropout is deliberately +# SUB-THRESHOLD for the dynamics rules (steps 5.8 GW < 6 GW, intra-hour +# range 5.8 GW < 8 GW) while sitting 11.6 GW under the reference — the +# 2026-06-07 frontier pattern that motivated the rule. +DEV_RANGE_MW = 8_000 +DEV_STEP_MW = 6_000 +DEV_MW = 8_000 + + +def _make_deviation_frame( + *, + offsets=(5_800, 11_600, 11_600, 5_800), + dropout_day: int = 2, + dropout_hour: int = 7, + nan_tail_slots: int = 8, +) -> pd.DataFrame: + """Two-column frame: constant forecast, actual with a dropout + NaN tail. + + Constant base values keep the injected dynamics exact (deterministic + sub-threshold steps); ``offsets`` are subtracted from the forecast at + the four slots of ``dropout_hour`` on ``dropout_day``. The last + ``nan_tail_slots`` actual slots are NaN while the forecast continues — + the ENTSO-E publication-lag frontier. + """ + idx = pd.date_range( + "2026-06-05", periods=3 * 24 * SLOTS_PER_HOUR, freq=CADENCE, tz="UTC" + ) + forecast = pd.Series(BASE_MW, index=idx) + actual = forecast.copy() + start = dropout_day * 24 * SLOTS_PER_HOUR + dropout_hour * SLOTS_PER_HOUR + for i, off in enumerate(offsets): + actual.iloc[start + i] = BASE_MW - off + if nan_tail_slots: + actual.iloc[-nan_tail_slots:] = np.nan + return pd.DataFrame({"Actual Load": actual, "Forecasted Load": forecast}) + + +class TestDetectorDeviation: + """Deviation rule: sustained dropout below a published reference.""" + + def _detect(self, df, **kwargs): + params = dict( + targets=["Actual Load"], + range_mw=DEV_RANGE_MW, + step_mw=DEV_STEP_MW, + window_days=3, + deviation_mw=DEV_MW, + deviation_ref="Forecasted Load", + ) + params.update(kwargs) + return detect_target_corruption(df, **params) + + def test_dynamics_rules_miss_the_dropout(self): + """Control: range/step alone must NOT flag the sub-threshold dropout.""" + df = _make_deviation_frame() + mask = self._detect(df, deviation_mw=None, deviation_ref=None) + assert not mask.any(), "sub-threshold dropout must evade dynamics rules" + + def test_deviation_rule_flags_sustained_dropout(self): + df = _make_deviation_frame() + mask = self._detect(df) + dropout_hour = pd.Timestamp("2026-06-07 07:00:00", tz="UTC") + slots = df.index[ + (df.index >= dropout_hour) + & (df.index < dropout_hour + pd.Timedelta(hours=1)) + ] + assert mask.loc[slots].all(), "deviation rule must flag the dropout hour" + # Nothing else flags: the surrounding clean hours stay clean. + assert mask.sum() == SLOTS_PER_HOUR + + def test_detector_inert_with_only_deviation_and_no_window(self): + df = _make_deviation_frame() + mask = detect_target_corruption( + df, + targets=["Actual Load"], + range_mw=None, + step_mw=None, + window_days=None, + deviation_mw=DEV_MW, + deviation_ref="Forecasted Load", + ) + assert not mask.any(), "window_days=None must keep the detector inert" + + def test_deviation_only_configuration_activates_detector(self): + """deviation_mw alone (range/step None) must activate the detector.""" + df = _make_deviation_frame() + mask = self._detect(df, range_mw=None, step_mw=None) + assert mask.any() + + def test_single_slot_dip_not_flagged_with_slots_2(self): + df = _make_deviation_frame(offsets=(11_600,)) + # One isolated slot 11.6 GW below: steps are 5.8+5.8 GW? No — a + # single 11.6 GW offset creates 11.6 GW steps, so disable the step + # rule to isolate the deviation-slots semantics. + mask = self._detect(df, range_mw=None, step_mw=None, deviation_slots=2) + assert not mask.any(), "single-slot blip must not flag at deviation_slots=2" + + def test_single_slot_dip_flagged_with_slots_1(self): + df = _make_deviation_frame(offsets=(11_600,)) + mask = self._detect(df, range_mw=None, step_mw=None, deviation_slots=1) + assert mask.any(), "deviation_slots=1 must flag a single-slot dropout" + + def test_missing_ref_column_is_inert(self): + df = _make_deviation_frame() + mask = self._detect(df, deviation_ref="NoSuchColumn") + assert not mask.any(), "absent reference column must skip the rule silently" + + def test_ref_none_is_inert(self): + df = _make_deviation_frame() + mask = self._detect(df, deviation_ref=None) + assert not mask.any() + + def test_frontier_nan_not_flagged(self): + """Publication-lag tail (forecast published, actual NaN) never flags.""" + df = _make_deviation_frame() + mask = self._detect(df) + tail = df.index[df["Actual Load"].isna()] + assert len(tail) > 0 + assert not mask.loc[ + tail + ].any(), "NaN-actual frontier slots must never be flagged" + + def test_nan_breaks_consecutive_run(self): + """A NaN between two below-threshold slots must break the run.""" + df = _make_deviation_frame(offsets=(11_600, 11_600)) + start = 2 * 24 * SLOTS_PER_HOUR + 7 * SLOTS_PER_HOUR + df.iloc[start + 1, df.columns.get_loc("Actual Load")] = np.nan + mask = self._detect(df, range_mw=None, step_mw=None, deviation_slots=2) + assert not mask.any(), "a NaN gap must break the consecutive-slot requirement" + + def test_positive_deviation_not_flagged(self): + """Actuals far ABOVE the forecast are under-forecasting, not corruption.""" + df = _make_deviation_frame(offsets=(-12_000, -12_000, -12_000, -12_000)) + mask = self._detect(df, range_mw=None, step_mw=None) + assert not mask.any(), "the deviation rule is dropout-only by design" + + def test_hourly_cadence_clamps_slots_to_one(self): + """On hourly data the sustained requirement collapses to one slot.""" + idx = pd.date_range("2026-06-05", periods=3 * 24, freq="h", tz="UTC") + forecast = pd.Series(BASE_MW, index=idx) + actual = forecast.copy() + actual.iloc[60] = BASE_MW - 12_000 # one corrupt hour + df = pd.DataFrame({"Actual Load": actual, "Forecasted Load": forecast}) + mask = detect_target_corruption( + df, + targets=["Actual Load"], + range_mw=None, + step_mw=None, + window_days=3, + deviation_mw=DEV_MW, + deviation_ref="Forecasted Load", + deviation_slots=2, + ) + assert mask.any(), "hourly cadence: a single corrupt hour must flag" + + def test_truncate_keeps_reference_column(self): + """policy='truncate' with scoped targets NaNs the actual only.""" + df = _make_deviation_frame() + n_forecast_obs = int(df["Forecasted Load"].notna().sum()) + df_out, report = apply_target_corruption_policy( + df, + targets=["Actual Load"], + policy="truncate", + range_mw=DEV_RANGE_MW, + step_mw=DEV_STEP_MW, + window_days=3, + max_heal_hours=0, + anchor_zone_hours=168, + cutoff=None, + logger=logging.getLogger("test-deviation"), + deviation_mw=DEV_MW, + deviation_ref="Forecasted Load", + ) + assert report.fired + assert report.action == "truncate" + assert report.first_flagged_hour == pd.Timestamp( + "2026-06-07 07:00:00", tz="UTC" + ) + assert ( + df_out.loc[report.first_flagged_hour :, "Actual Load"].isna().all() + ), "actual must be NaN from the first flagged hour onward" + assert ( + int(df_out["Forecasted Load"].notna().sum()) == n_forecast_obs + ), "the reference column must survive truncate untouched" + + def test_abort_on_deviation(self): + df = _make_deviation_frame() + with pytest.raises(TargetCorruptionError, match="corrupt hour"): + apply_target_corruption_policy( + df, + targets=["Actual Load"], + policy="abort", + range_mw=DEV_RANGE_MW, + step_mw=DEV_STEP_MW, + window_days=3, + max_heal_hours=0, + anchor_zone_hours=168, + cutoff=None, + logger=logging.getLogger("test-deviation"), + deviation_mw=DEV_MW, + deviation_ref="Forecasted Load", + ) + + def test_dst_fall_back_deviation_flagged(self): + """Deviation rule works on a Europe/Berlin frame across fall-back.""" + idx = pd.date_range( + "2025-10-24 00:00", + periods=4 * 24 * 4, + freq=CADENCE, + tz="Europe/Berlin", + ) + forecast = pd.Series(BASE_MW, index=idx) + actual = forecast.copy() + # Sustained 12 GW dropout in the 01:00 UTC hour of the fall-back day. + idx_utc = idx.tz_convert("UTC") + target_hour_utc = pd.Timestamp("2025-10-26 01:00:00", tz="UTC") + slots = [i for i, ts in enumerate(idx_utc) if ts.floor("h") == target_hour_utc] + assert len(slots) >= 2 + for i in slots[:2]: + actual.iloc[i] = BASE_MW - 12_000 + df = pd.DataFrame({"Actual Load": actual, "Forecasted Load": forecast}) + mask = detect_target_corruption( + df, + targets=["Actual Load"], + range_mw=None, + step_mw=None, + window_days=7, + deviation_mw=DEV_MW, + deviation_ref="Forecasted Load", + ) + assert mask.any(), "deviation dropout on the DST day must be flagged" diff --git a/tests/test_config_target_corruption_knobs.py b/tests/test_config_target_corruption_knobs.py index 6ef2f500..1fe73d24 100644 --- a/tests/test_config_target_corruption_knobs.py +++ b/tests/test_config_target_corruption_knobs.py @@ -34,6 +34,9 @@ def test_defaults(factory): assert cfg.target_corruption_policy == "abort" assert cfg.target_max_heal_hours == 0 assert cfg.target_anchor_zone_hours == 168 + assert cfg.target_qc_deviation_mw is None + assert cfg.target_qc_deviation_ref is None + assert cfg.target_qc_deviation_slots == 2 # --------------------------------------------------------------------------- @@ -50,6 +53,9 @@ def test_explicit_values_round_trip(factory): target_corruption_policy="truncate", target_max_heal_hours=24, target_anchor_zone_hours=48, + target_qc_deviation_mw=5_000.0, + target_qc_deviation_ref="Forecasted Load", + target_qc_deviation_slots=3, ) assert cfg.target_qc_range_mw == 5_000.0 assert cfg.target_qc_step_mw == 8_000.0 @@ -57,6 +63,9 @@ def test_explicit_values_round_trip(factory): assert cfg.target_corruption_policy == "truncate" assert cfg.target_max_heal_hours == 24 assert cfg.target_anchor_zone_hours == 48 + assert cfg.target_qc_deviation_mw == 5_000.0 + assert cfg.target_qc_deviation_ref == "Forecasted Load" + assert cfg.target_qc_deviation_slots == 3 # --------------------------------------------------------------------------- @@ -74,6 +83,9 @@ def test_set_params_round_trip(factory): target_corruption_policy="heal", target_max_heal_hours=12, target_anchor_zone_hours=72, + target_qc_deviation_mw=4_000.0, + target_qc_deviation_ref="Forecasted Load", + target_qc_deviation_slots=1, ) assert cfg.target_qc_range_mw == 3_000.0 assert cfg.target_qc_step_mw == 6_000.0 @@ -81,6 +93,9 @@ def test_set_params_round_trip(factory): assert cfg.target_corruption_policy == "heal" assert cfg.target_max_heal_hours == 12 assert cfg.target_anchor_zone_hours == 72 + assert cfg.target_qc_deviation_mw == 4_000.0 + assert cfg.target_qc_deviation_ref == "Forecasted Load" + assert cfg.target_qc_deviation_slots == 1 # --------------------------------------------------------------------------- @@ -119,6 +134,18 @@ def test_zero_anchor_zone_accepted(factory): assert cfg.target_anchor_zone_hours == 0 +@pytest.mark.parametrize("factory", [_entsoe, _multi]) +def test_zero_deviation_mw_accepted(factory): + cfg = factory(target_qc_deviation_mw=0.0) + assert cfg.target_qc_deviation_mw == 0.0 + + +@pytest.mark.parametrize("factory", [_entsoe, _multi]) +def test_deviation_slots_1_accepted(factory): + cfg = factory(target_qc_deviation_slots=1) + assert cfg.target_qc_deviation_slots == 1 + + # --------------------------------------------------------------------------- # Validation: reject invalid values # --------------------------------------------------------------------------- @@ -161,6 +188,25 @@ def test_negative_anchor_zone_rejected(factory): factory(target_anchor_zone_hours=-5) +@pytest.mark.parametrize("factory", [_entsoe, _multi]) +def test_negative_deviation_mw_rejected(factory): + with pytest.raises(ValueError, match="target_qc_deviation_mw"): + factory(target_qc_deviation_mw=-1.0) + + +@pytest.mark.parametrize("factory", [_entsoe, _multi]) +def test_non_string_deviation_ref_rejected(factory): + with pytest.raises(ValueError, match="target_qc_deviation_ref"): + factory(target_qc_deviation_ref=123) + + +@pytest.mark.parametrize("factory", [_entsoe, _multi]) +@pytest.mark.parametrize("bad", [0, -1]) +def test_non_positive_deviation_slots_rejected(factory, bad): + with pytest.raises(ValueError, match="target_qc_deviation_slots"): + factory(target_qc_deviation_slots=bad) + + # --------------------------------------------------------------------------- # get_params includes new knobs # --------------------------------------------------------------------------- @@ -176,4 +222,7 @@ def test_get_params_includes_knobs(factory): assert "target_corruption_policy" in p assert "target_max_heal_hours" in p assert "target_anchor_zone_hours" in p + assert "target_qc_deviation_mw" in p + assert "target_qc_deviation_ref" in p + assert "target_qc_deviation_slots" in p assert p["target_qc_range_mw"] == 4_000.0 From ea33040c8a11c99fce5f193c9deb7b2957346203 Mon Sep 17 00:00:00 2001 From: semantic-release-bot Date: Sun, 7 Jun 2026 12:47:41 +0000 Subject: [PATCH 2/2] chore(release): 18.1.0-rc.1 [skip ci] ## [18.1.0-rc.1](https://github.com/sequential-parameter-optimization/spotforecast2-safe/compare/v18.0.1...v18.1.0-rc.1) (2026-06-07) ### Features * **preprocessing:** add deviation rule to target-corruption detector ([95d45d2](https://github.com/sequential-parameter-optimization/spotforecast2-safe/commit/95d45d2ee3a6f53b3f17716987f4f32579105897)) ### Documentation * add live {python} Examples to all public symbols missing them ([5fac4ca](https://github.com/sequential-parameter-optimization/spotforecast2-safe/commit/5fac4cada521448aa51413d575013a90abf478f0)) --- CHANGELOG.md | 12 ++++++++++++ MODEL_CARD.md | 8 ++++---- pyproject.toml | 2 +- 3 files changed, 17 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b61406d8..cdab74f7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,15 @@ +## [18.1.0-rc.1](https://github.com/sequential-parameter-optimization/spotforecast2-safe/compare/v18.0.1...v18.1.0-rc.1) (2026-06-07) + + +### Features + +* **preprocessing:** add deviation rule to target-corruption detector ([95d45d2](https://github.com/sequential-parameter-optimization/spotforecast2-safe/commit/95d45d2ee3a6f53b3f17716987f4f32579105897)) + + +### Documentation + +* add live {python} Examples to all public symbols missing them ([5fac4ca](https://github.com/sequential-parameter-optimization/spotforecast2-safe/commit/5fac4cada521448aa51413d575013a90abf478f0)) + ## [18.0.1](https://github.com/sequential-parameter-optimization/spotforecast2-safe/compare/v18.0.0...v18.0.1) (2026-06-07) diff --git a/MODEL_CARD.md b/MODEL_CARD.md index e9a7628c..61d39efa 100644 --- a/MODEL_CARD.md +++ b/MODEL_CARD.md @@ -7,7 +7,7 @@ This card describes what spotforecast2-safe is, how to use it safely, the condit | Field | Value | | --- | --- | | Name | spotforecast2-safe | -| Version | 18.0.1 | +| Version | 18.1.0-rc.1 | | Type | Deterministic Python library for time series feature engineering and recursive multi-step forecasting. It performs no training of its own. | | Developed by | Thomas Bartz-Beielstein, ORCID [0000-0002-5938-5158](https://orcid.org/0000-0002-5938-5158) | | Distributed by | the `sequential-parameter-optimization` GitHub organization | @@ -18,7 +18,7 @@ This card describes what spotforecast2-safe is, how to use it safely, the condit The library depends only on numpy, pandas, scikit-learn, lightgbm, numba, pyarrow, requests, feature-engine, holidays, astral, and tqdm. It deliberately excludes plotly, matplotlib, spotoptim, optuna, torch, and tensorflow, so no plotting or automated-tuning code ships in this package. -Two Common Platform Enumeration (CPE) identifiers let vulnerability-tracking and software bill of materials (SBOM) tools recognize the package. The wildcard identifier `cpe:2.3:a:sequential_parameter_optimization:spotforecast2_safe:*:*:*:*:*:*:*:*` matches any release; the current release is `cpe:2.3:a:sequential_parameter_optimization:spotforecast2_safe:18.0.1:*:*:*:*:*:*:*`. +Two Common Platform Enumeration (CPE) identifiers let vulnerability-tracking and software bill of materials (SBOM) tools recognize the package. The wildcard identifier `cpe:2.3:a:sequential_parameter_optimization:spotforecast2_safe:*:*:*:*:*:*:*:*` matches any release; the current release is `cpe:2.3:a:sequential_parameter_optimization:spotforecast2_safe:18.1.0-rc.1:*:*:*:*:*:*:*`. The library itself is a low-risk component: it is deterministic, its source is fully inspectable, and it fails safe on invalid input. It is built to support high-risk AI systems in the sense of the EU AI Act, but it is not itself such a system. When it is embedded in a high-risk deployment, the duties that attach to that system fall on the integrator, not on the library. @@ -30,7 +30,7 @@ Responsibilities are divided as follows. | Distribution | sequential-parameter-optimization on GitHub | repository issue tracker | | Deployment, operation, and audit | the system integrator | defined per deployment | -The current release is 18.0.1, with a stable public interface pinned in `spotforecast2_safe.__init__.__all__`. The full version history, including release dates, is recorded in `CHANGELOG.md` and on the GitHub Releases page; it is maintained automatically by the release pipeline and is not repeated here. +The current release is 18.1.0-rc.1, with a stable public interface pinned in `spotforecast2_safe.__init__.__all__`. The full version history, including release dates, is recorded in `CHANGELOG.md` and on the GitHub Releases page; it is maintained automatically by the release pipeline and is not repeated here. ## 2. Intended Use and Scope @@ -216,7 +216,7 @@ Maintainer: Thomas Bartz-Beielstein, ORCID [0000-0002-5938-5158](https://orcid.o } ``` -Or as a formatted reference: Bartz-Beielstein, T. (2026). *spotforecast2-safe: Safety-critical subset of spotforecast2* (Version 18.0.1) [Computer software]. https://github.com/sequential-parameter-optimization/spotforecast2-safe +Or as a formatted reference: Bartz-Beielstein, T. (2026). *spotforecast2-safe: Safety-critical subset of spotforecast2* (Version 18.1.0-rc.1) [Computer software]. https://github.com/sequential-parameter-optimization/spotforecast2-safe The technical report (`bart26h/index.qmd`) is the long-form reference for design rationale, compliance mapping, and evaluation protocol. diff --git a/pyproject.toml b/pyproject.toml index ac805c4c..d175ec70 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "spotforecast2-safe" -version = "18.0.1" +version = "18.1.0-rc.1" description = "spotforecast2-safe (Core): Safety-critical time series forecasting for production" readme = "README.md" license = { text = "AGPL-3.0-or-later" }