Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
## [18.1.0-rc.1](https://github.com/sequential-parameter-optimization/spotforecast2-safe/compare/v18.0.1...v18.1.0-rc.1) (2026-06-07)


### Features

* **preprocessing:** add deviation rule to target-corruption detector ([95d45d2](https://github.com/sequential-parameter-optimization/spotforecast2-safe/commit/95d45d2ee3a6f53b3f17716987f4f32579105897))


### Documentation

* add live {python} Examples to all public symbols missing them ([5fac4ca](https://github.com/sequential-parameter-optimization/spotforecast2-safe/commit/5fac4cada521448aa51413d575013a90abf478f0))

## [18.0.1](https://github.com/sequential-parameter-optimization/spotforecast2-safe/compare/v18.0.0...v18.0.1) (2026-06-07)


Expand Down
8 changes: 4 additions & 4 deletions MODEL_CARD.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This card describes what spotforecast2-safe is, how to use it safely, the condit
| Field | Value |
| --- | --- |
| Name | spotforecast2-safe |
| Version | 18.0.1 |
| Version | 18.1.0-rc.1 |
| Type | Deterministic Python library for time series feature engineering and recursive multi-step forecasting. It performs no training of its own. |
| Developed by | Thomas Bartz-Beielstein, ORCID [0000-0002-5938-5158](https://orcid.org/0000-0002-5938-5158) |
| Distributed by | the `sequential-parameter-optimization` GitHub organization |
Expand All @@ -18,7 +18,7 @@ This card describes what spotforecast2-safe is, how to use it safely, the condit

The library depends only on numpy, pandas, scikit-learn, lightgbm, numba, pyarrow, requests, feature-engine, holidays, astral, and tqdm. It deliberately excludes plotly, matplotlib, spotoptim, optuna, torch, and tensorflow, so no plotting or automated-tuning code ships in this package.

Two Common Platform Enumeration (CPE) identifiers let vulnerability-tracking and software bill of materials (SBOM) tools recognize the package. The wildcard identifier `cpe:2.3:a:sequential_parameter_optimization:spotforecast2_safe:*:*:*:*:*:*:*:*` matches any release; the current release is `cpe:2.3:a:sequential_parameter_optimization:spotforecast2_safe:18.0.1:*:*:*:*:*:*:*`.
Two Common Platform Enumeration (CPE) identifiers let vulnerability-tracking and software bill of materials (SBOM) tools recognize the package. The wildcard identifier `cpe:2.3:a:sequential_parameter_optimization:spotforecast2_safe:*:*:*:*:*:*:*:*` matches any release; the current release is `cpe:2.3:a:sequential_parameter_optimization:spotforecast2_safe:18.1.0-rc.1:*:*:*:*:*:*:*`.

The library itself is a low-risk component: it is deterministic, its source is fully inspectable, and it fails safe on invalid input. It is built to support high-risk AI systems in the sense of the EU AI Act, but it is not itself such a system. When it is embedded in a high-risk deployment, the duties that attach to that system fall on the integrator, not on the library.

Expand All @@ -30,7 +30,7 @@ Responsibilities are divided as follows.
| Distribution | sequential-parameter-optimization on GitHub | repository issue tracker |
| Deployment, operation, and audit | the system integrator | defined per deployment |

The current release is 18.0.1, with a stable public interface pinned in `spotforecast2_safe.__init__.__all__`. The full version history, including release dates, is recorded in `CHANGELOG.md` and on the GitHub Releases page; it is maintained automatically by the release pipeline and is not repeated here.
The current release is 18.1.0-rc.1, with a stable public interface pinned in `spotforecast2_safe.__init__.__all__`. The full version history, including release dates, is recorded in `CHANGELOG.md` and on the GitHub Releases page; it is maintained automatically by the release pipeline and is not repeated here.

## 2. Intended Use and Scope

Expand Down Expand Up @@ -216,7 +216,7 @@ Maintainer: Thomas Bartz-Beielstein, ORCID [0000-0002-5938-5158](https://orcid.o
}
```

Or as a formatted reference: Bartz-Beielstein, T. (2026). *spotforecast2-safe: Safety-critical subset of spotforecast2* (Version 18.0.1) [Computer software]. https://github.com/sequential-parameter-optimization/spotforecast2-safe
Or as a formatted reference: Bartz-Beielstein, T. (2026). *spotforecast2-safe: Safety-critical subset of spotforecast2* (Version 18.1.0-rc.1) [Computer software]. https://github.com/sequential-parameter-optimization/spotforecast2-safe

The technical report (`bart26h/index.qmd`) is the long-form reference for design rationale, compliance mapping, and evaluation protocol.

Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
{
"hash": "7dd9705960361ff99aefd892719fe3e3",
"hash": "4c054038f0742ab143305463c939dea7",
"result": {
"engine": "jupyter",
"markdown": "---\ntitle: preprocessing.target_corruption.detect_target_corruption\n---\n\n\n\n```python\npreprocessing.target_corruption.detect_target_corruption(\n df,\n *,\n targets,\n range_mw,\n step_mw,\n window_days,\n)\n```\n\nDetect physically-impossible target-column corruption in the native frame.\n\nApplies two independent rules on the native-cadence (e.g. 15-min) series\nwithin a rolling look-back window ending at the last observed target\ntimestamp:\n\n- **Range rule** (sub-hourly cadence only): an hour is flagged when\n ``intra-hour max - intra-hour min > range_mw`` for any target column.\n Vacuously skipped for hourly-or-coarser cadence (intra-hour range is\n undefined on a single slot per hour).\n- **Step rule**: an hour is flagged when any ``|adjacent-slot diff|``\n that *touches* that hour exceeds ``step_mw`` for any target column.\n Applies to all cadences.\n\nFlags are OR-ed across target columns. ALL native-cadence slots of a\nflagged calendar hour are marked ``True`` in the returned boolean\n``Series``, so downstream NaN-ing operates on full hours rather than\nindividual sub-hourly slots.\n\nThe detector is **inert** (returns all-``False``) unless ``window_days``\nis set AND at least one of ``range_mw`` / ``step_mw`` is set. If the\ndata is shorter than ``window_days``, the window is clamped to\n``df.index.min()`` without raising.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------|---------------------------------------------------|--------------------------------------------------------------------------------------------------------------|------------|\n| df | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Native-cadence ``DataFrame`` indexed by a ``DatetimeIndex``. Must contain all columns listed in ``targets``. | _required_ |\n| targets | [Sequence](`typing.Sequence`)\\[[str](`str`)\\] | Sequence of target column names to inspect. | _required_ |\n| range_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum allowed intra-hour range (MW). ``None`` skips the range rule. | _required_ |\n| step_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum allowed absolute adjacent-slot difference (MW). ``None`` skips the step rule. | _required_ |\n| window_days | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Number of days before the last observed target to include in the scan. ``None`` makes the detector inert. | _required_ |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------|--------------------------------------------------------------------|\n| | [pd](`pandas`).[Series](`pandas.Series`) | Boolean ``pd.Series`` aligned to ``df.index``. ``True`` means the |\n| | [pd](`pandas`).[Series](`pandas.Series`) | slot belongs to a flagged calendar hour. All-``False`` when the |\n| | [pd](`pandas`).[Series](`pandas.Series`) | detector is inert or no corruption is found. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#851d62c6 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.preprocessing.target_corruption import (\n detect_target_corruption,\n)\n\n# 15-min cadence; one GW dropout at 12:15 inside the window\nidx = pd.date_range(\"2026-06-03\", periods=48, freq=\"15min\", tz=\"UTC\")\nvals = [55_000.0] * 48\nvals[5] = 44_000.0 # 11 GW step drop -> flags 12:00 hour\ndf = pd.DataFrame({\"load\": vals}, index=idx)\n\nmask = detect_target_corruption(\n df, targets=[\"load\"], range_mw=5_000, step_mw=8_000, window_days=3\n)\n# Slots in the 12:00 hour (index 4-7) are flagged\nassert mask.iloc[4:8].all(), \"Slots in the flagged hour must be True\"\nassert not mask.iloc[8:].any(), \"Subsequent clean slots must be False\"\nprint(\"flagged:\", mask.sum(), \"slots\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nflagged: 4 slots\n```\n:::\n:::\n\n\n",
"markdown": "---\ntitle: preprocessing.target_corruption.detect_target_corruption\n---\n\n\n\n```python\npreprocessing.target_corruption.detect_target_corruption(\n df,\n *,\n targets,\n range_mw,\n step_mw,\n window_days,\n deviation_mw=None,\n deviation_ref=None,\n deviation_slots=2,\n)\n```\n\nDetect physically-impossible target-column corruption in the native frame.\n\nApplies two independent rules on the native-cadence (e.g. 15-min) series\nwithin a rolling look-back window ending at the last observed target\ntimestamp:\n\n- **Range rule** (sub-hourly cadence only): an hour is flagged when\n ``intra-hour max - intra-hour min > range_mw`` for any target column.\n Vacuously skipped for hourly-or-coarser cadence (intra-hour range is\n undefined on a single slot per hour).\n- **Step rule**: an hour is flagged when any ``|adjacent-slot diff|``\n that *touches* that hour exceeds ``step_mw`` for any target column.\n Applies to all cadences.\n- **Deviation rule** (dropout-only, all cadences): an hour is flagged\n when ``target − reference < -deviation_mw`` holds for at least\n ``deviation_slots`` *consecutive* native-cadence slots within the\n scan window, where the reference is a published companion column\n such as the ENTSO-E day-ahead ``\"Forecasted Load\"``. The rule is\n asymmetric by design: the known corruption class is exclusively a\n dropout *below* the day-ahead forecast, while actuals above the\n forecast are ordinary under-forecasting. ``NaN`` in either column\n yields a ``NaN`` difference, which compares ``False`` — so the\n publication-lag frontier (forecast published, actual not yet) never\n flags, and a data gap breaks a consecutive run. On hourly-or-coarser\n cadence the sustained requirement collapses to a single slot. The\n rule is silently skipped when ``deviation_ref`` is missing from the\n frame (mirroring how absent target columns are skipped).\n\nFlags are OR-ed across target columns. ALL native-cadence slots of a\nflagged calendar hour are marked ``True`` in the returned boolean\n``Series``, so downstream NaN-ing operates on full hours rather than\nindividual sub-hourly slots.\n\nThe detector is **inert** (returns all-``False``) unless ``window_days``\nis set AND at least one of ``range_mw`` / ``step_mw`` / ``deviation_mw``\nis set. If the data is shorter than ``window_days``, the window is\nclamped to ``df.index.min()`` without raising.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------------|---------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| df | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Native-cadence ``DataFrame`` indexed by a ``DatetimeIndex``. Must contain all columns listed in ``targets``. | _required_ |\n| targets | [Sequence](`typing.Sequence`)\\[[str](`str`)\\] | Sequence of target column names to inspect. | _required_ |\n| range_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum allowed intra-hour range (MW). ``None`` skips the range rule. | _required_ |\n| step_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum allowed absolute adjacent-slot difference (MW). ``None`` skips the step rule. | _required_ |\n| window_days | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Number of days before the last observed target to include in the scan. ``None`` makes the detector inert. | _required_ |\n| deviation_mw | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum allowed dropout below the reference column (MW, positive magnitude): slots with ``target − reference < -deviation_mw`` are candidates. ``None`` skips the deviation rule. | `None` |\n| deviation_ref | [Optional](`typing.Optional`)\\[[str](`str`)\\] | Name of the reference column (e.g. ``\"Forecasted Load\"``). The rule is skipped when ``None`` or when the column is absent from ``df``. The reference column itself is never checked as a target by this rule. | `None` |\n| deviation_slots | [int](`int`) | Minimum number of *consecutive* sub-hourly slots the dropout must sustain before any hour is flagged (default ``2`` — a single-slot blip is more likely a metering glitch than the oscillating dropout class). Clamped to ``1`` on hourly-or-coarser cadence. | `2` |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------|--------------------------------------------------------------------|\n| | [pd](`pandas`).[Series](`pandas.Series`) | Boolean ``pd.Series`` aligned to ``df.index``. ``True`` means the |\n| | [pd](`pandas`).[Series](`pandas.Series`) | slot belongs to a flagged calendar hour. All-``False`` when the |\n| | [pd](`pandas`).[Series](`pandas.Series`) | detector is inert or no corruption is found. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#ed42fb8e .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.preprocessing.target_corruption import (\n detect_target_corruption,\n)\n\n# 15-min cadence; one GW dropout at 12:15 inside the window\nidx = pd.date_range(\"2026-06-03\", periods=48, freq=\"15min\", tz=\"UTC\")\nvals = [55_000.0] * 48\nvals[5] = 44_000.0 # 11 GW step drop -> flags 12:00 hour\ndf = pd.DataFrame({\"load\": vals}, index=idx)\n\nmask = detect_target_corruption(\n df, targets=[\"load\"], range_mw=5_000, step_mw=8_000, window_days=3\n)\n# Slots in the 12:00 hour (index 4-7) are flagged\nassert mask.iloc[4:8].all(), \"Slots in the flagged hour must be True\"\nassert not mask.iloc[8:].any(), \"Subsequent clean slots must be False\"\nprint(\"flagged:\", mask.sum(), \"slots\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nflagged: 4 slots\n```\n:::\n:::\n\n\n::: {#643afacd .cell execution_count=2}\n``` {.python .cell-code}\n# Deviation rule: a sub-threshold dropout the dynamics rules miss.\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.preprocessing.target_corruption import (\n detect_target_corruption,\n)\n\nidx = pd.date_range(\"2026-06-07\", periods=16, freq=\"15min\", tz=\"UTC\")\nforecast = pd.Series(48_000.0, index=idx)\nactual = forecast.copy()\n# Two consecutive slots 11.6 GW below the forecast, stepping by\n# only 5.8 GW per slot — below a 6 GW step rule, no range breach.\nactual.iloc[4] = forecast.iloc[4] - 5_800.0\nactual.iloc[5] = forecast.iloc[5] - 11_600.0\nactual.iloc[6] = forecast.iloc[6] - 11_600.0\nactual.iloc[7] = forecast.iloc[7] - 5_800.0\n# Publication-lag frontier: forecast published, actual not yet.\nactual.iloc[12:] = np.nan\ndf = pd.DataFrame({\"Actual Load\": actual, \"Forecasted Load\": forecast})\n\ndyn_only = detect_target_corruption(\n df, targets=[\"Actual Load\"],\n range_mw=15_000, step_mw=6_000, window_days=3,\n)\nwith_dev = detect_target_corruption(\n df, targets=[\"Actual Load\"],\n range_mw=15_000, step_mw=6_000, window_days=3,\n deviation_mw=8_000, deviation_ref=\"Forecasted Load\",\n)\nassert not dyn_only.any(), \"dynamics rules miss the dropout\"\nassert with_dev.iloc[4:8].any(), \"deviation rule catches it\"\nassert not with_dev.iloc[12:].any(), \"NaN frontier never flags\"\nprint(\"dynamics-only:\", int(dyn_only.sum()), \"| with deviation:\",\n int(with_dev.sum()))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ndynamics-only: 0 | with deviation: 4\n```\n:::\n:::\n\n\n",
"supporting": [
"preprocessing.target_corruption.detect_target_corruption_files"
"preprocessing.target_corruption.detect_target_corruption_files/figure-html"
],
"filters": [],
"includes": {}
Expand Down
3 changes: 3 additions & 0 deletions docs/reference/configurator.config_entsoe.ConfigEntsoe.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@ configurator.config_entsoe.ConfigEntsoe(
target_corruption_policy='abort',
target_max_heal_hours=0,
target_anchor_zone_hours=168,
target_qc_deviation_mw=None,
target_qc_deviation_ref=None,
target_qc_deviation_slots=2,
)
```

Expand Down
3 changes: 3 additions & 0 deletions docs/reference/configurator.config_multi.ConfigMulti.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,9 @@ configurator.config_multi.ConfigMulti(
target_corruption_policy='abort',
target_max_heal_hours=0,
target_anchor_zone_hours=168,
target_qc_deviation_mw=None,
target_qc_deviation_ref=None,
target_qc_deviation_slots=2,
)
```

Expand Down
Loading