From b9251b972822e6f3cc4cf17cf80d55234ea49ce3 Mon Sep 17 00:00:00 2001 From: Ian Tse Date: Tue, 21 Apr 2026 11:38:59 -0700 Subject: [PATCH] remove md files --- sdk_general_feedback.md | 296 ----------------------------- sdk_round2_feedback.md | 93 --------- sdk_weather_file_feedback.md | 352 ----------------------------------- 3 files changed, 741 deletions(-) delete mode 100644 sdk_general_feedback.md delete mode 100644 sdk_round2_feedback.md delete mode 100644 sdk_weather_file_feedback.md diff --git a/sdk_general_feedback.md b/sdk_general_feedback.md deleted file mode 100644 index 964015d..0000000 --- a/sdk_general_feedback.md +++ /dev/null @@ -1,296 +0,0 @@ -# SolarFarmer Python SDK — General Usability Feedback - -## Context - -This document captures non-weather-file observations from building a batch -energy-yield runner on top of `dnv-solarfarmer>=0.2.0rc1`. Weather file -feedback is covered separately in `sdk_weather_file_feedback.md`. - -The implementation was driven by an AI coding agent (Claude) working -interactively with a human engineer. Both perspectives inform the -observations below. - ---- - -## 1 Error reporting from the API - -### 1.1 `run_energy_calculation` returns `None` on failure - -When the API returns an HTTP 400 or other error, `run_energy_calculation()` -returns `None` with no exception raised and no error message surfaced to the -caller. The HTTP response body often contains useful validation messages, -but these are swallowed. - -**How debugging actually worked in practice:** - -1. `run_energy_calculation()` returned `None`. No exception, no stderr, no - log output — nothing to indicate what went wrong. -2. To make progress, we had to enable the SDK's logger at `ERROR` level. - This revealed HTTP 400 status codes with some error detail — but only - the subset that `_log_api_failure` manages to extract (see §1.3 below). -3. For errors where the logged detail was just `"Something went wrong."`, - we had to monkey-patch the HTTP client to intercept raw responses. -4. The error bodies contained clear, actionable validation messages — e.g. - *"The rack height specified in Mounting Type Specifications 'sat_mount' - is smaller than the module specifications"* and *"The mounting - specification is missing one or several of the required bifacial - properties."* These were immediately fixable once visible. - -The API's validation layer is doing good work — the messages it produces are -specific and actionable. The problem is entirely on the SDK side: those -messages are discarded and the caller gets `None`. - -**Suggestion:** On non-2xx responses, raise a dedicated exception (e.g. -`SolarFarmerAPIError`) that includes the status code, error message, and -the full `ProblemDetails` body. The `Response` object already contains all -of this information (`code`, `exception`, `problem_details_json`) — it just -needs to be surfaced to the caller rather than logged and discarded. - -This is the highest-impact change in this document — every issue described -in §§2–4 and §§7–8 below was ultimately diagnosed through the API's own -error messages, but only after manually bypassing the SDK to read them. - -### 1.2 `PVSystem.run_energy_calculation` return type mismatch - -`PVSystem.run_energy_calculation()` has a return annotation of `-> None` -and always executes `return None`, storing the result in `self.results` -instead. However, its docstring says: - -> Returns: CalculationResults - -This should either return the result (matching the docstring) or the -docstring should be corrected and the `self.results` pattern documented. -Returning `None` while the docstring promises `CalculationResults` will -mislead both human developers and AI agents. - -### 1.3 `_log_api_failure` silently swallows parsing errors - -`_log_api_failure` (in `endpoint_modelchains.py`) wraps its detail- -extraction logic in a bare `except Exception: pass`. This means that if -`problem_details_json` isn't in the expected shape, the error detail is -silently lost — the developer sees only the generic `Failure message` line, -not the title, errors, or detail fields. - -This is compounded by a type mismatch: `Response.problem_details_json` is -annotated as `str | None`, but `Client._make_request` assigns it the return -value of `response.json()` (a `dict`) on success, or `""` (empty string) on -failure. When it's an empty string, `json_response.get("errors")` raises -`AttributeError`, which is caught and silently discarded. - -**Suggestion:** Fix the type annotation to `dict | None`, handle the empty- -string fallback as `None`, and replace the bare `except` with specific -exception handling (or at least log the parsing error at `DEBUG` level). - ---- - -## 2 Spec ID ↔ filename coupling - -The `moduleSpecificationID` and `inverterSpecID` fields in `Layout` and -`Inverter` must exactly match the **filename stems** (without extension) of -the uploaded PAN/OND files. For example, uploading -`Trina_TSM-DEG19C.20-550_APP.PAN` means the spec ID must be -`Trina_TSM-DEG19C.20-550_APP`. - -This coupling is not documented in the field docstrings or in -`run_energy_calculation`. A developer's natural instinct is to use a -descriptive ID (e.g. `trina_550`), which produces an opaque validation error. - -**Suggestion:** Document the filename-stem convention in the -`moduleSpecificationID` and `inverterSpecID` field docstrings. Alternatively, -the SDK could infer spec IDs from filenames when the user doesn't explicitly -set them, or validate the match client-side before uploading. - ---- - -## 3 Bifacial module auto-detection - -When a PAN file contains a non-zero `BifacialityFactor`, the API requires -three additional properties on the `MountingTypeSpecification`: - -- `transmissionFactor` -- `bifacialShadeLossFactor` -- `bifacialMismatchLossFactor` - -If these are omitted, the API returns a validation error. However: - -1. The SDK does not document this dependency between PAN file contents and - mounting spec fields. -2. The `MountingTypeSpecification` model does not indicate which fields - become required for bifacial modules. -3. There is no client-side validation or warning. - -**Suggestion:** Add a note in the `MountingTypeSpecification` docstring -explaining that these three fields are required when the module is bifacial. -The PAN file's `BifacialityFactor` value could also be mentioned as the -trigger. - ---- - -## 4 Rack height validation - -The `rackHeight` field on `MountingTypeSpecification` must be strictly -greater than the physical height of the module (as specified in the PAN file). -For a portrait-oriented single module high, this means `rackHeight` > -module long dimension (e.g. 2.384 m for the Trina TSM-DEG19C.20-550). - -Setting `rackHeight` equal to the module height triggers: -*"The rack height specified in Mounting Type Specifications 'sat_mount' is -smaller than the module specifications."* - -This is reasonable validation, but the relationship between `rackHeight`, -`numberOfModulesHigh`, `modulesAreLandscape`, and the PAN file's physical -dimensions is not documented. A formula or note in the docstring would help: - -``` -Minimum rackHeight = numberOfModulesHigh × module_dimension - + (numberOfModulesHigh - 1) × ySpacingBetweenModules -where module_dimension = module height if portrait, width if landscape -``` - ---- - -## 5 `MonthlyAlbedo` constructor - -The `MonthlyAlbedo` model accepts a `values` parameter (a list of 12 floats), -not a `monthlyAlbedo` parameter. This is discoverable via `help()` but the -parameter name is unintuitive — a developer might expect `monthlyAlbedo` or -`albedo_values` given the class name. - -Minor point, but mentioning it because an AI agent tried `monthlyAlbedo=` -first (matching the class name pattern) and had to fall back to inspecting -the signature. - ---- - -## 6 `EnergyCalculationOptions.calculationYear` - -This field defaults to `1990`. For TMY data (which contains mixed years by -definition), the relationship between `calculationYear` and the timestamps -in the weather file is unclear. Does the API remap all timestamps to this -year? Does it filter? Does it ignore the year in TMY timestamps? - -**Suggestion:** A one-line docstring note explaining how `calculationYear` -interacts with TMY data would prevent confusion. - ---- - -## 7 `gridConnectionLimit` units: docstring says MW, API expects Watts - -The `PVPlant.grid_connection_limit` field docstring reads: - -> Maximum power that can be exported from the transformers in MW - -However, the API expects this value in **Watts**. Setting -`gridConnectionLimit=17.6` (intending 17.6 MW) causes the API to return -an opaque HTTP 400 with `"detail": "Something went wrong."` — no -indication that the value is being interpreted as 17.6 W. - -The `PVSystem` builder correctly converts with `grid_limit_MW * 1e6` -(line 1111 of `pvsystem.py`), confirming the API expects Watts. - -**Suggestion:** Fix the docstring to say "in Watts" (or "in W"), not "in -MW". - ---- - -## 8 `TransformerSpecification` is required but marked optional - -Both `Transformer.transformer_spec_id` (type `str | None`) and -`PVPlant.transformer_specifications` (type `dict | None`) are marked as -optional in the Pydantic models. Their docstrings give no indication -that the API requires them: - -- `Transformer.transformer_spec_id`: *"Reference to a transformer - specification"* -- `PVPlant.transformer_specifications`: *"Transformer specs keyed by ID"* - -Omitting both fields causes the API to return HTTP 400 with -`"detail": "Something went wrong."` — the same opaque error as the -grid-limit issue. There is no validation error message indicating -what's missing. - -The `PVSystem` builder always creates a `NoLoadAndOhmic` transformer -specification, confirming it's required in practice. - -**Suggestion:** Either: -1. Change the field types from `Optional` to required, or add a - client-side validator on `PVPlant` that raises a clear error when - a transformer references a spec ID not present in - `transformer_specifications`. -2. At minimum, update the docstrings to say these fields are - required by the API despite being structurally optional in the model. - ---- - -## 9 `PVSystem` builder uses PAN-internal module name, not filename stem - -The `PVSystem` builder reads the module name from inside the PAN file -content (e.g. `Trina_TSM-DEG19C` from the PVsyst model identifier) and -uses it as the `moduleSpecificationID`. However, the API requires this -ID to match the **filename stem** of the uploaded PAN file (e.g. -`Trina_TSM-DEG19C.20-550_APP`). - -This means `PVSystem.run_energy_calculation()` fails with: - -> *The given key 'Trina_TSM-DEG19C' was not present in the dictionary.* - -The workaround is to rename the PAN file so its stem matches the internal -model name, but this is fragile and non-obvious. - -**Suggestion:** The builder should use the filename stem (from the -`pan_files` dict key or the file path) as the spec ID, not the PAN file's -internal model identifier. - ---- - -## 10 `calculationYear` silently discards TMY data from non-matching years - -`EnergyCalculationOptions.calculationYear` defaults to `1990`. When the -weather file contains TMY data with mixed years (as produced by NSRDB, -PVGIS, and other TMY generators), the API processes **only** rows whose -timestamp year matches `calculationYear`. For NSRDB PSM4 TMY data, which -uses different years per month (e.g. Jan=2000, Feb=2003, …), this results -in partial-year calculations with dramatically wrong results — and no -warning or error. - -In our case, the API silently processed 744 hours (January 2000 only) -instead of 8760, producing −9.18 kWh/kWp and a Performance Ratio of -−0.12. There was no error message indicating that 92% of the data was -discarded. - -**Workaround:** Remap all TMY timestamps to year 1990 before writing the -weather file. - -**Suggestion:** -1. Document the interaction between `calculationYear` and TMY timestamps - prominently — this is not a minor detail, it determines whether the - calculation produces valid results. -2. Consider adding a warning when the number of processed timesteps is - significantly less than expected (e.g. <8000 for hourly data). -3. Consider a `TMY` mode or `auto` setting for `calculationYear` that - either ignores the year component or remaps automatically. - ---- - -## 11 Summary of priorities - -| # | Issue | Effort | Impact | -|---|-------|--------|--------| -| 1.1 | Raise exception on API failure instead of returning `None` | Small–Medium | **Critical** — affects every misconfiguration scenario | -| 1.3 | Fix `_log_api_failure` type mismatch and bare `except` | Trivial | **Critical** — causes silent loss of error details | -| 1.2 | Fix `PVSystem.run_energy_calculation` return type | Trivial | Medium — docstring contradicts code | -| 7 | Fix `gridConnectionLimit` docstring (MW → W) | Trivial | **Critical** — causes opaque failure with no diagnostic | -| 8 | Document or require `TransformerSpecification` | Trivial–Small | **High** — another opaque 400 error | -| 10 | Document/fix `calculationYear` + TMY year handling | Small | **High** — silently produces wrong results | -| 9 | Fix `PVSystem` module spec ID (PAN name vs filename) | Small | **High** — `PVSystem` builder fails on common PAN filenames | -| 2 | Document spec ID ↔ filename stem convention | Trivial | High — prevents a common first-use error | -| 3 | Document bifacial mounting property requirements | Trivial | High — prevents a non-obvious validation failure | -| 4 | Document rack height formula / constraints | Trivial | Medium — the error message is helpful but prevention is better | -| 5 | `MonthlyAlbedo` parameter naming | Trivial | Low — minor discoverability improvement | -| 6 | Document `calculationYear` + TMY interaction | Trivial | Medium — subsumed by §10 above | - -Items 7, 8, and 10 were the **actual blockers** that prevented a working -API call. Items 7 and 8 both produce the same opaque `"Something went -wrong."` error, making them indistinguishable without intercepting raw HTTP -responses. Item 10 produces silently wrong results with no error at all — -arguably worse than a failure. diff --git a/sdk_round2_feedback.md b/sdk_round2_feedback.md deleted file mode 100644 index 53458cc..0000000 --- a/sdk_round2_feedback.md +++ /dev/null @@ -1,93 +0,0 @@ -# SolarFarmer Python SDK — Feedback from Agent-Driven Integration Test - -**Date:** 2025-04-07 -**SDK version:** git commit `4f13c6e` (github.com/dnv-opensource/solarfarmer-python-sdk) -**Test:** ~20 MW bifacial plants (tracker + fixed tilt) at 7 NOAA SURFRAD locations using PSM4 TMY data via pvlib - -## Context - -An AI coding agent was given a one-paragraph prompt to build and run SolarFarmer energy calculations for all SURFRAD sites. This document summarizes the friction points encountered, ranked by debugging cost, with suggested improvements. - ---- - -## Immediate (docs + bug fixes) - -### 1. PAN/OND Filename Parsing Inconsistency — Bug Fix - -**Problem:** `pvsystem.py:1737` (PAN) and `pvsystem.py:1710` (OND) use `filename.split(".")[0]` to derive spec IDs, producing `Trina_TSM-DEG19C` from `Trina_TSM-DEG19C.20-550_APP.PAN`. But the client-side validation in `endpoint_modelchains.py:58` uses `Path(f.name).stem`, producing `Trina_TSM-DEG19C.20-550_APP`. These don't match, causing a `ValueError` before the API call is even made. Both PAN and OND paths are affected. - -**Cost:** ~25% of debugging time. Required source-diving into both files to understand the mismatch. Workaround was copying the PAN file with periods replaced by underscores. - -**Fix:** Use `Path.stem` (or `rsplit(".", 1)[0]`) consistently in both locations. - -### 2. TMY Year Requirement Not Discoverable from `PVSystem` Workflow — Docs - -**Problem:** TMY data from NSRDB contains timestamps from multiple source years (e.g. 2003, 2005, 2010, 2016). Submitting this as a TSV file to the API returns HTTP 400 with `{"detail": "Something went wrong."}` — no indication that timestamps are the issue. - -**Existing documentation:** The `calculation_year` docstring in `EnergyCalculationOptions` correctly warns about TMY mixed-year data and the need to remap timestamps. However, `PVSystem` users never interact with `EnergyCalculationOptions` directly, and the `calculation_year` docs state that for TSV format the value is "Ignored — timestamps in the file are used as-is", which could be read as implying raw TMY timestamps are acceptable. The agent found `weather.py` and `PVSystem` docs but had no reason to look at `EnergyCalculationOptions`. - -**Cost:** ~40% of debugging time. Required ~15 rounds of payload inspection, monkey-patching the response handler, and trial-and-error before discovering the root cause. - -**Fix:** - -- Add a note to the `PVSystem.weather_file` property docstring warning that TMY data with mixed source years must be remapped to a single calendar year. Reference `EnergyCalculationOptions.calculation_year` for details. -- Add a brief note to the `weather.py` module docstring (not in `TSV_COLUMNS`, which is a format spec — the year issue is TMY-specific, not a general format constraint). - -### 3. Weather Format Conversion — Docs - -**Problem:** Converting pvlib DataFrames to SolarFarmer TSV format requires knowing: column name mapping, datetime format (`YYYY-MM-DDThh:mm+OO:OO`), tab delimiter, year normalization, and pressure units. This information is spread across `weather.py` and internal source code. - -**Cost:** ~20% of debugging time. - -**Fix:** - -- Add a pvlib-to-SolarFarmer column mapping reference to the `weather.py` module docstring (e.g. `temp_air` → `TAmb`, `wind_speed` → `WS`, `pressure` → `Pressure`). -- Add a minimal conversion code example in `weather.py` or the `PVSystem` class docstring. - -### 4. `MountingType` Import Path — Docs - -**Problem:** `sf.MountingType` raises `AttributeError`. Must import from `solarfarmer.models.pvsystem.pvsystem`. - -**Cost:** ~5%. - -**Fix:** Document the import path in the `PVSystem` class docstring, since `mounting` is a required parameter. - -### 5. PAN/OND Dict Key Semantics — Docs - -**Problem:** `PVSystem.pan_files` accepts `dict[str, Path]`, but the dict keys are ignored — spec IDs are derived from filenames via `filename.split(".")[0]`. The dict interface implies the keys are meaningful. - -**Cost:** ~5%. - -**Fix:** Document in the `pan_files` and `ond_files` property docstrings that keys are not used as spec IDs. - -### 6. `CalculationResults` Usage — Docs - -**Problem:** No `results.net_energy_MWh` property. Must use `results.get_performance()['net_energy']`. - -**Cost:** ~5%. - -**Fix:** Add a usage example in the `CalculationResults` class docstring showing `get_performance()` and available keys. - ---- - -## Future Improvements - -- **Server-side validation:** Return field-level error details instead of generic `{"detail": "Something went wrong."}` for 400 responses. This is the single highest-impact improvement but is an API-side change. -- **Client-side TMY validation:** Detect non-contiguous years in TSV weather files before upload and raise a clear `ValueError` with remediation guidance. -- **Weather conversion utility:** `sf.weather.from_pvlib(df)` or `sf.weather.from_dataframe(df)` to handle column renaming, timestamp formatting, and year normalization for TMY data. -- **Top-level enum exports:** Export `MountingType`, `InverterType`, `OrientationType` from `solarfarmer.__init__`. -- **`pan_files` / `ond_files` as list:** Accept `list[Path]` in addition to `dict[str, Path]`, since the keys are unused. -- **Convenience properties on `CalculationResults`:** `net_energy_MWh`, `performance_ratio`, `energy_yield_kWh_per_kWp`. - ---- - -## Summary of Estimated Debugging Cost - -| Issue | Relative Cost | Category | -|---|---|---| -| TMY year requirement not discoverable | ~40% | Docs | -| PAN filename `split(".")` vs `Path.stem` | ~25% | Bug fix | -| Weather format conversion | ~20% | Docs | -| Missing enum import path | ~5% | Docs | -| Dict key confusion | ~5% | Docs | -| No convenience accessors | ~5% | Docs | diff --git a/sdk_weather_file_feedback.md b/sdk_weather_file_feedback.md deleted file mode 100644 index 3e17304..0000000 --- a/sdk_weather_file_feedback.md +++ /dev/null @@ -1,352 +0,0 @@ -# SolarFarmer Python SDK — Weather File Discoverability Feedback - -## Context - -This document captures observations from building a batch energy-yield runner -on top of `dnv-solarfarmer>=0.2.0rc1`. The task involved constructing -`EnergyCalculationInputs` payloads for 14 project variants and sourcing -NSRDB PSM4 TMY weather data formatted for the SolarFarmer API. - -An AI coding agent (Claude) drove the implementation. The agent's workflow — -calling `help()`, reading docstrings, inspecting signatures — mirrors how -human developers explore an unfamiliar SDK, and exposes where the discoverable -documentation runs short. The recommendations below target both audiences; -agents (especially lower-capability ones) are more sensitive to these gaps -because they cannot fall back on domain intuition the way a human might. - ---- - -## 1 What the SDK tells you about weather files today - -The table below is the **complete set** of weather-file guidance reachable -from `help()` / docstrings without leaving the REPL: - -| Symbol | Documentation | -|--------|---------------| -| `MeteoFileFormat` enum | `"Meteorological file format."` — four values (`dat`, `tsv`, `PvSystStandardFormat`, `ProtobufGz`), no per-value descriptions | -| `EnergyCalculationInputsWithFiles.meteo_file_path_or_contents` | `"Meteorological file path or inline contents"` | -| `EnergyCalculationInputsWithFiles.meteo_file_format` | `"Format of the meteorological file"` | -| `run_energy_calculation(meteorological_data_file_path=...)` | `"Accepts Meteonorm .dat, TSV, a SolarFarmer desktop export (e.g., MeteorologicalConditionsDatasetDto_Protobuf.gz), or PVsyst standard format .csv files"` | -| `parse_files_from_paths()` | `"Accepted extensions are .tsv, .dat, .csv (PVsyst format), and .gz (protobuf transfer file)"` | -| `MissingMetDataMethod` enum | `FAIL_ON_VALIDATION`, `REMOVE_TIMESTAMP` — no documentation of what constitutes "missing" data | - -None of these locations specify column names, column ordering, expected units, -timestamp format, timezone convention, or missing-data sentinel values. - ---- - -## 2 What a developer actually needs to produce a valid weather file - -To write the TSV conversion function (`fetch_tmy._to_sf_tsv()`), the -following had to be sourced from the **web-hosted user guide** — not from the -SDK: - -| Requirement | What we hard-coded | SDK source | -|-------------|-------------------|------------| -| Column names | `GHI`, `DHI`, `TAmb`, `WS`, `Pressure`, `PW`, `RH` | None | -| Column header aliases | API also accepts e.g. `Glob`/`SolarTotal` for GHI, `diff`/`diffuse`/`DIF` for DHI, `Temp`/`T` for temperature, `Wind`/`WindSpeed` for wind, `Water` for PW, `AP` for pressure, `Humidity` for RH — discovered only by reading API source code (`ReadMetDataFromTSV.cs`) | None | -| Column order matters | Yes — required columns first, optional after | None | -| Timestamp format | `YYYY-MM-DDThh:mm+OO:OO` — **not** `YYYY-MM-DD HH:MM:SS` (see §2.1 below) | None | -| Timezone convention | UTC offset embedded in each timestamp; local standard time with offset | None | -| Units: irradiance | W/m² | None | -| Units: temperature | °C | None | -| Units: wind speed | m/s | None | -| Units: pressure | mbar (not Pa, not hPa) | None | -| Units: precipitable water | cm | None | -| Units: relative humidity | % | None | -| Missing-data sentinel | `9999` or `-9999` | None | -| Value ranges | GHI 0–1300, DHI 0–1000, TAmb −35–60, WS 0–50, Pressure 750–1100, RH 0–100 | None | - -And TSV is only one of the four format families the API accepts. The same -information gap exists for `.dat` (Meteonorm), `.csv` (PVsyst standard -format, NREL SAM, NSRDB TMY3, SolarAnywhere, SolarGIS, Solcast, Vaisala), -and `.gz` (protobuf). The web user guide lists **11 top-level format -variants**; the SDK enum exposes 4 multipart field types that cover them. - -### 2.1 Timestamp format details - -The exact timestamp format required by the TSV parser is: - - YYYY-MM-DDThh:mm+OO:OO - -Key requirements: - -- `T` separator between date and time (not a space) -- No seconds component -- Mandatory UTC offset (e.g. `-06:00`, `+00:00`) — `Z` notation is not - accepted, and omitting the offset is not accepted -- The API parses timestamps via `DateTimeOffset.ParseExact` with format - `yyyy-MM-dd'T'HH:mmzzz` - -The line-detection regex in the TSV parser (`ReadMetDataFromTSV.cs`) requires -the `T` separator — timestamps without it are not recognized as data lines, -causing the file to appear empty. - -**Example valid timestamps:** -``` -2001-01-01T00:00-06:00 -2001-06-15T12:30+00:00 -2019-12-31T23:00-07:00 -``` - -### 2.2 Accepted column header aliases - -The TSV parser accepts multiple names for each variable. These aliases are -defined in `ReadMetDataFromTSV.cs` but not documented in the SDK: - -| Variable | Accepted headers | -|----------|-----------------| -| GHI | `GHI`, `ghi`, `Glob`, `SolarTotal` | -| DHI | `DHI`, `diff`, `diffuse`, `DIF` | -| Temperature | `Temp`, `T`, `TAmb` | -| Wind speed | `WS`, `Wind`, `Speed`, `WindSpeed` | -| Precipitable water | `Water`, `PW` | -| Air pressure | `Pressure`, `AP`, `pressure` | -| Relative humidity | `Humidity`, `RH`, `humidity` | -| Albedo | `Surface Albedo`, `Albedo`, `albedo` | -| Soiling | `SL`, `Soiling`, `SoilingLoss`, `Slg` | -| Date/time | `Date`, `DateTime`, `Time` | - -Exposing these alias lists in the SDK (e.g. as a dict constant) would let -both humans and agents validate their header names before uploading. - ---- - -## 3 Files that bypass the Pydantic layer - -The weather file is not the only input that passes through the SDK as an -opaque path or byte stream. The full inventory: - -| File type | SDK treatment | Pydantic-modelled? | Notes | -|-----------|---------------|-------------------|-------| -| **Weather** (`.dat`, `.tsv`, `.csv`, `.gz`) | `str` path → opened as `rb` → uploaded as multipart `tmyFile` / `pvSystStandardFormatFile` / `metDataTransferFile` | **No** | No column/unit/format schema in SDK | -| **PAN** (module spec) | `str` path → opened as `rb` → uploaded as multipart `panFiles` | **No** — but `PanFileSupplements` models overrides (quality factor, LID, IAM, bifaciality) | PAN format is a PVsyst standard; modelling it is out of scope | -| **OND** (inverter spec) | `str` path → opened as `rb` → uploaded as multipart `ondFiles` | **No** — but `OndFileSupplements` models overrides (over-power mode, voltage derate) | OND format is a PVsyst standard; same reasoning | -| **HOR** (horizon profile) | `str` path → opened as `rb` → uploaded as multipart `horFile` | **Partially** — `EnergyCalculationInputs` has `horizon_azimuths` / `horizon_angles` list fields; `HorizonType` enum describes azimuth conventions | Inline numeric arrays are modelled; the file format is not | - -The PAN and OND file formats are owned by PVsyst and widely standardised -across solar modelling tools. Documenting their internal structure in this -SDK would be out of scope — the SDK's role is to pass them through to the API -unchanged, and the `*Supplements` models appropriately handle the subset of -parameters the user may want to override. - -The horizon file is partially covered: if you have the data as arrays, you -can bypass the file entirely via the `horizon_azimuths` / `horizon_angles` -fields. The `HorizonType` enum documents the azimuth conventions clearly. - -**The weather file is the outlier** — not because it's the only file that -bypasses Pydantic, but because: - -1. It's the only one whose format is **defined by SolarFarmer itself** (the - TSV variant in particular). -2. It's the only one where the **user commonly needs to generate or convert** - the file rather than use an existing one from a data provider. -3. Its format details (columns, units, timestamp handling) are already - documented in the web user guide — they just aren't surfaced in the SDK. - ---- - -## 4 Recommendations - -### 4.1 Enrich the `MeteoFileFormat` enum (minimal cost, immediate impact) - -Add per-value docstrings and a link to the full specification: - -```python -class MeteoFileFormat(str, Enum): - """Meteorological file format. - - The API accepts weather data in several formats. The SDK maps file - extensions to the correct multipart upload field automatically: - - - ``.dat`` and ``.tsv`` → ``tmyFile`` - - ``.csv`` (PVsyst standard) → ``pvSystStandardFormatFile`` - - ``.gz`` (protobuf) → ``metDataTransferFile`` - - For TSV format details (column names, units, timestamp convention), - see the user guide: - https://mysoftware.dnv.com/.../DefineClimate/SolarResources.html - """ - - DAT = "dat" - """Meteonorm PVsyst Hourly TMY format.""" - TSV = "tsv" - """SolarFarmer tab-separated values — see class docstring for column spec.""" - PVSYST_STANDARD_FORMAT = "PvSystStandardFormat" - """PVsyst standard CSV export format.""" - PROTOBUF_GZ = "ProtobufGz" - """SolarFarmer desktop binary transfer format (protobuf, gzip-compressed).""" -``` - -This is a documentation-only change; no new code, no API surface change. - -### 4.2 Add a TSV format specification as a module-level docstring or constants - -The TSV format is SolarFarmer's own. A module (e.g. -`solarfarmer.models.weather` or constants in `solarfarmer.config`) could -expose the spec as discoverable Python objects: - -```python -# solarfarmer/models/weather.py - -TSV_COLUMNS = { - "required": [ - {"name": "DateTime", "unit": None, "format": "YYYY-MM-DDThh:mm+OO:OO", "note": "ISO 8601 with mandatory UTC offset, no seconds, T separator"}, - {"name": "GHI", "unit": "W/m²", "range": (0, 1300), "aliases": ["ghi", "Glob", "SolarTotal"]}, - {"name": "TAmb", "unit": "°C", "range": (-35, 60), "aliases": ["Temp", "T"]}, - ], - "optional": [ - {"name": "DHI", "unit": "W/m²", "range": (0, 1000), "aliases": ["diff", "diffuse", "DIF"], "note": "omit only if unavailable; engine can decompose from GHI (calculateDHI=True) but measured DHI is strongly preferred for accuracy"}, - {"name": "WS", "unit": "m/s", "range": (0, 50), "aliases": ["Wind", "Speed", "WindSpeed"]}, - {"name": "Pressure", "unit": "mbar", "range": (750, 1100), "aliases": ["AP", "pressure"]}, - {"name": "PW", "unit": "cm", "range": (0, 100), "aliases": ["Water"]}, - {"name": "RH", "unit": "%", "range": (0, 100), "aliases": ["Humidity", "humidity"]}, - {"name": "Albedo", "unit": "—", "range": (0, 1), "aliases": ["Surface Albedo", "albedo"]}, - {"name": "Soiling", "unit": "—", "range": (0, 1), "aliases": ["SL", "SoilingLoss", "Slg"]}, - ], - "missing_value_sentinel": 9999, - "delimiter": "\t", -} -``` - -This doubles as documentation and as a machine-readable contract that tools -(and AI agents) can consume programmatically. No Pydantic model overhead, no -validation logic — just a discoverable data structure. - -### 4.3 Cross-reference weather-dependent options - -Several fields in `EnergyCalculationOptions` have implicit dependencies on -the weather file contents, but nothing connects them: - -| Option | Dependency | Current documentation | -|--------|-----------|----------------------| -| `calculate_dhi` | If `True`, DHI column is not required in the weather file — but providing measured DHI is **strongly preferred** for accuracy; the engine's GHI→DHI decomposition is a fallback, not a replacement for source data | `"Whether to calculate DHI from GHI"` — no mention of weather file or accuracy trade-off | -| `missing_met_data_handling` | Controls behaviour when weather data contains sentinel values (`9999`) | `"Behaviour when required meteorological variables have missing data"` — no mention of what "missing" means in the file | -| `use_albedo_from_met_data_when_available` | Reads `Albedo` column from weather file if present | Field exists but no cross-reference to weather column name | -| `use_soiling_from_met_data_when_available` | Reads `Soiling` column from weather file if present | Same | -| `calculation_year` | **Only** processes weather file rows whose timestamp year matches this value (default 1990). TMY files with mixed years (NSRDB, PVGIS, etc.) will have most rows silently discarded. | `"Year to use for the calculation"` — no mention of the filtering behaviour or its interaction with TMY data | - -Adding a one-line cross-reference in each field's docstring (e.g. -*"When True, the DHI column may be omitted from the weather file"*) would -close the gap without adding code. - -The `calculation_year` / TMY interaction is particularly dangerous because -it fails silently — the API returns results (not an error), but those -results are based on a fraction of the intended data. See -`sdk_general_feedback.md` §10 for a detailed description of the problem -and suggestions. - -### 4.4 `write_tsv()` with a companion pvlib column-name mapping (medium effort, high value) - -pvlib's `map_variables=True` convention — supported in **30+ reader -functions** — normalizes every data source to a standard set of column names -(`ghi`, `dhi`, `temp_air`, `wind_speed`, `pressure`, `precipitable_water`, -`relative_humidity`, `albedo`, …). Any DataFrame from pvlib has these names -regardless of whether the source was NSRDB, Solcast, SolarGIS, ERA5, PVGIS, -SolarAnywhere, or a local TMY3/EPW file. - -The SDK could ship a writer function alongside a provided mapping constant: - -```python -# Provided as a convenience constant (not a default — user data may have -# arbitrary column names). Covers the pvlib `map_variables=True` convention. -PVLIB_TO_SF_TSV = { - "ghi": "GHI", - "dhi": "DHI", - "temp_air": "TAmb", - "wind_speed": "WS", - "pressure": "Pressure", # pvlib: mbar; SF: mbar ✓ - "precipitable_water": "PW", - "relative_humidity": "RH", - "albedo": "Albedo", -} - -def write_tsv( - df: pd.DataFrame, - path: str | Path, - *, - column_map: dict[str, str], - missing_value: float = 9999, -) -> None: - """Write a SolarFarmer-compatible TSV weather file. - - Parameters - ---------- - df : DataFrame - Must have a DatetimeIndex. Column names are mapped using - ``column_map``; unrecognised columns are dropped. - path : str or Path - Output file path (.tsv). - column_map : dict - Mapping from DataFrame column names to SolarFarmer TSV column - names. Keys are the source column names; values must be valid - TSV headers (see ``TSV_COLUMNS``). - missing_value : float - Sentinel value for NaN. Default is 9999. - - See Also - -------- - PVLIB_TO_SF_TSV : Pre-built mapping for DataFrames from any pvlib - reader called with ``map_variables=True``. - - Examples - -------- - >>> import pvlib.iotools - >>> from solarfarmer.weather import write_tsv, PVLIB_TO_SF_TSV - >>> df, meta = pvlib.iotools.get_nsrdb_psm4_tmy(lat, lon, api_key, email) - >>> write_tsv(df, "site.tsv", column_map=PVLIB_TO_SF_TSV) - """ -``` - -Making `column_map` a required argument (with no default) avoids the -assumption that user data will always match pvlib conventions — many datasets -arrive from non-pvlib sources with arbitrary column names. The -`PVLIB_TO_SF_TSV` constant is there for users who *do* work in the pvlib -ecosystem, and the `See Also` / `Examples` in the docstring make it -discoverable for both humans and agents without hard-wiring it into the -function signature. - -The `PVLIB_TO_SF_TSV` dict also documents the relationship between the two -ecosystems in a machine-readable way — an AI agent inspecting the constant -immediately understands both the pvlib input names and the SolarFarmer output -names. - -### 4.5 Example data: documentation snippets over bundled files - -Bundling full 8760-row weather files would expand the 864 KB package by -several MB — a ~10× size increase for data that is used only at documentation -time. Alternatives that provide the same discoverability without bloating the -distribution: - -1. **Inline docstring examples** showing 3–5 rows of a valid TSV: - - ``` - DateTime GHI DHI TAmb WS - 2001-01-01T00:00-06:00 0.0 0.0 -5.20 3.40 - 2001-01-01T01:00-06:00 0.0 0.0 -5.80 2.90 - 2001-01-01T06:00-06:00 12.3 8.1 -4.10 3.10 - ``` - -2. **Links to a companion `solarfarmer-examples` repository or - documentation site** for full-size files. - -Option 1 is nearly free and solves the AI-agent use case directly (the agent -reads docstrings, sees the format, and can produce a conformant file without -any external fetch). - ---- - -## 5 Summary of priorities - -| # | Change | Effort | Impact | Scope | -|---|--------|--------|--------|-------| -| 1 | Enrich `MeteoFileFormat` enum docstrings + doc link | Trivial | High for discoverability | Docs only | -| 2 | TSV column spec as module-level constants | Small | High for both humans and agents | New module, no API change | -| 3 | Cross-reference weather-dependent fields in `EnergyCalculationOptions` — especially `calculationYear` + TMY | Trivial | **High** — `calculationYear` silently discards non-matching years | Docstring edits | -| 4 | `write_tsv()` helper + `PVLIB_TO_SF_TSV` mapping constant | Medium | High — eliminates the #1 user-written boilerplate; mapping constant bridges pvlib ecosystem | New public function + constant | -| 5 | Inline docstring example (3–5 TSV rows) | Trivial | High for agents, good for humans | Docs only | - -Recommendations 1, 3, and 5 are pure documentation changes that could ship in -a patch release. Recommendation 2 provides the machine-readable contract that -makes the SDK self-describing for weather data. Recommendation 4 is the -functional complement to the existing file-reading utilities.