MRB-665 Support for global models and remove meteodata-lab#90
Conversation
|
Should I produce a global forecaster model to review this? Or can an interpolator work? |
|
@frazane can you also have a look pls? Supporting tp from the global GRIB files has been a bit of a pain and some of this fixes are really unpleasant .... |
| params_sel = list( | ||
| {p for p in params} | {_COSMO_TO_IFS[p] for p in params if p in _COSMO_TO_IFS} | ||
| ) | ||
| # Precipitation params don't have a step=0 field (accumulation is zero at |
There was a problem hiding this comment.
I suppose this was previously handled by meteodata-lab?
There was a problem hiding this comment.
I guess this didn't occur previously because we haven't read from the global files at all. All our lam cutouts use COSMO names.
| prec_params = [p for p in params_sel if p in _PREC_PARAMS] | ||
| other_params = [p for p in params_sel if p not in _PREC_PARAMS] | ||
| fieldlist = ekd.from_source("file", files) | ||
| datasets = [] | ||
| if other_params: | ||
| datasets.append(fieldlist.sel(param=other_params, step=steps).to_xarray(profile=profile)) | ||
| if prec_params: | ||
| prec_steps = [s for s in steps if s > 0] | ||
| datasets.append(fieldlist.sel(param=prec_params, step=prec_steps).to_xarray(profile=profile)) |
There was a problem hiding this comment.
in data_extract_baseline.py we use the following pattern:
for lt in lead_times:
lh = lt % 24
ld = lt // 24
filepath = file / "grib" / f"{gribname}{ld:02}{lh:02}0000_{run_id}"
LOG.info(f"Extracting {filepath}.")
fields = ekd.from_source("file", filepath)
for field in fields:
if field.metadata("shortName") in params:
out.append(field)
That one looks slow to me, but I haven't profiled.
There was a problem hiding this comment.
The above approach has the advantage that you do not hard-code expectations in, in the sense that if there was a precipitation field at time 0 (with zero or NA in it), then you could still read it.
…with _earthkit field
…allowed to be the globe
These changes have not been reconciled with main. Therefore another PR is necessary. --------- Co-authored-by: Francesco Zanetta <62377868+frazane@users.noreply.github.com> Co-authored-by: Claire Merker <34312518+clairemerker@users.noreply.github.com>
When running experiments with many initialization times, sometime individual initializations do not run successfully due to data problems. In order to deal with this problem, a blacklist of initialization times can be provided in the config to exclude from experiments. ## Summary of Changes * adapted config schema and update generation of list of initialization times * provide one example config with a blacklisted initialization
|
In order to use earthkit-data v1.0 release candidates, we need up-to-date @cosunae are there any updates from DWD? |
|
@frazane the issue with the |
|
DWD eccodes released on open data version 2.44 http://opendata.dwd.de/ |
This PR adds the option to read ICON-CH1/2-EPS surface GRIB files directly from the operational archive. It also removes the legacy zarr reader for baselines and consequently all cosmo-based config files. ### Results Quick test shows no difference in results between the existing zarr and the new grib readers <img width="1000" height="600" alt="image" src="https://github.com/user-attachments/assets/dfa1e1b2-f6d1-4795-ac00-c181ea32b68d" /> Performance-wise, it doesn't seem to make a big difference, which I find a bit odd, so I'll need to have a closer look. ``` 2026-05-13 23:12:00,289 - data_input - INFO - Loading baseline forecasts from ICON GRIB archive... 2026-05-13 23:12:00,292 - data_input - INFO - Reading ICON archive from /store_new/mch/msopr/osm/ICON-CH1-EPS/FCST25/25030100_638 2026-05-13 23:12:39,291 - __main__ - INFO - Loaded forecast data in 39.007409 seconds ``` ``` 2026-05-13 23:12:00,284 - data_input - INFO - Loading baseline forecasts from zarr dataset... 2026-05-13 23:13:17,642 - __main__ - INFO - Loaded forecast data in 77.357972 seconds ``` <img width="648" height="236" alt="image" src="https://github.com/user-attachments/assets/509c4cd3-4849-4e92-be8c-9cf939e397c5" /> ### Open questions - ~should we deprecate the baselines zarr instead?~ Done in 8619fea - ~should we use switch to earthkit v1?~ Out of scope - ~should we deprecate the dependency on meteodata-lab?~ Out of scope, already part of #90 - extend method to read extra variables, including from vertical levels? ### Follow-up PRs - extend method to read any arbitrary member - extend method to read the pre-computed median - extend method to compute ensemble mean


This PR introduces minimal support for global models in evalml, as we might want to evaluate small experiments on coarse global grids (e.g. o48 or o96).
Summary of changes
ECCODES_DEFINITION_PATHenvironment variable when running global modelsTODO:
_earthkitadded to the xarray object by earthkit. New versions of xarray crash when to_netcdf with fields that are not allowed. Solution is to alwayspopit away. Temporary solution is a pin of xarray but its not sustainable.