diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
index ddde1836..63a3aa89 100644
--- a/.github/pull_request_template.md
+++ b/.github/pull_request_template.md
@@ -18,11 +18,11 @@ Please review your pull request for the following steps. Feel free to delete any
- [ ] If no generative AI was used, then tick this box
Alternatively, if generative AI was used, then confirm you have:
-
+
- [ ] Attributed any generative AI (such as GitHub Copilot) that was used in this PR. See [our contributing guide](https://github.com/ACCESS-Community-Hub/PyEarthTools?tab=contributing-ov-file#generative-ai-usage) for more information.
- [ ] Included the name and version of the tool or system in the pull request
- [ ] Described the scope of that use
-Finally,
+Finally,
- [ ] Mark the PR as ready to review. Note - we encourage you to ask for feedback at the outset or at any time during the work.
diff --git a/README.md b/README.md
index 801eccd6..11299325 100644
--- a/README.md
+++ b/README.md
@@ -12,10 +12,10 @@
|
A weather prediction from a model trained with PyEarthTools.|
A data processing flow composed for working with climate data.|
|:-:|:-:|
-Source Code: [github.com/ACCESS-Community-Hub/PyEarthTools](https://github.com/ACCESS-Community-Hub/PyEarthTools)
-Documentation: [pyearthtools.readthedocs.io](https://pyearthtools.readthedocs.io)
-Tutorial Gallery: [available here](https://pyearthtools.readthedocs.io/en/latest/notebooks/Gallery.html)
-New Users Guide: [available here](https://pyearthtools.readthedocs.io/en/latest/newuser.html)
+Source Code: [github.com/ACCESS-Community-Hub/PyEarthTools](https://github.com/ACCESS-Community-Hub/PyEarthTools)
+Documentation: [pyearthtools.readthedocs.io](https://pyearthtools.readthedocs.io)
+Tutorial Gallery: [available here](https://pyearthtools.readthedocs.io/en/latest/notebooks/Gallery.html)
+New Users Guide: [available here](https://pyearthtools.readthedocs.io/en/latest/newuser.html)
**If you use `PyEarthTools` for your work or a publication, [please cite our work](https://pyearthtools.readthedocs.io/en/latest/#citing-pyearthtools).**
@@ -59,8 +59,8 @@ PyEarthTools is a Python framework containing modules for:
- training ML models and managing experiments;
- performing inference with ML models;
- and evaluating ML models (coming soon).
-
- PyEarthTools runs effectively on HPC (supercomputers), cloud, workstations and laptops.
+
+ PyEarthTools runs effectively on HPC (supercomputers), cloud, workstations and laptops.
## Overview of the Packages within PyEarthTools
diff --git a/docs/api/bundled_models/bundled_index.md b/docs/api/bundled_models/bundled_index.md
index 883187c0..a423850c 100644
--- a/docs/api/bundled_models/bundled_index.md
+++ b/docs/api/bundled_models/bundled_index.md
@@ -1,9 +1,9 @@
# Bundled Models Index
-Unlike the other directories in the 'packages' directory of PyEarthTools, the "bundled_models" directory does not itself contain a "bundled models" Python package. Rather, it contains multiple model packages in separate directories. Each of these bundled models **is** a Python package. As such, "bundled_models" is not itself installable. This page will provide an index table for each bundled model.
+Unlike the other directories in the 'packages' directory of PyEarthTools, the "bundled_models" directory does not itself contain a "bundled models" Python package. Rather, it contains multiple model packages in separate directories. Each of these bundled models **is** a Python package. As such, "bundled_models" is not itself installable. This page will provide an index table for each bundled model.
At the current time, the following bundled models are available:
- - [FourCastNeXt by Guo et al. (2024).](https://doi.org/10.48550/arXiv.2401.05584)
+ - [FourCastNeXt by Guo et al. (2024).](https://doi.org/10.48550/arXiv.2401.05584)
- [LUCIE by Guan et al. (2025).](https://doi.org/10.48550/arXiv.2405.16297)
Bundled models also have configuration files in addition to the the Python code. Each yaml file is also included in the table for the bundled model.
diff --git a/docs/api/data/data_how_to.md b/docs/api/data/data_how_to.md
index 6c1b27fa..6c818967 100644
--- a/docs/api/data/data_how_to.md
+++ b/docs/api/data/data_how_to.md
@@ -4,7 +4,7 @@ The PyEarthTools [Data API](/api/data/data_index.md) provides Data Accessors, wh
They handle the nuances of how the data set is stored and organised, such as how to walk the filesystem, how to match a user query to the files on disk, and how to subset the requested variables out of the data structure. They may also handle any transformations which are needed to the raw data, such as file compression.
-A more detailed how-to guide will be written in future describing how to use the various classes in the data module.
+A more detailed how-to guide will be written in future describing how to use the various classes in the data module.
For a general overview and examples of how to make use some of the data module's functionality, see:
diff --git a/docs/api/pipeline/pipeline_how_to.md b/docs/api/pipeline/pipeline_how_to.md
index f1c38d29..40daf344 100644
--- a/docs/api/pipeline/pipeline_how_to.md
+++ b/docs/api/pipeline/pipeline_how_to.md
@@ -10,7 +10,7 @@ It is somewhat similar to an IterableDataset in PyTorch, or a DataLoader in PyTo
For more information, please see:
-- [Introduction to data pipelines](project:/notebooks/tutorial/Data_Pipelines.ipynb)
+- [Introduction to data pipelines](project:/notebooks/tutorial/Data_Pipelines.ipynb)
- [Working with Multiple Data Sources](project:/notebooks/tutorial/MultipleSources.ipynb)
- [The pipeline API tutorials](project:/notebooks/Gallery.md#Deep-Dive---The-Pipeline-Module) in the tutorial gallery
- The [pyearthtools.pipeline](pipeline_index.md) API documentation index
diff --git a/docs/contributing.md b/docs/contributing.md
index 0ab61671..fe829313 100644
--- a/docs/contributing.md
+++ b/docs/contributing.md
@@ -41,10 +41,10 @@ Generative AI tools can be helpful, but contributors must be transparent about u
6. All contributions must adhere to the [code of conduct]([https://github.com/nci/scores/blob/develop/CODE_OF_CONDUCT.md](https://github.com/ACCESS-Community-Hub/PyEarthTools?tab=coc-ov-file)).
7. Given that generative tools are evolving rapidly, this policy will likely be adjusted over time.
-[1] https://joss.readthedocs.io/en/latest/policies.html#ai-usage-policy
-[2] https://www.ametsoc.org/ams/publications/ethical-guidelines-and-ams-policies/author-disclosure-and-obligations/
-[3] https://www.egu.eu/news/1031/statement-on-the-use-of-ai-based-tools-for-the-presentation-and-publication-of-research-results-in-earth-planetary-and-space-science/
-[4] https://rmets.onlinelibrary.wiley.com/hub/ai-policy
+[1] https://joss.readthedocs.io/en/latest/policies.html#ai-usage-policy
+[2] https://www.ametsoc.org/ams/publications/ethical-guidelines-and-ams-policies/author-disclosure-and-obligations/
+[3] https://www.egu.eu/news/1031/statement-on-the-use-of-ai-based-tools-for-the-presentation-and-publication-of-research-results-in-earth-planetary-and-space-science/
+[4] https://rmets.onlinelibrary.wiley.com/hub/ai-policy
## Contributor Recognition in Zenodo
diff --git a/docs/data.md b/docs/data.md
index d5542048..f5a3cbb8 100644
--- a/docs/data.md
+++ b/docs/data.md
@@ -13,10 +13,10 @@ Data is not provided by PyEarthTools (either directly or as a cloud service), it
PyEarthTools can efficiently access large, multi-terabyte data sets. These data sets are typically held on-disk at dedicated computing facilities.
-At the moment, PyEarthTools has existing integrations with the data holdings at three HPC facilities:
-- NCI (Australia).
+At the moment, PyEarthTools has existing integrations with the data holdings at three HPC facilities:
+- NCI (Australia).
- Met Office (UK).
-- Earth Sciences New Zealand (formerly NIWA).
+- Earth Sciences New Zealand (formerly NIWA).
If you are working at another HPC facility, feel free to get in touch to discuss how to most effectively utilise PyEarthTools in your environment.
@@ -42,7 +42,7 @@ Additionally, you can explore the [geonetwork](https://geonetwork.nci.org.au/geo
### Connecting PyEarthTools to a new dataset in any HPC facility by coding a new data accessor
-For on-disk data access, you will need to create a new accessor based on the [`pyearthtools.data.ArchiveIndex`](project:./api/data/data_api.md#pyearthtools.data.indexes.ArchiveIndex) class (or [`pyearthtools.data.Index`](project:./api/data/data_api.md#pyearthtools.data.indexes.Index) for some use cases). Additional instructions for this still need to be written. In the meantime, refer to the [NCI site archive source code](https://github.com/ACCESS-Community-Hub/PyEarthTools/tree/develop/packages/nci_site_archive) for examples.
+For on-disk data access, you will need to create a new accessor based on the [`pyearthtools.data.ArchiveIndex`](project:./api/data/data_api.md#pyearthtools.data.indexes.ArchiveIndex) class (or [`pyearthtools.data.Index`](project:./api/data/data_api.md#pyearthtools.data.indexes.Index) for some use cases). Additional instructions for this still need to be written. In the meantime, refer to the [NCI site archive source code](https://github.com/ACCESS-Community-Hub/PyEarthTools/tree/develop/packages/nci_site_archive) for examples.
The [HadISD tutorials](project:./notebooks/Gallery.md#Working-with-Station-Data-(Medium-Hardware-Requirements)) also demonstrate the process of creating a new data accessor. While these tutorials focus on connecting to the HadISD dataset, the patterns in these tutorials are repeatable and can be used for other datasets.
@@ -50,12 +50,12 @@ Additional considerations and rules-of-thumb for HPC environments are:
- Use large files rather than many small files. This makes formats like GRIB and NetCDF more appropriate than Zarr in many cases.
- If using dask for chunking data, use largeish chunks, aligned to the time dimension (or primary index dimension).
-- Do not zip up large datasets. Use internal zip compression. Zip is inherently single-threaded, so can require a long, slow, bottlenecked decompression step before data subsets can be read from a large file.
+- Do not zip up large datasets. Use internal zip compression. Zip is inherently single-threaded, so can require a long, slow, bottlenecked decompression step before data subsets can be read from a large file.
- Use a format like Parquet for point clouds, station data or other irregularly-spaced, sparse data.
## Using PyEarthTools on a Workstation or Laptop
-You can use PyEarthTools successfully on a workstation or laptop with data you download yourself.
+You can use PyEarthTools successfully on a workstation or laptop with data you download yourself.
While many geoscience datasets are so large (e.g. hundreds of terabytes) that they can only be used effectively in HPC environments, there are also many smaller datasets of interest which can be downloaded on a workstation or laptop.
@@ -63,11 +63,11 @@ While many geoscience datasets are so large (e.g. hundreds of terabytes) that th
The [Quick Start](project:./notebooks/Gallery.md#Quick-Start-(Low-Hardware-Requirements)) tutorials can run on a 4GB GPU, and include the download step for fetching around 3-10GB of data. They will also work in HPC environments.
-The [station data](project:./notebooks/Gallery.md#Working-with-Station-Data-(Medium-Hardware-Requirements)) tutorials do not need a GPU, but require more data. They have been tested on a laptop with 36GB of RAM and as well as an HPC node with over 100GB of RAM. 29GB of station data will be downloaded. Additional disk space is needed for reprocessing the data, although intermediate files can later be deleted. These notebooks may require user modification to run with less than 36GB of RAM but it should be possible with at least 16G of RAM.
+The [station data](project:./notebooks/Gallery.md#Working-with-Station-Data-(Medium-Hardware-Requirements)) tutorials do not need a GPU, but require more data. They have been tested on a laptop with 36GB of RAM and as well as an HPC node with over 100GB of RAM. 29GB of station data will be downloaded. Additional disk space is needed for reprocessing the data, although intermediate files can later be deleted. These notebooks may require user modification to run with less than 36GB of RAM but it should be possible with at least 16G of RAM.
### Connecting PyEarthTools to a new dataset on a workstation or laptop
-For on-disk data access, you will need to create a new accessor based on the [`pyearthtools.data.ArchiveIndex`](project:./api/data/data_api.md#pyearthtools.data.indexes.ArchiveIndex) class (or [`pyearthtools.data.Index`](project:./api/data/data_api.md#pyearthtools.data.indexes.Index) for some use cases). Additional instructions for this still need to be written. In the meantime, refer to the [NCI site archive source code](https://github.com/ACCESS-Community-Hub/PyEarthTools/tree/develop/packages/nci_site_archive) for examples.
+For on-disk data access, you will need to create a new accessor based on the [`pyearthtools.data.ArchiveIndex`](project:./api/data/data_api.md#pyearthtools.data.indexes.ArchiveIndex) class (or [`pyearthtools.data.Index`](project:./api/data/data_api.md#pyearthtools.data.indexes.Index) for some use cases). Additional instructions for this still need to be written. In the meantime, refer to the [NCI site archive source code](https://github.com/ACCESS-Community-Hub/PyEarthTools/tree/develop/packages/nci_site_archive) for examples.
The [HadISD tutorials](project:./notebooks/Gallery.md#Working-with-Station-Data-(Medium-Hardware-Requirements)) also demonstrate the process of creating a new data accessor. While these tutorials focus on connecting to the HadISD dataset, the patterns in these tutorials are repeatable and can be used for other datasets.
@@ -75,5 +75,5 @@ Additional considerations and rules-of-thumb for working on workstations or lapt
- Using a storage format like zarr is suitable, because they can efficiently use small files to index data
- RAM is likely to be more constrained, so consider limiting the number of workers for tools like dask or PyTorch
-- Some datasets have so-called ARCO (analysis-ready, cloud-optimised) versions available. Downloading an entire dataset in this fashion may be cost-prohibitive and inefficient for model training, but may be suitable for occasional access or to download model initial conditions for a single model run.
+- Some datasets have so-called ARCO (analysis-ready, cloud-optimised) versions available. Downloading an entire dataset in this fashion may be cost-prohibitive and inefficient for model training, but may be suitable for occasional access or to download model initial conditions for a single model run.
- There are also some versions of some datasets which have been heavily compressed using lossy compression techniques, but are still close analogs for the original data. These could be used for model training, but there will be caveats as to the accuracy of such models due to the lossy compression of the training data.
diff --git a/docs/index.md b/docs/index.md
index 8cfe0dcd..c7e4f595 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -19,10 +19,10 @@
A data processing flow composed for working with climate data.
-Source Code: [github.com/ACCESS-Community-Hub/PyEarthTools](https://github.com/ACCESS-Community-Hub/PyEarthTools)
-Documentation: [pyearthtools.readthedocs.io](https://pyearthtools.readthedocs.io)
-Tutorial Gallery: [available here](./notebooks/Gallery)
-New Users Guide: [available here](newuser.md)
+Source Code: [github.com/ACCESS-Community-Hub/PyEarthTools](https://github.com/ACCESS-Community-Hub/PyEarthTools)
+Documentation: [pyearthtools.readthedocs.io](https://pyearthtools.readthedocs.io)
+Tutorial Gallery: [available here](./notebooks/Gallery)
+New Users Guide: [available here](newuser.md)
**If you use `PyEarthTools` for your work or a publication, [please cite our work](https://pyearthtools.readthedocs.io/en/latest/#citing-pyearthtools).**
@@ -72,8 +72,8 @@ PyEarthTools is a Python framework containing modules for:
- training ML models and managing experiments;
- performing inference with ML models;
- and evaluating ML models (coming soon).
-
-PyEarthTools runs effectively on HPC (supercomputers), cloud, workstations and laptops.
+
+PyEarthTools runs effectively on HPC (supercomputers), cloud, workstations and laptops.
## Overview of the Packages within PyEarthTools
diff --git a/docs/notebooks/tutorial/Working_with_Climate_Data.ipynb b/docs/notebooks/tutorial/Working_with_Climate_Data.ipynb
index c6c71f7f..0875cf8e 100644
--- a/docs/notebooks/tutorial/Working_with_Climate_Data.ipynb
+++ b/docs/notebooks/tutorial/Working_with_Climate_Data.ipynb
@@ -6285,5 +6285,3 @@
"nbformat": 4,
"nbformat_minor": 5
}
-
-
diff --git a/packages/nci_site_archive/src/site_archive_nci/structure/BARPA.struc b/packages/nci_site_archive/src/site_archive_nci/structure/BARPA.struc
index 5b10dea0..8b61cc39 100644
--- a/packages/nci_site_archive/src/site_archive_nci/structure/BARPA.struc
+++ b/packages/nci_site_archive/src/site_archive_nci/structure/BARPA.struc
@@ -18677,4 +18677,4 @@ output:
- CAPE
- CAPEmax
- CIN
- - CINmax
\ No newline at end of file
+ - CINmax
diff --git a/packages/pipeline/tests/operations/xarray/test_xarray_reshape.py b/packages/pipeline/tests/operations/xarray/test_xarray_reshape.py
index fc2dce49..3c275da9 100644
--- a/packages/pipeline/tests/operations/xarray/test_xarray_reshape.py
+++ b/packages/pipeline/tests/operations/xarray/test_xarray_reshape.py
@@ -120,6 +120,7 @@ def test_CoordinateFlatten_skip_missing():
def test_undo_CoordinateFlatten():
import sys
+
print(f"Recursion limit set to {str(sys.getrecursionlimit())}")
f = reshape.CoordinateFlatten(["height"])
diff --git a/packages/utils/NOTICE.md b/packages/utils/NOTICE.md
index 12c31b0f..5232782a 100644
--- a/packages/utils/NOTICE.md
+++ b/packages/utils/NOTICE.md
@@ -5,4 +5,3 @@ The file config.py in various modules contains code taken from https://github.co
The file packages/utils/src/pyearthtools/utils/initialisation/init_parsing.py contains code from https://github.com/Lightning-AI/pytorch-lightning, released under the Apache 2.0 license, with copyright attributed to the Lightning AI team.
The file packages/utils/src/pyearthtools/utils/parsing/init_parsing.py contains code from https://github.com/Lightning-AI/pytorch-lightning, released under the Apache 2.0 license, with copyright attributed to the Lightning AI team.
-