Skip to content

PR suggestions for from_znap_dataset#106

Open
rogerkuou wants to merge 16 commits into
84-support-snapfrom
84-support-snap-ou
Open

PR suggestions for from_znap_dataset#106
rogerkuou wants to merge 16 commits into
84-support-snapfrom
84-support-snap-ou

Conversation

@rogerkuou

@rogerkuou rogerkuou commented Jun 26, 2026

Copy link
Copy Markdown
Member
  • Mannually hacked the metadata of 20230319 to make the time reading correct
  • Changed the code structure to build ds_stack along looping the zarr files
  • Use ds.datetime as time coords
  • Set RE patterns in config for naming patterns of zarr data variables (motivation: let's be specific on renaming behavior per variable, but not just remove the time and polarization pattern.)
  • Added a check for multiple mothers (only one zarr storage can have mother specific variables)

Still need to be implemented:

  • make data variable names flexible (e.g. not enforcing adding h2ph). This can be done by having an input arg for from_snap_dataset

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors SNAP ZNAP ingestion to better support “SNAP OU”/ZNAP-style archives by introducing explicit datalayer name patterns and a helper for reading/normalizing a single ZNAP archive, then building a stacked xr.Dataset with complex/amplitude/phase outputs.

Changes:

  • Add RE_PATTERNS_SNAP_DATALAYER plus ZNAP variable-name constants in conf.py.
  • Refactor from_snap_dataset to use _read_one_znap_archive, build a time stack, and derive complex/amplitude/phase.
  • Add _read_one_znap_archive helper to drop tiepoint dims and rename ZNAP datalayers into canonical variable names.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
sarxarray/conf.py Adds regex patterns/constants for canonicalizing SNAP ZNAP datalayer names and lists of expected ZNAP variables.
sarxarray/_io.py Refactors SNAP ZNAP reading/stacking logic and adds a helper to normalize each archive before concatenation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sarxarray/_io.py
Comment thread sarxarray/_io.py
Comment thread sarxarray/_io.py Outdated
Comment thread sarxarray/_io.py
rogerkuou and others added 8 commits June 29, 2026 10:35
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread sarxarray/_io.py
@rogerkuou rogerkuou marked this pull request as ready for review June 29, 2026 09:17
@rogerkuou

Copy link
Copy Markdown
Member Author

Hi @Simon-van-Diepen , I finished my implementations. These changes are only for from_znap_dataset.

As we discussed the flexible data var names still need to be implemented.

@rogerkuou rogerkuou changed the title 84 support snap ou PR suggestions for from_znap_dataset Jun 29, 2026
@rogerkuou rogerkuou mentioned this pull request Jun 29, 2026

@Simon-van-Diepen Simon-van-Diepen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments below, it's mostly related to creating a more generic implementation that the original facilitated but this version does not. It would work for TU Delft-specific processing but likely not for others

Comment thread sarxarray/conf.py

# SNAP Znap file data variable names
ZNAP_DATA_VAR_MOTHER = ["longitude", "latitude", "elevation"]
ZNAP_DATA_VAR_DAUGHTERS = ["i", "q", "h2ph"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation does not allow for any other layers to be present. h2ph is afaik a TU Delft-specific layer, and given that sarxarray is not TU Delft-specific I don't think we should limit it to just h2ph

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, ZNAP_DATA_VAR_DAUGHTERS is not used?

"product_scene_raster_height" : 84,
"time_coverage_start" : "2023-03-31T05:50:35.812250Z",
"time_coverage_end" : "2023-03-31T05:50:35.983086Z",
"time_coverage_start" : "2023-03-19T05:50:35.812250Z",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this a little bit. In our GPT graphs we read both the metadata from the mother and the daughter, where the daughter is read in second. Apparently this means that SNAP assigns all metadata based on the mother instead of the daughter. I think the cleanest way would be to read this epoch from the metadata since I don't know if we can assume these attributes to be always present.

I don't know what the best way forward is: either changing our SNAP parser so that the metadata of the daughter is present, or simply adding an extra argument to from_snap_dataset with the corresponding dates. Let's discuss

Comment thread sarxarray/conf.py

# Regular expressions for reading SNAP Zarr file (ZNAP) datalayers.
RE_PATTERNS_SNAP_DATALAYER = {
"i": r"^i_(VV|VH|HH|HV)_\d{1,2}[A-Za-z]{3}\d{4}$", # e.g., i_VV_19Mar2023

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if these regex patterns are always true. Specifically, I am doubting the case of multiple polarisations. Though I don't know what exactly the format would be then (i_VV_HH_19Mar2023 or i_VVVH_19Mar2023)?

Comment thread sarxarray/conf.py
RE_PATTERNS_SNAP_DATALAYER = {
"i": r"^i_(VV|VH|HH|HV)_\d{1,2}[A-Za-z]{3}\d{4}$", # e.g., i_VV_19Mar2023
"q": r"^q_(VV|VH|HH|HV)_\d{1,2}[A-Za-z]{3}\d{4}$", # e.g., q_VV_19Mar2023
"h2ph": r"^h2ph_(VV|VH|HH|HV)_\d{1,2}[A-Za-z]{3}\d{4}$", # e.g., h2ph_VV_19Mar2023

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be more generic. It is possible that other users may have a similar layer for all daughters which cannot be detected now sicne we only detect h2ph

Comment thread sarxarray/_io.py
for layer in list(data.data_vars):
for key, pattern in RE_PATTERNS_SNAP_DATALAYER.items():
if re.match(pattern, layer):
data = data.rename({layer: key})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we only rename the hardcoded layers i, q, h2ph, latitude, longitude, and elevation. In my original implementation I scanned the current datalayer name for presence of polarisation and a date instead of hardcoding it, and stripped them when found. This means that it is not limited to these six layers, and can be used by anyone in a generic sense. I am in favour of reverting to a more generic implementation to facilitate other users as well

Comment thread sarxarray/_io.py
data, is_mother = _read_one_znap_archive(file)

# Get current epoch
cur_epoch = data.attrs["time_coverage_start"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss this. I agree my original implementation of retrieving this from the data layer names is not ideal, but currently this metadata field always corresponds to the mother, so I do not think we can use this. There are three questions here:

  • Can we assume this field is always present?
  • If so, can we assume that this field always refers to the daughter image?
  • If both are yes, is this just a TU Delft-specific config related to our processing infrastructure, or is this generically true?

We could also consider the user to define the epochs of each znap archive as a second variable or a dictionary-like structure. Let's discuss

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion with @rogerkuou : we cannot assume this field to be correct by default since our best attempt at a sarxarray workflow apparently produces incorrect metadata in this field. Let's therefore use the second best option: in the metadata there is a field called:

    "elements" : [ {
      "name" : "sources",
      "attributes" : [ {
        "name" : "sourceProduct",
        "readOnly" : false,
        "data" : {
          "type" : "ascii",
          "elems" : "product:S1A_IW_SLC__1SDV_20230506T055033_20230506T055100_048409_05D2A0_F5DE"
        }
      } ]

The timestamp in the product file is always correct since this is configured by ESA. Let's read it from there

Comment thread sarxarray/_io.py
(metadata["number_of_lines"], metadata["number_of_pixels"], 1)
data_mother = data[ZNAP_DATA_VAR_MOTHER]
data = data.drop_vars(ZNAP_DATA_VAR_MOTHER)
data = data.assign(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be generic, not just h2ph. I propose to move this outside the loop when we can be sure that data is fully formed, and we can simply add all the missing fields present in data as zero layers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants