PR suggestions for from_znap_dataset by rogerkuou · Pull Request #106 · TUDelftGeodesy/sarxarray

rogerkuou · 2026-06-26T13:42:39Z

Mannually hacked the metadata of 20230319 to make the time reading correct
Changed the code structure to build ds_stack along looping the zarr files
Use ds.datetime as time coords
Set RE patterns in config for naming patterns of zarr data variables (motivation: let's be specific on renaming behavior per variable, but not just remove the time and polarization pattern.)
Added a check for multiple mothers (only one zarr storage can have mother specific variables)

Still need to be implemented:

make data variable names flexible (e.g. not enforcing adding h2ph). This can be done by having an input arg for from_snap_dataset

Copilot

Pull request overview

This PR refactors SNAP ZNAP ingestion to better support “SNAP OU”/ZNAP-style archives by introducing explicit datalayer name patterns and a helper for reading/normalizing a single ZNAP archive, then building a stacked xr.Dataset with complex/amplitude/phase outputs.

Changes:

Add RE_PATTERNS_SNAP_DATALAYER plus ZNAP variable-name constants in conf.py.
Refactor from_snap_dataset to use _read_one_znap_archive, build a time stack, and derive complex/amplitude/phase.
Add _read_one_znap_archive helper to drop tiepoint dims and rename ZNAP datalayers into canonical variable names.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`sarxarray/conf.py`	Adds regex patterns/constants for canonicalizing SNAP ZNAP datalayer names and lists of expected ZNAP variables.
`sarxarray/_io.py`	Refactors SNAP ZNAP reading/stacking logic and adds a helper to normalize each archive before concatenation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

rogerkuou · 2026-06-29T09:19:56Z

Hi @Simon-van-Diepen , I finished my implementations. These changes are only for from_znap_dataset.

As we discussed the flexible data var names still need to be implemented.

Simon-van-Diepen

See comments below, it's mostly related to creating a more generic implementation that the original facilitated but this version does not. It would work for TU Delft-specific processing but likely not for others

Simon-van-Diepen · 2026-07-02T09:42:14Z

+
+# SNAP Znap file data variable names
+ZNAP_DATA_VAR_MOTHER = ["longitude", "latitude", "elevation"]
+ZNAP_DATA_VAR_DAUGHTERS = ["i", "q", "h2ph"]


This implementation does not allow for any other layers to be present. h2ph is afaik a TU Delft-specific layer, and given that sarxarray is not TU Delft-specific I don't think we should limit it to just h2ph

Also, ZNAP_DATA_VAR_DAUGHTERS is not used?

Simon-van-Diepen · 2026-07-02T09:49:39Z

  "product_scene_raster_height" : 84,
-  "time_coverage_start" : "2023-03-31T05:50:35.812250Z",
-  "time_coverage_end" : "2023-03-31T05:50:35.983086Z",
+  "time_coverage_start" : "2023-03-19T05:50:35.812250Z",


I looked into this a little bit. In our GPT graphs we read both the metadata from the mother and the daughter, where the daughter is read in second. Apparently this means that SNAP assigns all metadata based on the mother instead of the daughter. I think the cleanest way would be to read this epoch from the metadata since I don't know if we can assume these attributes to be always present.

I don't know what the best way forward is: either changing our SNAP parser so that the metadata of the daughter is present, or simply adding an extra argument to from_snap_dataset with the corresponding dates. Let's discuss

Simon-van-Diepen · 2026-07-02T09:50:59Z

+
+# Regular expressions for reading SNAP Zarr file (ZNAP) datalayers.
+RE_PATTERNS_SNAP_DATALAYER = {
+    "i": r"^i_(VV|VH|HH|HV)_\d{1,2}[A-Za-z]{3}\d{4}$",  # e.g., i_VV_19Mar2023


I don't know if these regex patterns are always true. Specifically, I am doubting the case of multiple polarisations. Though I don't know what exactly the format would be then (i_VV_HH_19Mar2023 or i_VVVH_19Mar2023)?

Simon-van-Diepen · 2026-07-02T09:52:02Z

+RE_PATTERNS_SNAP_DATALAYER = {
+    "i": r"^i_(VV|VH|HH|HV)_\d{1,2}[A-Za-z]{3}\d{4}$",  # e.g., i_VV_19Mar2023
+    "q": r"^q_(VV|VH|HH|HV)_\d{1,2}[A-Za-z]{3}\d{4}$",  # e.g., q_VV_19Mar2023
+    "h2ph": r"^h2ph_(VV|VH|HH|HV)_\d{1,2}[A-Za-z]{3}\d{4}$",  # e.g., h2ph_VV_19Mar2023


This should be more generic. It is possible that other users may have a similar layer for all daughters which cannot be detected now sicne we only detect h2ph

Simon-van-Diepen · 2026-07-02T09:55:28Z

+    for layer in list(data.data_vars):
+        for key, pattern in RE_PATTERNS_SNAP_DATALAYER.items():
+            if re.match(pattern, layer):
+                data = data.rename({layer: key})


Here we only rename the hardcoded layers i, q, h2ph, latitude, longitude, and elevation. In my original implementation I scanned the current datalayer name for presence of polarisation and a date instead of hardcoding it, and stripped them when found. This means that it is not limited to these six layers, and can be used by anyone in a generic sense. I am in favour of reverting to a more generic implementation to facilitate other users as well

Simon-van-Diepen · 2026-07-02T10:00:10Z

+        data, is_mother = _read_one_znap_archive(file)
+
+        # Get current epoch
+        cur_epoch = data.attrs["time_coverage_start"]


We need to discuss this. I agree my original implementation of retrieving this from the data layer names is not ideal, but currently this metadata field always corresponds to the mother, so I do not think we can use this. There are three questions here:

Can we assume this field is always present?

If so, can we assume that this field always refers to the daughter image?

If both are yes, is this just a TU Delft-specific config related to our processing infrastructure, or is this generically true?

We could also consider the user to define the epochs of each znap archive as a second variable or a dictionary-like structure. Let's discuss

Discussion with @rogerkuou : we cannot assume this field to be correct by default since our best attempt at a sarxarray workflow apparently produces incorrect metadata in this field. Let's therefore use the second best option: in the metadata there is a field called:

"elements" : [ { "name" : "sources", "attributes" : [ { "name" : "sourceProduct", "readOnly" : false, "data" : { "type" : "ascii", "elems" : "product:S1A_IW_SLC__1SDV_20230506T055033_20230506T055100_048409_05D2A0_F5DE" } } ]

The timestamp in the product file is always correct since this is configured by ESA. Let's read it from there

Simon-van-Diepen · 2026-07-02T10:02:07Z

-                (metadata["number_of_lines"], metadata["number_of_pixels"], 1)
+            data_mother = data[ZNAP_DATA_VAR_MOTHER]
+            data = data.drop_vars(ZNAP_DATA_VAR_MOTHER)
+            data = data.assign(


This should be generic, not just h2ph. I propose to move this outside the loop when we can be sure that data is fully formed, and we can simply add all the missing fields present in data as zero layers

rogerkuou added 7 commits June 26, 2026 15:20

update epoch format

badd8af

update the behavior when inferring data layer names from znap

a4cf66b

format conf file

daa1a7b

change logic of handling other dimensions than x, and y

4da10dd

logic for identifying mother epoch

53b1eb7

change workflow

62b5fbb

handle stack without mother

af78740

rogerkuou requested a review from Copilot June 26, 2026 17:29

Copilot started reviewing on behalf of rogerkuou June 26, 2026 17:29 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

Comment thread sarxarray/_io.py

Comment thread sarxarray/_io.py

Comment thread sarxarray/_io.py Outdated

Comment thread sarxarray/_io.py

rogerkuou and others added 8 commits June 29, 2026 10:35

shortening chain calls

b62dcb6

hack zarr metadata

04a3baa

record epoch and files in a dict for sorting

873d7c5

add docs

01d09f9

output annotation

c4ab050

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

07e278d

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

remove redundant re pattern match check

b851c51

sort built ds_stack by time

837a5af

rogerkuou requested a review from Copilot June 29, 2026 09:07

Copilot started reviewing on behalf of rogerkuou June 29, 2026 09:08 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Comment thread sarxarray/_io.py

check if multiple mothers

121252c

rogerkuou marked this pull request as ready for review June 29, 2026 09:17

rogerkuou requested a review from Simon-van-Diepen June 29, 2026 09:20

rogerkuou changed the title ~~84 support snap ou~~ PR suggestions for from_znap_dataset Jun 29, 2026

rogerkuou mentioned this pull request Jun 29, 2026

84 support snap #105

Open

Simon-van-Diepen requested changes Jul 2, 2026

View reviewed changes

Uh oh!

Conversation

rogerkuou commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

rogerkuou commented Jun 29, 2026

Uh oh!

Simon-van-Diepen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rogerkuou commented Jun 26, 2026 •

edited

Loading