Skip to content

fix: prevent HDF5 dataset collision when data_path is reused (#744)#746

Merged
waltsims merged 3 commits into
waltsims:masterfrom
aconesac:fix/data-path-hdf5-collision
May 19, 2026
Merged

fix: prevent HDF5 dataset collision when data_path is reused (#744)#746
waltsims merged 3 commits into
waltsims:masterfrom
aconesac:fix/data-path-hdf5-collision

Conversation

@aconesac
Copy link
Copy Markdown
Contributor

@aconesac aconesac commented May 18, 2026

Summary

  • options_to_kwargs() was always forwarding data_path='/tmp' (the SimulationOptions default), bypassing the safe mkdtemp path in prepare() and causing h5py to crash with ValueError: name already exists on any second run
  • compat.py: skip forwarding data_path when it equals gettempdir(), so prepare() falls back to a fresh unique temp dir per run
  • cpp_simulation.py: raise FileExistsError with a clear message when an explicit data_path already contains kwave_input.h5

Test plan

  • test_default_data_path_not_forwarded — asserts data_path is not forwarded when using default SimulationOptions() (would have failed before fix)
  • test_custom_data_path_is_forwarded — asserts an explicitly-set data_path is still forwarded correctly
  • test_raises_when_input_already_exists — asserts prepare() raises FileExistsError with a clear message when the input file already exists (would have failed before fix)

Fixes #744

Greptile Summary

This PR fixes a ValueError: name already exists crash from h5py that occurred on any second simulation run, caused by options_to_kwargs() always forwarding the default data_path=gettempdir() and bypassing the mkdtemp-based unique-directory logic in prepare().

  • kwave/compat.py: Uses normalized os.path.realpath comparison to skip forwarding data_path when it holds the system temp dir default, allowing prepare() to fall back to a fresh mkdtemp directory per run.
  • kwave/solvers/cpp_simulation.py: Adds an early FileExistsError guard in prepare() so that an explicit data_path reuse produces a clear, actionable error instead of a cryptic h5py exception.
  • Tests: Three regression tests are added covering: default path not forwarded, custom path forwarded, and FileExistsError on collision.

Confidence Score: 5/5

Safe to merge — the fix correctly targets the default-path collision scenario and does not regress explicit data_path forwarding.

Both changed code paths are straightforward: the compat guard uses realpath+normpath normalization which handles platform symlink differences (e.g. macOS /tmp → /private/tmp), and the FileExistsError guard in prepare() only fires for explicit data_path values (mkdtemp always produces a new empty directory). All three new test cases directly cover the fixed scenarios. No new defects were found beyond style observations.

kwave/compat.py — the import placement is unusual but purely cosmetic. No files require attention for correctness.

Important Files Changed

Filename Overview
kwave/compat.py Adds normalized path comparison to skip forwarding data_path when it equals the system temp dir; imports os and gettempdir inside the conditional block rather than at module level
kwave/solvers/cpp_simulation.py Adds FileExistsError guard in prepare() before _write_hdf5(); also reformats a long _execute() call to multi-line for readability
tests/test_compat.py Adds two regression tests: verifies default data_path is not forwarded, and explicitly-set data_path is forwarded
tests/test_cpp_simulation.py Adds TestPrepare class with a test verifying FileExistsError is raised when kwave_input.h5 already exists

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["options_to_kwargs(simulation_options)"] --> B{opts.data_path is not None?}
    B -- No --> E[skip]
    B -- Yes --> C{normalized data_path == normalized gettempdir?}
    C -- Yes default temp dir --> D[skip forwarding data_path]
    C -- No user-set custom path --> F["kwargs['data_path'] = opts.data_path"]

    G["CppSimulation.run(data_path=...)"] --> H["prepare(data_path)"]
    H --> I{data_path is None?}
    I -- Yes --> J["mkdtemp() fresh unique dir"]
    I -- No --> K["makedirs(data_path, exist_ok=True)"]
    J --> L{kwave_input.h5 exists?}
    K --> L
    L -- Yes --> M["raise FileExistsError"]
    L -- No --> N["_write_hdf5(input_file)"]
    N --> O["return input_file, output_file"]
    O --> P["_execute(...)"]
    P --> Q{cleanup?}
    Q -- True data_path was None --> R["shutil.rmtree(data_dir)"]
    Q -- False explicit data_path --> S[files remain on disk]
Loading

Reviews (2): Last reviewed commit: "Potential fix for pull request finding" | Re-trigger Greptile

aconesac added 2 commits May 18, 2026 19:27
Demonstrates the two bugs before the fix:
- options_to_kwargs() always forwarded data_path='/tmp', causing every run to
  target /tmp/kwave_input.h5 and crash on the second call
- prepare() gave a cryptic h5py ValueError instead of a clear FileExistsError
  when the input file already existed
options_to_kwargs() was always forwarding data_path='/tmp' (the
SimulationOptions default), bypassing the safe mkdtemp path in
prepare() and causing h5py to crash with 'name already exists' on
any second run.

- compat.py: skip forwarding data_path when it equals gettempdir()
- cpp_simulation.py: raise FileExistsError with a clear message when
  an explicit data_path already contains kwave_input.h5

Fixes waltsims#744
Copilot AI review requested due to automatic review settings May 18, 2026 17:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes repeated-run failures in the C++ backend caused by reusing a non-unique default data_path, and improves the resulting error when an explicit path already contains a previous input file.

Changes:

  • Update options_to_kwargs() to avoid forwarding the SimulationOptions default temp directory, allowing CppSimulation.prepare() to create a unique per-run temp directory.
  • Add an explicit FileExistsError in CppSimulation.prepare() when kwave_input.h5 already exists in a user-specified data_path.
  • Add unit tests covering default-path forwarding behavior and the new collision error.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
kwave/compat.py Stops forwarding data_path when it equals the OS temp dir default to avoid HDF5 input collisions.
kwave/solvers/cpp_simulation.py Raises a clear FileExistsError if kwave_input.h5 already exists in the target directory; minor formatting cleanup.
tests/test_compat.py Adds tests ensuring default data_path is not forwarded and custom paths still are.
tests/test_cpp_simulation.py Adds a test asserting prepare() raises when kwave_input.h5 already exists.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread kwave/compat.py Outdated
Comment thread kwave/compat.py
Comment thread kwave/solvers/cpp_simulation.py
 normalize both paths before comparing

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.57%. Comparing base (4177811) to head (edc928a).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #746      +/-   ##
==========================================
+ Coverage   75.54%   75.57%   +0.02%     
==========================================
  Files          57       57              
  Lines        8188     8195       +7     
  Branches     1598     1600       +2     
==========================================
+ Hits         6186     6193       +7     
  Misses       1381     1381              
  Partials      621      621              
Flag Coverage Δ
3.10 75.53% <100.00%> (+0.02%) ⬆️
3.11 75.53% <100.00%> (+0.02%) ⬆️
3.12 75.53% <100.00%> (+0.02%) ⬆️
3.13 75.53% <100.00%> (+0.02%) ⬆️
macos-latest 75.47% <100.00%> (+0.02%) ⬆️
ubuntu-latest 75.47% <100.00%> (+0.02%) ⬆️
windows-latest 75.31% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@waltsims waltsims merged commit d500015 into waltsims:master May 19, 2026
18 checks passed
@waltsims
Copy link
Copy Markdown
Owner

Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Second run crashes with 'name already exists' when using options_to_kwargs() with cpp backend

3 participants