Skip to content

Copy receiver_data in autoimpute to avoid mutating caller's frame#185

Merged
MaxGhenis merged 1 commit intomainfrom
fix/autoimpute-copy
Apr 17, 2026
Merged

Copy receiver_data in autoimpute to avoid mutating caller's frame#185
MaxGhenis merged 1 commit intomainfrom
fix/autoimpute-copy

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

Fixes finding #13. autoimpute ended with receiver_data[var] = median_imputations[var], which could write through to the caller's original DataFrame depending on whether intermediate pandas operations returned a copy or a view. The caller's input frame could silently gain new columns without notification.

Take a defensive .copy() at the top of autoimpute so the caller's frame is always preserved; imputed columns are returned exclusively via result.receiver_data.

Test plan

  • test_autoimpute_does_not_mutate_caller_receiver: caller's receiver_data stays pristine, returned result.receiver_data has the imputed column.

autoimpute ended with
    receiver_data[var] = median_imputations[var]
which could write through to the caller's original DataFrame depending
on whether intermediate pandas operations returned a copy or a view.
This is a silent side effect: the user's input frame would gain new
columns without any notification.

Take a defensive .copy() of receiver_data at the top of autoimpute so
the caller's frame is always preserved. The imputed columns are
returned exclusively via result.receiver_data.

Tests
- test_autoimpute_does_not_mutate_caller_receiver asserts that
  receiver_data passed in remains pristine and the returned frame
  carries the imputed column.
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
microimpute-dashboard Ready Ready Preview, Comment Apr 17, 2026 0:57am

Copy link
Copy Markdown
Contributor Author

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoimpute defensive copy verified:

  • receiver_data = receiver_data.copy() at function entry before any preprocessing.
  • Test asserts caller's snapshot is untouched (pd.testing.assert_frame_equal(receiver, snapshot), 'y' not in receiver.columns) while the returned result.receiver_data carries the imputed column.
  • Performance: single O(N) copy at entry is negligible vs the multiple intermediate frames autoimpute already allocates per model × quantile during imputation.

CI all green. Mergeable. LGTM.

@MaxGhenis MaxGhenis merged commit c66ea45 into main Apr 17, 2026
7 checks passed
@MaxGhenis MaxGhenis deleted the fix/autoimpute-copy branch April 17, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant