Vectorize QuantReg random-quantile and remove dead RNG code#183
Merged
Vectorize QuantReg random-quantile and remove dead RNG code#183
Conversation
Two issues in microimpute/models/quantreg.py:
1. Dead "random quantile" RNG in QuantReg._fit:
random_generator = np.random.default_rng(self.seed)
q = 0.5
self.logger.info(f"Fitting quantile regression for random quantile {q:.4f}")
The generator was never used and q was hardcoded to 0.5. Removed
the dead RNG and clarified the log message.
2. Quadratic per-row .loc assignment in QuantRegResults._predict:
result_df = pd.DataFrame(index=..., columns=...) # object dtype
for idx in result_df.index:
sampled_q = rng.choice(quantiles)
for variable in self.imputed_variables:
result_df.loc[idx, variable] = random_q_imputations[sampled_q].loc[idx, variable]
This allocated an object-dtype frame and wrote per-row via .loc,
silently demoting numeric predictions to object dtype and running
in O(rows * vars) Python-level lookups — a direct contributor to
OOM pressure in issue #96.
Replaced with a single vectorised pass:
stacked = np.column_stack([random_q_imputations[q][var].values for q in quantiles])
row_idx = np.arange(n_rows)
result_df[var] = stacked[row_idx, sampled_idx]
This preserves a numeric dtype and avoids per-row attribute lookups.
Tests
- test_quantreg_random_quantile_sample_returns_numeric_dtype asserts
the output dtype is numeric (regression for object-dtype demotion).
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
MaxGhenis
commented
Apr 17, 2026
Contributor
Author
MaxGhenis
left a comment
There was a problem hiding this comment.
QuantReg cleanup verified:
- Dead code removed:
random_generator = np.random.default_rng(self.seed)with hardcodedq = 0.5and misleading 'random quantile' log message gone. Log now accurately readsFitting quantile regression for q={q}. - Per-row
.loc[idx, variable]loop (quadratic, object-dtype demotion) replaced with vectorisednp.column_stack+ per-row index:
sampled_idx = rng.integers(0, len(quantiles_arr), n_rows)
stacked[row_idx, sampled_idx]
Output is numeric dtype (asserted bypd.api.types.is_numeric_dtype). test_quantreg_random_quantile_sample_returns_numeric_dtypecovers the dtype and finiteness. Statistical equivalence to old loop holds (different RNG realisations but same sampling distribution); no bit-equivalence test would be meaningful since the old output was object-dtype demoted.
CI all green. Mergeable. LGTM.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes finding #11 in
microimpute/models/quantreg.py:QuantReg._fit. The fallback branch createdrandom_generator = np.random.default_rng(self.seed), logged "Fitting quantile regression for random quantile", and then hardcodedq = 0.5without ever using the RNG. Removed and updated the log message..locassignment inQuantRegResults._predict. The previous implementation allocated an object-dtype frame and wrote numeric predictions into it viaresult_df.loc[idx, variable] = ...inside a Python loop, running in O(rows * vars) and silently demoting numeric predictions toobject. This was flagged as a direct contributor to issue Sequential imputation runs out of memory with many variables #96 (OOM). Replaced with a single vectorised pass usingnp.column_stack+ per-row index selection, preserving numeric dtype.Test plan
test_quantreg_random_quantile_sample_returns_numeric_dtypeasserts output is a numeric dtype (regression for object-dtype demotion)tests/test_models/test_quantreg.pypasses (10/10)