Vectorize QuantReg random-quantile and remove dead RNG code by MaxGhenis · Pull Request #183 · PolicyEngine/microimpute

MaxGhenis · 2026-04-17T12:52:48Z

Summary

Fixes finding #11 in microimpute/models/quantreg.py:

Dead RNG in QuantReg._fit. The fallback branch created random_generator = np.random.default_rng(self.seed), logged "Fitting quantile regression for random quantile", and then hardcoded q = 0.5 without ever using the RNG. Removed and updated the log message.
Quadratic per-row .loc assignment in QuantRegResults._predict. The previous implementation allocated an object-dtype frame and wrote numeric predictions into it via result_df.loc[idx, variable] = ... inside a Python loop, running in O(rows * vars) and silently demoting numeric predictions to object. This was flagged as a direct contributor to issue Sequential imputation runs out of memory with many variables #96 (OOM). Replaced with a single vectorised pass using np.column_stack + per-row index selection, preserving numeric dtype.

Test plan

test_quantreg_random_quantile_sample_returns_numeric_dtype asserts output is a numeric dtype (regression for object-dtype demotion)
Full tests/test_models/test_quantreg.py passes (10/10)

Two issues in microimpute/models/quantreg.py: 1. Dead "random quantile" RNG in QuantReg._fit: random_generator = np.random.default_rng(self.seed) q = 0.5 self.logger.info(f"Fitting quantile regression for random quantile {q:.4f}") The generator was never used and q was hardcoded to 0.5. Removed the dead RNG and clarified the log message. 2. Quadratic per-row .loc assignment in QuantRegResults._predict: result_df = pd.DataFrame(index=..., columns=...) # object dtype for idx in result_df.index: sampled_q = rng.choice(quantiles) for variable in self.imputed_variables: result_df.loc[idx, variable] = random_q_imputations[sampled_q].loc[idx, variable] This allocated an object-dtype frame and wrote per-row via .loc, silently demoting numeric predictions to object dtype and running in O(rows * vars) Python-level lookups — a direct contributor to OOM pressure in issue #96. Replaced with a single vectorised pass: stacked = np.column_stack([random_q_imputations[q][var].values for q in quantiles]) row_idx = np.arange(n_rows) result_df[var] = stacked[row_idx, sampled_idx] This preserves a numeric dtype and avoids per-row attribute lookups. Tests - test_quantreg_random_quantile_sample_returns_numeric_dtype asserts the output dtype is numeric (regression for object-dtype demotion).

vercel · 2026-04-17T12:52:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
microimpute-dashboard	Ready	Preview, Comment	Apr 17, 2026 0:53am

MaxGhenis

QuantReg cleanup verified:

Dead code removed: random_generator = np.random.default_rng(self.seed) with hardcoded q = 0.5 and misleading 'random quantile' log message gone. Log now accurately reads Fitting quantile regression for q={q}.
Per-row .loc[idx, variable] loop (quadratic, object-dtype demotion) replaced with vectorised np.column_stack + per-row index:
sampled_idx = rng.integers(0, len(quantiles_arr), n_rows)
stacked[row_idx, sampled_idx]
Output is numeric dtype (asserted by pd.api.types.is_numeric_dtype).
test_quantreg_random_quantile_sample_returns_numeric_dtype covers the dtype and finiteness. Statistical equivalence to old loop holds (different RNG realisations but same sampling distribution); no bit-equivalence test would be meaningful since the old output was object-dtype demoted.

CI all green. Mergeable. LGTM.

vercel bot deployed to Preview April 17, 2026 12:53 View deployment

MaxGhenis commented Apr 17, 2026

View reviewed changes

MaxGhenis merged commit 7f48f40 into main Apr 17, 2026
7 checks passed

MaxGhenis deleted the fix/quantreg-dead-code-and-vectorize branch April 17, 2026 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize QuantReg random-quantile and remove dead RNG code#183

Vectorize QuantReg random-quantile and remove dead RNG code#183
MaxGhenis merged 1 commit intomainfrom
fix/quantreg-dead-code-and-vectorize

MaxGhenis commented Apr 17, 2026

Uh oh!

vercel bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

MaxGhenis left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Apr 17, 2026

Summary

Test plan

Uh oh!

vercel bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaxGhenis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Apr 17, 2026 •

edited

Loading