Skip to content

Vectorize QuantReg random-quantile and remove dead RNG code#183

Merged
MaxGhenis merged 1 commit intomainfrom
fix/quantreg-dead-code-and-vectorize
Apr 17, 2026
Merged

Vectorize QuantReg random-quantile and remove dead RNG code#183
MaxGhenis merged 1 commit intomainfrom
fix/quantreg-dead-code-and-vectorize

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

Fixes finding #11 in microimpute/models/quantreg.py:

  • Dead RNG in QuantReg._fit. The fallback branch created random_generator = np.random.default_rng(self.seed), logged "Fitting quantile regression for random quantile", and then hardcoded q = 0.5 without ever using the RNG. Removed and updated the log message.
  • Quadratic per-row .loc assignment in QuantRegResults._predict. The previous implementation allocated an object-dtype frame and wrote numeric predictions into it via result_df.loc[idx, variable] = ... inside a Python loop, running in O(rows * vars) and silently demoting numeric predictions to object. This was flagged as a direct contributor to issue Sequential imputation runs out of memory with many variables #96 (OOM). Replaced with a single vectorised pass using np.column_stack + per-row index selection, preserving numeric dtype.

Test plan

  • test_quantreg_random_quantile_sample_returns_numeric_dtype asserts output is a numeric dtype (regression for object-dtype demotion)
  • Full tests/test_models/test_quantreg.py passes (10/10)

Two issues in microimpute/models/quantreg.py:

1. Dead "random quantile" RNG in QuantReg._fit:
       random_generator = np.random.default_rng(self.seed)
       q = 0.5
       self.logger.info(f"Fitting quantile regression for random quantile {q:.4f}")
   The generator was never used and q was hardcoded to 0.5. Removed
   the dead RNG and clarified the log message.

2. Quadratic per-row .loc assignment in QuantRegResults._predict:
       result_df = pd.DataFrame(index=..., columns=...)   # object dtype
       for idx in result_df.index:
           sampled_q = rng.choice(quantiles)
           for variable in self.imputed_variables:
               result_df.loc[idx, variable] = random_q_imputations[sampled_q].loc[idx, variable]
   This allocated an object-dtype frame and wrote per-row via .loc,
   silently demoting numeric predictions to object dtype and running
   in O(rows * vars) Python-level lookups — a direct contributor to
   OOM pressure in issue #96.

   Replaced with a single vectorised pass:
       stacked = np.column_stack([random_q_imputations[q][var].values for q in quantiles])
       row_idx = np.arange(n_rows)
       result_df[var] = stacked[row_idx, sampled_idx]
   This preserves a numeric dtype and avoids per-row attribute lookups.

Tests
- test_quantreg_random_quantile_sample_returns_numeric_dtype asserts
  the output dtype is numeric (regression for object-dtype demotion).
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
microimpute-dashboard Ready Ready Preview, Comment Apr 17, 2026 0:53am

Copy link
Copy Markdown
Contributor Author

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QuantReg cleanup verified:

  • Dead code removed: random_generator = np.random.default_rng(self.seed) with hardcoded q = 0.5 and misleading 'random quantile' log message gone. Log now accurately reads Fitting quantile regression for q={q}.
  • Per-row .loc[idx, variable] loop (quadratic, object-dtype demotion) replaced with vectorised np.column_stack + per-row index:
    sampled_idx = rng.integers(0, len(quantiles_arr), n_rows)
    stacked[row_idx, sampled_idx]
    Output is numeric dtype (asserted by pd.api.types.is_numeric_dtype).
  • test_quantreg_random_quantile_sample_returns_numeric_dtype covers the dtype and finiteness. Statistical equivalence to old loop holds (different RNG realisations but same sampling distribution); no bit-equivalence test would be meaningful since the old output was object-dtype demoted.

CI all green. Mergeable. LGTM.

@MaxGhenis MaxGhenis merged commit 7f48f40 into main Apr 17, 2026
7 checks passed
@MaxGhenis MaxGhenis deleted the fix/quantreg-dead-code-and-vectorize branch April 17, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant