Fix QRF stochastic prediction: persistent RNG, unbiased median, monotone quantiles by MaxGhenis · Pull Request #177 · PolicyEngine/microimpute

MaxGhenis · 2026-04-17T12:28:07Z

Summary

Fixes three correctness bugs in _QRFModel.predict that interact in the stochastic-imputation code path (findings #1, #2, #3 from the bug hunt):

Add hyperparameter tuning to each model #1 — RNG reset on every predict(). np.random.default_rng(self.seed) was called inside predict(), so repeated calls on the same X returned identical draws and multiple-imputation variance collapsed to zero. The RNG is now created once in __init__ and consumed progressively across calls.
Add module with function for each model to impute on a dataset #2 — Quantile-grid bias. The grid [0.091..0.909] plus .astype(int) (which floors, not rounds) biased stochastic "median" predictions low and truncated the tails. Stochastic draws are now rounded onto a fine symmetric grid so the empirical mean of selected quantiles matches the intended mean_quantile.
Add function to compute quantile loss on a given dataframe for each model and quantile #3 — Per-row random quantile for explicit multi-quantile predict. When users passed quantiles=[0.1, 0.5, 0.9], each quantile request drew its own random per-row index, producing crossed quantiles. QRFResults._predict now routes explicit quantiles through a deterministic exact_quantile path that guarantees row-level monotonicity.

Test plan

test_qrf_repeated_predict_calls_produce_different_draws — two sequential predict() calls return different draws
test_qrf_stochastic_median_is_unbiased — mean of many stochastic median calls ≈ deterministic median
test_qrf_multi_quantile_per_row_monotonicity — per-row q=0.1 <= q=0.5 <= q=0.9 with explicit quantiles
Existing test_qrf_beta_distribution_sampling still passes
Full tests/test_models/test_qrf.py suite (30/30) passes
tests/test_smoke_qrf.py passes

…one quantiles Three correctness bugs in _QRFModel.predict are fixed together because they interact in the stochastic-imputation code path: 1. Seed reset on every predict() (#1). np.random.default_rng(self.seed) was called inside predict(), so repeated predict() calls on the same X returned identical draws and collapsed multiple-imputation variance to zero. The RNG is now created once in __init__ and consumed progressively across calls. 2. Quantile-grid bias (#2). The grid [0.091..0.909] combined with .astype(int) (which floors) biased stochastic "median" predictions low and truncated the tails. Stochastic draws are now rounded (not floored) onto a fine symmetric grid so the empirical mean of selected quantiles matches the intended mean_quantile. 3. Per-row random quantile for explicit multi-quantile predict (#3). When users passed quantiles=[0.1, 0.5, 0.9] for prediction intervals, each quantile request sampled a separate random per-row index, producing crossed quantiles. QRFResults._predict now routes explicit quantiles through a deterministic exact_quantile path that guarantees row-level monotonicity. Adds regression tests: - test_qrf_repeated_predict_calls_produce_different_draws - test_qrf_stochastic_median_is_unbiased - test_qrf_multi_quantile_per_row_monotonicity

vercel · 2026-04-17T12:28:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
microimpute-dashboard	Ready	Preview, Comment	Apr 17, 2026 0:28am

MaxGhenis

QRF stochastic fixes verified end-to-end:

Persistent RNG: self._rng initialised in _QRFModel.__init__ (qrf.py:150), advanced via self._rng.beta(...) in predict. test_qrf_repeated_predict_calls_produce_different_draws asserts two predict calls on same X produce different draws.
Unbiased median: np.rint onto fine symmetric grid (size = max(count_samples, 101)), eps = 1/(grid+1). With mean_quantile=0.5 → Beta(1,1) uniform → rounded index centred on 0.5. test_qrf_stochastic_median_is_unbiased averages 60 stochastic draws against the exact-quantile median and asserts the gap < 0.25·std(y).
Monotone per-row quantiles: QRFResults._predict routes explicit quantiles=[...] through exact_quantile=q which calls self.qrf.predict(X, quantiles=[q]) directly (no beta sampling). test_qrf_multi_quantile_per_row_monotonicity asserts row-level q_low <= q_mid <= q_high.

CI all green (lint + 3.12/3.14 tests + changelog). Mergeable. LGTM.

vercel Bot deployed to Preview April 17, 2026 12:28 View deployment

MaxGhenis commented Apr 17, 2026

View reviewed changes

MaxGhenis merged commit 541e9d3 into main Apr 17, 2026
7 checks passed

MaxGhenis deleted the fix/qrf-randomness-and-quantiles branch April 17, 2026 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix QRF stochastic prediction: persistent RNG, unbiased median, monotone quantiles#177

Fix QRF stochastic prediction: persistent RNG, unbiased median, monotone quantiles#177
MaxGhenis merged 1 commit intomainfrom
fix/qrf-randomness-and-quantiles

MaxGhenis commented Apr 17, 2026

Uh oh!

vercel Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

MaxGhenis left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Apr 17, 2026

Summary

Test plan

Uh oh!

vercel Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaxGhenis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Apr 17, 2026 •

edited

Loading