Skip to content

VarLookupDict should implement __iter__, causes CPython crash when using statsmodels in marimo notebook #259

@shenker

Description

@shenker

Using statsmodels with patsy formulas in a marimo notebook causes a CPython crash (!) if a variable in the patsy formula is undefined. There is an underlying CPython bug (it shouldn't crash!), which I have described at python/cpython#129605 (comment). This is easily fixable on the patsy side, though. (Happy to submit a small PR if this sounds good to you.)

Here's the real-world reproducer (and how I came across this bug). Create a marimo (0.23.8) notebook with the following cell (that imports statsmodel 0.14.6, which in turn imports patsy 1.0.2).

import polars as pl
import statsmodels.formula.api as smf

df = pl.DataFrame({"a2": range(10), "b": range(10, 20)})
model = smf.ols(
    "a ~ b",
    data=df.to_pandas(),
)

The CPython process running the marimo kernel will crash when the cell is run. A misspelled column name in a formula string triggers a NameError inside patsy's eval() call (where f_locals is a live VarLookupDict). Marimo calls traceback.print_exception directly from Python to format cell errors — unlike the REPL, which uses C-level PyErr_Display and is robust to this — so the KeyError(0) propagates uncaught and kills the notebook kernel.

Here's the minimal reproducer describing the easy fix on the patsy side:

from patsy.eval import VarLookupDict

d = VarLookupDict([{"x": 1, "y": 2}])
list(d)
# Expected: ["x", "y"]
# Actual: KeyError: 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions