Using statsmodels with patsy formulas in a marimo notebook causes a CPython crash (!) if a variable in the patsy formula is undefined. There is an underlying CPython bug (it shouldn't crash!), which I have described at python/cpython#129605 (comment). This is easily fixable on the patsy side, though. (Happy to submit a small PR if this sounds good to you.)
Here's the real-world reproducer (and how I came across this bug). Create a marimo (0.23.8) notebook with the following cell (that imports statsmodel 0.14.6, which in turn imports patsy 1.0.2).
import polars as pl
import statsmodels.formula.api as smf
df = pl.DataFrame({"a2": range(10), "b": range(10, 20)})
model = smf.ols(
"a ~ b",
data=df.to_pandas(),
)
The CPython process running the marimo kernel will crash when the cell is run. A misspelled column name in a formula string triggers a NameError inside patsy's eval() call (where f_locals is a live VarLookupDict). Marimo calls traceback.print_exception directly from Python to format cell errors — unlike the REPL, which uses C-level PyErr_Display and is robust to this — so the KeyError(0) propagates uncaught and kills the notebook kernel.
Here's the minimal reproducer describing the easy fix on the patsy side:
from patsy.eval import VarLookupDict
d = VarLookupDict([{"x": 1, "y": 2}])
list(d)
# Expected: ["x", "y"]
# Actual: KeyError: 0
Using statsmodels with patsy formulas in a marimo notebook causes a CPython crash (!) if a variable in the patsy formula is undefined. There is an underlying CPython bug (it shouldn't crash!), which I have described at python/cpython#129605 (comment). This is easily fixable on the patsy side, though. (Happy to submit a small PR if this sounds good to you.)
Here's the real-world reproducer (and how I came across this bug). Create a marimo (0.23.8) notebook with the following cell (that imports statsmodel 0.14.6, which in turn imports patsy 1.0.2).
The CPython process running the marimo kernel will crash when the cell is run. A misspelled column name in a formula string triggers a
NameErrorinside patsy'seval()call (wheref_localsis a liveVarLookupDict). Marimo callstraceback.print_exceptiondirectly from Python to format cell errors — unlike the REPL, which uses C-levelPyErr_Displayand is robust to this — so theKeyError(0)propagates uncaught and kills the notebook kernel.Here's the minimal reproducer describing the easy fix on the patsy side: