core/screen: retain removed screens to avoid use-after-free#879
core/screen: retain removed screens to avoid use-after-free#879sunshowers wants to merge 1 commit into
Conversation
Previously, when a screen was disconnected, `QuickshellTracked::updateScreens` deleted its `QuickshellScreenInfo` wrapper. But a copy of that `QObject*` may have been boxed into a `QVariant` and left a bare, unguarded pointer. Deleting the wrapper then dangles that pointer, and the next time the engine re-wraps it (the process segfaults).
This was observed as intermittent crashes on monitor wakeup under niri with DankMaterialShell, confirmed from two core dumps to be a freed `QuickshellScreenInfo` inside a `Toplevel.screens` list retained by a model.
Both core dumps had the following in them:
```
screens QVariant @ 0x7fecc79e4eb8 metatype='QList<QuickshellScreenInfo*>'
inline QList {d=0x7fecf1921320, ptr=0x7fecf1921330, size=1}
screens[0] = 0x7fed0313b6a0 <-- freed (vtable=0x1 d_ptr=0x7)
```
Fix this by not actually deleting lost screens and instead just putting them in an internal vector. Note that this is a memory leak, in the sense that the list will keep growing in an unbounded fashion. But screens are pretty bounded, so it seems like a relatively straightforward way to handle a somewhat rare occurrence.
Note that `deleteLater` doesn't quite work here, because these raw pointers can be held across event loop iterations. A more structural fix is likely some kind of generational arena + weak index pattern, but that's outside the scope of this PR.
|
(I've patched this locally, and am going to run with it for a while to monitor if I see any other crashes.) |
|
I've never managed to get the QML engine to choke on a freed pointer in any prior testing I've done to trigger this class of bug, but I went and did an extra round of testing and didn't find anything for similar cases. There's really only one point we could reasonably return an invalid |
|
Thanks. One thing to note here is that my monitor is a Samsung G9 57 which I run at 7680x2160 240Hz with an AMD GPU. I've seen that cause some issues on monitor wakeup, with link training failing for a bit then succeeding. Both crashes I saw have this stack trace: https://gist.github.com/sunshowers/e439bd41d78fb5f189088471fc8d3e36 |
From what I can tell it's not Quickshell returning a |
|
From my journalctl, both the crashes I saw have the following pattern: So yeah, it seems like this is more likely to happen if a screen is added, removed, then rapidly added again afterwards. This might be why if you're not on a monitor that sometimes fails to do link training (such as a large DisplayPort MST display) you might not see the crash. I'm not sure what facilities Quickshell or QML has to inject monitor add/remove events, but if there is such a mechanism, reproducing this might be more feasible. |
Previously, when a screen was disconnected,
QuickshellTracked::updateScreensdeleted itsQuickshellScreenInfowrapper. But a copy of thatQObject*may have been put into aQVariantas a bare, unguarded pointer. Deleting the wrapper leaves that pointer dangling, and the next time the engine accesses it, the process segfaults with a use-after-free error.This was observed as intermittent crashes on monitor wakeup under niri with DankMaterialShell, confirmed from two core dumps to be a freed
QuickshellScreenInfoinside aToplevel.screenslist retained by a model.Both core dumps had the following in them:
Fix this by not actually deleting lost screens and instead just putting them in an internal vector. Note that this is a memory leak, in the sense that the list will keep growing in an unbounded fashion. But over the lifetime of a quickshell process there aren't that many screens connected and disconnected, so it seems like a relatively straightforward way to handle this.
Note that
deleteLaterdoesn't quite work here, because these raw pointers can be held across event loop iterations. A more structural fix is likely some kind of generational arena + weak index pattern, but that's outside the scope of this PR.TODO: