Avoid correlated SQLite plans for alert decision filters#4471
Conversation
|
@jimstrang: There are no 'kind' label on this PR. You need a 'kind' label to generate the release automatically.
DetailsI am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository. |
|
@jimstrang: There are no area labels on this PR. You can add as many areas as you see fit.
DetailsI am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository. |
|
/kind fix |
|
hey, thanks for the PR. We're going to investigate a bit to understand where the regression might come from tho. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4471 +/- ##
==========================================
- Coverage 63.89% 63.88% -0.02%
==========================================
Files 478 479 +1
Lines 34298 34304 +6
==========================================
Hits 21915 21915
- Misses 10227 10232 +5
- Partials 2156 2157 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I think its due to how SQLite specifically handles the chunked streaming chages that were introduced in #4413, but candidly I haven't yet explicitly rolled that one back to A/B test. I can test that out later |
|
I think I've narrowed down the regression to not the chunked streaming changes, but the deps update in #4412, specifically the go-sqlite3 driver/package With the newer SQLite, the existing correlated This PR avoids the https://sqlite.org/changes.html shows some changes in the past few releases for |
Summary
This changes the positive
/v1/alertsdecision filters from Ent's generatedHasDecisionsWith(...)predicate to a small helper that emits independentINsubqueries.The affected filters are:
decision_typeoriginhas_active_decision=trueThe intent is to keep the existing filter semantics while avoiding SQLite plans that repeatedly probe
decisionsfor each candidate alert.Rationale
The current Ent shape is a correlated
EXISTS, roughly:When several decision filters are combined, SQLite plans this as an alert scan plus repeated correlated probes into
decisions. That gets expensive when a matching alert has thousands of linked decisions, which is the shape seen in #4464 and #4470.The new helper keeps each decision filter independent, but emits:
This lets SQLite build matching alert IDs from
decisionsfirst, then look up alerts by primary key. It is intentionally not a semantic change: different linked decision rows may still satisfy different filters, just like with the previous separateHasDecisionsWith(...)predicates.Implementation Notes
include_capi=falseis unchanged because it uses negativeNOT EXISTSpredicates. A naiveNOT INrewrite is not null-safe whendecisions.alert_decisionscontainsNULL; I verified this with unlinkedCAPI/listsdecisions.scenarioandip/rangeare unchanged. They have similar correlated shapes, but fixture testing did not show the same benefit; theINform for IP/range made SQLite scan larger parts ofdecisions.alertpackage file, so generated Ent files stay untouched and call sites remainalert.HasDecisionsMatching(...).Testing
Validated on the reproduced SQLite DB from #4464, an expanded #4470-style SQLite fixture, a fresh production SQLite copy with WAL disabled, and the live production Postgres backend.
Result sets:
EXISTSSQL with newINSQL usingEXCEPTin both directions./v1/alertscases.cscli,lists, andCAPIorigins.SQLite/API behavior:
decision_type=ban&origin=crowdsec&has_active_decision=1moved from a 50s+ path to roughly tens of milliseconds in the local reproducer.origin=cscli,origin=lists, and a large synthetic origin; the new shape returned those cases in ~160-410ms.PRAGMA journal_mode=delete), current code still timed out on affected alert filters; the new shape returned quickly.Expanded fixture comparison:
Fresh DB / DELETE journal comparison:
I also reran the result-set comparison on that fresh DB; the current and new query shapes returned the same alert IDs for the tested cases.
Postgres check:
homepage_bans,origin=lists,origin=CAPI, andorigin=csclicases.Go checks:
Both pass locally.
Refs #4464
Refs #4470
Supersedes the index-only approach explored in #4468