ROX-33614: Update Falco to 0.23.1#2976
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #2976 +/- ##
==========================================
+ Coverage 27.34% 27.37% +0.03%
==========================================
Files 95 94 -1
Lines 5420 5413 -7
Branches 2545 2547 +2
==========================================
Hits 1482 1482
+ Misses 3211 3201 -10
- Partials 727 730 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. |
c2385e2 to
e647cac
Compare
- Update falcosecurity-libs from 0.18.1 to 0.23.1 - Fix BPF verifier failures on older kernels (4.18) - Fix clang-format lint in Utility.cpp - Skip fd-based execs (/dev/fd/N) in exepath fallback - Disable TOCTOU 64-bit progs for missing syscalls - Remove container plugin, use built-in container ID lookups - Add analyze-ci Claude skill - Add update-falco-libs Claude skill Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
70ea6aa to
cd3770b
Compare
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
544901c to
23c2967
Compare
Molter73
left a comment
There was a problem hiding this comment.
I haven't gone through the changes under .claude or AGENTS.md, I think those probably should moved to their own PR.
Even without those there are plenty of comments, the falco side looks fine though.
There was a problem hiding this comment.
Can we move the .claude/commands and AGENTS.md to a separate PR so we can focus the review efforts?
| bool is_send_recv = (strncmp(evt_name, "send", 4) == 0 || | ||
| strncmp(evt_name, "recv", 4) == 0); | ||
| if (!is_send_recv) { |
There was a problem hiding this comment.
I'm fairly certain directly doing the checks is faster than doing these string checks. If we don't want to do this check for process information, I'm fairly certain we can check the syscall number instead and that will also be faster. Is there some other reason this check would be here I'm not seeing?
| if (!check) { | ||
| CLOG(WARNING) << "Filter check not available for field: " << wrapper->event_name; | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Fairly certain this check should just be an assert.
| if (!buf) return nullptr; \ | ||
| if (!filter_check_##id##_.filter_check) return nullptr; \ | ||
| std::vector<extract_value_t> vals_##id; \ | ||
| if (!filter_check_##id##_->extract(event, vals_##id)) return nullptr; \ |
There was a problem hiding this comment.
Why are we using extract when the falco PR still has the definition of extract_single?
| uint32_t len; \ | ||
| auto buf = filter_check_##id##_->extract_single(event, &len); \ | ||
| if (!buf) return nullptr; \ | ||
| if (!filter_check_##id##_.filter_check) return nullptr; \ |
There was a problem hiding this comment.
Should this be an assert?
| CLOG(WARNING) << "Could not set container filter: " << e.what() | ||
| << ". Container filtering will not be active."; |
There was a problem hiding this comment.
Since ACS has no idea what to do with processes outside of a container, it might be better to just let the exception bubble up and kill collector, rather than flood sensor with events it will not process.
| for (const auto& [subsys, cgroup_path] : system_inspector_threadinfo_->cgroups()) { | ||
| if (auto id = ExtractContainerIDFromCgroup(cgroup_path)) { | ||
| return std::string(*id); | ||
| } | ||
| } |
There was a problem hiding this comment.
Why can't we use GetContainerID here?
|
|
||
| CLOG(DEBUG) << "Process (" << signal->container_id() << ": " << signal->pid() << "): " | ||
| << signal->name() << "[" << container_metadata_.GetNamespace(event) << "] " | ||
| << signal->name() |
There was a problem hiding this comment.
| << signal->name() | |
| << signal->name() << "[" << container_id << "] " |
| std::ostream& operator<<(std::ostream& os, const sinsp_threadinfo* t) { | ||
| if (t) { | ||
| os << "Container: \"" << t->m_container_id << "\", Name: " << t->m_comm << ", PID: " << t->m_pid << ", Args: " << t->m_exe; | ||
| os << "Name: " << t->m_comm << ", PID: " << t->m_pid << ", Args: " << t->m_exe; |
There was a problem hiding this comment.
Fairly certain you can do something like this:
| os << "Name: " << t->m_comm << ", PID: " << t->m_pid << ", Args: " << t->m_exe; | |
| os << "Container: \"" << (t == nullptr ? "null" : GetContainerID(*t)) << "\", Name: " << t->m_comm << ", PID: " << t->m_pid << ", Args: " << t->m_exe; |
Or just create an overloaded version of GetContainerID that takes a pointer, checks it, then calls the version with the reference.
There was a problem hiding this comment.
Why do the timeouts in this test need to be bumped from 5 to 30 seconds?
Description
Falco PR: stackrox/falcosecurity-libs#97
The changes in this PR relate to the uplift of Falco to the latest tagged version, 0.23.1. This is a significant upgrade, jumping from 0.18.1 up to this latest version and contains the following notable changes to Collector:
This work was performed primarily by Claude, with oversight from me. This was something of an experiment and so I got Claude to do as much of the work as possible, with my role being that of the driver; steering Claude when it got muddled or hyper focused on the wrong fix.
The update and rebase itself was relatively straight forward, resulting in new locally-built and locally-verified builds in just a couple of hours (i.e. builds that built and ran locally, passing the integration tests on my Fedora 42 x86 machine.) The bottle neck became the CI and getting that work fed back into Claude to perform the diagnostics and fixes.
As a result, I have added two Claude skills as part of this PR which allow Claude to (1) perform Falco updates in the future and (2) inspect the state of CI and investigate test failures or BPF verifiers issues.
Checklist
Automated testing
If any of these don't apply, please comment below.
Testing Performed
Built and tested locally, all unit & integration tests passing. CI handles the remainder of our test matrix.