filter_write_tag: in_emitter: Mitigate data loss on multiline#12029
filter_write_tag: in_emitter: Mitigate data loss on multiline#12029cosmo0920 wants to merge 4 commits into
Conversation
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR fixes silent record loss when the shared in_emitter is paused. ChangesEmitter Busy Handling and Record Preservation
Estimated code review effort: 3 (Moderate) | ~25 minutes Sequence Diagram(s)sequenceDiagram
participant RewriteTag as rewrite_tag filter
participant Emitter as in_emitter
participant Queue as emitter queue/chunks
participant Output as output plugin
RewriteTag->>Emitter: in_emitter_add_record(rewritten record)
Emitter->>Queue: check mem_buf_limit using pending_bytes
alt queue over limit
Emitter-->>RewriteTag: return FLB_EMITTER_BUSY
RewriteTag-->>RewriteTag: set keep = FLB_TRUE
else paused, no ring buffer
Emitter->>Queue: buffer record for later ingestion
Emitter-->>RewriteTag: return queued
else available
Emitter->>Queue: enqueue record
Emitter-->>RewriteTag: return success
end
Queue->>Output: flush buffered records
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2ab49b4318
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/filter_rewrite_tag/rewrite_tag.c (1)
403-419: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
ret < 0check missesFLB_FILTER_NOTOUCHfailure path fromin_emitter_add_record.
in_emitter_add_recordcan returnFLB_FILTER_NOTOUCH(value2, a positive int) when it fails to allocatei_reffor tracking the source input (flb_mallocfailure path). SinceFLB_FILTER_NOTOUCHis positive, the updatedif (ret < 0)check at Line 417 won't catch this failure — the record will be misclassified as successfully emitted (is_emitted == FLB_TRUE,*keepstays whateverrule->keep_recordwas), and ifkeep_record=false, the record is silently dropped despite the actual emission having failed. This defeats the exact "don't drop on emitter failure" guarantee this PR is trying to establish.🐛 Proposed fix
- if (ret < 0) { + if (ret < 0 || ret == FLB_FILTER_NOTOUCH) { *keep = FLB_TRUE; return FLB_FALSE; }Alternatively, consider having
in_emitter_add_recordreturn a value consistent with its own0/-1/FLB_EMITTER_BUSYconventions instead of reusing aflb_filter.hconstant, to avoid mixing return-code namespaces.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/filter_rewrite_tag/rewrite_tag.c` around lines 403 - 419, The failure handling in rewrite_tag.c around ingest_inline/in_emitter_add_record is missing the FLB_FILTER_NOTOUCH path, so a positive failure return can still be treated as success. Update the post-call handling in this block to explicitly recognize the in_emitter_add_record return conventions used by rewrite_tag.c and in_emitter_add_record, not just ret < 0, and ensure any non-success from record emission forces the keep/drop-safe path. If possible, align in_emitter_add_record’s return values with the emitter’s own status codes so rewrite_tag.c can make a single consistent success/failure check.
🧹 Nitpick comments (2)
tests/runtime/filter_rewrite_tag.c (2)
344-360: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winFixed 1500ms sleep with an exact-match assertion may be timing-sensitive under heavy load.
Both tests now assert an exact count (
heavy_loop == got) after a single fixedflb_time_msleep(1500), pushing 100,000 records through a mem-buf-limited emitter. Previously (per the AI summary) the assertion tolerated partial delivery; tightening it to an exact match increases sensitivity to whatever cadence buffered/queued chunks are drained on under CI load variance. Consider polling for stabilization ofget_output_num()instead of a single fixed sleep, to reduce flakiness risk.Also applies to: 407-412
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/runtime/filter_rewrite_tag.c` around lines 344 - 360, The exact-count assertion after a single fixed sleep in the filter rewrite tag tests is timing-sensitive and can flake under load. Update the test logic around get_output_num(), flb_time_msleep(), and the heavy_loop check to wait until output delivery stabilizes or the expected count is reached, rather than relying on one 1500ms delay before asserting heavy_loop == got. Apply the same stabilization approach in the other matching test block referenced by the review so both cases avoid CI-load-dependent failures.
362-415: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winNear-duplicate of
flb_test_heavy_input_pause_emitter.
flb_test_busy_emitter_keeps_original(Lines 362-415) duplicates most of the setup/push/wait logic fromflb_test_heavy_input_pause_emitter(Lines 294-343), differing mainly in theMatchfilter and final assertion. Consider extracting a shared helper (parameterized byMatchpattern and expected count) to reduce duplication and ease future maintenance.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/runtime/filter_rewrite_tag.c` around lines 362 - 415, `flb_test_busy_emitter_keeps_original` duplicates most of `flb_test_heavy_input_pause_emitter`, including setup, push loop, sleep, and output check. Refactor the shared test flow into a helper in `filter_rewrite_tag.c` that takes the `Match` pattern and expected count as parameters, then have both test functions call it with their specific values while keeping only the differing assertion/setup in each wrapper.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/in_emitter/emitter.c`:
- Around line 166-174: The non-ring-buffer path in emitter.c does not enforce a
hard stop when flb_input_buf_paused(ctx->ins) is true, so the backlog can still
grow via ec->mp_sbuf. Update the emitter chunk handling around the ctx->msgs
branch to either reject new records immediately like the ring-buffer path or add
a separate backlog cap/limit before appending, using the existing
flb_input_buf_paused check and the emitter queue state symbols to keep this path
bounded.
---
Outside diff comments:
In `@plugins/filter_rewrite_tag/rewrite_tag.c`:
- Around line 403-419: The failure handling in rewrite_tag.c around
ingest_inline/in_emitter_add_record is missing the FLB_FILTER_NOTOUCH path, so a
positive failure return can still be treated as success. Update the post-call
handling in this block to explicitly recognize the in_emitter_add_record return
conventions used by rewrite_tag.c and in_emitter_add_record, not just ret < 0,
and ensure any non-success from record emission forces the keep/drop-safe path.
If possible, align in_emitter_add_record’s return values with the emitter’s own
status codes so rewrite_tag.c can make a single consistent success/failure
check.
---
Nitpick comments:
In `@tests/runtime/filter_rewrite_tag.c`:
- Around line 344-360: The exact-count assertion after a single fixed sleep in
the filter rewrite tag tests is timing-sensitive and can flake under load.
Update the test logic around get_output_num(), flb_time_msleep(), and the
heavy_loop check to wait until output delivery stabilizes or the expected count
is reached, rather than relying on one 1500ms delay before asserting heavy_loop
== got. Apply the same stabilization approach in the other matching test block
referenced by the review so both cases avoid CI-load-dependent failures.
- Around line 362-415: `flb_test_busy_emitter_keeps_original` duplicates most of
`flb_test_heavy_input_pause_emitter`, including setup, push loop, sleep, and
output check. Refactor the shared test flow into a helper in
`filter_rewrite_tag.c` that takes the `Match` pattern and expected count as
parameters, then have both test functions call it with their specific values
while keeping only the differing assertion/setup in each wrapper.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dbeabad8-d41a-420d-9f95-9430ee216a51
📒 Files selected for processing (3)
plugins/filter_rewrite_tag/rewrite_tag.cplugins/in_emitter/emitter.ctests/runtime/filter_rewrite_tag.c
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
2ab49b4 to
0905395
Compare
|
@cosmo0920 I think we need to backport this right? |
With the reproducer, I addressed like as:
Details
Closes #12026.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit