Skip to content

Fix/unstable#410

Closed
warrick1016 wants to merge 8 commits intofeature/ror-update-8.2.0from
fix/unstable
Closed

Fix/unstable#410
warrick1016 wants to merge 8 commits intofeature/ror-update-8.2.0from
fix/unstable

Conversation

@warrick1016
Copy link
Copy Markdown
Collaborator

No description provided.

warrick1016 and others added 8 commits April 13, 2026 11:42
ioThreadsScaleUpStart() (which logs 'IO threads scale-up end') runs in the
*next* beforeSleep() after CONFIG SET is processed.  However, io_thread_1
may already flush the 'OK' reply to the socket before the main thread
enters that next beforeSleep(), causing Tcl to call verify_log_message
(which checks immediately, no waiting) before the log line is written.

Under ASAN the main thread is slower due to memory instrumentation while
the io thread's socket write is a simple syscall, widening the race window.

Fix: add wait_for_condition for io_thread_scale_status == 'none' between
the CONFIG SET and verify_log_message, matching the pattern already used
in the 101-client case at line 173-177.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…napshot

Line 248 captures a snapshot via 'set info [r info threads]' and checks
whether io_thread_2 exists in it.  The original line 249 then issued a
second 'r info threads' query inside the if-body.  If io_thread_2 was
torn down between those two RPCs (easy under ASAN where the main thread is
slower), the second query returns '' for io_thread_2, causing
  assert_equal '' 0
to fail.

Fix: reuse the already-captured $info for the clients assertion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 'Test child sending info' test fails under ASAN with:
  COW info wasn't reported

Root cause: sendChildInfoGeneric() throttles /proc/self/smaps reads
(zmalloc_get_private_dirty) using CHILD_COW_DUTY_CYCLE=100.  Under ASAN
the process virtual address space is roughly doubled by shadow memory, so
/proc/self/smaps is much larger and can take 2-10 seconds to parse.  The
duty-cycle suppresses the NEXT measurement for cow_update_cost * 100
microseconds — potentially hundreds of seconds — making it impossible to
see a second incremental COW update within the original 8-second window.

Three targeted fixes for ASAN builds ($::asan == 1):

1. Increase wait_for_condition from 80*100ms to 300*100ms (30 s) so the
   first slow smaps read can complete and the parent can absorb it via
   run_with_period(1000).

2. Break after the first successful COW observation (after incr iteration
   sets iteration=2), skipping the second iteration that would require a
   second smaps read throttled away for minutes.

3. Relax 'assert_morethan_equal $iteration 2' to 'assert {$iteration >= 1}'
   and change the final rdb_last_cow_size check to 'assert {$final_cow > 0}'
   (instead of 90% of last active cow_size which can be 0 under ASAN when
   bgsave completes before the first incremental report is visible).

Normal (non-ASAN) behaviour is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant