Skip to content

maintainer,dispatcher: fence stale generation requests#5182

Draft
hongyunyan wants to merge 10 commits into
pingcap:masterfrom
hongyunyan:codex/fix-5083-generation-fence
Draft

maintainer,dispatcher: fence stale generation requests#5182
hongyunyan wants to merge 10 commits into
pingcap:masterfrom
hongyunyan:codex/fix-5083-generation-fence

Conversation

@hongyunyan
Copy link
Copy Markdown
Collaborator

@hongyunyan hongyunyan commented Jun 1, 2026

What problem does this PR solve?

Issue Number: close #5083

During maintainer failover, a delayed schedule request from the previous
maintainer can still reach a dispatcher manager after the new maintainer has
already bootstrapped and recreated the same table span. Without a receiver-side
ownership fence, the stale request can create an orphan dispatcher that enters
Working and writes to the downstream sink before the new maintainer observes
and removes it.

What is changed and how it works?

This PR adds a receiver-local maintainer generation fence:

  • Adds generation to maintainer bootstrap, schedule, post-bootstrap, and close
    heartbeat messages.
  • Bumps and persists the changefeed epoch before new maintainer ownership is
    scheduled through add/move operators, and before resume/retry scheduling.
  • Serializes persisted epoch bumps in the backend by reading the latest stored
    ChangeFeedInfo and job status, advancing with max(candidate, persisted+1),
    preserving stored status by default, and writing info/job under info-key and
    job-key ModRevision compares.
  • Writes warning retry state/error through the same epoch bump boundary instead
    of first doing an ordinary no-CAS changefeed update.
  • Generates epochs from PD TSO without silent production fallback, and keeps each
    changefeed's generation strictly monotonic with max(candidate, current+1).
  • Keeps AddMaintainerRequest.Config bytes synchronized with the latest
    ChangeFeedInfo.
  • Stamps maintainer outbound control messages with the changefeed epoch.
  • Makes dispatcher managers track the active maintainer owner plus explicit
    request generation and reject stale schedule/post-bootstrap/close requests
    locally.
  • Serializes dispatcher-manager control requests with maintainer generation
    changes, and keeps currentOperatorMap keyed by dispatcher ID and generation.
  • Keeps rolling-upgrade compatibility by allowing generation 0 only while the
    receiver has not observed a non-zero generation for the changefeed, and only
    for the current compatibility-mode maintainer owner.

Check List

Tests

  • Unit test

Questions

Will it cause performance regression or break compatibility?

No expected performance regression. The new mutex only serializes per-changefeed
dispatcher-manager control operations such as bootstrap, close, and dispatcher
create/remove scheduling; it is not in the event write path.

The change is wire-compatible. New fields are optional protobuf fields, and a
new receiver still allows generation 0 from the current maintainer owner while
it remains in compatibility mode for that changefeed.

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

Fix a bug where delayed stale maintainer requests could create duplicate dispatchers during maintainer failover.

Validation

  • make generate-protobuf
  • make fmt
  • tools/bin/golangci-lint run --timeout 10m0s --new-from-rev=upstream/master
  • go test ./coordinator/changefeed ./coordinator/operator ./coordinator ./pkg/pdutil
  • go test ./downstreamadapter/dispatchermanager ./downstreamadapter/dispatcherorchestrator ./coordinator ./coordinator/changefeed ./coordinator/operator ./pkg/pdutil ./maintainer ./maintainer/replica ./maintainer/operator
  • go test ./api/v1 ./coordinator ./coordinator/changefeed ./coordinator/drain ./coordinator/operator ./coordinator/scheduler ./downstreamadapter/dispatchermanager ./downstreamadapter/dispatcherorchestrator ./maintainer ./maintainer/replica ./maintainer/operator ./pkg/bootstrap ./pkg/server ./pkg/pdutil
  • git diff --check

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 1, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/needs-triage-completed labels Jun 1, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jun 1, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign charlescheung96 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c334b3c8-dd15-4118-b22b-16a7dd66ac3c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 1, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a maintainer generation/epoch fencing mechanism to prevent stale maintainer requests from affecting dispatcher managers. It adds generation fields to heartbeat protobuf messages, implements fencing logic in the dispatcher manager and orchestrator, and stamps outgoing requests with the current maintainer generation. The review feedback highlights two critical head-of-line blocking issues in dispatcher_orchestrator.go where the orchestrator-wide lock m.mutex is held while waiting for the per-changefeed lock manager.LockControl(), and provides suggestions to safely release the lock before acquiring the per-changefeed lock.

Comment on lines 224 to +235
} else {
manager.LockControl()
defer manager.UnlockControl()
if !manager.TryUpdateMaintainer(from, generation) {
log.Warn("drop stale maintainer bootstrap request",
zap.String("changefeed", cfId.Name()),
zap.String("from", from.String()),
zap.Uint64("requestGeneration", generation),
zap.Uint64("currentGeneration", manager.GetMaintainerEpoch()),
zap.String("currentMaintainer", manager.GetMaintainerID().String()))
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Holding the orchestrator-wide lock m.mutex while waiting for the per-changefeed lock manager.LockControl() can cause head-of-line blocking. If a single changefeed's dispatcher manager is slow or blocked, all other changefeeds on this node will be blocked from bootstrapping or closing.

To avoid this, we should unlock m.mutex as soon as we retrieve the manager from m.dispatcherManagers, and then acquire manager.LockControl(). To prevent races with concurrent close/delete operations, we can re-verify under m.mutex that the manager is still the active one in m.dispatcherManagers before proceeding.

	} else {
		m.mutex.Unlock()
		manager.LockControl()
		defer manager.UnlockControl()

		m.mutex.Lock()
		currentManager, stillExists := m.dispatcherManagers[cfId]
		if !stillExists || currentManager != manager {
			m.mutex.Unlock()
			return nil
		}
		m.mutex.Unlock()

		if !manager.TryUpdateMaintainer(from, generation) {
			log.Warn("drop stale maintainer bootstrap request",
				zap.String("changefeed", cfId.Name()),
				zap.String("from", from.String()),
				zap.Uint64("requestGeneration", generation),
				zap.Uint64("currentGeneration", manager.GetMaintainerEpoch()),
				zap.String("currentMaintainer", manager.GetMaintainerID().String()))
			return nil
		}

Comment on lines 389 to 410
m.mutex.Lock()
if manager, ok := m.dispatcherManagers[cfId]; ok {
if closed := manager.TryClose(req.Removed); closed {
delete(m.dispatcherManagers, cfId)
metrics.DispatcherManagerGauge.WithLabelValues(cfId.Keyspace(), cfId.Name()).Dec()
response.Success = true
manager.LockControl()
if manager.IsMaintainerRequestAllowed(from, req.Generation) {
if closed := manager.TryClose(req.Removed); closed {
delete(m.dispatcherManagers, cfId)
metrics.DispatcherManagerGauge.WithLabelValues(cfId.Keyspace(), cfId.Name()).Dec()
response.Success = true
} else {
response.Success = false
}
} else {
response.Success = false
log.Warn("drop stale maintainer close request",
zap.String("changefeed", cfId.Name()),
zap.String("from", from.String()),
zap.Uint64("requestGeneration", req.Generation),
zap.Uint64("currentGeneration", manager.GetMaintainerEpoch()),
zap.String("currentMaintainer", manager.GetMaintainerID().String()))
}
manager.UnlockControl()
}
m.mutex.Unlock()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Holding the orchestrator-wide lock m.mutex while waiting for the per-changefeed lock manager.LockControl() can cause head-of-line blocking. If a single changefeed's dispatcher manager is slow or blocked, all other changefeeds on this node will be blocked from bootstrapping or closing.

To avoid this, we should unlock m.mutex as soon as we retrieve the manager from m.dispatcherManagers, and then acquire manager.LockControl(). To prevent races with concurrent close/delete operations, we can re-verify under m.mutex that the manager is still the active one in m.dispatcherManagers before proceeding.

	m.mutex.Lock()
	manager, ok := m.dispatcherManagers[cfId]
	if !ok {
		m.mutex.Unlock()
		return response
	}
	m.mutex.Unlock()

	manager.LockControl()
	defer manager.UnlockControl()

	m.mutex.Lock()
	currentManager, stillExists := m.dispatcherManagers[cfId]
	if !stillExists || currentManager != manager {
		m.mutex.Unlock()
		response.Success = false
		return response
	}

	if manager.IsMaintainerRequestAllowed(from, req.Generation) {
		if closed := manager.TryClose(req.Removed); closed {
			delete(m.dispatcherManagers, cfId)
			metrics.DispatcherManagerGauge.WithLabelValues(cfId.Keyspace(), cfId.Name()).Dec()
			response.Success = true
		} else {
			response.Success = false
		}
	} else {
		log.Warn("drop stale maintainer close request",
			zap.String("changefeed", cfId.Name()),
			zap.String("from", from.String()),
			zap.Uint64("requestGeneration", req.Generation),
			zap.Uint64("currentGeneration", manager.GetMaintainerEpoch()),
			zap.String("currentMaintainer", manager.GetMaintainerID().String()))
	}
	m.mutex.Unlock()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-triage-completed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicate dispatcher can be created during maintainer failover before orphan dispatcher drains

1 participant