Skip to content

[WIP] columnar: Producer-Consumer Pipeline Read Model#10904

Open
JaySon-Huang wants to merge 1 commit into
pingcap:masterfrom
JaySon-Huang:pipeline_col
Open

[WIP] columnar: Producer-Consumer Pipeline Read Model#10904
JaySon-Huang wants to merge 1 commit into
pingcap:masterfrom
JaySon-Huang:pipeline_col

Conversation

@JaySon-Huang

@JaySon-Huang JaySon-Huang commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?


Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • New Features

    • Implemented next-generation columnar read pipeline with improved IO/CPU separation and task scheduling.
    • Added IO seek operation tracking for performance monitoring.
  • Documentation

    • Added design documentation for columnar pipeline producer-consumer architecture.
  • Chores

    • Updated build configuration to include new columnar storage components.
    • Updated Docker image version tags in test utilities.

Signed-off-by: JaySon-Huang <tshent@qq.com>
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 16, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign breezewish for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 16, 2026
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Introduces a producer-consumer pipeline for disaggregated columnar reads under ENABLE_NEXT_GEN_COLUMNAR. A new ColumnarReadSourceOp IO producer materializes columnar readers and serializes blocks into a SharedQueue; RNColumnarSourceOp consumes that queue on the CPU side. Reader prefetch switches from a detached background thread to PrefetchColumnarReaderTask submitted to the pipeline IO pool. StorageDisaggregatedColumnar wiring, shared notification types, IO seek counters, tests, and a design document are also added.

Changes

Columnar Producer-Consumer Pipeline

Layer / File(s) Summary
Shared data contracts and pipeline notification types
dbms/src/Storages/StorageDisaggregatedColumnar.h, dbms/src/Storages/Columnar/ColumnarReadSourceOp.h
Adds RNColumnarReaderNotifyFuture adapter, extends RNColumnarReaderWork with notify_future, adds tryAcquireReaderWork(bool), startAsyncMaterializeReader, tryGetReadyReader, and setPipelineExecutorContext to RNColumnarReadTask, adds createWithReader factory to RNColumnarInputStream, and defines ColumnarReadSourceState enum with ColumnarReadSourceOp class declaration and private members.
PrefetchColumnarReaderTask: IO-pool async reader materialization
dbms/src/Storages/Columnar/PrefetchColumnarReaderTask.h, dbms/src/Storages/Columnar/PrefetchColumnarReaderTask.cpp
Constructor stores read_task and reader_work; executeImpl throws LOGICAL_ERROR to enforce IO-only routing; executeIOImpl calls createColumnarReaderWithBackoff, transitions reader_work state to Ready or Failed under mutex, then notifies both cv and notify_future waiters; finalizeImpl is a no-op.
ColumnarReadSourceOp: IO producer state machine
dbms/src/Storages/Columnar/ColumnarReadSourceOp.cpp
Implements prefix/suffix lifecycle methods; readImpl dispatches on state (DONE, READY_BLOCK, else awaitImpl); consumeReadyReader wraps a ColumnarReaderPtr into RNColumnarInputStream; awaitImpl handles NEED_READER/WAIT_READER transitions with mutex-guarded RNColumnarReaderMaterializeState checks; executeIOImpl materializes readers and reads blocks from current_input_stream.
RNColumnarSourceOp: SharedQueue CPU consumer
dbms/src/Storages/Columnar/ColumnarSourceOp.h, dbms/src/Storages/Columnar/ColumnarSourceOp.cpp
Declares RNColumnarSourceOp with Options struct (exec context, req id, header, shared queue); readImpl maps tryPop outcomes to WAIT_FOR_NOTIFY, HAS_OUTPUT, or CANCELLED operator statuses.
StorageDisaggregatedColumnar pipeline wiring and reader-work management
dbms/src/Storages/StorageDisaggregatedColumnar.cpp
Rewrites readThroughColumnar to build an IO producer group (ColumnarReadSourceOp × N + SharedQueueSinkOp) and a CPU consumer group (RNColumnarSourceOp), or injects NullSourceOp for empty ranges; replaces detached thread prefetch with PrefetchColumnarReaderTask submission; adds tryGetReadyReader, startAsyncMaterializeReader, tryAcquireReaderWork(bool); adds RNColumnarInputStream::createWithReader; removes old RNColumnarSourceOp implementation.
dm_io_seek_count instrumentation
dbms/src/Storages/DeltaMerge/ScanContext.h, dbms/src/Storages/DeltaMerge/ScanContext.cpp, dbms/src/Storages/DeltaMerge/File/DMFileReader.cpp
Adds std::atomic<uint64_t> dm_io_seek_count to ScanContext, increments it at each substream seek in DMFileReader::readFromDisk, adds a debug log in readImpl, and exposes the counter in toJson().
Unit tests, design document, and auxiliary updates
dbms/src/Storages/tests/gtest_storage_disaggregated_columnar.cpp, docs/design/2026-06-13-columnar-pipeline-producer-consumer-model.md, dbms/CMakeLists.txt, tests/docker/next-gen-utils/Makefile, .gitignore
Adds gtests covering RNColumnarReaderWork init state, null-source profile recording, RNColumnarSourceOp queue reads, and WAIT_FOR_NOTIFY/CANCELLED transitions; adds a design document for the producer-consumer model; registers dbms/src/Storages/Columnar in CMake; updates Docker image tags from next-gen to nextgen; adds .gitignore entry.

Sequence Diagram(s)

sequenceDiagram
  participant PipelineBuilder as StorageDisaggregated
  participant ColumnarReadSourceOp as ColumnarReadSourceOp (IO)
  participant PrefetchColumnarReaderTask as PrefetchTask (IO pool)
  participant RNColumnarReaderWork as ReaderWork
  participant SharedQueue as SharedQueue
  participant RNColumnarSourceOp as RNColumnarSourceOp (CPU)

  rect rgba(100, 150, 220, 0.5)
    note over PipelineBuilder: Pipeline construction
    PipelineBuilder->>ColumnarReadSourceOp: add N producers
    PipelineBuilder->>SharedQueue: create bounded queue
    PipelineBuilder->>RNColumnarSourceOp: add consumer
  end

  rect rgba(220, 140, 60, 0.5)
    note over ColumnarReadSourceOp: executeIOImpl — reader materialization
    ColumnarReadSourceOp->>PrefetchColumnarReaderTask: submit to IO pool (startAsyncMaterializeReader)
    PrefetchColumnarReaderTask->>RNColumnarReaderWork: createColumnarReaderWithBackoff → state=Ready
    PrefetchColumnarReaderTask->>RNColumnarReaderWork: notify_future.notifyAll()
    ColumnarReadSourceOp->>RNColumnarReaderWork: awaitImpl sees Ready, consumeReadyReader
    ColumnarReadSourceOp->>SharedQueue: push blocks via SharedQueueSinkOp
  end

  rect rgba(60, 180, 100, 0.5)
    note over RNColumnarSourceOp: readImpl — queue consumption
    RNColumnarSourceOp->>SharedQueue: tryPop(block)
    alt READY
      SharedQueue-->>RNColumnarSourceOp: HAS_OUTPUT
    else EMPTY
      SharedQueue-->>RNColumnarSourceOp: WAIT_FOR_NOTIFY
    else FINISHED
      SharedQueue-->>RNColumnarSourceOp: HAS_OUTPUT (empty block = EOF)
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • JinheLin
  • Lloyd-Pottiger
  • yongman

Poem

🐇 Hop, hop — no more detached threads to chase,
The IO pool now sets the reading pace.
A queue sits snug between producer and consumer's paw,
Backpressure balanced, no lost-wakeup flaw.
WAIT_FOR_NOTIFY hops safely to HAS_OUTPUT's grace —
This rabbit approves the columnar pipeline's new face! 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is essentially a blank template with no substantive content: issue/problem statement missing, implementation details empty, test approach unspecified, and side effects/documentation unmarked. Complete all required sections: provide issue number, explain the problem being solved, describe what changed and why, specify test coverage, document side effects and impacts, and provide a meaningful release note.
Docstring Coverage ⚠️ Warning Docstring coverage is 11.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly describes the main change: implementing a producer-consumer pipeline read model for columnar functionality, which matches the substantial code additions across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot

ti-chi-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (1)

1231-1247: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Do not transition to Creating when no pipeline executor is available.

Line 1235 sets reader_work->state = Creating before Line 1246 checks exec_context. In stream mode (exec_context == nullptr), this leaves the work stuck in Creating with no scheduled task, and subsequent getOrCreateReader() can block forever.

Proposed fix
 void RNColumnarReadTask::prefetchReaderWork(const RNColumnarReaderWorkPtr & reader_work)
 {
     RUNTIME_CHECK(reader_work != nullptr);

+    // Stream path has no pipeline scheduler; keep work in NotStarted so inline creation can proceed.
+    if (exec_context == nullptr)
+        return;
+
     {
         auto guard = std::lock_guard(reader_work->mutex);
         if (reader_work->state != RNColumnarReaderMaterializeState::NotStarted)
             return;
         reader_work->state = RNColumnarReaderMaterializeState::Creating;
     }

     const auto region_id = reader_work->plan.region_id;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 1231 - 1247,
The state transition to Creating occurs before checking whether exec_context is
available. In stream mode where exec_context is nullptr, this leaves the work
stuck in Creating state with no scheduled task to complete it. Move the
exec_context nullptr check to occur before the state transition to Creating
(before line 1235 where reader_work->state =
RNColumnarReaderMaterializeState::Creating is set), so that the function returns
early without changing state when there is no pipeline executor available.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Storages/Columnar/ColumnarReadSourceOp.cpp`:
- Around line 176-190: The issue is that both NotStarted and Creating states are
triggering inline materialization, causing concurrent reader creation in
prefetch tasks and source IO paths. Fix this by only materializing when the
state is NotStarted, not when it's already Creating (which indicates
materialization is already in progress). In the switch statement around line 177
in ColumnarReadSourceOp.cpp, remove the Creating case from triggering
should_materialize or only set should_materialize = true for the NotStarted
case. Apply the same logic fix at the sibling location around lines 227-236 to
ensure Creating state is not treated as a trigger for inline materialization at
any point in the code.

In `@dbms/src/Storages/DeltaMerge/ScanContext.h`:
- Line 51: The dm_io_seek_count atomic counter member variable is defined in the
ScanContext class but is not integrated into the serialization and aggregation
operations, causing loss of I/O seek instrumentation data in distributed
queries. Add handling for dm_io_seek_count in the deserialize() method (around
line 180) to read the value from the tipb::TiFlashScanContext protobuf message,
in the serialize() method (around line 269) to write the value to the protobuf
message, in the merge(const ScanContext&) overload (around line 359) to
aggregate counters from another ScanContext instance, and in the merge(const
tipb::TiFlashScanContext&) overload (around line 454) to aggregate from a
protobuf message. Additionally, verify that the protobuf definition for
tipb::TiFlashScanContext includes a dm_io_seek_count field; if it does not
exist, update the .proto file to add this field before implementing the C++
changes.

---

Outside diff comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 1231-1247: The state transition to Creating occurs before checking
whether exec_context is available. In stream mode where exec_context is nullptr,
this leaves the work stuck in Creating state with no scheduled task to complete
it. Move the exec_context nullptr check to occur before the state transition to
Creating (before line 1235 where reader_work->state =
RNColumnarReaderMaterializeState::Creating is set), so that the function returns
early without changing state when there is no pipeline executor available.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cdb4af6f-e15d-48cc-8056-f6c6a72161f6

📥 Commits

Reviewing files that changed from the base of the PR and between 044990a and bb2b137.

📒 Files selected for processing (16)
  • .gitignore
  • dbms/CMakeLists.txt
  • dbms/src/Storages/Columnar/ColumnarReadSourceOp.cpp
  • dbms/src/Storages/Columnar/ColumnarReadSourceOp.h
  • dbms/src/Storages/Columnar/ColumnarSourceOp.cpp
  • dbms/src/Storages/Columnar/ColumnarSourceOp.h
  • dbms/src/Storages/Columnar/PrefetchColumnarReaderTask.cpp
  • dbms/src/Storages/Columnar/PrefetchColumnarReaderTask.h
  • dbms/src/Storages/DeltaMerge/File/DMFileReader.cpp
  • dbms/src/Storages/DeltaMerge/ScanContext.cpp
  • dbms/src/Storages/DeltaMerge/ScanContext.h
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp
  • dbms/src/Storages/StorageDisaggregatedColumnar.h
  • dbms/src/Storages/tests/gtest_storage_disaggregated_columnar.cpp
  • docs/design/2026-06-13-columnar-pipeline-producer-consumer-model.md
  • tests/docker/next-gen-utils/Makefile

Comment on lines +176 to +190
case RNColumnarReaderMaterializeState::NotStarted:
case RNColumnarReaderMaterializeState::Creating:
current_reader_work->state = RNColumnarReaderMaterializeState::Creating;
should_materialize = true;
break;
}
}

if (taken_reader.has_value())
{
consumeReadyReader(std::move(taken_reader.value()));
return OperatorStatus::IO_IN;
}
if (should_materialize)
return OperatorStatus::IO_IN;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid double materialization when a work item is already Creating.

In Line 177 and Line 227, Creating is handled as “materialize inline”, so a prefetched work can be created twice concurrently (prefetch task + source IO path). This duplicates remote reader creation and can drop the losing reader result.

Suggested direction
// In awaitImpl(), NEED_READER branch:
-    case RNColumnarReaderMaterializeState::NotStarted:
-    case RNColumnarReaderMaterializeState::Creating:
-        current_reader_work->state = RNColumnarReaderMaterializeState::Creating;
-        should_materialize = true;
-        break;
+    case RNColumnarReaderMaterializeState::NotStarted:
+        current_reader_work->state = RNColumnarReaderMaterializeState::Creating;
+        should_materialize = true;
+        break;
+    case RNColumnarReaderMaterializeState::Creating:
+        state = ColumnarReadSourceState::WAIT_READER;
+        setNotifyFuture(&current_reader_work->notify_future);
+        return OperatorStatus::WAIT_FOR_NOTIFY;
// In executeIOImpl(), NEED_READER/WAIT_READER branch:
// only inline-create when this operator owns the NotStarted -> Creating transition.

Also applies to: 227-236

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/Columnar/ColumnarReadSourceOp.cpp` around lines 176 - 190,
The issue is that both NotStarted and Creating states are triggering inline
materialization, causing concurrent reader creation in prefetch tasks and source
IO paths. Fix this by only materializing when the state is NotStarted, not when
it's already Creating (which indicates materialization is already in progress).
In the switch statement around line 177 in ColumnarReadSourceOp.cpp, remove the
Creating case from triggering should_materialize or only set should_materialize
= true for the NotStarted case. Apply the same logic fix at the sibling location
around lines 227-236 to ensure Creating state is not treated as a trigger for
inline materialization at any point in the code.

std::atomic<uint64_t> dmfile_lm_filter_scanned_rows{0};
std::atomic<uint64_t> dmfile_lm_filter_skipped_rows{0};
std::atomic<uint64_t> total_dmfile_read_time_ns{0};
std::atomic<uint64_t> dm_io_seek_count{0};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify protobuf contract for dm_io_seek_count
# Look for the .proto definition file and check if dm_io_seek_count field exists

# Search for .proto files that define TiFlashScanContext
find . -name "*.proto" -type f | xargs grep -l "TiFlashScanContext" 2>/dev/null

# Then check if dm_io_seek_count is defined in those files
fd -e proto | xargs grep -n "dm_io_seek_count" 2>/dev/null

Repository: pingcap/tiflash

Length of output: 43


🏁 Script executed:

#!/bin/bash
# First, locate the ScanContext files
find . -name "ScanContext.*" -type f

Repository: pingcap/tiflash

Length of output: 151


🏁 Script executed:

#!/bin/bash
# Find .proto files that might define TiFlashScanContext
find . -name "*.proto" -type f | head -20

Repository: pingcap/tiflash

Length of output: 1045


🏁 Script executed:

#!/bin/bash
# Search for TiFlashScanContext in the entire codebase
rg "TiFlashScanContext" --type cpp --type h -l

Repository: pingcap/tiflash

Length of output: 147


🏁 Script executed:

#!/bin/bash
# Search for TiFlashScanContext in .proto files
rg "TiFlashScanContext" --type proto

Repository: pingcap/tiflash

Length of output: 89


🏁 Script executed:

#!/bin/bash
# Read the ScanContext.h file to check line 51 and the serialize/deserialize/merge methods
wc -l ./dbms/src/Storages/DeltaMerge/ScanContext.h

Repository: pingcap/tiflash

Length of output: 108


🏁 Script executed:

#!/bin/bash
# Look for proto files that might contain ScanContext
find . -name "*.proto" -type f -exec grep -l "ScanContext\|scan_context" {} \;

Repository: pingcap/tiflash

Length of output: 41


🏁 Script executed:

#!/bin/bash
# Read the ScanContext.h file - focus on the key sections
cat -n ./dbms/src/Storages/DeltaMerge/ScanContext.h | head -100

Repository: pingcap/tiflash

Length of output: 4890


🏁 Script executed:

#!/bin/bash
# Check around line 51 where the new member is added
cat -n ./dbms/src/Storages/DeltaMerge/ScanContext.h | sed -n '40,70p'

Repository: pingcap/tiflash

Length of output: 1742


🏁 Script executed:

#!/bin/bash
# Check the serialize/deserialize methods around the mentioned lines
cat -n ./dbms/src/Storages/DeltaMerge/ScanContext.h | sed -n '170,190p'

Repository: pingcap/tiflash

Length of output: 1404


🏁 Script executed:

#!/bin/bash
# Check the serialize method around line 269
cat -n ./dbms/src/Storages/DeltaMerge/ScanContext.h | sed -n '260,290p'

Repository: pingcap/tiflash

Length of output: 2579


🏁 Script executed:

#!/bin/bash
# Check the merge methods around lines 359 and 454
cat -n ./dbms/src/Storages/DeltaMerge/ScanContext.h | sed -n '350,370p'

Repository: pingcap/tiflash

Length of output: 1460


🏁 Script executed:

#!/bin/bash
# Check the second merge method around line 454
cat -n ./dbms/src/Storages/DeltaMerge/ScanContext.h | sed -n '445,465p'

Repository: pingcap/tiflash

Length of output: 1547


🏁 Script executed:

#!/bin/bash
# Search for protobuf definitions in tipb directory
find . -name "*.proto" -path "*/tipb/*" -type f

Repository: pingcap/tiflash

Length of output: 41


🏁 Script executed:

#!/bin/bash
# Search for tipb proto files more broadly
find . -name "executor.proto" -o -name "tipb.proto" -o -name "*tipb*.proto" 2>/dev/null | head -20

Repository: pingcap/tiflash

Length of output: 41


🏁 Script executed:

#!/bin/bash
# Check if tipb is an external dependency - search for it in includes
grep -r "tipb/executor" --include="*.h" --include="*.hpp" | head -5

Repository: pingcap/tiflash

Length of output: 442


🏁 Script executed:

#!/bin/bash
# Check the entire deserialize method to see all fields being deserialized
cat -n ./dbms/src/Storages/DeltaMerge/ScanContext.h | sed -n '180,268p'

Repository: pingcap/tiflash

Length of output: 7267


🏁 Script executed:

#!/bin/bash
# Check the entire serialize method to see all fields being serialized
cat -n ./dbms/src/Storages/DeltaMerge/ScanContext.h | sed -n '269,357p'

Repository: pingcap/tiflash

Length of output: 7396


🏁 Script executed:

#!/bin/bash
# Get the complete merge(const ScanContext&) method
cat -n ./dbms/src/Storages/DeltaMerge/ScanContext.h | sed -n '359,453p'

Repository: pingcap/tiflash

Length of output: 5988


🏁 Script executed:

#!/bin/bash
# Get the complete merge(const tipb::TiFlashScanContext&) method
cat -n ./dbms/src/Storages/DeltaMerge/ScanContext.h | sed -n '454,584p'

Repository: pingcap/tiflash

Length of output: 8409


dm_io_seek_count member not wired into serialize/deserialize/merge operations.

The new dm_io_seek_count counter added to the header is missing from deserialize(), serialize(), and both merge() methods. This prevents the counter from being transmitted between instances, aggregated in distributed queries, or restored from remote snapshots, resulting in loss of I/O seek instrumentation data.

Add dm_io_seek_count handling to:

  1. deserialize() (line 180): deserialize from tipb::TiFlashScanContext
  2. serialize() (line 269): serialize to tipb::TiFlashScanContext
  3. merge(const ScanContext&) (line 359): aggregate the counter
  4. merge(const tipb::TiFlashScanContext&) (line 454): aggregate from proto

Also verify that the protobuf definition (tipb::TiFlashScanContext) includes a dm_io_seek_count field. If not, the .proto file must be updated.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/DeltaMerge/ScanContext.h` at line 51, The dm_io_seek_count
atomic counter member variable is defined in the ScanContext class but is not
integrated into the serialization and aggregation operations, causing loss of
I/O seek instrumentation data in distributed queries. Add handling for
dm_io_seek_count in the deserialize() method (around line 180) to read the value
from the tipb::TiFlashScanContext protobuf message, in the serialize() method
(around line 269) to write the value to the protobuf message, in the merge(const
ScanContext&) overload (around line 359) to aggregate counters from another
ScanContext instance, and in the merge(const tipb::TiFlashScanContext&) overload
(around line 454) to aggregate from a protobuf message. Additionally, verify
that the protobuf definition for tipb::TiFlashScanContext includes a
dm_io_seek_count field; if it does not exist, update the .proto file to add this
field before implementing the C++ changes.

@ti-chi-bot

ti-chi-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@JaySon-Huang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-next-gen bb2b137 link true /test pull-unit-next-gen
pull-integration-next-gen-columnar bb2b137 link true /test pull-integration-next-gen-columnar

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant