Skip to content

[feature](tso) Add global monotonically increasing Timestamp Oracle(TSO)#61199

Merged
morningman merged 35 commits intoapache:masterfrom
AntiTopQuark:support_tso
Apr 7, 2026
Merged

[feature](tso) Add global monotonically increasing Timestamp Oracle(TSO)#61199
morningman merged 35 commits intoapache:masterfrom
AntiTopQuark:support_tso

Conversation

@AntiTopQuark
Copy link
Copy Markdown
Contributor

@AntiTopQuark AntiTopQuark commented Mar 11, 2026

What problem does this PR solve?

Issue Number: close #61198

Related #57921

Problem Summary:

Release note

  • Implement a global monotonically increasing Timestamp Oracle (TSO) service that generates unique, monotonically increasing timestamps for transactions.
    The service calibrates its initial timestamp at startup and periodically updates it to maintain a time window.
    A TSO timestamp encodes the physical time and a logical counter; it is assembled and extracted by the new TSOTimestamp class.
  • Introduce TSOService, a master-only daemon that manages global timestamps.
    The service exposes two main methods:
    • getTSO() – returns a new TSO timestamp for transaction commits.
    • getCurrentTSO() – returns the current TSO without bumping the logical counter.
  • Add multiple configuration properties to control the behavior of the TSO feature:
    • experimental_enable_feature_tso – enables/disables the TSO feature.
    • tso_service_update_interval_ms – interval in milliseconds for the TSO service to update its window.
    • max_update_tso_retry_count and max_get_tso_retry_count – retry limits for updating and obtaining TSOs.
    • tso_service_window_duration_ms – length of the time window allocated by the TSO service.
    • tso_time_offset_debug_mode – debug offset for the physical time.
    • enable_tso_persist_journal and enable_tso_checkpoint_module – persistence switches for edit log and checkpoint.
  • Table property: Introduce enable_tso which can be configured in CREATE TABLE or modified via ALTER TABLE. Only tables with enable_tso = true generate commit TSO for transactions; when disabled, commit_tso remains -1.
  • Transaction and commit integration:
    • During commit, TransactionState now fetches a commit TSO from TSOService when TSO is enabled and stores it in the transaction state and TableCommitInfo.
    • The commit TSO is recorded per partition (via TPartitionVersionInfo.commit_tso), and is persisted with each rowset (see next item).
  • Rowset and meta changes:
    • Rowset::make_visible now accepts a commit_tso parameter and writes it to RowsetMeta.
    • RowsetMetaPB adds a new field commit_tso to persist commit timestamps.
    • information_schema.rowsets introduces a new column COMMIT_TSO allowing users to query the commit timestamp for each rowset.
    • Pending publish tasks, asynchronous publish tasks and other internal structures have been extended to carry commit TSO.
  • External interface:
    A new REST endpoint /api/tso is added for retrieving current TSO information. It returns a JSON payload containing:
    • window_end_physical_time – end of the current TSO time window.
    • current_tso – the current composed 64‑bit TSO.
    • current_tso_physical_time and current_tso_logical_counter – the decomposed physical and logical parts of the current TSO. This API does not increment the logical counter.
  • Metrics & observability:
    New metrics counters (e.g., tso_clock_drift_detected, tso_clock_backward_detected, tso_clock_calculated, tso_clock_updated) expose state and health of the TSO service.
  • Regression & unit tests:
    New unit tests verify TSOTimestamp bit manipulation, TSOService behavior, commit TSO propagation, and the /api/tso endpoint. Regression tests verify that rowset commit timestamps are populated when TSO is enabled and that the API returns increasing TSOs.

Impact and Compatibility

  • Experimental: the TSO feature is currently guarded by experimental_enable_feature_tso. It is disabled by default and can be enabled in front-end configuration. When enabled, old FE versions without this feature cannot replay edit log entries containing TSO operations; therefore upgrade all FEs before enabling.
  • Table compatibility: tables created before enabling TSO remain unaffected unless explicitly modified to set enable_tso to true. Tables with TSO enabled will produce commit TSO for each rowset and may require downstream consumers to handle the new commit_tso field.
  • Client API: clients can call /api/tso to inspect current TSO values. No existing API is modified.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@AntiTopQuark
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.45% (1798/2263)
Line Coverage 64.65% (32249/49881)
Region Coverage 65.60% (16145/24611)
Branch Coverage 56.06% (8611/15360)

@AntiTopQuark
Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot
Copy link
Copy Markdown

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.31% (1798/2267)
Line Coverage 64.65% (32290/49944)
Region Coverage 65.57% (16170/24659)
Branch Coverage 55.96% (8611/15388)

@AntiTopQuark
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.31% (1798/2267)
Line Coverage 64.63% (32279/49944)
Region Coverage 65.55% (16165/24659)
Branch Coverage 55.95% (8610/15388)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 44.34% (188/424) 🎉
Increment coverage report
Complete coverage report

@AntiTopQuark
Copy link
Copy Markdown
Contributor Author

run buildall

…TSO)

Signed-off-by: Jingzhe Jia <AntiTopQuark1350@outlook.com>
@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.31% (1798/2267)
Line Coverage 64.66% (32294/49944)
Region Coverage 65.59% (16173/24659)
Branch Coverage 55.97% (8612/15388)

@AntiTopQuark
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 44.34% (188/424) 🎉
Increment coverage report
Complete coverage report

Signed-off-by: Jingzhe Jia <AntiTopQuark1350@outlook.com>
@AntiTopQuark
Copy link
Copy Markdown
Contributor Author

run buildall

@AntiTopQuark
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.24% (1798/2269)
Line Coverage 64.53% (32281/50023)
Region Coverage 65.45% (16165/24699)
Branch Coverage 55.85% (8609/15414)

morningman
morningman previously approved these changes Apr 1, 2026
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

PR approved by at least one committer and no changes requested.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.53% (27200/36991)
Line Coverage 57.08% (292803/512955)
Region Coverage 54.17% (243064/448667)
Branch Coverage 55.99% (105727/188816)

Copy link
Copy Markdown
Contributor

@yujun777 yujun777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

gavinchou
gavinchou previously approved these changes Apr 1, 2026
@dataroaring
Copy link
Copy Markdown
Contributor

Test Coverage Analysis

The core TSO logic (TSOTimestamp + TSOService) has solid unit test coverage. However, the integration seams — where TSO meets the transaction commit path, DDL operations, BE publish pipeline, and edit log replay — are under-tested.

What's Well Covered

  • TSOTimestamp — 11 test methods: constructor, compose/extract, bit-width masking, negative validation, serialization round-trip, compareTo, MAX_LOGICAL_COUNTER.
  • TSOService — ~20 tests: getTSO error paths, runAfterCatalogReady, calibrateTimestamp (persist failure, clock backward, journal disabled, fatal flag reset), updateTimestamp, generateTSO, writeTimestampToBDBJE, save/load round-trip.
  • Commit TSO in transactions — TSO set when enableTso=true, remains -1 when enableTso=false.
  • Regression tests — REST API validation (monotonicity, composition) and end-to-end COMMIT_TSO in information_schema.rowsets.

Gaps

High Priority:

  1. No concurrent getTSO() test — TSOService uses ReentrantLock + AtomicLong, but no test verifies correctness under concurrent access. For a globally-monotonic TSO, this is critical.

  2. getCommitTSO() error paths untestedDatabaseTransactionMgr.getCommitTSO() has 5 branches but only 2 tested. Missing:

    • Config.enable_tso_feature = false (global disable)
    • TSOService throws exception during commit → TransactionCommitFailedException
    • TSOService returns <= 0
    • Mixed tables (some TSO-enabled, some not)
  3. No ALTER TABLE ... SET ("enable_tso" = "true/false") testModifyTablePropertiesOp (+16 lines) and PropertyAnalyzer (+23 lines) have zero test coverage. Regression tests only cover CREATE TABLE.

Medium Priority:

  1. TSOAction.java has no unit test — Error paths (FE not ready, non-master FE, exception) untested. Only happy path via regression.

  2. BE C++ plumbing largely untested — Changes span 10+ files propagating commit_tso. Specific gaps:

    • pb_convert.cpp — 4 new has_commit_tso() blocks, no test
    • schema_rowsets_scanner.cpp — COMMIT_TSO column fill, no BE unit test
    • DiscontinuousVersionTablet struct, no test
    • engine_publish_version_task.cpp — significant refactoring, no direct test
  3. Edit log replay pathreplayWindowEndTSO is tested, but the full path (deserialize from journal → dispatch via OP_TSO_TIMESTAMP_WINDOW_END → replay) is not.

  4. Metrics — 4 new counters (tso_clock_drift_detected, tso_clock_backward_detected, tso_clock_calculated, tso_clock_updated) never asserted in any test.

Low Priority:

  1. No regression test for TSO-disabled tables (verifying commit_tso stays -1).
  2. SchemaChangeHandler / CloudSchemaChangeHandler enable_tso validation — no test.
  3. No integration test for TSO state surviving FE restart/checkpoint recovery.

…riptions to ensure clarity of the descriptive information.
@AntiTopQuark
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.48% (1798/2291)
Line Coverage 64.15% (32276/50310)
Region Coverage 65.03% (16184/24886)
Branch Coverage 55.48% (8623/15542)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 58.53% (271/463) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 29077 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 754394412cdfae02e360cb91b2caa7f2553606f0, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17762	3667	3709	3667
q2	q3	10687	884	616	616
q4	4678	472	371	371
q5	7464	1342	1133	1133
q6	188	174	140	140
q7	939	949	766	766
q8	9309	1477	1317	1317
q9	5594	5305	5249	5249
q10	6251	2055	1761	1761
q11	477	278	279	278
q12	629	418	285	285
q13	18053	2793	2175	2175
q14	282	284	259	259
q15	q16	861	833	787	787
q17	1031	1204	788	788
q18	6531	5616	5617	5616
q19	1194	1266	1088	1088
q20	563	428	294	294
q21	4255	2540	2121	2121
q22	539	407	366	366
Total cold run time: 97287 ms
Total hot run time: 29077 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4870	4521	4663	4521
q2	q3	4603	4766	4185	4185
q4	2110	2211	1466	1466
q5	4912	4955	5187	4955
q6	212	178	138	138
q7	2009	1763	1606	1606
q8	3299	3047	3037	3037
q9	8621	8561	8283	8283
q10	4463	4465	4264	4264
q11	598	406	399	399
q12	668	712	497	497
q13	2695	3047	2374	2374
q14	291	290	277	277
q15	q16	767	813	673	673
q17	1362	1307	1215	1215
q18	7895	7107	6914	6914
q19	1260	1167	1132	1132
q20	2244	2246	1988	1988
q21	6122	5676	4820	4820
q22	560	507	428	428
Total cold run time: 59561 ms
Total hot run time: 53172 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 179179 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 754394412cdfae02e360cb91b2caa7f2553606f0, data reload: false

query5	4346	671	543	543
query6	344	243	211	211
query7	4207	557	349	349
query8	336	247	236	236
query9	8790	3952	3952	3952
query10	458	364	324	324
query11	6765	5487	5125	5125
query12	192	132	126	126
query13	1296	617	448	448
query14	5739	5230	4884	4884
query14_1	4201	4159	4189	4159
query15	227	206	184	184
query16	1046	489	457	457
query17	1151	796	642	642
query18	2438	491	369	369
query19	224	207	162	162
query20	138	134	127	127
query21	224	142	122	122
query22	14008	14665	14550	14550
query23	18296	17307	16711	16711
query23_1	16781	16764	16847	16764
query24	7453	1738	1355	1355
query24_1	1361	1377	1361	1361
query25	573	503	431	431
query26	1262	328	180	180
query27	2693	620	380	380
query28	4470	1918	1909	1909
query29	1016	742	581	581
query30	303	231	197	197
query31	1098	1044	958	958
query32	91	75	73	73
query33	561	372	308	308
query34	1221	1155	671	671
query35	763	795	688	688
query36	1244	1237	1056	1056
query37	159	114	89	89
query38	3088	3047	2958	2958
query39	920	919	855	855
query39_1	826	843	842	842
query40	237	156	139	139
query41	67	64	64	64
query42	114	112	110	110
query43	324	329	289	289
query44	
query45	212	200	189	189
query46	1121	1257	776	776
query47	2364	2314	2234	2234
query48	425	428	300	300
query49	671	560	457	457
query50	777	294	225	225
query51	4359	4305	4245	4245
query52	115	111	101	101
query53	259	284	211	211
query54	340	298	283	283
query55	102	101	90	90
query56	343	335	328	328
query57	1731	1631	1711	1631
query58	317	294	292	292
query59	2915	2956	2744	2744
query60	338	336	334	334
query61	161	147	152	147
query62	707	621	571	571
query63	244	201	194	194
query64	5332	1320	947	947
query65	
query66	1467	471	377	377
query67	24370	24112	24192	24112
query68	
query69	442	346	317	317
query70	1032	951	1015	951
query71	326	286	272	272
query72	3019	2767	2433	2433
query73	778	836	441	441
query74	9878	9708	9661	9661
query75	2763	2614	2311	2311
query76	2293	1154	787	787
query77	416	418	343	343
query78	11293	11407	10759	10759
query79	1523	1072	866	866
query80	842	568	522	522
query81	468	279	233	233
query82	1354	165	121	121
query83	365	296	285	285
query84	264	149	124	124
query85	910	493	454	454
query86	439	318	337	318
query87	3301	3214	3129	3129
query88	3641	2729	2721	2721
query89	447	387	351	351
query90	1889	179	180	179
query91	183	170	144	144
query92	80	74	69	69
query93	979	978	573	573
query94	608	337	270	270
query95	644	369	444	369
query96	994	794	312	312
query97	2692	2665	2592	2592
query98	246	227	228	227
query99	1075	1070	976	976
Total cold run time: 257995 ms
Total hot run time: 179179 ms

Copy link
Copy Markdown
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doris-robot
Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 37.23% (35/94) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.96% (20072/37898)
Line Coverage 36.55% (188590/515976)
Region Coverage 32.82% (146519/446423)
Branch Coverage 33.97% (64145/188853)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 69.40% (93/134) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.61% (27323/37117)
Line Coverage 57.19% (294200/514411)
Region Coverage 54.31% (244713/450570)
Branch Coverage 56.10% (106271/189445)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 77.94% (438/562) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.62% (27327/37117)
Line Coverage 57.20% (294249/514411)
Region Coverage 54.32% (244765/450570)
Branch Coverage 56.11% (106301/189445)

@morningman morningman merged commit 03290d2 into apache:master Apr 7, 2026
34 of 36 checks passed
iaorekhov-1980 pushed a commit to iaorekhov-1980/doris that referenced this pull request Apr 7, 2026
…SO) (apache#61199)

### What problem does this PR solve?

Issue Number: close apache#61198 


Related apache#57921


Problem Summary:

### Release note

- Implement a **global monotonically increasing Timestamp Oracle (TSO)**
service that generates unique, monotonically increasing timestamps for
transactions.
The service calibrates its initial timestamp at startup and periodically
updates it to maintain a **time window**.
A **TSO timestamp** encodes the physical time and a logical counter; it
is assembled and extracted by the new **TSOTimestamp** class.
- Introduce **TSOService**, a master-only daemon that manages global
timestamps.
   The service exposes two main methods:
  - `getTSO()` – returns a new TSO timestamp for transaction commits.
- `getCurrentTSO()` – returns the current TSO without bumping the
logical counter.
- Add multiple configuration properties to control the behavior of the
TSO feature:
- `experimental_enable_feature_tso` – enables/disables the TSO feature.
- `tso_service_update_interval_ms` – interval in milliseconds for the
TSO service to update its window.
- `max_update_tso_retry_count` and `max_get_tso_retry_count` – retry
limits for updating and obtaining TSOs.
- `tso_service_window_duration_ms` – length of the time window allocated
by the TSO service.
  - `tso_time_offset_debug_mode` – debug offset for the physical time.
- `enable_tso_persist_journal` and `enable_tso_checkpoint_module` –
persistence switches for edit log and checkpoint.
- **Table property**: Introduce `enable_tso` which can be configured in
`CREATE TABLE` or modified via `ALTER TABLE`. Only tables with
`enable_tso = true` generate commit TSO for transactions; when disabled,
commit_tso remains `-1`.
- **Transaction and commit integration**:
- During commit, `TransactionState` now fetches a commit TSO from
`TSOService` when TSO is enabled and stores it in the transaction state
and `TableCommitInfo`.
- The commit TSO is recorded per partition (via
`TPartitionVersionInfo.commit_tso`), and is persisted with each rowset
(see next item).
- **Rowset and meta changes**:
- `Rowset::make_visible` now accepts a `commit_tso` parameter and writes
it to `RowsetMeta`.
- `RowsetMetaPB` adds a new field `commit_tso` to persist commit
timestamps.
- `information_schema.rowsets` introduces a new column **`COMMIT_TSO`**
allowing users to query the commit timestamp for each rowset.
- Pending publish tasks, asynchronous publish tasks and other internal
structures have been extended to carry commit TSO.
- **External interface**:
A new REST endpoint `/api/tso` is added for retrieving current TSO
information. It returns a JSON payload containing:
  - `window_end_physical_time` – end of the current TSO time window.
  - `current_tso` – the current composed 64‑bit TSO.
- `current_tso_physical_time` and `current_tso_logical_counter` – the
decomposed physical and logical parts of the current TSO. This API does
**not** increment the logical counter.
- **Metrics & observability**:
New metrics counters (e.g., `tso_clock_drift_detected`,
`tso_clock_backward_detected`, `tso_clock_calculated`,
`tso_clock_updated`) expose state and health of the TSO service.
- **Regression & unit tests**:
New unit tests verify `TSOTimestamp` bit manipulation, `TSOService`
behavior, commit TSO propagation, and the `/api/tso` endpoint.
Regression tests verify that rowset commit timestamps are populated when
TSO is enabled and that the API returns increasing TSOs.

### Impact and Compatibility

- **Experimental**: the TSO feature is currently guarded by
`experimental_enable_feature_tso`. It is disabled by default and can be
enabled in front-end configuration. When enabled, old FE versions
without this feature cannot replay edit log entries containing TSO
operations; therefore upgrade all FEs before enabling.
- **Table compatibility**: tables created before enabling TSO remain
unaffected unless explicitly modified to set `enable_tso` to `true`.
Tables with TSO enabled will produce commit TSO for each rowset and may
require downstream consumers to handle the new `commit_tso` field.
- **Client API**: clients can call `/api/tso` to inspect current TSO
values. No existing API is modified.
Signed-off-by: Jingzhe Jia <AntiTopQuark1350@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/5.0.x kind/need-document meta-change reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add a global monotonically increasing timestamp service (TSO) for incremental computation in Doris