Skip to content

[feat][broker]:Support broker configuration for BookKeeper client TCP keep-alive options#25580

Merged
wolfstudy merged 2 commits intoapache:masterfrom
wolfstudy:support-broker-config-tcp-keepalive
Apr 27, 2026
Merged

[feat][broker]:Support broker configuration for BookKeeper client TCP keep-alive options#25580
wolfstudy merged 2 commits intoapache:masterfrom
wolfstudy:support-broker-config-tcp-keepalive

Conversation

@wolfstudy
Copy link
Copy Markdown
Member

Fixes #xyz

Motivation

Apache BookKeeper PR #4683 introduced TCP
keep-alive configuration options for the BookKeeper client, namely tcpKeepIdle,
tcpKeepIntvl and tcpKeepCnt. This change has been released in BookKeeper 4.17.3.

Now that the BookKeeper client exposes these TCP keep-alive tuning knobs, the Pulsar Broker should also expose matching
configuration entries, so that operators can tune the TCP keep-alive behavior of the broker's
BookKeeper client without having to patch code or rely on OS-wide defaults.

This is particularly useful in the following scenarios:

  • Detecting and recovering from half-open connections (e.g. Bookie host crash, NAT/LB silently
    dropping idle connections) faster than the OS default (which is typically 2 hours on Linux).
  • Environments where network devices aggressively close idle TCP connections, causing the
    broker's BookKeeper client to unexpectedly hit broken connections under low traffic.
  • Fine-grained tuning of keep-alive probes to balance between fast failure detection and
    extra network overhead.

Modifications

Added three new broker configuration entries under ServiceConfiguration, and wired them
through BookKeeperClientFactoryImpl into the underlying BookKeeper ClientConfiguration:

Broker config BookKeeper client setter Default Semantics
tcpKeepAliveTimeSeconds setTcpKeepIdle 0 Idle time before the first keep-alive probe is sent (seconds).
tcpKeepAliveIntervalSeconds setTcpKeepIntvl 0 Interval between subsequent keep-alive probes (seconds).
tcpKeepAliveProbeCount setTcpKeepCnt 0 Number of unacknowledged probes before the connection is considered dead.

Behavior details:

  • The default value 0 means "do not override" — the broker will fall back to the BookKeeper
    client default (-1), which in turn falls back to the OS-level TCP keep-alive settings.
  • A value is only applied to the BookKeeper ClientConfiguration when it is strictly greater
    than 0, so existing deployments are fully backward compatible.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Added BookKeeperClientFactoryImplTest#testSetTcpKeepAliveConfiguration, which asserts:
    • When none of the new configs are set, the resulting BookKeeper ClientConfiguration
      returns the BookKeeper defaults (getTcpKeepIdle() == -1, getTcpKeepIntvl() == -1,
      getTcpKeepCnt() == -1).
    • When the new configs are set (e.g. 60 / 10 / 5), the corresponding values are correctly
      forwarded to ClientConfiguration (getTcpKeepIdle() == 60, getTcpKeepIntvl() == 10,
      getTcpKeepCnt() == 5).

Does this pull request potentially affect one of the following parts:

  • The public API — added three new broker configuration entries
    (tcpKeepAliveTimeSeconds, tcpKeepAliveIntervalSeconds, tcpKeepAliveProbeCount).
    These are additive and default to 0 (no behavior change unless explicitly configured),
    so existing deployments are not affected.

Documentation

  • doc-not-needed
    The new fields carry inline @FieldContext documentation that is automatically surfaced
    in the generated broker reference documentation.

Signed-off-by: xiaolongran <xiaolongran@tencent.com>
@wolfstudy wolfstudy self-assigned this Apr 27, 2026
@wolfstudy wolfstudy requested a review from liudezhi2098 April 27, 2026 07:36
@wolfstudy wolfstudy changed the title [improve][broker] Support broker configuration for BookKeeper client TCP keep-alive options feat(broker):Support broker configuration for BookKeeper client TCP keep-alive options Apr 27, 2026
@wolfstudy wolfstudy changed the title feat(broker):Support broker configuration for BookKeeper client TCP keep-alive options [feat][broker]:Support broker configuration for BookKeeper client TCP keep-alive options Apr 27, 2026
Copy link
Copy Markdown
Contributor

@hanmz hanmz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The settings in ServiceConfiguration are specific to Pulsar.
There's already a solution to configure BookKeeper client options by prefixing the options with bookkeeper_ in broker.conf. For Pulsar Helm chart, that would be PULSAR_PREFIX_bookkeeper_.

pulsar/conf/broker.conf

Lines 1253 to 1257 in 33fe755

# You can add other configuration options for the BookKeeper client
# by prefixing them with "bookkeeper_". These configurations are applied
# to all bookkeeper clients started by the broker (including the managed ledger bookkeeper clients as well as
# the BookkeeperPackagesStorage bookkeeper client), except the distributed log bookkeeper client.
# The dlog bookkeeper client is configured in the functions worker configuration file.

For example, this would be the way to configure BookKeeper client TCP keep-alive options for Pulsar Broker:

"PULSAR_PREFIX_bookkeeper_tcpKeepIdle": "300"
"PULSAR_PREFIX_bookkeeper_tcpKeepIntvl": "60"
"PULSAR_PREFIX_bookkeeper_tcpKeepCnt": "5"

It could be reasonable to set these as defaults (or commented out lines) in broker.conf (without PULSAR_PREFIX_) and document the settings. Could you please modify this PR to handle it this way?

@lhotari
Copy link
Copy Markdown
Member

lhotari commented Apr 27, 2026

Sidenote: there's been a proposal to add keepalive to Pulsar components, but that was rejected in #14841. For broker connections, we rely on Pulsar's application level keep-alive Ping/Pong commands. The gap in Ping/Pong was addressed in #15382.

@lhotari
Copy link
Copy Markdown
Member

lhotari commented Apr 27, 2026

Now that the BookKeeper client exposes these TCP keep-alive tuning knobs, the Pulsar Broker should also expose matching
configuration entries, so that operators can tune the TCP keep-alive behavior of the broker's
BookKeeper client without having to patch code or rely on OS-wide defaults.

To solve the issues where connections stall, it's usually necessary to configure TCP keep-alive on both sides: the client and the server. Keep-alive is enabled by default for BookKeeper client and server, but they do rely on OS defaults.

It will anyways be necessary to tune OS defaults since the BookKeeper server doesn't have a way to configure the keep-alive parameters. In cloud managed k8s nodes, the settings have reasonable values by default at least in GCP.

# sysctl -a |grep keepalive
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_time = 300

@wolfstudy
Copy link
Copy Markdown
Member Author

Sidenote: there's been a proposal to add keepalive to Pulsar components, but that was rejected in #14841. For broker connections, we rely on Pulsar's application level keep-alive Ping/Pong commands. The gap in Ping/Pong was addressed in #15382.

Thanks @lhotari reply, the two PRs mentioned here address a different issue than the one this current PR aims to resolve.
The primary objective of the current PR is to resolve connection interruptions that occur in physical machine deployment environments. These interruptions stem from the fact that Bookkeeper and Broker instances are deployed on separate physical nodes, and—due to intervening firewall restrictions—rely on the operating system's default keep-alive settings, which have an excessively long timeout duration.

@lhotari
Copy link
Copy Markdown
Member

lhotari commented Apr 27, 2026

Sidenote: there's been a proposal to add keepalive to Pulsar components, but that was rejected in #14841. For broker connections, we rely on Pulsar's application level keep-alive Ping/Pong commands. The gap in Ping/Pong was addressed in #15382.

Thanks @lhotari reply, the two PRs mentioned here address a different issue than the one this current PR aims to resolve. The primary objective of the current PR is to resolve connection interruptions that occur in physical machine deployment environments. These interruptions stem from the fact that Bookkeeper and Broker instances are deployed on separate physical nodes, and—due to intervening firewall restrictions—rely on the operating system's default keep-alive settings, which have an excessively long timeout duration.

Yes, that was a sidenote. Please check what I provided in the example. It's already possible to tune the keep-alive for BookKeeper client. In addition, you will need to anyways tune BookKeeper server separately to complete the solution (the BookKeeper server side can only be tuned with OS level parameters).

Configuring BookKeeper client options in a Pulsar broker k8s deployment:

"PULSAR_PREFIX_bookkeeper_tcpKeepIdle": "300"
"PULSAR_PREFIX_bookkeeper_tcpKeepIntvl": "60"
"PULSAR_PREFIX_bookkeeper_tcpKeepCnt": "5"

The bookkeeper_ prefix solution was added in #9232 and documented in #15818. Please read my comments in the review.

@wolfstudy
Copy link
Copy Markdown
Member Author

Now that the BookKeeper client exposes these TCP keep-alive tuning knobs, the Pulsar Broker should also expose matching
configuration entries, so that operators can tune the TCP keep-alive behavior of the broker's
BookKeeper client without having to patch code or rely on OS-wide defaults.

To solve the issues where connections stall, it's usually necessary to configure TCP keep-alive on both sides: the client and the server. Keep-alive is enabled by default for BookKeeper client and server, but they do rely on OS defaults.

It will anyways be necessary to tune OS defaults since the BookKeeper server doesn't have a way to configure the keep-alive parameters. In cloud managed k8s nodes, the settings have reasonable values by default at least in GCP.

# sysctl -a |grep keepalive
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_time = 300

Thanks for the context! A quick clarification on a couple of points:

  1. BookKeeper server now exposes keep-alive tuning. Since BookKeeper 4.17.3 (via Supports configuring TCP Keepalive related parameters in Bookie Client. bookkeeper#4683), both the BK client and the bookie server expose tcpKeepIdle, tcpKeepIntvl, and tcpKeepCnt as configuration options. Pulsar already pulls in BookKeeper 4.17.3, so tuning can be done at the application level on both ends — we no longer have to rely on OS defaults for the bookie server side.
  2. Relying on OS defaults is not always viable. The typical Linux kernel defaults are tcp_keepalive_time=7200s (2h), tcp_keepalive_intvl=75s, tcp_keepalive_probes=9, which means a broken connection can go undetected for ~2h 11min. That's too long for a messaging system where broker↔bookie liveness matters. While GCP GKE ships with saner defaults (time=300, intvl=60, probes=5), this is not guaranteed across other environments:
  • EKS / AKS / on-prem clusters often keep the 7200s default.
  • Tuning net.ipv4.tcp_keepalive_* via sysctl requires privileged pods or node-level DaemonSets, which many operators either cannot or don't want to deploy.
  • OS-level settings are global and affect every TCP socket on the node, whereas the BookKeeper-level settings are scoped to BK connections only.
  1. What this PR actually does. It doesn't add new Pulsar-specific config; it just documents how to forward these three BK client options through the existing bookkeeper_ prefix mechanism in broker.conf, so operators have a discoverable, Pulsar-idiomatic way to tune broker↔bookie keep-alive without needing node-level privileges or changing OS defaults. The defaults remain -1 (fall back to OS), so behavior is unchanged unless explicitly opted in.
    So the motivation isn't "OS defaults aren't enough"; it's "give operators a per-application, documented knob that works uniformly across environments, now that BookKeeper 4.17.3 makes this possible."

@wolfstudy
Copy link
Copy Markdown
Member Author

In addition, you will need to anyways tune BookKeeper server separately to complete the solution (the BookKeeper server side can only be tuned with OS level parameters).

Yes, the client side is already fully tunable today via the existing bookkeeper_ prefix mechanism and
the bookie side has to be tuned at the OS level anyway.

The primary goal of this PR is to address precisely the kind of logic you described.

However, it must be said that perhaps making the SO_KEEPALIVE setting on the Bookie Server side configurable as well would constitute a more complete implementation.

Signed-off-by: xiaolongran <xiaolongran@tencent.com>
Copy link
Copy Markdown
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari
Copy link
Copy Markdown
Member

lhotari commented Apr 27, 2026

2. Relying on OS defaults is not always viable. The typical Linux kernel defaults are tcp_keepalive_time=7200s (2h), tcp_keepalive_intvl=75s, tcp_keepalive_probes=9, which means a broken connection can go undetected for ~2h 11min. That's too long for a messaging system where broker↔bookie liveness matters. While GCP GKE ships with saner defaults (time=300, intvl=60, probes=5), this is not guaranteed across other environments:

  • EKS / AKS / on-prem clusters often keep the 7200s default.
  • Tuning net.ipv4.tcp_keepalive_* via sysctl requires privileged pods or node-level DaemonSets, which many operators either cannot or don't want to deploy.
  • OS-level settings are global and affect every TCP socket on the node, whereas the BookKeeper-level settings are scoped to BK connections only.

I think that we should update Pulsar documentation to recommend to adjust the TCP Keepalive OS defaults for Pulsar deployments. There shouldn't be a reason why they shouldn't be adjusted to values what GCP uses by default (time=300, intvl=60, probes=5). Besides BookKeeper, Zookeeper needs TCP keepalive settings. We do enable TCP keepalive for Zookeeper, but there isn't a way to tune the settings. To fully close the gap, there's also a need to cover ZooKeeper client and server besides the BookKeeper client and server.

@wolfstudy
Copy link
Copy Markdown
Member Author

  1. Relying on OS defaults is not always viable. The typical Linux kernel defaults are tcp_keepalive_time=7200s (2h), tcp_keepalive_intvl=75s, tcp_keepalive_probes=9, which means a broken connection can go undetected for ~2h 11min. That's too long for a messaging system where broker↔bookie liveness matters. While GCP GKE ships with saner defaults (time=300, intvl=60, probes=5), this is not guaranteed across other environments:
  • EKS / AKS / on-prem clusters often keep the 7200s default.
  • Tuning net.ipv4.tcp_keepalive_* via sysctl requires privileged pods or node-level DaemonSets, which many operators either cannot or don't want to deploy.
  • OS-level settings are global and affect every TCP socket on the node, whereas the BookKeeper-level settings are scoped to BK connections only.

I think that we should update Pulsar documentation to recommend to adjust the TCP Keepalive OS defaults for Pulsar deployments. There shouldn't be a reason why they shouldn't be adjusted to values what GCP uses by default (time=300, intvl=60, probes=5). Besides BookKeeper, Zookeeper needs TCP keepalive settings. We do enable TCP keepalive for Zookeeper, but there isn't a way to tune the settings. To fully close the gap, there's also a need to cover ZooKeeper client and server besides the BookKeeper client and server.

Good ideas, I will continue to push these matters forward following this PR.

@wolfstudy wolfstudy added area/broker type/feature The PR added a new feature or issue requested a new feature labels Apr 27, 2026
@wolfstudy wolfstudy merged commit ad114ad into apache:master Apr 27, 2026
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/broker type/feature The PR added a new feature or issue requested a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants