[feat][broker]:Support broker configuration for BookKeeper client TCP keep-alive options#25580
Conversation
Signed-off-by: xiaolongran <xiaolongran@tencent.com>
lhotari
left a comment
There was a problem hiding this comment.
The settings in ServiceConfiguration are specific to Pulsar.
There's already a solution to configure BookKeeper client options by prefixing the options with bookkeeper_ in broker.conf. For Pulsar Helm chart, that would be PULSAR_PREFIX_bookkeeper_.
Lines 1253 to 1257 in 33fe755
For example, this would be the way to configure BookKeeper client TCP keep-alive options for Pulsar Broker:
"PULSAR_PREFIX_bookkeeper_tcpKeepIdle": "300"
"PULSAR_PREFIX_bookkeeper_tcpKeepIntvl": "60"
"PULSAR_PREFIX_bookkeeper_tcpKeepCnt": "5"
It could be reasonable to set these as defaults (or commented out lines) in broker.conf (without PULSAR_PREFIX_) and document the settings. Could you please modify this PR to handle it this way?
To solve the issues where connections stall, it's usually necessary to configure TCP keep-alive on both sides: the client and the server. Keep-alive is enabled by default for BookKeeper client and server, but they do rely on OS defaults. It will anyways be necessary to tune OS defaults since the BookKeeper server doesn't have a way to configure the keep-alive parameters. In cloud managed k8s nodes, the settings have reasonable values by default at least in GCP. |
Thanks @lhotari reply, the two PRs mentioned here address a different issue than the one this current PR aims to resolve. |
Yes, that was a sidenote. Please check what I provided in the example. It's already possible to tune the keep-alive for BookKeeper client. In addition, you will need to anyways tune BookKeeper server separately to complete the solution (the BookKeeper server side can only be tuned with OS level parameters). Configuring BookKeeper client options in a Pulsar broker k8s deployment: The |
Thanks for the context! A quick clarification on a couple of points:
|
Yes, the client side is already fully tunable today via the existing bookkeeper_ prefix mechanism and The primary goal of this PR is to address precisely the kind of logic you described. However, it must be said that perhaps making the |
Signed-off-by: xiaolongran <xiaolongran@tencent.com>
I think that we should update Pulsar documentation to recommend to adjust the TCP Keepalive OS defaults for Pulsar deployments. There shouldn't be a reason why they shouldn't be adjusted to values what GCP uses by default (time=300, intvl=60, probes=5). Besides BookKeeper, Zookeeper needs TCP keepalive settings. We do enable TCP keepalive for Zookeeper, but there isn't a way to tune the settings. To fully close the gap, there's also a need to cover ZooKeeper client and server besides the BookKeeper client and server. |
Good ideas, I will continue to push these matters forward following this PR. |
Fixes #xyz
Motivation
Apache BookKeeper PR #4683 introduced TCP
keep-alive configuration options for the BookKeeper client, namely
tcpKeepIdle,tcpKeepIntvlandtcpKeepCnt. This change has been released in BookKeeper4.17.3.Now that the BookKeeper client exposes these TCP keep-alive tuning knobs, the Pulsar Broker should also expose matching
configuration entries, so that operators can tune the TCP keep-alive behavior of the broker's
BookKeeper client without having to patch code or rely on OS-wide defaults.
This is particularly useful in the following scenarios:
dropping idle connections) faster than the OS default (which is typically 2 hours on Linux).
broker's BookKeeper client to unexpectedly hit broken connections under low traffic.
extra network overhead.
Modifications
Added three new broker configuration entries under
ServiceConfiguration, and wired themthrough
BookKeeperClientFactoryImplinto the underlying BookKeeperClientConfiguration:tcpKeepAliveTimeSecondssetTcpKeepIdle0tcpKeepAliveIntervalSecondssetTcpKeepIntvl0tcpKeepAliveProbeCountsetTcpKeepCnt0Behavior details:
0means "do not override" — the broker will fall back to the BookKeeperclient default (
-1), which in turn falls back to the OS-level TCP keep-alive settings.ClientConfigurationwhen it is strictly greaterthan
0, so existing deployments are fully backward compatible.Verifying this change
This change added tests and can be verified as follows:
BookKeeperClientFactoryImplTest#testSetTcpKeepAliveConfiguration, which asserts:ClientConfigurationreturns the BookKeeper defaults (
getTcpKeepIdle() == -1,getTcpKeepIntvl() == -1,getTcpKeepCnt() == -1).60 / 10 / 5), the corresponding values are correctlyforwarded to
ClientConfiguration(getTcpKeepIdle() == 60,getTcpKeepIntvl() == 10,getTcpKeepCnt() == 5).Does this pull request potentially affect one of the following parts:
(
tcpKeepAliveTimeSeconds,tcpKeepAliveIntervalSeconds,tcpKeepAliveProbeCount).These are additive and default to
0(no behavior change unless explicitly configured),so existing deployments are not affected.
Documentation
doc-not-neededThe new fields carry inline
@FieldContextdocumentation that is automatically surfacedin the generated broker reference documentation.