Skip to content

Fix startup NPE when a weakly dependent registry is unavailable (check=false)#16356

Open
AryamannSingh7 wants to merge 1 commit into
apache:3.3from
AryamannSingh7:fix/16178-null-registry-startup
Open

Fix startup NPE when a weakly dependent registry is unavailable (check=false)#16356
AryamannSingh7 wants to merge 1 commit into
apache:3.3from
AryamannSingh7:fix/16178-null-registry-startup

Conversation

@AryamannSingh7

Copy link
Copy Markdown

What is the purpose of the change?

Fixes #16178.

When a service subscribes to multiple registries and one of them is weakly dependent (check="false") and unavailable at startup, the application fails to start with:

java.lang.NullPointerException: Cannot invoke "org.apache.dubbo.registry.Registry.getUrl()" because "this.registry" is null
    at org.apache.dubbo.registry.ListenerRegistryWrapper.getUrl(ListenerRegistryWrapper.java:44)
    at org.apache.dubbo.registry.integration.RegistryDirectory.subscribe(RegistryDirectory.java:...)

This is a regression: Dubbo 3.2.03.2.4 start successfully, 3.2.5+ fail (also reproduced on 3.3.x).

Root cause

Under check=false, when a registry cannot be created (the registry is down), AbstractRegistryFactory#getRegistry swallows the exception and returns null. RegistryFactoryWrapper still wraps this null into a ListenerRegistryWrapper, so a null delegate is an expected stateListenerRegistryWrapper already guards register / unregister / subscribe with if (registry != null).

However:

  • getUrl / isAvailable / destroy / unsubscribe / isServiceDiscovery / lookup were not guarded.
  • RegistryDirectory#subscribe computes a metrics cluster name via registry.getUrl().getParameter(...) unconditionally. That line was introduced in Support multi registries metrics key #12582 (released in 3.2.5), which is exactly why the NPE appears from 3.2.5 onward.

What changed

  1. ListenerRegistryWrapper — complete the existing null-safety: getUrl returns null, isAvailable / isServiceDiscovery return false, destroy / unsubscribe become no-ops, and lookup returns an empty list when the delegate registry is null.
  2. RegistryDirectory#subscribe — guard the metrics cluster-name computation against a null registry URL (no cluster name is reported when the registry is unavailable). RegistryEvent#toSubscribeEvent already tolerates a null name.

Verification

Added ListenerRegistryWrapperTest#testNullRegistryIsTolerated, which reproduces the exact NPE without the fix and passes with it. :dubbo-registry-api tests and spotless:check pass locally (JDK 21).

Checklist

  • Make sure there is a GitHub issue field for the change.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test to verify your logic correction.
  • Make sure GitHub actions can pass.

…check=false

When a registry cannot be created and check=false (e.g. a weakly dependent
ZooKeeper is down at startup), AbstractRegistryFactory returns a null registry
which RegistryFactoryWrapper still wraps in a ListenerRegistryWrapper. The
wrapper already guards register/unregister/subscribe against a null delegate,
but getUrl/isAvailable/destroy/unsubscribe/isServiceDiscovery/lookup did not,
and RegistryDirectory#subscribe dereferences registry.getUrl() unconditionally
to build a metrics label (added in apache#12582), causing an NPE since 3.2.5.

Complete the null-safety in ListenerRegistryWrapper and guard the metrics
label computation in RegistryDirectory#subscribe so a service can still start
when a non-critical registry is unavailable.

Fixes apache#16178
@codecov-commenter

codecov-commenter commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 61.53846% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.85%. Comparing base (2626e28) to head (15d657b).

Files with missing lines Patch % Lines
...apache/dubbo/registry/ListenerRegistryWrapper.java 62.50% 1 Missing and 2 partials ⚠️
.../dubbo/registry/integration/RegistryDirectory.java 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##                3.3   #16356      +/-   ##
============================================
+ Coverage     60.81%   60.85%   +0.03%     
- Complexity       30    11762   +11732     
============================================
  Files          1953     1953              
  Lines         89213    89217       +4     
  Branches      13460    13460              
============================================
+ Hits          54254    54290      +36     
+ Misses        29374    29349      -25     
+ Partials       5585     5578       -7     
Flag Coverage Δ
integration-tests-java21 32.15% <30.76%> (-0.01%) ⬇️
integration-tests-java8 32.23% <30.76%> (+0.05%) ⬆️
samples-tests-java21 32.16% <30.76%> (+<0.01%) ⬆️
samples-tests-java8 29.74% <30.76%> (-0.08%) ⬇️
unit-tests-java11 59.05% <61.53%> (-0.01%) ⬇️
unit-tests-java17 58.56% <61.53%> (+<0.01%) ⬆️
unit-tests-java21 58.57% <61.53%> (+0.02%) ⬆️
unit-tests-java25 58.48% <61.53%> (-0.02%) ⬇️
unit-tests-java8 59.11% <66.66%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Service startup fails when a weakly dependent ZooKeeper is down

2 participants