Skip to content

peer.hostname resolution flaws #11088

@psemeniuk

Description

@psemeniuk

Tracer Version(s)

1.58.0

Java Version(s)

21.0.9

JVM Vendor

Amazon Corretto

Bug Report

Hi,
we're experiencing a problem where we have traces with impossible relations between our services. Initial investigation revealed insonsistency in peer.hostname span attribute. I dug deeper and here are my findings:

  1. Tracer has implemented ip -> hostname resolution CACHE
  2. This cache may lead to wrong peer.hostname resolution under certain condition, when there are multiple domains under a single IP address. Which is exactly our case (services behind reverse proxy).
  3. THIS PR mitigates cache usage by reusing already done resolutions. But didn't worked for us, because:
    a) HOLDER_GET isn't loaded during static init.
    b) Initially i assumed it's because of cl variable is effectively resolves null - getClassloader() for classes loaded by Bootstrap ClassLoader (which is the case because of THIS). In the end, that wasn't the problem, because it turned out MethodHandles.method doesn't rely on classloader passed in its constructor. But issue with null cl it could be analyzed on your side anyway, because this method of retrieving the classloader might cause problems elsewhere.
    c) Ultimately the problem lay inside of MethodHandle.method where failed attempt of "enabling" reflection was swallowed by error handling and logged as debug log, which I didn't catch during my initial analysis:
[dd.trace 2026-04-10 16:54:47:569 +0200] [ioClientGroup-4-1] EXCLUDE_TELEMETRY datadog.trace.util.MethodHandles - Could not get method holder accepting [] from class class java.net.InetAddress
java.lang.reflect.InaccessibleObjectException: Unable to make java.net.InetAddress$InetAddressHolder java.net.InetAddress.holder() accessible: module java.base does not "opens java.net" to unnamed module @16b98e56
	at java.base/java.lang.reflect.AccessibleObject.throwInaccessibleObjectException(AccessibleObject.java:391)
	at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:367)
	at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:315)
	at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:203)
	at java.base/java.lang.reflect.Method.setAccessible(Method.java:197)
	at datadog.trace.util.MethodHandles.lambda$method$3(MethodHandles.java:156)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:319)
	at datadog.trace.util.MethodHandles.method(MethodHandles.java:141)
	at datadog.trace.bootstrap.instrumentation.java.net.HostNameResolver.<clinit>(HostNameResolver.java:23)
	at datadog.trace.bootstrap.instrumentation.decorator.BaseDecorator.onPeerConnection(BaseDecorator.java:137)
	at datadog.trace.bootstrap.instrumentation.decorator.BaseDecorator.onPeerConnection(BaseDecorator.java:123)
	at datadog.trace.instrumentation.netty41.client.HttpClientRequestTracingHandler.write(HttpClientRequestTracingHandler.java:89)
	at io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:891)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:875)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:984)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:868)
	at io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:305)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:891)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:875)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:984)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:868)
	at io.netty.handler.logging.LoggingHandler.write(LoggingHandler.java:288)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:891)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:956)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:982)
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:950)
	at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:1000)
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler.unbufferedWrite(HttpStreamsHandler.java:327)
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler.flushNext(HttpStreamsHandler.java:376)
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsHandler.write(HttpStreamsHandler.java:270)
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HttpStreamsClientHandler.write(HttpStreamsClientHandler.java:59)
  1. After adding --add-opens=java.base/java.net=ALL-UNNAMED now resolution works perfectly.

*I did indicate version 1.58, but looking at the code, I think the problem still exists in the latest versions

Related support tickets:
https://help.datadoghq.com/hc/en-us/requests/2443074
https://help.datadoghq.com/hc/en-us/requests/2495915

Expected Behavior

  1. Gather all required modules and add information about --add-opens to official docs. Currently there is no mention about this.
  2. During init add some warn logging in HostNameResolver about the inability to use getAlreadyResolvedHostName, because it's clearly undesirable behavior, which should be noted beyond the jungle of debug logs.
  3. Last but not least - reconsider the validity of the caching mechanism. In the era of widespread reverse proxies, depending on assumption that every domain has IP exclusively, leads to hours of head-scratching for your customers :) Maybe it's better not to resolve peer.hostname at all than to provide misleading data.
  4. As bonus - Recheck relying classloader resolution on classes from the JVM agent, because they may be referenced from bootstrap classloader (I'm not sure about that point, because I haven't looked into it in depth).

Reproduction Code

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugBug report and fix

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions