Skip to content

fix(dgw): improve session-level tracing#1822

Merged
Benoît Cortier (CBenoit) merged 9 commits into
masterfrom
feat/dgw-session-observability
Jun 12, 2026
Merged

fix(dgw): improve session-level tracing#1822
Benoît Cortier (CBenoit) merged 9 commits into
masterfrom
feat/dgw-session-observability

Conversation

@irvingoujAtDevolution

Copy link
Copy Markdown
Contributor

What

Three small, behavior-preserving tracing/logging improvements that make Gateway session failures easier to triage from logs alone.

1. Session-id correlation for recording uploads

  • The /jet/jrec/push span now carries session_id, and a Recording push stream started event is emitted on connect.
  • Recording-upload log lines now correlate with the owning /jet/rdp session through a single session_id grep (the push span previously only had client=<ip>).

2. Distinguish a Gateway-initiated kill from a peer-driven end

  • Proxy::forward now logs Forwarding ended because the session was killed by the Gateway when the session is terminated via the kill path (recording-policy violation, max duration, ...), instead of the same Forwarding ended line used for a natural close.
  • Removes a recurring triage ambiguity where a policy kill was indistinguishable from a target-side reset.

3. Log the resolved target IP on RDCleanPath TLS handshake failure

  • When the TLS handshake to the target RDP server fails, log the resolved server_addr alongside the target hostname and error.
  • Lets us tell a wrong / split-horizon DNS resolution apart from a target-side reset without requiring a packet capture.

Why

These came out of recent support swarms around browser / web-client RDP: recording-required sessions getting killed when the recording stream drops, and target-side TLS resets where the resolved IP was the missing fact. In several of these the existing logs were correct but required expert correlation across multiple lines, or didn't surface the one fact needed.

Scope / risk

Tracing and logging only - no control-flow or behavioral change. cargo check and cargo clippy -p devolutions-gateway are clean.

Out of scope (possible follow-ups)

  • Heartbeat / health observability (gateway-side record of inbound heartbeats + a "last heartbeat received" surface).
  • Guaranteed startup-failure diagnostics written to the Windows Event Log.

Correlate recording uploads with their session: carry the session id on
the JREC push span and emit a "Recording push stream started" event, so
a /jet/jrec/push stream and its /jet/rdp session can be matched with a
single session-id grep (the push span previously only had client=<ip>).

Distinguish a Gateway-initiated session kill from a natural or
peer-driven end in the forwarding proxy. A kill (e.g. recording-policy
violation, max session duration) previously logged the same
"Forwarding ended" line as a normal close and was easily mistaken for a
target-side reset during triage.

Log the resolved peer address when the RDCleanPath TLS handshake to the
target RDP server fails. The bubbled error only carries the target
hostname; logging the IP we actually connected to distinguishes a
wrong/split-horizon DNS resolution from a target-side reset without a
packet capture.

Tracing/logging only; no behavior change.
@github-actions

Copy link
Copy Markdown

Let maintainers know that an action is required on their side

  • Add the label release-required Please cut a new release (Devolutions Gateway, Devolutions Agent, Jetsocat, PowerShell module) when you request a maintainer to cut a new release (Devolutions Gateway, Devolutions Agent, Jetsocat, PowerShell module)

  • Add the label release-blocker Follow-up is required before cutting a new release if a follow-up is required before cutting a new release

  • Add the label publish-required Please publish libraries (`Devolutions.Gateway.Utils`, OpenAPI clients, etc) when you request a maintainer to publish libraries (Devolutions.Gateway.Utils, OpenAPI clients, etc.)

  • Add the label publish-blocker Follow-up is required before publishing libraries if a follow-up is required before publishing libraries

Use a ForwardingOutcome enum (derives Debug, logged as a structured
field) instead of a true/false bool to record whether forwarding ended
naturally or was killed by the Gateway.

Carry session_id on the jmux span as well, the remaining WebSocket
side-channel handler that only had client= (jrec was fixed in the
previous commit). Both now match the session_id field already used by
the fwd / generic_client / rdp_proxy spans.
When the RDCleanPath TLS handshake to the target RDP server fails,
classify the failure from the error already in hand (server alert /
non-TLS response / reset before any response / other) and log it as a
structured field, alongside the resolved peer address.

This lets a crypto/cert/version mismatch (server TLS alert) be told
apart from a pre-ServerHello reset (in-path device, fingerprint
filtering, or target drop) and from a non-TLS response on the port - the
distinction that previously required a packet capture.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Devolutions Gateway’s session-level observability for support/triage by enriching tracing spans and adding targeted log lines around common failure points (recording uploads, proxy termination, and RDCleanPath TLS handshakes).

Changes:

  • Add session_id correlation to /jet/jrec/push and /jmux tracing spans, plus a “Recording push stream started” log event on connect.
  • Add a forwarding “outcome” signal to distinguish Gateway-initiated termination from natural/peer-driven ends.
  • Log the resolved target server_addr and a TLS handshake failure classification when RDCleanPath’s target TLS handshake fails.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
devolutions-gateway/src/recording.rs Adds a connect-time log event for recording push streams with session_id.
devolutions-gateway/src/api/jrec.rs Adds session_id to the jrec span for correlation across logs.
devolutions-gateway/src/api/jmux.rs Adds session_id to the jmux span for correlation across logs.
devolutions-gateway/src/rd_clean_path.rs Adds target TLS handshake failure classification + logs resolved server_addr on failure.
devolutions-gateway/src/proxy.rs Introduces ForwardingOutcome to distinguish completion vs kill-path termination in logs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread devolutions-gateway/src/proxy.rs
The TlsHandshakeFailure alert field was only read through a derived
Debug, which `cargo clippy --workspace --tests -- -D warnings` (the CI
lint) rejects as a never-read field.

Give the enum a manual Display impl (which reads the field, satisfying
the lint) and log it via Display, and carry the rustls AlertDescription
itself instead of a raw u8 so the alert reads as e.g. HandshakeFailure
rather than 40.
Addresses review feedback: the forwarding-end log collapsed both
outcomes into a single "Forwarding ended" message plus a structured
field. Restore a distinct message for the kill path so the
Gateway-initiated termination is obvious in plain-text logs and simple
greps, while keeping the structured `outcome` field for log processing.

@CBenoit Benoît Cortier (CBenoit) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment thread devolutions-gateway/src/api/jmux.rs Outdated
Comment thread devolutions-gateway/src/api/jrec.rs Outdated
Comment thread devolutions-gateway/src/proxy.rs Outdated
Comment thread devolutions-gateway/src/recording.rs Outdated
Comment thread devolutions-gateway/src/rd_clean_path.rs
Comment thread devolutions-gateway/src/proxy.rs Outdated
Co-authored-by: Benoît Cortier <3809077+CBenoit@users.noreply.github.com>
Co-authored-by: Benoît Cortier <3809077+CBenoit@users.noreply.github.com>
Co-authored-by: Benoît Cortier <3809077+CBenoit@users.noreply.github.com>
Builds on the accepted review suggestions (%session_id shorthand on the
jrec/jmux spans) and finishes the trim:

- Drop ForwardingOutcome: a Gateway-initiated kill is already logged with
  a more specific reason at the source (terminate endpoint, max session
  lifetime, recording policy), so the proxy-side distinction is redundant.
- Drop the "Recording push stream started" event: the session id is
  already in the request URL and on the jrec span.
- Drop the TLS handshake failure classification and the extra warn!: the
  error already bubbles up and is logged in full by the top-level error
  reporter. Instead carry the resolved server address in the TlsHandshake
  error so that log includes the IP, the one fact it did not already have.
@CBenoit Benoît Cortier (CBenoit) changed the title feat(dgw): improve session-level tracing for triage fix(dgw): improve session-level tracing Jun 12, 2026

@CBenoit Benoît Cortier (CBenoit) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CBenoit Benoît Cortier (CBenoit) merged commit 304ecf0 into master Jun 12, 2026
42 checks passed
@CBenoit Benoît Cortier (CBenoit) deleted the feat/dgw-session-observability branch June 12, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants