feat(fuzz): add egfx_surface_state state-machine oracle#1337
Open
Greg Lamberson (glamberson) wants to merge 2 commits into
Open
feat(fuzz): add egfx_surface_state state-machine oracle#1337Greg Lamberson (glamberson) wants to merge 2 commits into
Greg Lamberson (glamberson) wants to merge 2 commits into
Conversation
Implements target 5 of Devolutions#1316 (egfx fuzz-coverage umbrella): a multi-frame oracle that drives a sequence of Arbitrary-derived GfxPdus through a single GraphicsPipelineClient + surface-cache pair across each fuzz iteration, exposing cross-PDU state corruption that single-shot fuzzers cannot reach. Harness shape (per the design note posted as issuecomment-4559459227 on Devolutions#1316). The oracle: 1. Constructs one GraphicsPipelineClient (no-op handler, no H.264 decoder) at iteration start. The surface cache, state machine, and ZGFX decompressor history are owned by this single instance. 2. Initialises the channel by calling DvcProcessor::start so the client moves into its post-start state. 3. Uses arbitrary::Unstructured to derive Vec<GfxPdu> from the raw fuzz input. Each GfxPdu variant is Arbitrary via the cascade in PR Devolutions#1334. 4. For each PDU: encodes back to wire bytes via encode_vec, wraps in a single uncompressed ZGFX segment via wrap_uncompressed, and feeds the wire bytes to the client's public DvcProcessor::process entry point. Errors and panics propagate to libFuzzer naturally. The harness deliberately routes through the public process() entry rather than the private handle_pdu so the dispatch path exercises the same ZGFX-decompress + GfxPdu-decode + per-variant-dispatch + state machine + surface cache flow that production traffic takes. The uncompressed ZGFX wrapper bypasses the ZGFX decoder layer (already covered by egfx_zgfx_decompress in a separate PR) and concentrates fuzz pressure on the dispatch + state machine. What this catches: panics or sanitizer reports along the dispatch + state machine path under adversarially-ordered PDU sequences; inconsistent surface-cache state under attacker-controlled CreateSurface / DeleteSurface / Map* orderings; corrupted frame-id state from interleaved StartFrame / EndFrame / FrameAcknowledge sequences; ZGFX-wrapper integration bugs separate from the standalone ZGFX coverage. What this does NOT catch: cross-frame H.264 decoder state corruption. The client is constructed with h264_decoder: None, so H264-bearing PDUs (WireToSurface1 with AVC codecs) skip the H.264 decoder. The standalone egfx_avc420_decode and egfx_avc444_decode targets cover the H.264 wrapper. Wiring a real (or mock) H.264 decoder into this harness can be a follow-up if frame-to-frame H.264 state coverage surfaces as a gap. Bug discovery during development. The first smoke fuzz on the in-progress harness surfaced a bit_field::set_bits panic in Timestamp::encode where derive(Arbitrary) generated u8/u16 values exceeding the encoder's 6-bit / 10-bit pack widths. The fix landed in PR Devolutions#1334 (the cascade prerequisite) as manual Arbitrary impls for Timestamp, QuantQuality, and the Encoding bitflag, mapping each field to its wire-allowed range. This is the "sanitize via mask" shape from Devolutions#1122's body. Validation. 15-minute rigorous libFuzzer + ASan smoke fuzz on the final SHA: 3,659,157 iterations in 901 seconds (~4,061 exec/s sustained), peak RSS 96 MB, zero panics, zero sanitizer reports, zero OOMs. Corpus grew to ~4,558 entries via libFuzzer's coverage discovery. The regression-replay test check_egfx_multi_frame passes against the seed-empty corpus entry. Depends on PR Devolutions#1334 (Arbitrary cascade across ironrdp-egfx public PDU types). CI on this PR's branch reds until Devolutions#1334 merges or this PR's branch rebases onto master post-merge. That's the correct mechanical signal of the dependency per the no-exclude-then-follow-up rule. Refs Devolutions#1316.
Implements target 6 of Devolutions#1316 (egfx fuzz-coverage umbrella): the egfx_surface_state oracle. Sibling of egfx_multi_frame (target 5, PR Devolutions#1336) with two distinct properties: 1. The PDU stream is narrowed to a SurfaceLifecyclePdu enum that contains only the seven state-affecting variants: ResetGraphics, CreateSurface, DeleteSurface, MapSurfaceToOutput, StartFrame, EndFrame, FrameAcknowledge. Each iteration concentrates fuzz pressure on the surface and frame state machine rather than spreading across all 23 GfxPdu variants. 2. The oracle maintains a parallel ExpectedState model alongside the client. For each successfully-encoded PDU the model is updated to mirror the client's documented state transition (CreateSurface inserts a surface unless width or height is zero; DeleteSurface removes; MapSurfaceToOutput updates is_mapped and origin; ResetGraphics clears all surfaces; EndFrame increments the running frame counter). After dispatch the client's observable state via the public get_surface and total_frames_decoded getters is asserted against the model. This catches a different bug class from egfx_multi_frame's panic / sanitizer oracle: logic bugs in state transitions that produce wrong but non-crashing observable state. Examples covered: client retains a surface after DeleteSurface, client clobbers is_mapped on a surface that did not receive MapSurfaceToOutput, client fails to clear all surfaces on ResetGraphics, client increments total_frames_decoded on a PDU other than EndFrame. The model encodes the implementation's current documented behavior (e.g., zero-dimension CreateSurface is silently skipped per handle_create_surface), so the oracle catches drift from that behavior. Spec-compliance verification is a separate concern that would require a parallel spec-only model. What this does NOT catch: state transitions that exist only in private fields without a public getter (current_frame_id, frames_queued). Exposing observable access to those fields is a separate egfx-public-API change, out of scope here. Validation. 15-minute rigorous libFuzzer + ASan smoke fuzz on the final SHA: 2,770,751 iterations in 901 seconds (~3,075 exec/s sustained), peak RSS 102 MB, zero panics, zero sanitizer reports, zero state-model assertion failures. Corpus grew to ~1,840 entries via libFuzzer's coverage discovery. The check_egfx_surface_state regression-replay test passes against the seed-empty corpus entry. Stack dependency. This PR is stacked on PR Devolutions#1336 (egfx_multi_frame, target 5) which is stacked on PR Devolutions#1334 (Arbitrary cascade). CI on this branch will red until both ancestors merge or this branch rebases onto master post-merge. With this PR the Devolutions#1316 umbrella has its initial six target slots filled (1 merged, 2+2b+3+5+6 open). Refs Devolutions#1316. Depends on Devolutions#1334 and Devolutions#1336.
4939a82 to
e6e38d2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
egfx_surface_statestate-machine oracle. Sibling ofegfx_multi_frame(PR feat(fuzz): add egfx_multi_frame oracle and target #1336) with two distinct properties.SurfaceLifecyclePduenum that contains only the seven state-affecting variants (ResetGraphics,CreateSurface,DeleteSurface,MapSurfaceToOutput,StartFrame,EndFrame,FrameAcknowledge). Each iteration concentrates fuzz pressure on the surface and frame state machine rather than spreading across all 23GfxPduvariants.ExpectedStatestruct mirrors the client's documented per-surface fields (id,width,height,pixel_format,is_mapped,output_origin_x/y) plus thetotal_frames_decodedcounter. For each successfully-encoded PDU the model is updated to mirror the client'shandle_*transitions; after dispatch the client's observable state via the publicget_surfaceandtotal_frames_decodedgetters is asserted against the model.What this catches
A different bug class from
egfx_multi_frame's panic / sanitizer oracle: logic bugs in state transitions that produce wrong but non-crashing observable state. Examples:DeleteSurface(model removed it, client did not).is_mappedon a surface that did not receiveMapSurfaceToOutput.ResetGraphics.total_frames_decodedon a PDU other thanEndFrame.The model encodes the implementation's current documented behavior (e.g.,
CreateSurfacewithwidth == 0orheight == 0is silently skipped perhandle_create_surface,total_frames_decodedis incremented unconditionally byhandle_end_frameeven when no matchingStartFramepreceded it). The oracle catches drift from that behavior. Spec-compliance verification is a separate concern.What this does NOT catch
State transitions that exist only in private fields without a public getter (
current_frame_id,frames_queued). Wiring observable access to those fields is a separate egfx-public-API change, out of scope here.Dependencies
This PR is stacked on PR #1336 (
egfx_multi_frame, target 5) which is stacked on PR #1334 (Arbitrary cascade). CI on this branch reds until both ancestors merge or this branch rebases onto master post-merge. That is the correct mechanical signal of the dependency per the no-exclude-then-follow-up rule.Validation
cargo xtask ciclean locally on1.89.0: fmt, lints, tests, typos, locks all green.check_egfx_surface_stateregression-replay test passes against the seed-empty corpus entry.Notes
SurfaceLifecyclePduenum lives oracle-private inside the function (not exported fromironrdp-egfx), keeping the egfx public surface clean per the fallback shape from the design discussion on egfx: extend fuzz coverage to OpenH264-wrapper, ZGFX, multi-frame state, and encoder round-trip #1316.as _imports forArbitraryandDvcProcessorper clippy'sunused_trait_nameslint, matching theegfx_multi_framestyle.#[expect(clippy::panic, reason = ...)]and# Panicsdoc section match the precedent frombulk_round_tripfor assertion-based oracles.With this PR the #1316 umbrella has its initial six target slots filled (1 merged, 2+2b+3+5+6 open).
Refs #1316. Depends on #1334 and #1336.