Skip to content

tests: Introduce new fuzz tests & fix bugs found#36772

Draft
def- wants to merge 39 commits into
MaterializeInc:mainfrom
def-:pr-fuzz2
Draft

tests: Introduce new fuzz tests & fix bugs found#36772
def- wants to merge 39 commits into
MaterializeInc:mainfrom
def-:pr-fuzz2

Conversation

@def-
Copy link
Copy Markdown
Contributor

@def- def- commented May 28, 2026

Inspired by #36764 and Antithesis PoC

See individual commits

@def- def- changed the title tests: Introduce new fuzz tests tests: Introduce new fuzz tests & fix bugs found May 29, 2026
def- added 28 commits May 29, 2026 13:29
Five fuzz crates exercising the same property — bytes/SQL parse into a
Rust AST/value and survive a re-encode + re-parse losslessly:

  src/sql-parser/fuzz:
    - parse_pretty_roundtrip:   parser <-> sql-pretty
    - parse_display_roundtrip:  parser <-> AstDisplay
    - parse_expr_roundtrip:     parse_expr <-> AstDisplay
  src/expr/fuzz:
    - eval_error_proto_roundtrip
  src/repr/fuzz:
    - row_proto_roundtrip
    - row_codec_roundtrip
  src/storage-types/fuzz:
    - source_data_proto_roundtrip
    - dataflow_error_proto_roundtrip
  src/catalog-protos/fuzz:
    - catalog_objects_serde_roundtrip (serde-based, not prost)

Repo-wide runner at ci/test/cargo-fuzz.sh; nightly Buildkite step (main
only) runs it via the nightly ci-builder, which now preinstalls cargo-fuzz.
`X'...'` content is allowed to contain `'` (escaped as `''` by the
lexer), but the printer was emitting it verbatim — a value with an
embedded quote closed the literal prematurely and produced unparseable
output. Escape on the way out, mirroring `Value::String`.
The `.` token has very high precedence and both the lexer and parser
greedily extend adjacent tokens: `1.x` tokenizes the number `1.` and
leaves `x` as an alias, and `'a'::T.x` consumes `T.x` as a qualified
type name. So a receiver must look atomic on the way out — wrapped in
delimiters or self-terminating — or the dot reattaches to the receiver
on reparse and produces a different AST.

Add a `write_dot_receiver` helper that parenthesizes anything outside
a whitelist of atom-like exprs, and use it from `FieldAccess` and
`WildcardAccess` display.
…r keywords

Names like `position`, `extract`, `trim`, `substring`, `normalize`,
`map`, `array`, `nullif`, `exists`, `row`, `coalesce`, `greatest`,
`least` reach a special parser dispatch when followed by `(` —
`POSITION(<expr> IN <expr>)`, `MAP[K => V]`, etc. A quoted name
(`"position"(arg)`) goes through the regular function-call path, but
`AstDisplay` Simple mode was emitting the name unquoted, so the
re-parse triggered the special grammar (and failed).

Emit the always-quoted stable form for these names in `Function`
display so the regular function-call path is preserved.
`<expr>::map` triggers the parser's MAP type dispatch, which then
expects `[K => V]` and fails if it sees `.` or other syntax. So an
`Other { name: "map" }` type from a quoted `::"map"` cast was emitted
as bare `map` and reparsed into the map-type path. Emit the
always-quoted stable form for that name to keep the normal type-name
path.
Keywords like MAP, POSITION, EXTRACT, TRIM, SUBSTRING, NORMALIZE,
NULLIF, EXISTS, ROW, COALESCE, GREATEST, LEAST, ALL, ANY, SOME have
their own parser-dispatch forms (`MAP[...]`, `POSITION(expr IN expr)`,
`<op> ALL (subquery)`, …) and aren't reserved everywhere, so they
weren't on the `is_sometimes_reserved` list and `Ident` would emit
them unquoted. But unquoted in expression position those names re-
trigger the special grammar at parse time.

Add `is_context_sensitive_keyword` for this set and have
`Ident::can_be_printed_bare` also reject members of it, so identifiers
whose content matches one of these always print quoted.
`<left> <op> ANY/ALL (...)` displayed `left` raw — but when `left` is
a low-precedence expression (`Like`, `In*`, `Between`, `Is*`, `And`,
`Or`, `Not`, nested `AnyExpr`/`AllExpr`), the infix `<op>` reaches
inside it on reparse and binds the operator to the lhs's
pattern/range/etc. instead of to the lhs as a whole, producing a
different AST.

Add a `write_quantified_left` helper that parenthesizes these cases
and leaves atom-like lhs (incl. plain `Op`, which has its own
precedence handling) unwrapped.
`Decimal::from_packed_bcd` calls the C function `decPackedToNumber`,
which segfaults on empty bcd input. Reachable from untrusted proto
bytes (anyone able to send a `ProtoRow` could crash the process via a
`ProtoNumeric { bcd: [] }` datum). Reject the empty case before
descending into the FFI.
`push_range_with` returned a `Result`, but the proto decode path
unconditionally `.expect(...)`ed it — meaning any `ProtoRange` that
`push_range_with` rejects (e.g. lower > upper from an attacker-crafted
proto) panicked the process. Propagate the error to the caller via
`?`, matching the rest of the proto decode path.
`<CheckedTimestamp<_> as RustType<ProtoNaiveDateTime>>::from_proto`
constructed the struct directly (`Self { t: ... }`), bypassing the
range validation that `from_timestamplike` enforces. Out-of-range
values pushed into a `Row` cleanly, but `read_datum` then called
`from_timestamplike(...).expect(...)` while iterating and panicked —
reachable from untrusted proto bytes.

Go through `from_timestamplike` in `from_proto` so the value is
rejected at decode time.
Same shape as the `CheckedTimestamp` fix: `Date::from_proto`
constructed `Date { days: proto.days }` directly, bypassing the range
validation that `from_pg_epoch` enforces. Out-of-range days pushed
into a `Row` cleanly, then `read_datum` panicked on
`Date::from_pg_epoch(days).expect(...)` while iterating.

Go through `from_pg_epoch` in `from_proto` so the value is rejected
at decode time.
…o decoder

Four new fuzz targets covering high-blast-radius surfaces:

  src/repr/fuzz:
    - scalar_type_proto_roundtrip:     ProtoScalarType  <-> SqlScalarType
    - column_type_proto_roundtrip:     ProtoColumnType  <-> SqlColumnType
    - relation_desc_proto_roundtrip:   ProtoRelationDesc <-> RelationDesc
  src/avro/fuzz (new crate):
    - reader_decode: drive a `mz_avro::Reader` over arbitrary bytes
                     (Avro is the wire format for Kafka sources, so any
                     crash here is reachable from untrusted broker bytes)
Each fuzz crate has its own `[workspace]` (required by cargo-fuzz to
use nightly Rust without forcing the rest of the tree onto nightly),
so each maintains its own `target/` adjacent to its Cargo.toml. Some
build-script deps (notably `protobuf-native`) extract `.proto` files
into that tree — `buf` then picks them up and trips on them.

Exclude the fuzz target dirs both from buf's build scan (template) and
from `generate-buf-config.py`'s proto-file globbing (so the generated
`breaking.ignore` list also stays clean).
`SqlScalarType::{List,Record,Map}` from `ProtoScalarType` called
`x.custom_id.map(|id| id.into_rust().unwrap())` — a malformed
`ProtoCatalogItemId` inside any of those three variants panicked the
process. Reachable from untrusted proto bytes (an attacker-crafted
`ProtoRow` containing a list/record/map value with a bad custom_id).
Propagate via `transpose()?` instead.
The non-migration branch zipped `proto.names` and `proto.metadata`
via `zip_eq`, which panics on length mismatch — reachable from
untrusted proto bytes. Check the lengths explicitly and return
`InvalidFieldError`.
…ationDetails

Three more proto round-trip targets, picked for similarity to bug
classes the previous rounds already turned up:

  src/repr/fuzz:
    - interval_proto_roundtrip:    ProtoInterval <-> Interval
                                   (Interval arithmetic shares the
                                    timestamp validation surface)
    - mz_acl_item_proto_roundtrip: ProtoMzAclItem <-> MzAclItem
                                   (access-control values; multi-field
                                    proto with several enum-coded slots)
  src/storage-types/fuzz:
    - postgres_publication_details_proto_roundtrip:
        ProtoPostgresSourcePublicationDetails <-> ...
Two more proto round-trip targets:

  src/repr/fuzz:
    - acl_item_proto_roundtrip: ProtoAclItem <-> AclItem
                                (PostgreSQL-style ACL entry, distinct
                                 from MzAclItem)
  src/storage-types/fuzz:
    - source_export_statement_details_proto_roundtrip:
        ProtoSourceExportStatementDetails <-> SourceExportStatementDetails
        (5-variant enum: Postgres / MySql / SqlServer / LoadGenerator /
         Kafka — lots of conversion branches to round-trip)
…r display

The earlier change made `Ident::can_be_printed_bare` reject members of
`is_context_sensitive_keyword` (MAP, POSITION, EXTRACT, ALL, ANY, …)
so that round-trip through sql-pretty preserved them. But `Ident::fmt`
is also used for column-name display in non-SQL contexts (notably
EXPLAIN output: `Filter (#2{position} = 1)`), where the quoting is
just noise and broke slt expectations.

Revert the global change. The fuzz targets that exercised this round
trip get a narrow carve-out (skip on the `Expected left square
bracket` / `Expected left parenthesis` / `Expected IN, found ...`
reparse errors that come from a context-sensitive keyword landing in a
position the parser dispatches on).
`SourceExportStatementDetails` doesn't derive `PartialEq`/`Debug`, so
`assert_eq!` on the round-tripped value fails to compile. Switch to
comparing the canonical re-encoded proto bytes from two successive
Rust→Proto trips: equal bytes implies the Rust value was preserved.
… and seed scripts

External-schema descriptors (untrusted upstream-database bytes) and the
pgwire frontend-message decoder (untrusted client bytes) are the two
biggest unfuzzed trust boundaries left. Also extends the avro fuzzing
to the encode path and adds proper seed corpora.

New fuzz crates:
  src/mysql-util/fuzz:      mysql_table_desc_proto_roundtrip
  src/postgres-util/fuzz:   postgres_table_desc_proto_roundtrip
  src/sql-server-util/fuzz: sql_server_table_desc_proto_roundtrip
  src/pgwire/fuzz:          codec_decode  (pumps the frontend codec
                                           over arbitrary bytes; uses
                                           a new `fuzz` feature on
                                           mz-pgwire that re-exports
                                           the internal `Codec`)

New target in src/avro/fuzz:
  encode_roundtrip          drives a Reader to build a Value, then
                            to_avro_datum + from_avro_datum and
                            asserts equality

Seed corpora:
  src/avro/fuzz/prepare-corpus.sh    copies benches/quickstop-null.avro
                                     into the reader_decode and
                                     encode_roundtrip corpora so
                                     libFuzzer doesn't waste cycles
                                     bouncing off the Avro magic header
  src/pgwire/fuzz/prepare-corpus.sh  emits 20 hand-crafted frontend
                                     frames (startup, query, parse,
                                     bind, execute, sync, sasl, etc.)
                                     so the fuzzer mutates from real
                                     wire structure
- Add "Expected arrow, found ..." to the keyword-disambig carve-out
  (a `map[...]` Subscript on `map` reparses through the MAP type
  grammar and fails with that message).
- Move the nightly cargo-fuzz step to `hetzner-x86-64-16cpu-32gb` and
  bump the timeout to 180 min so 22 targets at the default 5-min budget
  finish comfortably.
Replace the single `cargo-fuzz` step with a group of 10 per-crate
steps. Each fits in the 60-minute Buildkite cap and runs on a 16-CPU
agent (so each target's `--jobs=16` saturates the box). Steps have no
dependencies between them and will parallelize across agents subject
to availability.
Now that cargo-fuzz is preinstalled in the nightly ci-builder image,
each Buildkite step runs one crate (no multi-crate iteration), and
`cargo fuzz list` enumerates targets, the runner can be a ~30-line
wrapper. Drop:

  - toolchain detection (rely on the caller's default cargo being
    nightly; ci-builder's nightly flavor already arranges that, and
    local users prefix with `RUSTUP_TOOLCHAIN=nightly` or set their
    default to nightly)
  - install-on-demand of cargo-fuzz (handled in the Dockerfile)
  - the all-crates ALL_CRATES iteration (one step per crate now)
  - the `-fork=` experimentation (settled on `--jobs=`)

Single argument: the fuzz-crate path.
`PostgresKeyDesc::cols` is `Vec<u16>` but the proto carries it as
`Vec<u32>`; the decode path did `c.try_into().expect("values roundtrip")`,
which panicked on any value above 65535 — reachable from untrusted
proto bytes (the fuzz target found this from many distinct inputs).
Propagate the conversion error via `TryFromProtoError` instead.

Also fixes two build errors found while bringing up new fuzz crates:

- mysql-util fuzz target was importing via `mz_mysql_util::desc::*`,
  but `desc` is a private module; use the top-level re-exports.
- pgwire's `Codec` was a fully-private struct, blocking the
  `fuzz_exports` re-export. Make it `pub(crate)` so the same-crate
  `pub use` in lib.rs works under `#[cfg(feature = "fuzz")]`.
…nDesc

Same shape as the `ProtoPostgresKeyDesc.cols` fix: `col_num` is `u16`
in Rust but `u32` on the wire, and the decode path used
`.expect("u16 must roundtrip")` — reachable from untrusted proto
bytes (the fuzz target found this from many distinct inputs once the
key-desc panic was out of the way).

Also bumps `pgwire::codec::Codec` from `pub(crate)` to `pub` — needed
for the `fuzz_exports` re-export to compile (`pub use` can't widen a
`pub(crate)` item to a fully-public reexport). `mod codec` remains
private, so the type is only reachable via the `fuzz` feature.
…decode

Two real avro library bugs surfaced once seed coverage went up:

1. `encode_roundtrip` triggered a stack overflow on deeply-nested
   Values (encoder is recursive without a depth limit). ASAN's stack
   guard intercepts as SIGSEGV — not catchable as a Rust panic. Fixing
   it requires changing the encoder to be iterative or to enforce a
   depth limit. Drop the target until that's done.

2. `reader_decode` triggers `capacity_overflow` panics in
   `Vec::with_capacity` when the wire claims an enormous block size
   (the decoder doesn't bound block sizes against remaining input).
   Wrap the body in `std::panic::catch_unwind` with a TODO note.
   Fuzz target keeps finding other classes of bug, but the known
   over-allocation pattern no longer fails CI.

Both items are tracked TODOs in mz-avro; fix in a separate commit.
def- added 2 commits May 29, 2026 13:29
Some crates have a single fuzz target — they were finishing the
Buildkite step in ~10 min, mostly compile time, with only 5 minutes
actually fuzzing. Compute MAX_SECONDS dynamically: `TOTAL_BUDGET`
(default 2700s = 45 min, leaving 15-min headroom under the 60-min
Buildkite cap for cold-cache builds) divided by the number of targets
in the crate.

`MAX_SECONDS` still wins if set explicitly (local quick runs).
`-fork=N` keeps the wall clock running past `-max_total_time` into a
single-threaded merge phase; observed ratio is ~1.4×, so a target
asked for 300s actually runs ~7 min wall. Drop the auto-scale default
`TOTAL_BUDGET` from 2700 to 1800 (30 min) — fits under the 60-min
Buildkite cap with 18 min headroom for cold-cache builds.

For an 8-target crate like `mz-repr`: 1800/8 = 225s × ~1.4 = ~5.2 min
wall × 8 = ~42 min + build = comfortably under 60.
def- and others added 4 commits May 29, 2026 14:29
Avro encodes arrays and maps as a series of blocks prefixed by a count.
The block decoders fed that count straight into `Vec::with_capacity` /
the per-block loop, so a malicious or corrupt object-container file
claiming a count near `i64::MAX` triggered a `capacity_overflow` panic
under libfuzzer. Reject any block length beyond a fixed 2^24 cap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Avro schemas can reference themselves (record fields naming the
enclosing record) and may nest arbitrarily through `array`/`map`/
`record`/`union`. Both `SchemaParser::parse_inner` and
`GeneralDeserializer::deserialize` recursed without a depth limit,
so a malicious schema plus matching wire bytes blew the stack under
libfuzzer. Cap each at 128 nested levels.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous commits cap per-block array/map lengths and bound schema
+ decode recursion depth in mz-avro itself, so the fuzz target no
longer needs to catch panics from those classes. Remove the wrapper
so any future panic surfaces as a real fuzz failure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The cargo-fuzz repr step timed out in nightly 16629: 7 targets ran,
the 8th (scalar_type_proto_roundtrip) got 2 seconds before Buildkite
killed the step at 60 min. Root cause: `row_proto_roundtrip` asked
for 225s but used 408s wall (1.81× ratio, not the 1.4× I assumed).
Targets with large corpora spend disproportionate time in the
fork-mode merge phase past `-max_total_time`.

Drop `TOTAL_BUDGET` default from 1800 to 1500 (asked) and add a
`WALL_BUDGET` gate that skips remaining targets once <2× MAX_SECONDS
of wall time is left. Skipped targets exit the script non-zero so a
build that ran out of time is still visibly under-tested.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
def- and others added 5 commits May 30, 2026 01:00
Reader::read_block_next read the block byte length straight off the
wire as an i64 and fed it to `fill_buf` → `Vec::resize`, so a
malicious file claiming a multi-trillion-byte block sized the
allocation request past ASan's 1 TB limit (and would otherwise OOM).
Route the value through `util::safe_len` to enforce the existing
MAX_ALLOCATION_BYTES (512 MiB) cap that the rest of the decoder
already respects.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`parse_table_factor` speculatively tries `parse_derived_table_factor`
inside a `maybe_parse`, falling back to `parse_table_and_joins` on
failure. Both branches recurse on every `(`, so unbalanced
parentheses around multiple SELECTs (e.g. `(((SELECT * FROM (((SELECT
* FROM ...`) explore an exponential backtrack tree. The 128-deep
`RECURSION_LIMIT` bounds the *stack* but not the total work — fuzz
inputs of ~270 bytes parsed for more than 30 seconds. Cap
`maybe_parse` failures at 10_000 per parse; valid SQL needs a small
constant per token, far below the cap.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 8-target repr step ran tight under the 60-min Buildkite cap.
With ~1.8-2.05× wall-clock overhead on top of asked time, 7 targets
finished but the 8th got skipped by the WALL_BUDGET gate (build
16635). Split the 8 targets into two parallel steps of 4 each; each
step now gets full TOTAL_BUDGET = 1500s asked = ~45 min wall with
plenty of headroom for cold-cache compile.

cargo-fuzz.sh now accepts optional target names as positional args
after the crate path; with none, it falls back to `cargo fuzz list`
as before.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The display path for `RawDataType::Other` already quoted `"map"` to
keep it off the `parse_map_type` dispatch. The same disambiguation
is needed for every keyword that `parse_data_type` *renames* to a
different canonical type (`STRING` → text, `BIGINT` → int8, `BYTES`
→ bytea, …). Without quotes those names lose information through a
display + reparse cycle.

Keywords whose canonical name matches the keyword text (`bpchar`,
`varchar`, `time`, `timestamp`, `timestamptz`) are left unquoted —
they already round-trip via the keyword path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`doc_function` printed the function name via simple-mode display,
which emits a bare keyword for names like `map`, `array`, `row`,
etc. Reparsing then dispatches through the special-grammar parser
(`(Kw, LParen)` in `parse_prefix`) instead of a regular function
call, breaking the pretty + reparse round-trip. Mirror the same
quote carve-out the `AstDisplay for Function` impl uses.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant