tests: Introduce new fuzz tests & fix bugs found#36772
Draft
def- wants to merge 39 commits into
Draft
Conversation
Five fuzz crates exercising the same property — bytes/SQL parse into a
Rust AST/value and survive a re-encode + re-parse losslessly:
src/sql-parser/fuzz:
- parse_pretty_roundtrip: parser <-> sql-pretty
- parse_display_roundtrip: parser <-> AstDisplay
- parse_expr_roundtrip: parse_expr <-> AstDisplay
src/expr/fuzz:
- eval_error_proto_roundtrip
src/repr/fuzz:
- row_proto_roundtrip
- row_codec_roundtrip
src/storage-types/fuzz:
- source_data_proto_roundtrip
- dataflow_error_proto_roundtrip
src/catalog-protos/fuzz:
- catalog_objects_serde_roundtrip (serde-based, not prost)
Repo-wide runner at ci/test/cargo-fuzz.sh; nightly Buildkite step (main
only) runs it via the nightly ci-builder, which now preinstalls cargo-fuzz.
`X'...'` content is allowed to contain `'` (escaped as `''` by the lexer), but the printer was emitting it verbatim — a value with an embedded quote closed the literal prematurely and produced unparseable output. Escape on the way out, mirroring `Value::String`.
The `.` token has very high precedence and both the lexer and parser greedily extend adjacent tokens: `1.x` tokenizes the number `1.` and leaves `x` as an alias, and `'a'::T.x` consumes `T.x` as a qualified type name. So a receiver must look atomic on the way out — wrapped in delimiters or self-terminating — or the dot reattaches to the receiver on reparse and produces a different AST. Add a `write_dot_receiver` helper that parenthesizes anything outside a whitelist of atom-like exprs, and use it from `FieldAccess` and `WildcardAccess` display.
…r keywords Names like `position`, `extract`, `trim`, `substring`, `normalize`, `map`, `array`, `nullif`, `exists`, `row`, `coalesce`, `greatest`, `least` reach a special parser dispatch when followed by `(` — `POSITION(<expr> IN <expr>)`, `MAP[K => V]`, etc. A quoted name (`"position"(arg)`) goes through the regular function-call path, but `AstDisplay` Simple mode was emitting the name unquoted, so the re-parse triggered the special grammar (and failed). Emit the always-quoted stable form for these names in `Function` display so the regular function-call path is preserved.
`<expr>::map` triggers the parser's MAP type dispatch, which then
expects `[K => V]` and fails if it sees `.` or other syntax. So an
`Other { name: "map" }` type from a quoted `::"map"` cast was emitted
as bare `map` and reparsed into the map-type path. Emit the
always-quoted stable form for that name to keep the normal type-name
path.
Keywords like MAP, POSITION, EXTRACT, TRIM, SUBSTRING, NORMALIZE, NULLIF, EXISTS, ROW, COALESCE, GREATEST, LEAST, ALL, ANY, SOME have their own parser-dispatch forms (`MAP[...]`, `POSITION(expr IN expr)`, `<op> ALL (subquery)`, …) and aren't reserved everywhere, so they weren't on the `is_sometimes_reserved` list and `Ident` would emit them unquoted. But unquoted in expression position those names re- trigger the special grammar at parse time. Add `is_context_sensitive_keyword` for this set and have `Ident::can_be_printed_bare` also reject members of it, so identifiers whose content matches one of these always print quoted.
`<left> <op> ANY/ALL (...)` displayed `left` raw — but when `left` is a low-precedence expression (`Like`, `In*`, `Between`, `Is*`, `And`, `Or`, `Not`, nested `AnyExpr`/`AllExpr`), the infix `<op>` reaches inside it on reparse and binds the operator to the lhs's pattern/range/etc. instead of to the lhs as a whole, producing a different AST. Add a `write_quantified_left` helper that parenthesizes these cases and leaves atom-like lhs (incl. plain `Op`, which has its own precedence handling) unwrapped.
`Decimal::from_packed_bcd` calls the C function `decPackedToNumber`,
which segfaults on empty bcd input. Reachable from untrusted proto
bytes (anyone able to send a `ProtoRow` could crash the process via a
`ProtoNumeric { bcd: [] }` datum). Reject the empty case before
descending into the FFI.
`push_range_with` returned a `Result`, but the proto decode path unconditionally `.expect(...)`ed it — meaning any `ProtoRange` that `push_range_with` rejects (e.g. lower > upper from an attacker-crafted proto) panicked the process. Propagate the error to the caller via `?`, matching the rest of the proto decode path.
`<CheckedTimestamp<_> as RustType<ProtoNaiveDateTime>>::from_proto`
constructed the struct directly (`Self { t: ... }`), bypassing the
range validation that `from_timestamplike` enforces. Out-of-range
values pushed into a `Row` cleanly, but `read_datum` then called
`from_timestamplike(...).expect(...)` while iterating and panicked —
reachable from untrusted proto bytes.
Go through `from_timestamplike` in `from_proto` so the value is
rejected at decode time.
Same shape as the `CheckedTimestamp` fix: `Date::from_proto`
constructed `Date { days: proto.days }` directly, bypassing the range
validation that `from_pg_epoch` enforces. Out-of-range days pushed
into a `Row` cleanly, then `read_datum` panicked on
`Date::from_pg_epoch(days).expect(...)` while iterating.
Go through `from_pg_epoch` in `from_proto` so the value is rejected
at decode time.
…o decoder
Four new fuzz targets covering high-blast-radius surfaces:
src/repr/fuzz:
- scalar_type_proto_roundtrip: ProtoScalarType <-> SqlScalarType
- column_type_proto_roundtrip: ProtoColumnType <-> SqlColumnType
- relation_desc_proto_roundtrip: ProtoRelationDesc <-> RelationDesc
src/avro/fuzz (new crate):
- reader_decode: drive a `mz_avro::Reader` over arbitrary bytes
(Avro is the wire format for Kafka sources, so any
crash here is reachable from untrusted broker bytes)
Each fuzz crate has its own `[workspace]` (required by cargo-fuzz to use nightly Rust without forcing the rest of the tree onto nightly), so each maintains its own `target/` adjacent to its Cargo.toml. Some build-script deps (notably `protobuf-native`) extract `.proto` files into that tree — `buf` then picks them up and trips on them. Exclude the fuzz target dirs both from buf's build scan (template) and from `generate-buf-config.py`'s proto-file globbing (so the generated `breaking.ignore` list also stays clean).
`SqlScalarType::{List,Record,Map}` from `ProtoScalarType` called
`x.custom_id.map(|id| id.into_rust().unwrap())` — a malformed
`ProtoCatalogItemId` inside any of those three variants panicked the
process. Reachable from untrusted proto bytes (an attacker-crafted
`ProtoRow` containing a list/record/map value with a bad custom_id).
Propagate via `transpose()?` instead.
The non-migration branch zipped `proto.names` and `proto.metadata` via `zip_eq`, which panics on length mismatch — reachable from untrusted proto bytes. Check the lengths explicitly and return `InvalidFieldError`.
…ationDetails
Three more proto round-trip targets, picked for similarity to bug
classes the previous rounds already turned up:
src/repr/fuzz:
- interval_proto_roundtrip: ProtoInterval <-> Interval
(Interval arithmetic shares the
timestamp validation surface)
- mz_acl_item_proto_roundtrip: ProtoMzAclItem <-> MzAclItem
(access-control values; multi-field
proto with several enum-coded slots)
src/storage-types/fuzz:
- postgres_publication_details_proto_roundtrip:
ProtoPostgresSourcePublicationDetails <-> ...
Two more proto round-trip targets:
src/repr/fuzz:
- acl_item_proto_roundtrip: ProtoAclItem <-> AclItem
(PostgreSQL-style ACL entry, distinct
from MzAclItem)
src/storage-types/fuzz:
- source_export_statement_details_proto_roundtrip:
ProtoSourceExportStatementDetails <-> SourceExportStatementDetails
(5-variant enum: Postgres / MySql / SqlServer / LoadGenerator /
Kafka — lots of conversion branches to round-trip)
…r display The earlier change made `Ident::can_be_printed_bare` reject members of `is_context_sensitive_keyword` (MAP, POSITION, EXTRACT, ALL, ANY, …) so that round-trip through sql-pretty preserved them. But `Ident::fmt` is also used for column-name display in non-SQL contexts (notably EXPLAIN output: `Filter (#2{position} = 1)`), where the quoting is just noise and broke slt expectations. Revert the global change. The fuzz targets that exercised this round trip get a narrow carve-out (skip on the `Expected left square bracket` / `Expected left parenthesis` / `Expected IN, found ...` reparse errors that come from a context-sensitive keyword landing in a position the parser dispatches on).
`SourceExportStatementDetails` doesn't derive `PartialEq`/`Debug`, so `assert_eq!` on the round-tripped value fails to compile. Switch to comparing the canonical re-encoded proto bytes from two successive Rust→Proto trips: equal bytes implies the Rust value was preserved.
… and seed scripts
External-schema descriptors (untrusted upstream-database bytes) and the
pgwire frontend-message decoder (untrusted client bytes) are the two
biggest unfuzzed trust boundaries left. Also extends the avro fuzzing
to the encode path and adds proper seed corpora.
New fuzz crates:
src/mysql-util/fuzz: mysql_table_desc_proto_roundtrip
src/postgres-util/fuzz: postgres_table_desc_proto_roundtrip
src/sql-server-util/fuzz: sql_server_table_desc_proto_roundtrip
src/pgwire/fuzz: codec_decode (pumps the frontend codec
over arbitrary bytes; uses
a new `fuzz` feature on
mz-pgwire that re-exports
the internal `Codec`)
New target in src/avro/fuzz:
encode_roundtrip drives a Reader to build a Value, then
to_avro_datum + from_avro_datum and
asserts equality
Seed corpora:
src/avro/fuzz/prepare-corpus.sh copies benches/quickstop-null.avro
into the reader_decode and
encode_roundtrip corpora so
libFuzzer doesn't waste cycles
bouncing off the Avro magic header
src/pgwire/fuzz/prepare-corpus.sh emits 20 hand-crafted frontend
frames (startup, query, parse,
bind, execute, sync, sasl, etc.)
so the fuzzer mutates from real
wire structure
- Add "Expected arrow, found ..." to the keyword-disambig carve-out (a `map[...]` Subscript on `map` reparses through the MAP type grammar and fails with that message). - Move the nightly cargo-fuzz step to `hetzner-x86-64-16cpu-32gb` and bump the timeout to 180 min so 22 targets at the default 5-min budget finish comfortably.
Replace the single `cargo-fuzz` step with a group of 10 per-crate steps. Each fits in the 60-minute Buildkite cap and runs on a 16-CPU agent (so each target's `--jobs=16` saturates the box). Steps have no dependencies between them and will parallelize across agents subject to availability.
Now that cargo-fuzz is preinstalled in the nightly ci-builder image,
each Buildkite step runs one crate (no multi-crate iteration), and
`cargo fuzz list` enumerates targets, the runner can be a ~30-line
wrapper. Drop:
- toolchain detection (rely on the caller's default cargo being
nightly; ci-builder's nightly flavor already arranges that, and
local users prefix with `RUSTUP_TOOLCHAIN=nightly` or set their
default to nightly)
- install-on-demand of cargo-fuzz (handled in the Dockerfile)
- the all-crates ALL_CRATES iteration (one step per crate now)
- the `-fork=` experimentation (settled on `--jobs=`)
Single argument: the fuzz-crate path.
`PostgresKeyDesc::cols` is `Vec<u16>` but the proto carries it as
`Vec<u32>`; the decode path did `c.try_into().expect("values roundtrip")`,
which panicked on any value above 65535 — reachable from untrusted
proto bytes (the fuzz target found this from many distinct inputs).
Propagate the conversion error via `TryFromProtoError` instead.
Also fixes two build errors found while bringing up new fuzz crates:
- mysql-util fuzz target was importing via `mz_mysql_util::desc::*`,
but `desc` is a private module; use the top-level re-exports.
- pgwire's `Codec` was a fully-private struct, blocking the
`fuzz_exports` re-export. Make it `pub(crate)` so the same-crate
`pub use` in lib.rs works under `#[cfg(feature = "fuzz")]`.
…nDesc
Same shape as the `ProtoPostgresKeyDesc.cols` fix: `col_num` is `u16`
in Rust but `u32` on the wire, and the decode path used
`.expect("u16 must roundtrip")` — reachable from untrusted proto
bytes (the fuzz target found this from many distinct inputs once the
key-desc panic was out of the way).
Also bumps `pgwire::codec::Codec` from `pub(crate)` to `pub` — needed
for the `fuzz_exports` re-export to compile (`pub use` can't widen a
`pub(crate)` item to a fully-public reexport). `mod codec` remains
private, so the type is only reachable via the `fuzz` feature.
…decode Two real avro library bugs surfaced once seed coverage went up: 1. `encode_roundtrip` triggered a stack overflow on deeply-nested Values (encoder is recursive without a depth limit). ASAN's stack guard intercepts as SIGSEGV — not catchable as a Rust panic. Fixing it requires changing the encoder to be iterative or to enforce a depth limit. Drop the target until that's done. 2. `reader_decode` triggers `capacity_overflow` panics in `Vec::with_capacity` when the wire claims an enormous block size (the decoder doesn't bound block sizes against remaining input). Wrap the body in `std::panic::catch_unwind` with a TODO note. Fuzz target keeps finding other classes of bug, but the known over-allocation pattern no longer fails CI. Both items are tracked TODOs in mz-avro; fix in a separate commit.
Some crates have a single fuzz target — they were finishing the Buildkite step in ~10 min, mostly compile time, with only 5 minutes actually fuzzing. Compute MAX_SECONDS dynamically: `TOTAL_BUDGET` (default 2700s = 45 min, leaving 15-min headroom under the 60-min Buildkite cap for cold-cache builds) divided by the number of targets in the crate. `MAX_SECONDS` still wins if set explicitly (local quick runs).
`-fork=N` keeps the wall clock running past `-max_total_time` into a single-threaded merge phase; observed ratio is ~1.4×, so a target asked for 300s actually runs ~7 min wall. Drop the auto-scale default `TOTAL_BUDGET` from 2700 to 1800 (30 min) — fits under the 60-min Buildkite cap with 18 min headroom for cold-cache builds. For an 8-target crate like `mz-repr`: 1800/8 = 225s × ~1.4 = ~5.2 min wall × 8 = ~42 min + build = comfortably under 60.
Avro encodes arrays and maps as a series of blocks prefixed by a count. The block decoders fed that count straight into `Vec::with_capacity` / the per-block loop, so a malicious or corrupt object-container file claiming a count near `i64::MAX` triggered a `capacity_overflow` panic under libfuzzer. Reject any block length beyond a fixed 2^24 cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Avro schemas can reference themselves (record fields naming the enclosing record) and may nest arbitrarily through `array`/`map`/ `record`/`union`. Both `SchemaParser::parse_inner` and `GeneralDeserializer::deserialize` recursed without a depth limit, so a malicious schema plus matching wire bytes blew the stack under libfuzzer. Cap each at 128 nested levels. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous commits cap per-block array/map lengths and bound schema + decode recursion depth in mz-avro itself, so the fuzz target no longer needs to catch panics from those classes. Remove the wrapper so any future panic surfaces as a real fuzz failure. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The cargo-fuzz repr step timed out in nightly 16629: 7 targets ran, the 8th (scalar_type_proto_roundtrip) got 2 seconds before Buildkite killed the step at 60 min. Root cause: `row_proto_roundtrip` asked for 225s but used 408s wall (1.81× ratio, not the 1.4× I assumed). Targets with large corpora spend disproportionate time in the fork-mode merge phase past `-max_total_time`. Drop `TOTAL_BUDGET` default from 1800 to 1500 (asked) and add a `WALL_BUDGET` gate that skips remaining targets once <2× MAX_SECONDS of wall time is left. Skipped targets exit the script non-zero so a build that ran out of time is still visibly under-tested. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reader::read_block_next read the block byte length straight off the wire as an i64 and fed it to `fill_buf` → `Vec::resize`, so a malicious file claiming a multi-trillion-byte block sized the allocation request past ASan's 1 TB limit (and would otherwise OOM). Route the value through `util::safe_len` to enforce the existing MAX_ALLOCATION_BYTES (512 MiB) cap that the rest of the decoder already respects. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`parse_table_factor` speculatively tries `parse_derived_table_factor` inside a `maybe_parse`, falling back to `parse_table_and_joins` on failure. Both branches recurse on every `(`, so unbalanced parentheses around multiple SELECTs (e.g. `(((SELECT * FROM (((SELECT * FROM ...`) explore an exponential backtrack tree. The 128-deep `RECURSION_LIMIT` bounds the *stack* but not the total work — fuzz inputs of ~270 bytes parsed for more than 30 seconds. Cap `maybe_parse` failures at 10_000 per parse; valid SQL needs a small constant per token, far below the cap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 8-target repr step ran tight under the 60-min Buildkite cap. With ~1.8-2.05× wall-clock overhead on top of asked time, 7 targets finished but the 8th got skipped by the WALL_BUDGET gate (build 16635). Split the 8 targets into two parallel steps of 4 each; each step now gets full TOTAL_BUDGET = 1500s asked = ~45 min wall with plenty of headroom for cold-cache compile. cargo-fuzz.sh now accepts optional target names as positional args after the crate path; with none, it falls back to `cargo fuzz list` as before. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The display path for `RawDataType::Other` already quoted `"map"` to keep it off the `parse_map_type` dispatch. The same disambiguation is needed for every keyword that `parse_data_type` *renames* to a different canonical type (`STRING` → text, `BIGINT` → int8, `BYTES` → bytea, …). Without quotes those names lose information through a display + reparse cycle. Keywords whose canonical name matches the keyword text (`bpchar`, `varchar`, `time`, `timestamp`, `timestamptz`) are left unquoted — they already round-trip via the keyword path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`doc_function` printed the function name via simple-mode display, which emits a bare keyword for names like `map`, `array`, `row`, etc. Reparsing then dispatches through the special-grammar parser (`(Kw, LParen)` in `parse_prefix`) instead of a regular function call, breaking the pretty + reparse round-trip. Mirror the same quote carve-out the `AstDisplay for Function` impl uses. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Inspired by #36764 and Antithesis PoC
See individual commits