Skip to content

fix stale output capacity in Lz4BlockDecompressor decompress#64091

Open
sahvx655-wq wants to merge 1 commit into
apache:masterfrom
sahvx655-wq:lz4block-output-capacity
Open

fix stale output capacity in Lz4BlockDecompressor decompress#64091
sahvx655-wq wants to merge 1 commit into
apache:masterfrom
sahvx655-wq:lz4block-output-capacity

Conversation

@sahvx655-wq
Copy link
Copy Markdown

@sahvx655-wq sahvx655-wq commented Jun 3, 2026

reading the lz4 block path in decompressor.cpp: remaining_output_len is computed once per large block and then handed to LZ4_decompress_safe as the destination capacity for every small block inside it. output_ptr advances by each small block's decompressed length, but that capacity never moves with it, so from the second small block on the decompress is told it has the full large-block space starting at an already advanced pointer. a crafted lz4block stream (for instance a csv load) can then write past the line reader output buffer, a heap out-of-bounds write.

the fix passes the true remaining capacity measured from the current output_ptr: output_max_len - (output_ptr - output).

  1. problem fixed: heap out-of-bounds write in Lz4BlockDecompressor::decompress. the inner small-block loop passed a stale dstCapacity (remaining_output_len, fixed per large block) to LZ4_decompress_safe while output_ptr kept advancing, so later small blocks could be decompressed past the output buffer. fixed by computing the capacity relative to the current output_ptr each iteration.

  2. behaviour modified: before, the second and later small blocks within a large block were given the full large-block capacity even though output_ptr had already moved forward, which over-states the space and allows an overflow. now the capacity tracks output_ptr, so a small block that would not fit makes LZ4_decompress_safe return an error and decompress returns InvalidArgument instead of overflowing. impact is limited to this bounds check; well-formed streams that already fit are unaffected.

  3. no new feature.

  4. no refactor. the change is a single argument to LZ4_decompress_safe plus a comment.

  5. no optimisation.

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants