Skip to content

fix(FileStorage): harden GC against non-cache files (#45)#86

Open
OndraDol wants to merge 2 commits intonette:masterfrom
OndraDol:fix/issue-45-gc-hardening-atomic-write
Open

fix(FileStorage): harden GC against non-cache files (#45)#86
OndraDol wants to merge 2 commits intonette:masterfrom
OndraDol:fix/issue-45-gc-hardening-atomic-write

Conversation

@OndraDol
Copy link
Copy Markdown

@OndraDol OndraDol commented Apr 4, 2026

Summary

Fix clean() crashing with unserialize(): Error at offset when non-cache files (e.g. Latte lock files) exist in the cache directory with names matching the _* pattern.

Root cause

FileStorage::readMetaAndLock() calls unserialize() on arbitrary file contents without error handling. The clean() method uses Finder::find('_*') to scan the cache directory, which matches all underscore-prefixed files — not just cache entries. When a non-cache file like _template.php.lock with content 0942e7 is encountered:

  1. The 6-byte header read produces (int) "0942e7" → non-zero size
  2. stream_get_contents() reads garbage bytes
  3. unserialize() fails with "Error at offset X of Y bytes"

This was diagnosed by @martinbohmcz in #45 (comment) — after renaming the directory to avoid the underscore prefix, the errors stopped.

Solution

readMetaAndLock() — Replace unguarded unserialize() + assert(is_array()) with @unserialize() + is_array() runtime validation. Non-cache files now return null instead of crashing. The @ error suppression is consistent with other I/O operations in the same method (e.g. @fopen on line 303).

readData() — Add defensive @unserialize() for corrupted serialized data payloads (e.g. from truncated writes after process crashes).

What this does NOT change

The write path and locking model remain untouched. The current header-last write strategy (null bytes → data → seek → header) is a deliberate design that uses the header as a commit marker. True atomic writes via temp file + rename would require rethinking the stampede prevention locking model (currently file-handle based), which is a separate concern.

Changes by file

  • src/Caching/Storages/FileStorage.php — defensive unserialize in readMetaAndLock() and readData()
  • tests/Storages/FileStorage.gc-foreign-files.phptnew: regression test for GC with foreign files (6 scenarios: digit-prefixed content, binary, empty, zero-header, fake meta, nested subdirectory)
  • tests/Storages/FileStorage.corrupted.phptnew: graceful handling of corrupted metadata, corrupted data payload, truncated files, all-zero files
  • tests/Storages/FileStorage.stress.phpt — enhanced: foreign files placed in cache dir during 1000-iteration stress test

Testing

  • All 62 existing tests pass (8 skipped — Memcached extension)
  • 2 new test files covering the exact scenarios from the bug report
  • Stress test enhanced with foreign files
  • PHPStan analysis clean
  • No debug artifacts in diff

Related issue

Closes #45

OndraDol added 2 commits April 4, 2026 22:34
…-cache files (nette#45)

Replace unguarded unserialize() + assert(is_array()) with @unserialize()
and is_array() runtime validation in readMetaAndLock(). Non-cache files
(e.g. Latte lock files matching the _* pattern) now return null instead
of crashing with "unserialize(): Error at offset".

Add defensive @unserialize() in readData() to handle corrupted serialized
data from truncated writes.
…tte#45)

Add FileStorage.gc-foreign-files.phpt:
- Verifies clean() does not crash when non-cache files (lock files,
  binary files, empty files) matching _* pattern exist in cache dir
- Tests both collector mode (skip foreign files) and Cache::All mode
  (delete everything)

Add FileStorage.corrupted.phpt:
- Verifies graceful handling of corrupted metadata, corrupted data
  payload, truncated files, and all-zero files

Enhance FileStorage.stress.phpt:
- Place foreign files in cache dir during stress test to verify they
  don't interfere with concurrent cache operations
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8aa08c70a9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

$data === false => null,
empty($meta[self::MetaSerialized]) => $data,
default => unserialize($data),
default => @unserialize($data), // @ - data may be corrupted by a truncated write
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat corrupted serialized payload as cache miss

When MetaSerialized is set, readData() now returns @unserialize($data) directly, so malformed payloads resolve to false instead of a miss. In Cache::load(), only null is treated as missing ($data === null), so a corrupted file can become a persistent cache hit with value false, skipping regeneration and returning incorrect data to callers. Please map unserialize failure to null (while preserving legitimate serialized false, e.g. b:0;).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FileStorage does not use atomic write

1 participant