Skip to content

fix(docker): replace PyPI opencv wheel with ffmpeg-free build [security]#569

Merged
lawrence-u10d merged 1 commit intomainfrom
ffmpeg-fix
Apr 22, 2026
Merged

fix(docker): replace PyPI opencv wheel with ffmpeg-free build [security]#569
lawrence-u10d merged 1 commit intomainfrom
ffmpeg-fix

Conversation

@lawrence-u10d
Copy link
Copy Markdown
Contributor

@lawrence-u10d lawrence-u10d commented Apr 22, 2026

Summary

Mirrors Unstructured-IO/unstructured#4336 in this repo so the quay.io/unstructured-io/unstructured-api image no longer ships the 14 ffmpeg 5.1.x CVEs bundled in PyPI opencv-python wheels.

After uv sync, the Dockerfile now:

  • Downloads the architecture-specific opencv-contrib-python-headless wheel (built with WITH_FFMPEG=OFF + ENABLE_CONTRIB=1 + ENABLE_HEADLESS=1) from the upstream Unstructured-IO/unstructured GitHub release (opencv-4.12.0.88)
  • SHA-256-verifies against the hashes published by the upstream build-opencv-wheels.yml workflow
  • Uninstalls any installed PyPI opencv variants and installs the verified wheel with --no-deps

The contrib-headless variant is a strict superset of the cv2 API exposed by opencv-python, opencv-python-headless, and opencv-contrib-python, so a single wheel transparently replaces whichever variant is present.

One deviation from upstream

Upstream uninstalls all four opencv variants in a single uv pip uninstall … call because their image pulls all four transitively (via unstructured-paddleocr). Our uv.lock currently only resolves opencv-python, so a single combined uninstall would fail on the three that aren't installed. Replaced with a per-package loop using || true — same end state, robust if transitive deps change.

Version / Changelog

  • Bumps service version 0.1.30.1.4
  • CHANGELOG.md entry under 0.1.4 → Security
  • No uv lock changes needed; the lockfile still resolves opencv-python 4.13.0.92, and we overlay the 4.12.0.88 contrib-headless wheel only at image build time (upstream 4.13.0.92 has no sdist on PyPI, which is why the build-from-source workflow is pinned to 4.12.0.88).

Test plan

  • make docker-build succeeds on amd64 and arm64; the opencv replacement step resolves the architecture-specific wheel and the SHA-256 check passes
  • docker run … python -c "import cv2; print(cv2.__version__)" prints 4.12.0.88 inside the built image
  • make docker-test passes against the rebuilt image
  • Container scan of the rebuilt image no longer flags the 14 ffmpeg CVEs called out by upstream PR #4336

🤖 Generated with Claude Code


Note

Medium Risk
Medium risk because it changes a core binary dependency (opencv) at image build time via an external wheel download and forced uninstall/reinstall, which could impact image build reliability or runtime CV2 behavior across architectures.

Overview
Updates the Docker build to remove vulnerable ffmpeg-bundled PyPI OpenCV wheels by downloading an arch-specific, SHA-256-verified opencv-contrib-python-headless wheel built with WITH_FFMPEG=OFF, uninstalling any installed OpenCV variants, and reinstalling the verified wheel.

Bumps the service version to 0.1.4 and adds a CHANGELOG.md security entry documenting the OpenCV/ffmpeg CVE mitigation.

Reviewed by Cursor Bugbot for commit 7e23afc. Bugbot is set up for automated code reviews on this repo. Configure here.

Mirrors Unstructured-IO/unstructured#4336. After uv sync, the Dockerfile
now downloads a source-built opencv-contrib-python-headless wheel
(WITH_FFMPEG=OFF) from the upstream release, hash-verifies it, and
substitutes it for the PyPI opencv variant installed from uv.lock. This
eliminates the 14 bundled ffmpeg 5.1.x CVEs shipped in PyPI opencv wheels.

Bumps service version 0.1.3 -> 0.1.4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lawrence-u10d lawrence-u10d requested a review from qued April 22, 2026 16:26
@lawrence-u10d lawrence-u10d merged commit 03b57e0 into main Apr 22, 2026
12 checks passed
@lawrence-u10d lawrence-u10d deleted the ffmpeg-fix branch April 22, 2026 21:14
lawrence-u10d added a commit that referenced this pull request Apr 22, 2026
## Summary
Follow-up to #569 (v0.1.4). That PR replaced the PyPI `opencv-python`
wheel with an ffmpeg-free build, but image scanners were still flagging
the 14 ffmpeg CVEs against v0.1.4. Root cause is scanner scope, not a
broken replacement.

## Root cause
`uv pip uninstall` only drops a package from `site-packages`. The
extracted wheel archive stays in the uv cache. Inspecting the pushed
v0.1.4 image:

- ✅ `cv2.__version__` reports `4.12.0` (our replacement wheel)
- ✅ `site-packages/cv2/` has no `.libs/` directory
- ❌
`/home/notebook-user/.cache/uv/archive-v0/<hash>/opencv_python.libs/`
still contains the full extracted old wheel:
  - `libavcodec-*.so.59.37.100`
  - `libavformat-*.so.59.27.100`
  - `libavutil-*.so.57.28.100`
  - plus `libavfilter`, `libavdevice`, `libswscale`, `libswresample`

SO-version suffixes (avcodec 59.37 / avformat 59.27 / avutil 57.28) are
ffmpeg 5.1.x — matching the CVE set the upstream PR called out. Scanners
walk the whole filesystem and flag these even though nothing links
against them at runtime. `UV_LINK_MODE=copy` (set globally in this
Dockerfile) compounds it — the cache keeps its own copy independent of
`site-packages`.

## Fix
Add `uv cache clean` to the end of the opencv replacement `RUN` to wipe
the cache (including the old opencv wheel archive) from the final image
layer. Single minimal change — scoped to the opencv-fix RUN, not a
broader image-slimming pass.

Safe because `UV_LINK_MODE=copy` means the live venv copies files out of
cache — wiping the cache doesn't affect the installed packages.

## False positives ignored (not fixed here)
Two other `libav*` filenames in the image that are **not** ffmpeg and
don't trigger these CVEs:
- `/usr/lib/libreoffice/program/libavmedia{gst,lo}.so` — LibreOffice's
\"avmedia\" framework shim
- `pillow.libs/libavif-*.so.16` — AV1 image codec

## Version / Changelog
- Bumps service version `0.1.4` → `0.1.5`
- `CHANGELOG.md` entry under `0.1.5` → Security
- No `uv lock` changes

## Test plan
- [ ] `make docker-build` succeeds on `amd64` and `arm64`
- [ ] In the rebuilt image, `find / -name \"libavcodec*\" -o -name
\"libavformat*\" -o -name \"libswscale*\"` returns nothing under
`/home/notebook-user/.cache/uv/` and nothing under
`site-packages/cv2/.libs/`
- [ ] `cv2.__version__` still reports `4.12.0.88` and `import cv2;
cv2.imdecode(...)` smoke check works
- [ ] Container scan of the rebuilt image no longer flags the 14 ffmpeg
CVEs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: a single Docker build-step cleanup (`uv cache clean`) plus
version/changelog bumps; main risk is unintended impact on Docker layer
caching or build time, not runtime behavior.
> 
> **Overview**
> Removes leftover ffmpeg `.so` files from the built image by adding `uv
cache clean` after uninstalling/reinstalling OpenCV wheels in the
Dockerfile, preventing scanners from flagging CVEs from cached wheel
contents.
> 
> Bumps the service version to `0.1.5` and adds a matching
`CHANGELOG.md` security entry describing the cache purge.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
f73143d. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants