Skip to content

fix: reject path-traversal payloads in document metadata#11787

Merged
sjrl merged 7 commits into
deepset-ai:mainfrom
camgrimsec:feat/image-utils-path-traversal-guard
Jun 30, 2026
Merged

fix: reject path-traversal payloads in document metadata#11787
sjrl merged 7 commits into
deepset-ai:mainfrom
camgrimsec:feat/image-utils-path-traversal-guard

Conversation

@camgrimsec

Copy link
Copy Markdown
Contributor

When _extract_image_sources_info is called with a non-empty root_path, the resolved file path must now stay within that root. Document metadata containing '../' sequences or absolute paths that escape the configured root raises ValueError before any filesystem read.

This blocks an exfiltration vector where attacker-controlled file_path metadata on indexed documents could cause image-conversion pipelines (LLMDocumentContentExtractor, DocumentToImageContent) to read arbitrary host files and forward their contents to an external LLM endpoint.

Behaviour with an empty root_path is unchanged.

Tests:

  • rejects '../../../etc/passwd' relative payload
  • rejects absolute path outside root
  • accepts path that resolves inside root

Related Issues

  • fixes #issue-number

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I have documented my code.
  • I have added a release note file, following the contributors guidelines.
  • I have run pre-commit hooks and fixed any issue.

When _extract_image_sources_info is called with a non-empty root_path, the
resolved file path must now stay within that root. Document metadata
containing '../' sequences or absolute paths that escape the configured
root raises ValueError before any filesystem read.

This blocks an exfiltration vector where attacker-controlled file_path
metadata on indexed documents could cause image-conversion pipelines
(LLMDocumentContentExtractor, DocumentToImageContent) to read arbitrary
host files and forward their contents to an external LLM endpoint.

Behaviour with an empty root_path is unchanged.

Tests:
- rejects '../../../etc/passwd' relative payload
- rejects absolute path outside root
- accepts path that resolves inside root
@camgrimsec camgrimsec requested a review from a team as a code owner June 26, 2026 14:44
@camgrimsec camgrimsec requested review from sjrl and removed request for a team June 26, 2026 14:44
@vercel

vercel Bot commented Jun 26, 2026

Copy link
Copy Markdown

@camgrimsec is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@sjrl

sjrl commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@sjrl sjrl self-assigned this Jun 29, 2026
Comment thread haystack/components/converters/image/image_utils.py Outdated
Comment thread releasenotes/notes/image_utils-path-traversal-guard-951e4271322a3bf5.yaml Outdated
@sjrl sjrl changed the title fix(image_utils): reject path-traversal payloads in document metadata fix: reject path-traversal payloads in document metadata Jun 30, 2026
@sjrl

sjrl commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Hey @camgrimsec thanks for opening the PR! Please address the comments including fixing the failing formatting in the CI.

@github-actions

Copy link
Copy Markdown
Contributor

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  haystack/components/converters/image
  image_utils.py
Project Total  

This report was generated by python-coverage-comment-action

@sjrl sjrl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sjrl sjrl merged commit 5c659a2 into deepset-ai:main Jun 30, 2026
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants