feat(storage): Add compressed archive support#28146
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
🟡 Playwright Results — all passed (10 flaky)✅ 4139 passed · ❌ 0 failed · 🟡 10 flaky · ⏭️ 89 skipped
🟡 10 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
Code Review ✅ Approved 5 resolved / 5 findingsAdds compressed archive support for S3 and GCS connectors, including robust path validation and archive readers. All initial findings regarding zip bomb risks, path traversal, and null pointer exceptions have been resolved. ✅ 5 resolved✅ Edge Case: is_archive_format crashes on None structureFormat
✅ Security: Missing path traversal validation in ZIP and RAR readers
✅ Security: Zip bomb risk: no ratio check on decompressed data
✅ Quality: Overly broad
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar



Describe your changes:
Fixes #
I worked on ... because ...
Type of change:
High-level design:
N/A — small change.
Tests:
Use cases covered
Unit tests
Backend integration tests
Ingestion integration tests
Playwright (UI) tests
Manual testing performed
UI screen recording / screenshots:
Not applicable.
Checklist:
Fixes <issue-number>: <short explanation>Fixes #<issue-number>above.Summary by Gitar
sqlalchemy-pytdsfrom~=0.3to~=1.0iningestion/setup.pyto fixAttributeErrorduring server-side cursor fetches.zip,tar,tar.gz,tgz,7z,rar) in S3 and GCS connectors.open_archive_readerandArchiveReaderinfrastructure to handle nested file ingestion with schema inference.test_archive.pycovering archive readers, security checks, and column inference._is_safe_archive_pathto prevent directory traversal vulnerabilities during archive extraction._generate_structured_containersto prevent ingestion failures when processing individual corrupted archives.This will update automatically on new commits.