fix: disambiguate archive entries when multiple inputs share the same basename#999
fix: disambiguate archive entries when multiple inputs share the same basename#999toller892 wants to merge 1 commit into
Conversation
… basename When compressing multiple directories that share the same basename (e.g., /a/parent1/Scripts and /a/parent2/Scripts), ouch previously stripped paths to just the basename using file_name(). This caused: - Zip: 'Invalid zip archive - Duplicate filename: Scripts/' - Tar: both directories merge into one, data loss on decompress Fix: detect when input paths have duplicate basenames and, in that case, compute the common ancestor directory of all inputs. Archive entries are then stored relative to this common ancestor, including enough parent path components to disambiguate (e.g., parent1/Scripts vs parent2/Scripts). When basenames don't collide, behavior is unchanged (backward compatible). Applied consistently to all three archive builders: tar, zip, and 7z. Fixes ouch-org#978
2b784e6 to
c458437
Compare
|
My 2 cents. I would behave like tar/zip by default: Use path from command line, stripping leading /. And maybe add something like --shorten-path. |
I'm slightly more favorable to this suggestion than the biggest common ancestor approach used in this PR. Mainly due to predictability. @tommady @valoq can I get u guys opinions? (If we change the default behavior we wouldn't need a flag too.) |
As I mentioned in the issue comment below, I will follow your advice. |
|
The tar and zip approach is better, yes. A few more potential issues:
|
Problem
When compressing multiple directories with the same basename (e.g.,
/mnt/d/Temp/GOG/Scriptsand/mnt/d/Temp/kodi/Scripts), ouch strips paths to just the basename usingfile_name(). This causes:"Invalid zip archive - Duplicate filename: Scripts/"Fixes #978
Root Cause
In all three archive builders (tar, zip, 7z), the
build_archivefunction:cd_into_same_dir_as(explicit_path)to change cwd to the parent directoryexplicit_path.file_name().unwrap()to get just the basenameScripts/...When two inputs share the same basename, they collide.
Fix
When input paths have duplicate basenames, compute the common ancestor directory of all inputs and store archive entries relative to it, including enough parent path components to disambiguate (e.g.,
parent1/Scriptsvsparent2/Scripts).When basenames don't collide, behavior is unchanged (backward compatible).
Example
Files Changed
src/utils/fs.rs- Addedhas_duplicate_basenames()andcommon_ancestor()helper functionssrc/archive/tar.rs- Use common ancestor approach when basenames collidesrc/archive/zip.rs- Same fix for zip buildersrc/archive/sevenz.rs- Same fix for 7z buildertests/integration.rs- Added tests for tar and zip with duplicate basenamesTest Results
All 91 tests pass (34 unit + 39 integration + 3 mime + 13 UI + 2 utils), including the 2 new tests:
compress_dirs_with_same_basename_tarcompress_dirs_with_same_basename_zip