Skip to content

compact: free space per pack with --threshold#9801

Draft
mr-raj12 wants to merge 1 commit into
borgbackup:masterfrom
mr-raj12:pack-files-compact
Draft

compact: free space per pack with --threshold#9801
mr-raj12 wants to merge 1 commit into
borgbackup:masterfrom
mr-raj12:pack-files-compact

Conversation

@mr-raj12

Copy link
Copy Markdown
Contributor

Description

Moves borg compact from deleting single objects to compacting whole packs, so it keeps working once a pack holds more than one object (N>1).

For each pack:

  • all objects unused: drop the pack file.
  • all objects used: leave it.
  • mixed: rewrite only if the unused bytes reach --threshold percent (default 40), copying the survivors into a new pack via Repository.compact_pack and dropping the old one. Below the threshold the pack is left alone, so a large pack is not rewritten just to reclaim a few bytes.

The chunk index is scanned twice to keep memory bounded: first only per-pack byte counts to decide each pack's fate, then the object ids of just the packs that change. The #9748 crash-safety order is preserved: cached chunk indexes are invalidated before the first store change.

At N=1 every pack holds one object, so mixed packs never occur and the behavior matches before. The rewrite path is covered by a test that forces max_count > 1.

This recycles the approach from #9777, which can be closed.

refs #8572 #8514

Checklist

  • PR is against master
  • New code has tests
  • Tests pass
  • Commit messages are clean and reference related issues

Group objects by their pack and act per pack: drop fully-unused packs, and
rewrite mixed packs whose unused bytes reach --threshold (default 40%) by
copying the survivors forward with compact_pack. Two index scans keep the
memory use bounded. refs borgbackup#8572 borgbackup#8514
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.75%. Comparing base (bf36090) to head (e292c2e).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9801      +/-   ##
==========================================
- Coverage   84.77%   81.75%   -3.02%     
==========================================
  Files          92       92              
  Lines       15251    15284      +33     
  Branches     2286     2295       +9     
==========================================
- Hits        12929    12496     -433     
- Misses       1622     2091     +469     
+ Partials      700      697       -3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some stuff i found...

pack_total, pack_unused = {}, {}
for id, entry in self.chunks.iteritems():
pid = entry.pack_id
pack_total[pid] = pack_total.get(pid, 0) + entry.obj_size

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what defaultdict type was made for.

continue # all used -> leave alone
if unused == total:
drop_packs.add(pid) # all unused -> drop the whole file
elif unused / total * 100 >= self.threshold:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 * unused / total is simpler to read.

Comment on lines +329 to +336
subparser.add_argument(
"--threshold",
metavar="PERCENT",
dest="threshold",
type=float,
default=40.0,
help="rewrite a pack when at least PERCENT of its bytes are unused (default: 40)",
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare cli syntax and default value to what we had in borg 1.4-maint.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.4 used an int cli option and defaulted to 10%.


repository = Repository(location, exclusive=True, create=True)
build_one_pack(repository, wasteful)
repository = reopen(repository)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't we find already that we don't really need reopen?

@ThomasWaldmann

ThomasWaldmann commented Jun 23, 2026

Copy link
Copy Markdown
Member

Also please rebase on current master (I have merged #9800 now).

@ThomasWaldmann

Copy link
Copy Markdown
Member

Have a look at that, maybe you can implement --dry-run rather easily?

#9379

@ThomasWaldmann

ThomasWaldmann commented Jun 23, 2026

Copy link
Copy Markdown
Member

I did a test run with slightly modified code (N=2) and got this. So guess the 100% display issue is not fixed yet?

borg compact --progress
Starting compaction / garbage collection...
Getting object IDs present in the repository...
Computing object IDs used by archives...

Cleaning archives directory from soft-deleted archives...
Deleting 4992 unused objects...
Compacting packs 0.0%
Compacting packs 0.1%
Compacting packs 0.2%
...
Compacting packs 99.9%

Overall statistics, considering all 0 archives in this repository:
Source data size was 0 B in 0 files.
Repository size is 0 B in 0 objects.
Compaction saved 39 MB.
Cleaning up files cache...
Removed 1 unused files cache files.
Finished compaction / garbage collection...

Also: even without -v, it is very verbose:

% borg compact                                                                    
Starting compaction / garbage collection...
Getting object IDs present in the repository...
Computing object IDs used by archives...
Analyzing archive arch2 2026-06-23 17:30:11.952749+02:00 ad56058509ce0dffa8e080f52c35a55008dd1279a38c7a6bf9f5037f279811f6 (1/2)
Analyzing archive arch2 2026-06-23 17:30:18.510445+02:00 b9ac90e796a20b53ed7759b3df0bec14489ca4cac53ca9d6eae175c154152a22 (2/2)
Cleaning archives directory from soft-deleted archives...
Deleting 1 unused objects...
Overall statistics, considering all 2 archives in this repository:
Source data size was 305 MB in 10552 files.
Repository size is 39 MB in 4992 objects.
Compaction saved 393 B.
Cleaning up files cache...
Removed 0 unused files cache files.
Finished compaction / garbage collection...

I suggest you try this manually to get a feel of it and find such issues yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants