compact: free space per pack with --threshold#9801
Conversation
Group objects by their pack and act per pack: drop fully-unused packs, and rewrite mixed packs whose unused bytes reach --threshold (default 40%) by copying the survivors forward with compact_pack. Two index scans keep the memory use bounded. refs borgbackup#8572 borgbackup#8514
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #9801 +/- ##
==========================================
- Coverage 84.77% 81.75% -3.02%
==========================================
Files 92 92
Lines 15251 15284 +33
Branches 2286 2295 +9
==========================================
- Hits 12929 12496 -433
- Misses 1622 2091 +469
+ Partials 700 697 -3 ☔ View full report in Codecov by Harness. |
| pack_total, pack_unused = {}, {} | ||
| for id, entry in self.chunks.iteritems(): | ||
| pid = entry.pack_id | ||
| pack_total[pid] = pack_total.get(pid, 0) + entry.obj_size |
There was a problem hiding this comment.
this is what defaultdict type was made for.
| continue # all used -> leave alone | ||
| if unused == total: | ||
| drop_packs.add(pid) # all unused -> drop the whole file | ||
| elif unused / total * 100 >= self.threshold: |
There was a problem hiding this comment.
100 * unused / total is simpler to read.
| subparser.add_argument( | ||
| "--threshold", | ||
| metavar="PERCENT", | ||
| dest="threshold", | ||
| type=float, | ||
| default=40.0, | ||
| help="rewrite a pack when at least PERCENT of its bytes are unused (default: 40)", | ||
| ) |
There was a problem hiding this comment.
compare cli syntax and default value to what we had in borg 1.4-maint.
There was a problem hiding this comment.
1.4 used an int cli option and defaulted to 10%.
|
|
||
| repository = Repository(location, exclusive=True, create=True) | ||
| build_one_pack(repository, wasteful) | ||
| repository = reopen(repository) |
There was a problem hiding this comment.
didn't we find already that we don't really need reopen?
|
Also please rebase on current master (I have merged #9800 now). |
|
Have a look at that, maybe you can implement |
|
I did a test run with slightly modified code (N=2) and got this. So guess the 100% display issue is not fixed yet? Also: even without -v, it is very verbose: I suggest you try this manually to get a feel of it and find such issues yourself. |
Description
Moves
borg compactfrom deleting single objects to compacting whole packs, so it keeps working once a pack holds more than one object (N>1).For each pack:
--thresholdpercent (default 40), copying the survivors into a new pack viaRepository.compact_packand dropping the old one. Below the threshold the pack is left alone, so a large pack is not rewritten just to reclaim a few bytes.The chunk index is scanned twice to keep memory bounded: first only per-pack byte counts to decide each pack's fate, then the object ids of just the packs that change. The #9748 crash-safety order is preserved: cached chunk indexes are invalidated before the first store change.
At N=1 every pack holds one object, so mixed packs never occur and the behavior matches before. The rewrite path is covered by a test that forces max_count > 1.
This recycles the approach from #9777, which can be closed.
refs #8572 #8514
Checklist
master