Skip to content

repair pqueue using current heap relation#192

Open
xcws52 wants to merge 1 commit into
MoatLab:masterfrom
xcws52:pri-fix
Open

repair pqueue using current heap relation#192
xcws52 wants to merge 1 commit into
MoatLab:masterfrom
xcws52:pri-fix

Conversation

@xcws52

@xcws52 xcws52 commented Jun 22, 2026

Copy link
Copy Markdown

Description

Repair pqueue_change_priority() by selecting the heap-repair direction from the updated node's relationship with its current parent.

Some FEMU callers update the object's priority field before invoking pqueue_change_priority(). The current implementation then reads the already-updated value as old_pri, making old_pri == new_pri.

A decreased priority consequently takes the percolate_down() path instead of bubble_up(), allowing the heap root to become stale.

This is visible in FDP global-greedy GC, where an RU with a smaller vpc can remain below an RU with a larger vpc.

Related issue

Fixes #191

Testing

I tested this by running YCSB workload A+RocksDB+XFS with 20M record and 50M operationon a 32G(25% OP) FDP SSD, the benchmark can finished after this patch.

2026-06-22 08:09:41 630 sec: 19070320 operations; [INSERT: Count=19070320 Max=3185573.89 Min=4.01 Avg=29.28 90=9.02 99=1123.33 99.9=2238.46 99.99=2308.09]
2026-06-22 08:09:51 640 sec: 19077555 operations; [INSERT: Count=19077555 Max=3185573.89 Min=4.01 Avg=29.63 90=9.04 99=1124.35 99.9=2238.46 99.99=3313.66]
2026-06-22 08:10:01 650 sec: 19084792 operations; [INSERT: Count=19084792 Max=3185573.89 Min=4.01 Avg=30.30 90=9.06 99=1126.40 99.9=2240.51 99.99=3319.81]
2026-06-22 08:10:11 660 sec: 19092028 operations; [INSERT: Count=19092028 Max=3185573.89 Min=4.01 Avg=30.81 90=9.08 99=1130.49 99.9=2240.51 99.99=3323.90]
2026-06-22 08:10:21 670 sec: 19099263 operations; [INSERT: Count=19099263 Max=3185573.89 Min=4.01 Avg=31.31 90=9.10 99=1131.52 99.9=2242.56 99.99=3323.90]
2026-06-22 08:10:31 680 sec: 19106499 operations; [INSERT: Count=19106499 Max=3185573.89 Min=4.01 Avg=31.65 90=9.12 99=1132.54 99.9=2246.66 99.99=3323.90]
2026-06-22 08:10:41 690 sec: 19113735 operations; [INSERT: Count=19113735 Max=3185573.89 Min=4.01 Avg=32.33 90=9.14 99=1132.54 99.9=2250.75 99.99=3325.95]
2026-06-22 08:10:51 700 sec: 19389684 operations; [INSERT: Count=19389684 Max=3185573.89 Min=4.01 Avg=32.02 90=9.10 99=1132.54 99.9=2250.75 99.99=3325.95]
2026-06-22 08:11:01 710 sec: 19736228 operations; [INSERT: Count=19736228 Max=3185573.89 Min=4.01 Avg=32.12 90=9.05 99=1132.54 99.9=2248.70 99.99=3325.95]
2026-06-22 08:11:10 719 sec: 20000000 operations; [INSERT: Count=20000000 Max=3185573.89 Min=4.01 Avg=32.20 90=9.02 99=1131.52 99.9=2248.70 99.99=3325.95]
Load runtime(sec): 719.232
Load operations(ops): 20000000
Load throughput(ops/sec): 27807.4
2026-06-22 08:45:50 2080 sec: 49481836 operations; [READ: Count=24740359 Max=11378.69 Min=1.19 Avg=35.56 90=72.45 99=120.32 99.9=314.88 99.99=820.74] [UPDATE: Count=24741477 Max=683147.26 Min=5.93 Avg=46.85 90=82.88 99=134.91 99.9=1434.62 99.99=2570.24]
2026-06-22 08:46:00 2090 sec: 49728763 operations; [READ: Count=24863702 Max=11378.69 Min=1.19 Avg=35.56 90=72.38 99=120.25 99.9=314.62 99.99=820.74] [UPDATE: Count=24865061 Max=683147.26 Min=5.93 Avg=46.86 90=82.81 99=134.78 99.9=1433.60 99.99=2570.24]
2026-06-22 08:46:10 2100 sec: 49991666 operations; [READ: Count=24995131 Max=11378.69 Min=1.19 Avg=35.54 90=72.38 99=120.13 99.9=314.37 99.99=820.74] [UPDATE: Count=24996535 Max=683147.26 Min=5.93 Avg=46.86 90=82.81 99=134.66 99.9=1432.58 99.99=2570.24]
2026-06-22 08:46:11 2101 sec: 50000000 operations; [READ: Count=24999273 Max=11378.69 Min=1.19 Avg=35.54 90=72.38 99=120.13 99.9=314.37 99.99=820.74] [UPDATE: Count=25000727 Max=683147.26 Min=5.93 Avg=46.86 90=82.81 99=134.66 99.9=1432.58 99.99=2570.24]
Run runtime(sec): 2101.04
Run operations(ops): 50000000
Run throughput(ops/sec): 23797.8

@xcws52 xcws52 changed the title femu: repair pqueue using current heap relation repair pqueue using current heap relation Jun 22, 2026
@inhoinno

inhoinno commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

This is an unexpected code change for me.
pqueue_change_priority() is the library for FEMU bbssd for a while.
I would like to ask clarification - 1. why this is a necessary change, 2. what is the actual issue and the reason why this function is a bug.

@xcws52

xcws52 commented Jun 23, 2026

Copy link
Copy Markdown
Author

I can't reproduce this RU exhausted scenario, the workload I use is too huge beyond the capacity.

I'm testing a much smaller workload again now.

But during the victim selecting, I found this, there are consecutive FTL logs, background gc first choose RU96 then RU124, RU39, no GC action occured in this period, why background GC will select RU96 again and even select RU39 for GC victim? Is this a proper behavior? I think backgound GC will always select RU 124.

[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 96 ipc 6144 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 124 ipc 6310 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 96 ipc 6144 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 124 ipc 6310 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 96 ipc 6144 threshold 8192 full 65536)
[FEMU] FDP-Trace: GC_BACK_RESERT triggered but delay GC (ru 39 ipc 5635 threshold 8192 full 65536)

@xcws52

xcws52 commented Jun 23, 2026

Copy link
Copy Markdown
Author

I decrease the threshold of forground GC from 95% to 80%, background GC from 75% to 65% respectively, and 25%OP->10%OP, using same YCSB workload A with 20M record and 50M operation. Then I reproduce the RU exhausted problem.

Such workload is too heavy for a 32G SSD and in the last but two testing, FEMU stuck at a moment when YCSB load 20M record successfully then process 28M operation and output one same GC for a victim RU repeatly(This may be a proper phenomenon I guess).

But this time, same problem occured, I added some code line for checking the victim queue, the code patch is attached in attach file, what I do is checking the hole pqueue before FEMU output No free RUs.
femu-dump-victim-heap-on-exhaustion.patch

@@ -1092,6 +1234,7 @@
 
     ru = QTAILQ_FIRST(&rm->free_ru_list);
     if (!ru) {
+        dump_victim_heap_at_exhaustion(ssd, rg);
         ftl_err("No free RUs left in rg[%d]\n", rg->rgidx);
         return NULL;
     }

In the logs which outputed by my validation code, the root of such heap is not the best victim for GC, for a greedy GC policy, I guess this may not a proper result. I will attach the full FTL log, It report the entire pqueue, and some of node doesn't follow the heap restriction.
femu-fdp-2026-06-23-012548.log

[FEMU] FTL-Err: FDP_HEAP_ROOT ru=6 pri=65132 vpc=65132 ipc=404 pos=1
[FEMU] FTL-Err: FDP_HEAP_TRUE_BEST ru=29 pri=50995 vpc=50995 ipc=14541 pos=9

But the relationship between the wrong order in victim RU and RUs exhausting is opaque, I do not really understand them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Wrong RU victim priority queue let FEMU exhaused all RU unexpectively.

2 participants