perf: hash-adjacency overhaul#2421
Open
jrgemignani wants to merge 1 commit intoapache:masterfrom
Open
Conversation
Replaces the per-graph adjacency map with a Robin Hood open-addressing
hashtable (agehash) and an embedded flat-array edge list, removing the
hottest dynahash path on IC1 and shrinking the largest hashtable AGE
keeps. Stages land as one commit:
S1 MurmurHash3 fmix64 for graphid hashtables (replaces tag_hash)
S2 Precompute graphid hash; share across paired DFS lookups
S3 Replace ListGraphId adjacency with embedded flat-array
VertexEdgeArray (single palloc, contiguous iteration)
S4 Batched MLP lookup pipeline in add_valid_vertex_edges
S5/C1 agehash library: INLINE Robin Hood hashtable with
_with_hash API, freeze, iter, and a regress-only selftest
S5/C2 Wire global graph edge_hashtable through agehash;
drop edge_id from edge_entry (key lives in slot header);
AGEHASH_MAX_LOAD=0.85; MemoryContextAllocHuge for SF10+
Performance (SF3 LDBC SNB, 5 runs/3 warmup, vs clean master baseline_v2):
IC1 8,625 → 7,117 ms −17.49 % (the headline; hashtable-bound)
IU1 40 → 35 ms −11.86 % (heaviest update; lookup-bound)
IC sum 198,958 → 197,367 ms −0.80 % (suite-level noise)
IS sum 1,009 → 1,028 ms +1.86 % (IS3 jitter; sub-ms)
IU sum 77 → 72 ms −6.64 %
IC2/3/4/5/6/7/8/9/10/11/12: parity (within ±3.3 %, mostly ±1.5 %)
The VLE-DFS-heavy queries (IC3/5/6/9/11) sit at parity: with
hash_search_with_hash_value at ≤1 % inclusive on their baseline
flames, no hashtable swap can recover meaningful wall-time on them.
Memory: removing edge_id from edge_entry saves ~416 MB on SF3 and
~1.4 GB on SF10 for the global graph's edge_hashtable. Slot capacity
uses MemoryContextAllocHuge so SF10+ edge tables can be built.
Adds:
src/backend/utils/cache/agehash.c, src/include/utils/agehash.h
regress/sql/agehash.sql + expected/agehash.out (boundary selftest)
_agehash_self_test() in both fresh-install and upgrade SQL
Tested on PostgreSQL 18.3 (REL_18_STABLE): all 35 regression tests
pass (installcheck), warning-free build.
Co-authored-by: Claude <noreply@anthropic.com>
modified: Makefile
modified: age--1.7.0--y.y.y.sql
new file: regress/expected/agehash.out
new file: regress/sql/agehash.sql
modified: sql/age_main.sql
modified: src/backend/utils/adt/age_global_graph.c
modified: src/backend/utils/adt/age_vle.c
new file: src/backend/utils/cache/agehash.c
modified: src/include/utils/age_global_graph.h
new file: src/include/utils/agehash.h
There was a problem hiding this comment.
Pull request overview
This PR introduces a new high-performance open-addressing hashtable (agehash) and rewires AGE’s global-graph edge lookup + VLE traversal hot paths to reduce dynahash overhead and improve cache locality.
Changes:
- Add
agehash(INLINE Robin Hood hashtable) plus SQL/regress self-test coverage. - Replace global graph
edge_hashtable(dynahash) withedge_table(agehash) and removeedge_idfrom the edge payload (key lives in the slot header). - Replace per-vertex linked-list adjacency with embedded flat arrays and batch VLE edge lookups using precomputed
graphid_hash.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/include/utils/agehash.h |
Defines the agehash public/internal API contract, slot layout, and helper macros. |
src/backend/utils/cache/agehash.c |
Implements INLINE Robin Hood hashtable + iterator + SQL-callable self-test. |
src/include/utils/age_global_graph.h |
Introduces VertexEdgeArray, adds get_edge_entry_with_hash, and exposes graphid_hash/graphid_keyeq. |
src/backend/utils/adt/age_global_graph.c |
Switches global edge storage to agehash, embeds vertex adjacency arrays, implements graphid_hash, and updates accessors/freeing. |
src/backend/utils/adt/age_vle.c |
Reuses precomputed graphid hashes across paired lookups and batches adjacency processing for better MLP. |
sql/age_main.sql |
Adds SQL declaration for _agehash_self_test() for fresh installs. |
age--1.7.0--y.y.y.sql |
Adds _agehash_self_test() to the extension upgrade script. |
regress/sql/agehash.sql |
Adds regression test invoking the agehash self-test. |
regress/expected/agehash.out |
Expected output for the new agehash regression test. |
Makefile |
Builds agehash.o and adds agehash to the regression test list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+141
to
+143
| t->payload_size = (uint32) payload_size; | ||
| t->payload_offset = AGEHASH_SLOT_KEY_OFFSET + (uint32) key_size; | ||
| t->slot_size = MAXALIGN(t->payload_offset + (uint32) payload_size); |
Comment on lines
+905
to
+907
| * value. The dynahash table keyed on graphid is shared with edge_hashtable | ||
| * elsewhere, so callers can compute graphid_hash() once and reuse it for | ||
| * lookups in both tables. |
| char carry_payload[4096]; | ||
| void *result_payload = NULL; | ||
| bool placed_caller = false; | ||
|
|
| Assert(payload_size > 0 && payload_size <= 4096); | ||
| Assert(hash_fn != NULL); | ||
| Assert(keyeq_fn != NULL); | ||
|
|
Comment on lines
+305
to
+308
| * Probe distance overflow guard. With AGEHASH_MAX_LOAD = 0.7 and a | ||
| * non-degenerate hash function, max probe is empirically <= 32. | ||
| * The 0xFE00 ceiling reserves headroom while leaving probe_dist | ||
| * well clear of the AGEHASH_EMPTY sentinel. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces the per-graph adjacency map with a Robin Hood open-addressing hashtable (agehash) and an embedded flat-array edge list, removing the hottest dynahash path on IC1 and shrinking the largest hashtable AGE keeps. Stages land as one commit:
S1 MurmurHash3 fmix64 for graphid hashtables (replaces tag_hash)
S2 Precompute graphid hash; share across paired DFS lookups
S3 Replace ListGraphId adjacency with embedded flat-array
VertexEdgeArray (single palloc, contiguous iteration)
S4 Batched MLP lookup pipeline in add_valid_vertex_edges
S5/C1 agehash library: INLINE Robin Hood hashtable with
_with_hash API, freeze, iter, and a regress-only selftest
S5/C2 Wire global graph edge_hashtable through agehash;
drop edge_id from edge_entry (key lives in slot header);
AGEHASH_MAX_LOAD=0.85; MemoryContextAllocHuge for SF10+
Performance (SF3 LDBC SNB, 5 runs/3 warmup, vs clean master baseline_v2):
IC1 8,625 → 7,117 ms −17.49 % (the headline; hashtable-bound)
IU1 40 → 35 ms −11.86 % (heaviest update; lookup-bound)
IC sum 198,958 → 197,367 ms −0.80 % (suite-level noise)
IS sum 1,009 → 1,028 ms +1.86 % (IS3 jitter; sub-ms)
IU sum 77 → 72 ms −6.64 %
IC2/3/4/5/6/7/8/9/10/11/12: parity (within ±3.3 %, mostly ±1.5 %)
The VLE-DFS-heavy queries (IC3/5/6/9/11) sit at parity: with hash_search_with_hash_value at ≤1 % inclusive on their baseline flames, no hashtable swap can recover meaningful wall-time on them.
Memory: removing edge_id from edge_entry saves ~416 MB on SF3 and ~1.4 GB on SF10 for the global graph's edge_hashtable. Slot capacity uses MemoryContextAllocHuge so SF10+ edge tables can be built.
Adds:
src/backend/utils/cache/agehash.c, src/include/utils/agehash.h
regress/sql/agehash.sql + expected/agehash.out (boundary selftest)
_agehash_self_test() in both fresh-install and upgrade SQL
Tested on PostgreSQL 18.3 (REL_18_STABLE): all 35 regression tests pass (installcheck), warning-free build.
Co-authored-by: Claude noreply@anthropic.com
modified: Makefile
modified: age--1.7.0--y.y.y.sql
new file: regress/expected/agehash.out
new file: regress/sql/agehash.sql
modified: sql/age_main.sql
modified: src/backend/utils/adt/age_global_graph.c
modified: src/backend/utils/adt/age_vle.c
new file: src/backend/utils/cache/agehash.c
modified: src/include/utils/age_global_graph.h
new file: src/include/utils/agehash.h