Context
We need C API functions to estimate memory requirements before building a Vamana index. Currently we have no way to gate the build on available memory — we can only fail after the fact, after the user has already waited through a full table scan.
Use Case
Our goal is a pre-flight check in vamanabuild.c:
- Estimate peak memory required given vector count, dimensions, and index parameters
- Check whether
maintenance_work_mem (or system memory) is sufficient
- Build if yes; abort early with a clear error if no
Requested API
/**
* Estimate peak memory (bytes) needed to build a Vamana index.
*
* @param builder Configured builder handle (algorithm + storage set)
* @param num_vectors Number of vectors to be indexed
* @param dimensions Vector dimensionality
* @param out_err Error output handle
* @return Estimated peak memory in bytes, or 0 on error
*/
SVS_API size_t svs_index_estimate_build_memory(
svs_index_builder_h builder,
size_t num_vectors,
size_t dimensions,
svs_error_h out_err
);
/**
* Estimate steady-state memory (bytes) of the built index (for search).
*
* @param builder Configured builder handle (algorithm + storage set)
* @param num_vectors Number of vectors to be indexed
* @param dimensions Vector dimensionality
* @param out_err Error output handle
* @return Estimated in-memory index size in bytes, or 0 on error
*/
SVS_API size_t svs_index_estimate_search_memory(
svs_index_builder_h builder,
size_t num_vectors,
size_t dimensions,
svs_error_h out_err
);
Taking svs_index_builder_h as input means the estimates automatically reflect the configured storage type (float32/16, LVQ, LeanVec, SQ) and algorithm parameters (graph degree, window size, etc.) without us duplicating that logic in the extension.
Memory Components
Build-time peak (svs_index_estimate_build_memory):
- Graph adjacency list:
num_vectors × (graph_max_degree + 1) × sizeof(Idx)
- Raw vector buffer:
num_vectors × dimensions × sizeof(float) (needed before quantization)
- Temporary candidate pools / search history (scales with
window_size, use_full_search_history, thread count)
Steady-state / search (svs_index_estimate_search_memory):
- Graph adjacency list (same formula)
- Stored vectors after compression (depends on storage type)
- Metadata (entry points, TID maps, etc.)
Upper-bound estimates are acceptable — we just need something accurate enough to make a go/no-go decision.
@mihaic @rfsaliev , please review the issue when you get a chance
Context
We need C API functions to estimate memory requirements before building a Vamana index. Currently we have no way to gate the build on available memory — we can only fail after the fact, after the user has already waited through a full table scan.
Use Case
Our goal is a pre-flight check in
vamanabuild.c:maintenance_work_mem(or system memory) is sufficientRequested API
Taking
svs_index_builder_has input means the estimates automatically reflect the configured storage type (float32/16, LVQ, LeanVec, SQ) and algorithm parameters (graph degree, window size, etc.) without us duplicating that logic in the extension.Memory Components
Build-time peak (
svs_index_estimate_build_memory):num_vectors × (graph_max_degree + 1) × sizeof(Idx)num_vectors × dimensions × sizeof(float)(needed before quantization)window_size,use_full_search_history, thread count)Steady-state / search (
svs_index_estimate_search_memory):Upper-bound estimates are acceptable — we just need something accurate enough to make a go/no-go decision.
@mihaic @rfsaliev , please review the issue when you get a chance