Skip to content

ffi: add experimental fast FFI call API #63068

Draft
ShogunPanda wants to merge 2 commits intonodejs:mainfrom
ShogunPanda:fast-ffi
Draft

ffi: add experimental fast FFI call API #63068
ShogunPanda wants to merge 2 commits intonodejs:mainfrom
ShogunPanda:fast-ffi

Conversation

@ShogunPanda
Copy link
Copy Markdown
Contributor

Review Guide: Optional Fast FFI

Summary

This change adds an optional fast call path for the experimental node:ffi
module. The fast path uses V8 Fast API calls on the JavaScript/C++ boundary and
Cranelift-generated native trampolines on the C++/native boundary.

The feature remains opt-in at runtime with --experimental-fast-ffi and is now
also optional at build time with --without-fast-ffi. Builds without fast FFI do
not compile the fast C++ integration, do not enable the Cranelift Rust module,
and do not expose --experimental-fast-ffi in
process.allowedNodeEnvironmentFlags.

Baseline node:ffi support remains controlled by --experimental-ffi and the
existing --without-ffi / --shared-ffi build options.

TODO

  1. Add benchmarks.
  2. Verify on all other supported platforms.

Motivation

The existing FFI call path is general and compatibility-oriented. It supports
the full declared FFI surface, including slow conversions, but has substantial
per-call overhead for simple scalar and raw-memory signatures.

This change provides a faster path for a deliberately limited subset of valid
FFI signatures while preserving the existing compatibility path for everything
else.

User-Facing Behavior

Fast FFI is active only when all of these are true:

  1. Node is built with FFI support.
  2. Node is built with fast FFI support.
  3. The process is started with --experimental-ffi.
  4. The process is started with --experimental-fast-ffi.
  5. The requested signature is fast-path eligible.

If any condition is not met, the existing FFI path is used or the flag is
unavailable.

Build-Time Behavior

Default FFI build:

python3 configure.py --ninja

Expected config:

process.config.variables.node_use_ffi === 'true'
process.config.variables.node_use_fast_ffi === 'true'
process.allowedNodeEnvironmentFlags.has('--experimental-fast-ffi') === true

Fast FFI disabled:

python3 configure.py --ninja --without-fast-ffi

Expected config:

process.config.variables.node_use_ffi === 'true'
process.config.variables.node_use_fast_ffi === 'false'
process.allowedNodeEnvironmentFlags.has('--experimental-fast-ffi') === false

In a --without-fast-ffi build, passing --experimental-fast-ffi fails as an
unknown option.

Runtime Flags

--experimental-ffi

Enables the experimental node:ffi module when the binary was built with FFI
support.

--experimental-fast-ffi

Enables fast calls for supported FFI signatures when the binary was built with
fast FFI support. This flag requires --experimental-ffi.

Fast-Path Eligibility

Eligible signatures use scalar numeric values, pointers, registered callback
pointers, explicit buffers, and explicit array buffers.

Supported type families include:

  • void
  • i8, u8, i16, u16, i32, u32
  • i64, u64
  • f32, f64
  • pointer
  • explicit buffer
  • explicit arraybuffer
  • function as a registered callback pointer value

Unsupported signatures continue through the existing compatibility path.

Notably, string and str signatures intentionally remain on the compatibility
path because they allocate temporary UTF-8 storage per call. Performance-sensitive
C strings should be pre-encoded by user code and passed as explicit
buffer/arraybuffer values, including a trailing \0 when required by the C
API.

Correctness Constraints

Reviewers should pay particular attention to these invariants:

  1. u64 values preserve BigInt semantics.
  2. Narrow integer values are sign- or zero-extended correctly.
  3. f32 and f64 values preserve special values such as NaN, infinities, and
    -0.
  4. pointer accepts bigint and null in the fast path.
  5. Raw byte storage should use explicit buffer or arraybuffer signatures.
  6. JS strings passed to generic pointer remain compatibility-path behavior.
  7. Buffer-like values passed to generic pointer remain compatibility-path
    behavior.
  8. Closed dynamic libraries are detected before invoking the native target.
  9. Invalid buffer conversions throw the same coded Node errors as the baseline
    path when the fast sentinel branch handles the value.
  10. Unsupported signatures must never partially enter the fast path.

Safety Model

Fast FFI compiles a wrapper for a declared signature and native target slot. The
wrapper loads the native target from the dynamic function slot on each call, so a
closed library can be detected rather than calling a stale target.

Fast FFI assumes the user-provided signature accurately describes the native
function. Calling a native function with an incorrect signature is outside the
safety guarantees of node:ffi and remains user responsibility.

Build Integration

The build changes are intentionally split between baseline FFI and fast FFI:

  • node_use_ffi controls baseline node:ffi support.
  • node_use_fast_ffi controls fast FFI support.
  • HAVE_FFI guards baseline FFI code and options.
  • HAVE_FAST_FFI guards fast-only C++ storage, declarations, and CLI option
    registration.
  • src/ffi/fast.cc is compiled only when node_use_fast_ffi == "true".
  • Rust Cranelift dependencies are enabled only through the Cargo fast_ffi
    feature.

This keeps baseline bundled libffi builds independent from the Cranelift fast
path.

Rust Integration

The Rust crate under deps/crates remains the single Rust static library used by
Node. Fast FFI adds a gated module:

#[cfg(feature = "fast_ffi")]
pub mod node_fast_ffi;

Cranelift-related dependencies are optional and are enabled by:

cargo rustc --features fast_ffi

through deps/crates/crates.gyp when node_use_fast_ffi == "true".

Important Files

Build and configure:

  • configure.py
  • node.gyp
  • node.gypi
  • deps/crates/Cargo.toml
  • deps/crates/crates.gyp
  • deps/crates/src/lib.rs
  • deps/crates/src/node_fast_ffi.rs

C++ implementation:

  • src/node_ffi.cc
  • src/node_ffi.h
  • src/ffi/fast.cc
  • src/node_options.cc
  • src/node_options.h

Tests:

  • test/common/index.js
  • test/ffi/test-ffi-shared-buffer.js
  • test/ffi/test-ffi-fast.js
  • test/ffi/test-ffi-calls.js
  • test/ffi/test-ffi-dynamic-library.js
  • test/parallel/test-process-env-allowed-flags-are-documented.js

Docs:

  • doc/api/ffi.md
  • doc/api/cli.md
  • doc/node.1

Review Checklist

Build system:

  • --without-fast-ffi sets node_use_fast_ffi to false.
  • --without-ffi still disables all FFI support.
  • --shared-ffi does not accidentally enable fast FFI.
  • Baseline FFI still links libffi when fast FFI is disabled.
  • src/ffi/fast.cc is absent from disabled fast FFI builds.
  • Cranelift dependencies are optional Cargo dependencies.

C++ guards:

  • Fast-only fields in FFIFunction are behind HAVE_FAST_FFI.
  • PrepareFastFunction() is declared and called only under HAVE_FAST_FFI.
  • --experimental-fast-ffi is registered only under HAVE_FAST_FFI.
  • No disabled-build reference remains to experimental_fast_ffi.

Fast call behavior:

  • Fast wrappers preserve function name, length, and pointer metadata.
  • Unsupported signatures continue to use the baseline path.
  • Closed-library checks happen before invoking native code.
  • Invalid explicit buffer inputs preserve coded error behavior.
  • u64 remains BigInt-facing.

Tests and docs:

  • Fast-specific tests skip in builds without fast FFI.
  • Tests with --experimental-fast-ffi do not fail before skip logic can run.
  • allowedNodeEnvironmentFlags tests account for optional fast FFI.
  • CLI docs mention the flag is only available in fast FFI builds.
  • FFI docs explain strings and explicit buffer/arraybuffer guidance.

Verification Commands

Default fast-enabled build:

python3 configure.py --ninja
ninja -C out/Release -j8 node
out/Release/node --no-warnings --experimental-ffi --expose-internals test/ffi/test-ffi-shared-buffer.js
out/Release/node --no-warnings --experimental-ffi --expose-internals test/ffi/test-ffi-fast.js
out/Release/node --no-warnings test/parallel/test-process-env-allowed-flags-are-documented.js
python3 tools/test.py --mode=release ffi/test-ffi-fast

Confirm enabled flag state:

out/Release/node --no-warnings -p "[process.config.variables.node_use_fast_ffi, process.allowedNodeEnvironmentFlags.has('--experimental-fast-ffi')].join(' ')"

Expected output:

true true

Fast-disabled build:

python3 configure.py --ninja --without-fast-ffi
ninja -C out/Release -j8 node
out/Release/node --no-warnings --experimental-ffi --expose-internals test/ffi/test-ffi-shared-buffer.js
out/Release/node --no-warnings --experimental-ffi --expose-internals test/ffi/test-ffi-fast.js
out/Release/node --no-warnings test/parallel/test-process-env-allowed-flags-are-documented.js

Confirm disabled flag state:

out/Release/node --no-warnings -p "[process.config.variables.node_use_fast_ffi, process.allowedNodeEnvironmentFlags.has('--experimental-fast-ffi')].join(' ')"
out/Release/node --experimental-fast-ffi -e "0"

Expected behavior:

false false
out/Release/node: bad option: --experimental-fast-ffi

Whitespace check:

git diff --check

Verified Locally

Verified on macOS arm64:

  • Default fast-enabled configure completed successfully.
  • Default fast-enabled build completed successfully.
  • Fast FFI tests passed.
  • Baseline shared-buffer FFI tests passed.
  • allowedNodeEnvironmentFlags documentation test passed.
  • tools/test.py --mode=release ffi/test-ffi-fast passed.
  • --without-fast-ffi configure completed successfully.
  • --without-fast-ffi build completed successfully.
  • Baseline shared-buffer FFI tests passed in disabled build.
  • Fast FFI test skipped in disabled build.
  • --experimental-fast-ffi was absent from allowedNodeEnvironmentFlags in
    disabled build.
  • --experimental-fast-ffi was rejected by the disabled build.
  • git diff --check passed.

Cross-Platform Notes

Only macOS arm64 was verified locally. Reviewers should pay special attention to:

  • Windows Cargo target handling through deps/crates/cargo_build.py.
  • Linux executable-memory behavior from the Cranelift trampoline allocation.
  • macOS x64 and Linux x64 ABI behavior for narrow integer and floating-point
    signatures.
  • Shared FFI builds, where fast FFI is intentionally disabled.

PR Footer

Assisted-By: OpenAI:GPT-5.5 <openai/gpt-5.5>

Signed-off-by: Paolo Insogna <paolo@cowtech.it>
Assisted-By: OpenAI:GPT-5.5 <openai/gpt-5.5>
Signed-off-by: Paolo Insogna <paolo@cowtech.it>
Assisted-By: OpenAI:GPT-5.5 <openai/gpt-5.5>
@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/security-wg

@ShogunPanda ShogunPanda marked this pull request as draft May 1, 2026 15:39
@nodejs-github-bot nodejs-github-bot added build Issues and PRs related to build files or the CI. dependencies Pull requests that update a dependency file. needs-ci PRs that need a full CI run. labels May 1, 2026
@ShogunPanda ShogunPanda changed the title Fast ffi ffi: add experimental fast FFI call API May 1, 2026
@ShogunPanda ShogunPanda added the ffi Issues and PRs related to experimental Foreign Function Interface support. label May 1, 2026
@bengl
Copy link
Copy Markdown
Member

bengl commented May 1, 2026

At first glance, some observations:

  1. There are two big changes happening here. One is using V8 Fast API, the other is using cranelift. These probably can and probably should be two separate PRs, since they're not particularly dependent on each other.
  2. Why bother with the --experimental-fast-ffi flag? No one wants slow FFI. If cranelift etc. is already compiled in, and it's compatible with the platform, why treat it as a separate experimental feature from FFI itself? It's just an implementation detail.
  3. In my old experiments from 4 years ago, I had some trouble with V8 Fast FFI. In particular, paradoxically, I found it not particularly fast for my benchmarks compared to non-fast. YMMV, and things have changed dramatically since then, so let's see how this shakes out with benchmarks.
  4. Cranelift is not small, and you're using it here to do functionally the same thing as libffi does, but with more code on the Node.js side to maintain. It seems like we ought to pick one or the other, not optionally both. If we want to go down the road of using a JIT compiler to build trampolines, I'm curious how this compares against something like cjit/TCC. It would be good to see benchmarks there, also comparing against the current libffi approach.

Comment thread src/node_ffi.cc
ffi_args_heap.resize(nargs);
values = values_heap.data();
ffi_args = ffi_args_heap.data();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what MaybeStackBuffer is there for

Comment thread src/ffi/fast.cc
}

return true;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ style: This should return std::optional<std::pair<FastFFIType, CTypeInfo>>

Comment thread src/node_ffi.h
Comment on lines +34 to +37
std::shared_ptr<void> fast_code;
std::vector<v8::CTypeInfo> fast_arg_info;
std::unique_ptr<v8::CFunctionInfo> fast_function_info;
std::unique_ptr<v8::CFunction> fast_c_function;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to leave a TODO for me to clean up the allocation management here, having 10+ separate heap allocations for each function seems like a lot

Comment thread src/node_options.h
@@ -129,6 +129,9 @@ class EnvironmentOptions : public Options {
bool experimental_addon_modules = EXPERIMENTALS_DEFAULT_VALUE;
bool experimental_eventsource = EXPERIMENTALS_DEFAULT_VALUE;
bool experimental_ffi = EXPERIMENTALS_DEFAULT_VALUE;
#if HAVE_FAST_FFI
bool experimental_fast_ffi = EXPERIMENTALS_DEFAULT_VALUE;
#endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to echo what @bengl said – It seems like having the flag available unconditionally would not break anything and just make things easier (e.g. save you the file reexecution jumps you're hooping through in the tests).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is first-party Node.js core code, right? It probably shouldn't live in deps/ in the long run

Comment thread doc/api/ffi.md
allocate a temporary UTF-8 copy. For performance-sensitive C string APIs, encode
the string before invoking the native function, for example with `TextEncoder`,
and declare the parameter as `buffer` or `arraybuffer`. Include the trailing
`\0` byte when the native API expects a NUL-terminated string.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but that's also a temporary UTF-8 copy, just like passing a string directly would have been?

Comment thread src/ffi/fast.cc
kBuffer = 12,
};

bool ToToFastFFIType(ffi_type* type,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the double To intentional?

Comment thread src/ffi/fast.cc
};

bool ToToFastFFIType(ffi_type* type,
const std::string& type_name,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const std::string& type_name,
std::string_view type_name,

Comment thread src/node_ffi.cc
#if HAVE_FAST_FFI
PrepareFastFunction(env, fn.get());
const CFunction* fast_c_function = fn->fast_c_function.get();
#endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#endif
#else
const CFunction* fast_c_function = nullptr;
#endif

that lets you get rid of the much larger conditional below here

@ShogunPanda
Copy link
Copy Markdown
Contributor Author

At first glance, some observations:

  1. There are two big changes happening here. One is using V8 Fast API, the other is using cranelift. These probably can and probably should be two separate PRs, since they're not particularly dependent on each other.

Unfortunately that's not the case, as far as I understood this problem.

V8 Fast API optimize JS -> C++ entry, Cranelift generates the native wrapper that performs the ABI-correct call to the FFI target.

FFI signatures are declared at runtime, while V8 Fast API requires a concrete native signature for each fast callable. Cranelift is what turns the runtime FFI signature into such a concrete callable.

A libffi-only Fast API path is possible, but only for a finite set of predefined C++ wrapper signatures, and it would still route through ffi_call().

That would not provide the universal fast path this PR is trying to introduce.

  1. Why bother with the --experimental-fast-ffi flag? No one wants slow FFI. If cranelift etc. is already compiled in, and it's compatible with the platform, why treat it as a separate experimental feature from FFI itself? It's just an implementation detail.

@addaleax Also concurred on this below. I'll remove it.

  1. In my old experiments from 4 years ago, I had some trouble with V8 Fast FFI. In particular, paradoxically, I found it not particularly fast for my benchmarks compared to non-fast. YMMV, and things have changed dramatically since then, so let's see how this shakes out with benchmarks.

I'll attach some benchmarks tomorrow so we can compare.

  1. Cranelift is not small, and you're using it here to do functionally the same thing as libffi does, but with more code on the Node.js side to maintain. It seems like we ought to pick one or the other, not optionally both. If we want to go down the road of using a JIT compiler to build trampolines, I'm curious how this compares against something like cjit/TCC. It would be good to see benchmarks there, also comparing against the current libffi approach.

As far as I understand, TCC is LGPL which is not usable in Node.js? Am I wrong?

@addaleax
Copy link
Copy Markdown
Member

addaleax commented May 2, 2026

while V8 Fast API requires a concrete native signature for each fast callable

Does it? I haven't tried it out myself, but there are

CFunction(const void* address, const CFunctionInfo* type_info);
CFunctionInfo(const CTypeInfo& return_info, unsigned int arg_count,
              const CTypeInfo* arg_info,
              Int64Representation repr = Int64Representation::kNumber);

constructors available, which should allow constructing CFunction instances with runtime-supplied type information, no?

@ShogunPanda
Copy link
Copy Markdown
Contributor Author

I get a little confused here. I guess you're right, but what are they invoking? How are the target functions built?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Issues and PRs related to build files or the CI. dependencies Pull requests that update a dependency file. ffi Issues and PRs related to experimental Foreign Function Interface support. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants