perf(redis-lua): pool *lua.LState to cut GC pressure by bootjp · Pull Request #580 · bootjp/elastickv

bootjp · 2026-04-21T19:26:38Z

Summary

Pools *lua.LState across EVAL / EVALSHA invocations, reusing the expensive base-lib plus redis/cjson/cmsgpack initialisation instead of minting a fresh VM every call.
Heap profiles in production showed ~34% of in-use heap in gopher-lua allocations (newFuncContext 11.8%, newRegistry 11.3%, newFunctionProto 10.5%); a BullMQ-style workload (~10 scripts/s) is exactly the shape that this wasted.
Micro-bench on darwin/arm64 (M1 Max, go 1.26): ~5x fewer allocs/op, ~5x less B/op, ~4.5x faster per eval (see below).

Security invariant

No script state must leak from one evaluator to the next. On release, the pool:

Walks the state global table and deletes any key not present in the snapshot captured at pool-fill time -- this removes user-introduced globals such as KEYS, ARGV, or a leaking GLOBAL_LEAK.
Restores every snapshot key to its original value, so a script that rebinds allowed globals (e.g. redis = nil, string = {upper = evil}) cannot poison the next user.
Truncates the value stack and unbinds the per-eval *luaScriptContext so stale redis.call / redis.pcall invocations cannot fire.

The pre-registered redis.call / redis.pcall closures dispatch the per-eval context via a *lua.LState to *luaScriptContext binding map (set on acquire, cleared on release), which is what makes pooling safe without re-registering closures every eval.

Benchmark

BenchmarkLuaState_NewVsPooled (darwin/arm64, Apple M1 Max, go 1.26):

new_state_per_call   59255 ns/op   194417 B/op   543 allocs/op
pooled_state         13306 ns/op    39700 B/op   102 allocs/op

Rough extrapolation to the 34% heap slice cited in the profile: the pooled path allocates roughly 1/5 the bytes per script, so that slice should compress to ~7-8% of in-use heap in steady state (actual number depends on script mix and sync.Pool eviction rate under GC).

Tests added

TestLua_VMReuseDoesNotLeakGlobals -- the load-bearing leak test. Script A sets GLOBAL_LEAK = 42 plus a LEAKY_TABLE, script B asserts both are nil on a recycled VM.
TestLua_VMReuseRestoresRebindsWhitelistedGlobals -- sabotages redis and string.upper on the pooled state and confirms they are restored.
TestLua_PoolRecordsReuseVsAllocation -- proves via the hit counter that the pool actually hands back existing VMs.
TestRedis_LuaPoolNoGlobalLeakEndToEnd -- drives the full EVAL path on a real RedisServer and confirms cross-script leakage cannot happen through the user-facing protocol.

Test plan

go test -race -short ./adapter/... (green locally, ~60s)
golangci-lint --config=.golangci.yaml run ./adapter/... -> 0 issues
go test -run='^$' -bench=BenchmarkLuaState_NewVsPooled -benchmem ./adapter/ shows ~5x reduction in B/op and allocs/op
CI green on this branch

EVAL / EVALSHA previously minted a fresh *lua.LState per call. Production heap profiles showed ~34% of in-use heap in gopher-lua allocs (newFuncContext 11.8%, newRegistry 11.3%, newFunctionProto 10.5%), driven by BullMQ-style ~10 scripts/s workloads. Each Eval allocating a VM is pure waste. This change introduces a sync.Pool-backed *lua.LState pool. Pooled states are pre-initialised once with base libs + nil-ed dangerous loaders + redis/cjson/cmsgpack modules. The redis module's call/pcall closures resolve the per-eval *luaScriptContext out of a global binding map (set on acquire, cleared on release) instead of capturing it, so the state does not need to be rewired with fresh closures every eval -- which is what makes pooling both safe and cheap. Reset logic (security invariant): on release the pool 1) walks the state's global table and deletes any key not present in the snapshot captured at pool-fill time -- this removes user-introduced globals such as KEYS, ARGV, or a leaking GLOBAL_LEAK, 2 restores every snapshot key to its original value, so a script that rebinds allowed globals (redis = nil, string = {upper = evil}) cannot poison the next user, and 3) truncates the value stack and unbinds the script context so stale redis.call invocations cannot fire. No script from evaluator A can observe or influence state in evaluator B's run. Tests: - TestLua_VMReuseDoesNotLeakGlobals covers the load-bearing invariant for GLOBAL_LEAK-style cases. - TestLua_VMReuseRestoresRebindsWhitelistedGlobals covers sabotage of whitelisted globals. - TestLua_PoolRecordsReuseVsAllocation proves the pool is actually being used via a hit counter. - TestRedis_LuaPoolNoGlobalLeakEndToEnd drives the full EVAL path on a live server. BenchmarkLuaState_NewVsPooled (darwin/arm64, Apple M1 Max): new_state_per_call 59255 ns/op 194417 B/op 543 allocs/op pooled_state 13306 ns/op 39700 B/op 102 allocs/op -> ~5x fewer allocs, ~5x less B/op, ~4.5x faster per eval. EOF )

coderabbitai · 2026-04-21T19:26:45Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 0 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 53 minutes and 0 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c035e2ad-95be-45dc-8f63-df0296ecccf4

📥 Commits

Reviewing files that changed from the base of the PR and between bbd7850 and eacc395.

📒 Files selected for processing (4)

adapter/redis.go
adapter/redis_lua.go
adapter/redis_lua_pool.go
adapter/redis_lua_pool_test.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/lua-vm-pool

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a luaStatePool to reuse *lua.LState instances, significantly reducing heap and GC pressure for high-rate Lua script workloads. The implementation includes a mechanism to snapshot and reset global variables to ensure isolation between script executions. Feedback highlights several critical areas: the current reset logic only handles string-keyed globals and fails to revert mutations within standard library tables, potentially leading to state leakage or environment poisoning. Additionally, there are potential nil pointer dereferences in redis.call and redis.pcall when the script context is missing, and the use of a global mutex for context lookup may become a performance bottleneck under high concurrency.

gemini-code-assist · 2026-04-21T19:29:24Z

+// not attempt to deep-clone them; if a script mutates
+// `table.insert = nil`, the restore re-binds the original function
+// value which is still alive.
+func snapshotGlobals(state *lua.LState) map[string]lua.LValue {


snapshotGlobals only captures string-keyed globals. Lua allows any value type (except nil and NaN) as a key in the global table _G. A script could intentionally or accidentally leak state by setting a non-string key (e.g., _G[42] = "secret"), which would bypass the current reset logic and persist across pooled state reuses.

References

Avoid silently dropping entries during normalization or state capture to ensure state consistency and prevent data leaks.

gemini-code-assist · 2026-04-21T19:29:24Z

+// the whitelisted ones, then truncates the value stack. It is the
+// heart of the security invariant: anything the script did to globals
+// must not be observable by the next user.
+func (p *pooledLuaState) reset() {


The reset logic is vulnerable to table poisoning. While it restores global variable references (e.g., the string table itself), it does not undo mutations to the fields within those tables (e.g., string.upper = function() return "pwned" end).

Since standard library tables are mutable in gopher-lua and the snapshot only holds a reference to the table object, any mutation to the table's contents will persist and poison subsequent script executions. The test TestLua_VMReuseRestoresRebindsWhitelistedGlobals only verifies that rebinding the global name string is handled, but it does not check for mutation of the original table's fields.

gemini-code-assist · 2026-04-21T19:29:24Z

+			if !ok {
+				scriptState.RaiseError("redis.call invoked without an active script context")
+				return 0
+			}


If luaBindContext is called with a nil context (as seen in the benchmarks), luaLookupContext will return nil, true. The current check if !ok will pass, leading to a nil pointer dereference when luaRedisCommand attempts to use the context. The check should explicitly verify that the context is non-nil.

Suggested change

if !ok {

scriptState.RaiseError("redis.call invoked without an active script context")

return 0

}

if !ok || ctx == nil {

scriptState.RaiseError("redis.call invoked without an active script context")

return 0

}

References

When using context.Context for managing operations, ensure it is correctly propagated and validated to prevent runtime errors.

gemini-code-assist · 2026-04-21T19:29:24Z

+			if !ok {
+				scriptState.Push(luaErrorTable(scriptState, "redis.pcall invoked without an active script context"))
+				return 1
+			}


Similar to redis.call, redis.pcall does not check if the resolved context is nil. If luaLookupContext returns nil, true, this will result in a nil pointer dereference in luaRedisCommand.

Suggested change

if !ok {

scriptState.Push(luaErrorTable(scriptState, "redis.pcall invoked without an active script context"))

return 1

}

if !ok || ctx == nil {

scriptState.Push(luaErrorTable(scriptState, "redis.pcall invoked without an active script context"))

return 1

}

References

When using context.Context for managing operations, ensure it is correctly propagated and validated to prevent runtime errors.

gemini-code-assist · 2026-04-21T19:29:24Z

+var luaStateBindings = struct {
+	sync.RWMutex
+	m map[*lua.LState]*luaScriptContext
+}{m: map[*lua.LState]*luaScriptContext{}}


Using a global map protected by a single sync.RWMutex for luaStateBindings will likely become a performance bottleneck under high concurrency. Every redis.call and redis.pcall across all concurrent scripts must acquire an RLock on this global mutex to resolve the context.

Consider storing the *luaScriptContext directly within the *lua.LState's registry (using lua.RegistryIndex) when the state is acquired. This localizes the context lookup to the specific Lua state and eliminates the need for global locking.

References

For frequently accessed fields, prefer mechanisms that avoid global mutex contention to improve performance on hot paths.

gemini-code-assist · 2026-04-21T19:29:24Z

+
+	// Collect all current string keys first; mutating the table in
+	// ForEach is unsafe.
+	currentKeys := make([]string, 0, len(p.globalsSnapshot)+luaResetKeySlack)


The currentKeys slice is allocated on every call to reset. To further reduce GC pressure (the primary goal of this PR), consider storing this slice within the pooledLuaState struct and reusing it (after clearing it) across resets.

References

Pre-allocating or reusing buffers is acceptable to manage memory bounds and reduce allocation overhead.

Addresses three findings from the gemini-code-assist review on #580. 1. Non-string-keyed globals survived reset. The old snapshot only captured string keys, so a script doing `_G[42] = "secret"` or `_G[true] = "bad"` leaked across pool reuse. Snapshot now uses an LValue-keyed map and reset iterates every key type. Switched to RawSet (not RawSetH) because gopher-lua stores integer keys in an internal array slice that RawSetH does not touch. 2. Table poisoning survived reset. Restoring the `string` / `math` / `redis` / etc. GLOBAL references did not undo in-place mutations of those tables' fields (e.g. `string.upper = function() return "pwned" end`). We now snapshot every whitelisted nested table's shallow field set at pool init and restore it on reset (Option A / B from the review), keyed by *LTable pointer so the fix still applies even if a script rebinds the outer global name. 3. `redis.call` / `redis.pcall` nil-deref. A nil *luaScriptContext stored via luaBindContext(nil) -- which the bench path does -- produced a (nil, true) lookup, then panicked inside luaRedisCommand. Guards now explicitly check `!ok || ctx == nil`. New regression tests (all pass under -race -count=5): - TestLua_VMReuseNonStringGlobalKeysAreWiped - TestLua_VMReuseDoesNotPoisonStringLib - TestLua_VMReuseDoesNotPoisonRedisModule - TestLua_PoolNilContextProducesErrorNotPanic

bootjp · 2026-04-21T21:07:06Z

Addressed the three flagged findings (2x SECURITY-HIGH + 1x HIGH) in c56a7f0. Picked Option B for the table-poisoning fix (snapshot every whitelisted nested table's shallow field set at init, restore on reset) over Option C because the fix is tractable, the reset cost is O(fields) per table -- tiny in practice -- and shipping the pool off by default would forfeit the 4-5x heap win the PR is about.

1. Non-string-keyed globals leaking (SECURITY-HIGH, line 171)

snapshotGlobals now returns map[lua.LValue]lua.LValue and captures every key the init-time _G.ForEach yields, not just LString ones.
Reset collects current keys (any type) via ForEach and wipes the diff against the snapshot.
Subtle gopher-lua gotcha caught while writing the test: integer keys live in an internal array slice, and RawSetH(LNumber(42), LNil) only touches dict, leaving the array entry alive. Switched the reset loops to RawSet (which dispatches by key type).
Verified by TestLua_VMReuseNonStringGlobalKeysAreWiped: script A sets _G[42] = \"leak\"; _G[true] = \"bad\", script B asserts both are nil.

2. Table poisoning surviving reset (SECURITY-HIGH, line 186)

Added tableSnapshots map[*lua.LTable]map[lua.LValue]lua.LValue: at init we shallow-copy the fields of every LTable-valued global (string, math, table, redis, cjson, cmsgpack; _G itself is skipped since it's the globals table).
Reset runs resetTableContents on every snapshotted table BEFORE top-level globals are restored -- the outer map is keyed by *LTable, so even if a script did string = nil, the original string table is still reachable via the snapshot and its fields are restored.
Verified by TestLua_VMReuseDoesNotPoisonStringLib (script A replaces string.upper with a pwned function + adds a rogue string.pwn field; script B asserts string.upper(\"x\") == \"X\" and string.pwn == nil) and a sibling TestLua_VMReuseDoesNotPoisonRedisModule that does the same for redis.sha1hex.

3. Nil context nil-pointer deref (HIGH, lines 266 / 274)

Changed both closures to if !ok || ctx == nil as suggested. Matches the bench path that calls pool.get(nil).
Verified by TestLua_PoolNilContextProducesErrorNotPanic, which drives both redis.call (must raise a Lua error) and redis.pcall (must push an error table, not panic).

Verification

go test -race -count=1 -short ./adapter/... -- 384 tests, all pass.
go test -race -count=5 -short -run 'TestLua_VMReuse|TestLua_Pool|TestRedis_LuaPool' -v ./adapter/ -- the 9 pool-relevant tests (incl. the live-cluster e2e leak test) pass 5x under race with no flakes.
make lint clean (mnd caught one magic number, now a named constant).
A full -race -count=5 ./adapter/... ran hot into port exhaustion from the 3-node raft fixtures (listen tcp: address 65538: invalid port) -- pre-existing infra limit of the integration harness, unrelated to this PR.

/gemini review

gemini-code-assist

Code Review

This pull request introduces a Lua state pool to optimize Redis EVAL operations by reusing Lua virtual machines, significantly reducing heap allocations and GC pressure. The implementation includes a security mechanism to snapshot and restore the Lua environment, preventing state leakage between executions. Reviewer feedback identifies a potential concurrency bottleneck in the global context mapping and recommends using the Lua registry for thread-local storage, alongside suggestions to use reusable slices during the environment reset to further minimize memory overhead.

gemini-code-assist · 2026-04-21T21:15:35Z

+var luaStateBindings = struct {
+	sync.RWMutex
+	m map[*lua.LState]*luaScriptContext
+}{m: map[*lua.LState]*luaScriptContext{}}
+
+func luaBindContext(state *lua.LState, ctx *luaScriptContext) {
+	luaStateBindings.Lock()
+	luaStateBindings.m[state] = ctx
+	luaStateBindings.Unlock()
+}
+
+func luaUnbindContext(state *lua.LState) {
+	luaStateBindings.Lock()
+	delete(luaStateBindings.m, state)
+	luaStateBindings.Unlock()
+}
+
+func luaLookupContext(state *lua.LState) (*luaScriptContext, bool) {
+	luaStateBindings.RLock()
+	ctx, ok := luaStateBindings.m[state]
+	luaStateBindings.RUnlock()
+	return ctx, ok
+}


The use of a global map with a sync.RWMutex for mapping *lua.LState to *luaScriptContext introduces a global synchronization point that will become a bottleneck under high concurrency. Every redis.call or redis.pcall invocation inside a Lua script performs an RLock on this global mutex, which can lead to significant contention.

A more efficient and scalable approach is to store the *luaScriptContext directly within the *lua.LState using the Lua registry (lua.RegistryIndex). This is local to the state and requires no synchronization.

Suggested implementation pattern:

In newPooledLuaState, pre-allocate an *lua.LUserData in the registry: state.RawSet(lua.RegistryIndex, lua.LString("elastickv_ctx"), state.NewUserData(nil)).

In luaBindContext, simply update the Value of that pre-allocated LUserData.

In luaLookupContext, read the LUserData from the registry and return its Value.

This also improves memory safety by ensuring the context reference is cleaned up when the LState is destroyed, avoiding potential leaks in a global map.

References

For frequently accessed fields or hot paths, prefer mechanisms that avoid global mutex contention (like local storage or atomic operations) to improve performance.

gemini-code-assist · 2026-04-21T21:15:35Z

+type pooledLuaState struct {
+	state *lua.LState
+	// globalsSnapshot is a copy of every entry reachable via the
+	// state's global table at init, keyed by LValue (not just string)
+	// so scripts cannot smuggle state across evals via non-string
+	// keys such as _G[42] = "secret".
+	globalsSnapshot map[lua.LValue]lua.LValue
+	// tableSnapshots holds the shallow field sets of well-known
+	// whitelisted tables (string, math, table, redis, cjson,
+	// cmsgpack) captured at init. On reset we restore each to its
+	// original contents so a script doing e.g.
+	// `string.upper = function() return "pwned" end` cannot poison
+	// subsequent pooled reuses.
+	//
+	// The outer map is keyed by the *LTable pointer of the parent
+	// (e.g. the `string` table) so tableSnapshots survives even if a
+	// script rebinds the global name (`string = nil`) -- the reset
+	// restores the global name first, then restores the table's
+	// internal contents from this snapshot.
+	tableSnapshots map[*lua.LTable]map[lua.LValue]lua.LValue
+}


To further reduce GC pressure and achieve zero-allocation cleanup, consider adding a scratchKeys []lua.LValue field to the pooledLuaState struct. This slice can be reused in reset and resetTableContents to collect keys before mutation, avoiding the current per-execution allocations. To prevent unbounded memory growth, ensure this collection remains within reasonable bounds.

type pooledLuaState struct { state *lua.LState // globalsSnapshot is a copy of every entry reachable via the // state's global table at init, keyed by LValue (not just string) // so scripts cannot smuggle state across evals via non-string // keys such as _G[42] = "secret". globalsSnapshot map[lua.LValue]lua.LValue // tableSnapshots holds the shallow field sets of well-known // whitelisted tables (string, math, table, redis, cjson, // cmsgpack) captured at init. On reset we restore each to its // original contents so a script doing e.g. // string.upper = function() return "pwned" end cannot poison // subsequent pooled reuses. // // The outer map is keyed by the *LTable pointer of the parent // (e.g. the string table) so tableSnapshots survives even if a // script rebinds the global name (string = nil) -- the reset // restores the global name first, then restores the table's // internal contents from this snapshot. tableSnapshots map[*lua.LTable]map[lua.LValue]lua.LValue // scratchKeys is a reusable slice for collecting table keys during // reset to avoid allocations. scratchKeys []lua.LValue }

References

To prevent unbounded memory growth and potential OOM issues, apply a fixed bound to collections that can grow from external requests.

gemini-code-assist · 2026-04-21T21:15:35Z

+	currentKeys := make([]lua.LValue, 0, len(p.globalsSnapshot)+luaResetKeySlack)
+	globals.ForEach(func(k, _ lua.LValue) {
+		currentKeys = append(currentKeys, k)
+	})


Reuse the scratchKeys slice from pooledLuaState here to avoid allocating a new slice on every reset. The same pattern should be applied to resetTableContents at line 318.

Suggested change

currentKeys := make([]lua.LValue, 0, len(p.globalsSnapshot)+luaResetKeySlack)

globals.ForEach(func(k, _ lua.LValue) {

currentKeys = append(currentKeys, k)

})

currentKeys := p.scratchKeys[:0]

globals.ForEach(func(k, _ lua.LValue) {

currentKeys = append(currentKeys, k)

})

p.scratchKeys = currentKeys

Move the per-eval *luaScriptContext binding off the global luaStateBindings map (guarded by sync.RWMutex) and into a pre-allocated *lua.LUserData stashed in each pooled state's own Lua registry under a fixed key. redis.call / redis.pcall now resolve the context via a lock-free, state-local read, eliminating the global RLock contention point that would otherwise be hit on every redis.call (e.g. ~50 lookups/s/script under BullMQ fan-out). Also reuse pooledLuaState.scratchKeys across resets so resetPooledLuaState / resetTableContents no longer allocate a fresh []lua.LValue per call. The retained backing array is dropped if it grows past luaScratchKeysMaxCap (1024) to guard against pathological scripts pinning unbounded memory on a pooled state. Adds regression tests: - TestLua_PoolConcurrentContextIsolation: 64 goroutines x 100 lookups must each observe their own *luaScriptContext (never another goroutine's, proved under -race -count=5. - TestLua_PoolContextIsRegistryBacked: pins the contract that the binding lives in the state registry, not a package-level map. - TestLua_PoolScratchKeysReused: asserts reuse + max-cap bound. - BenchmarkLuaLookupContext_Concurrent: measures concurrent lookup cost (lock-free after this change). EOF )

bootjp · 2026-04-22T05:19:30Z

Addressing Gemini's 2026-04-21 review — commit `3064b91f`

HIGH: global `luaStateBindings` map (line 136)

Picked Option B (per-state Lua registry binding):

Pre-allocate one *lua.LUserData per pooled state during newPooledLuaState, stashed in the state's own registry under luaCtxRegistryKey = "elastickv_ctx".
luaStatePool.get / put set/clear the userdata's .Value (a single pointer write — no mutex, no global map).
luaLookupContext does state.GetField(state.Get(lua.RegistryIndex), luaCtxRegistryKey) + type assertion. Each pooled *lua.LState is used by at most one goroutine at a time, so the per-state registry is lock-free.
The global luaStateBindings map + sync.RWMutex are gone.

Why B over A: lua.LState.SetContext exists in gopher-lua v1.1.2 but takes a context.Context, not our *luaScriptContext. Using the registry keeps the existing type system and, crucially, removes the cross-state contention entirely rather than just sharding it (Option C).

Evidence of correctness — TestLua_PoolConcurrentContextIsolation: 64 goroutines x 100 lookups each, with distinct *luaScriptContext pointers; assertion that every lookup observed its own pointer. Passes under go test -race -count=5 -run TestLua_Pool. TestLua_PoolContextIsRegistryBacked pins the registry contract so a future refactor cannot silently reintroduce the global map.

Bench (darwin/arm64, M1 Max, -benchtime=3s):

BenchmarkLuaLookupContext_Concurrent-10    1,927,508 ops    1918 ns/op    1491 B/op    93 allocs/op
BenchmarkLuaState_NewVsPooled/pooled-10      132,823 ops   18853 ns/op   39967 B/op   162 allocs/op

No regression vs. the pre-change pooled path; lookups are now lock-free.

MEDIUM: reuse `scratchKeys` (lines 103 / 281 / 318)

Added scratchKeys []lua.LValue to pooledLuaState. reset borrows it, passes it into resetTableContents (now takes and returns a scratch slice), and stores it back sliced to [:0] for the next reset. Capped by luaScratchKeysMaxCap = 1024 — past that the backing array is dropped so a pathological script cannot pin unbounded memory on a pooled state. Covered by TestLua_PoolScratchKeysReused.

Gates

go test -race -count=1 -short ./adapter/... — passes (48s).
go test -race -count=5 -run TestLua_Pool -timeout 300s ./adapter/... — passes.
make lint — 0 issues.

/gemini review

gemini-code-assist · 2026-04-22T05:19:33Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Pull request overview

Introduces a pooled *lua.LState implementation for Redis EVAL/EVALSHA to reduce allocation rate and GC pressure by reusing initialized Lua VMs while enforcing a “no cross-script state leakage” reset invariant.

Changes:

Add luaStatePool / pooledLuaState with global/table snapshotting, per-state registry-backed context binding, and reset logic.
Wire Redis script execution to acquire/release pooled Lua states instead of creating/closing a fresh VM per script.
Add unit/integration tests and benchmarks validating reuse, isolation, and concurrency behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
adapter/redis_lua_pool.go	Implements the pooled Lua VM, snapshotting, context binding, and reset logic.
adapter/redis_lua_pool_test.go	Adds safety/concurrency regression tests and benchmarks for pooling behavior.
adapter/redis_lua.go	Switches EVAL execution to use the pool and removes the per-invocation Lua VM init path.
adapter/redis.go	Adds `RedisServer` fields for the Lua pool and initializes it in the constructor.

Copilot · 2026-04-22T07:08:52Z

+func (p *pooledLuaState) reset() {
+	globals := p.state.G.Global
+
+	// (1) Restore inner contents of every snapshotted whitelisted
+	// table. This defeats poisoning attacks like
+	// `string.upper = function() return "pwned" end`.
+	//
+	// resetTableContents borrows p.scratchKeys as a working slice.
+	// We pass it in and receive the (possibly grown) backing array
+	// back so successive calls within the same reset share one
+	// allocation.
+	scratch := p.scratchKeys[:0]
+	for tbl, originalFields := range p.tableSnapshots {
+		scratch = resetTableContents(tbl, originalFields, scratch[:0])
+	}
+
+	// (2) Collect all current global keys (of any type). Mutating
+	// the table inside ForEach is unsafe, so snapshot keys first.
+	scratch = scratch[:0]
+	globals.ForEach(func(k, _ lua.LValue) {
+		scratch = append(scratch, k)
+	})
+
+	// Delete any key not in the init-time snapshot: these are
+	// user-introduced globals (KEYS, ARGV, GLOBAL_LEAK, _G[42],
+	// _G[true], ...).
+	//
+	// We use RawSet (not RawSetH) because gopher-lua stores integer
+	// keys in an internal `array` slice rather than `dict`; RawSetH
+	// only touches `dict`, so a call like RawSetH(LNumber(42), LNil)
+	// leaves the array entry intact. RawSet dispatches to the right
+	// storage by key type.
+	for _, k := range scratch {
+		if _, keep := p.globalsSnapshot[k]; !keep {
+			globals.RawSet(k, lua.LNil)
+		}
+	}
+
+	// (3) Restore every whitelisted global to its original value.
+	// This covers the case where a script rebinds an allowed global
+	// (e.g. `redis = something`) -- we simply put the original back.
+	for k, v := range p.globalsSnapshot {
+		globals.RawSet(k, v)
+	}
+


The reset logic restores global/table entries but does not restore table metatables. A script can call setmetatable(_G, ...) or setmetatable(string, ...) (base lib exposes setmetatable) and leak/poison behavior across pooled evals without changing any table fields, breaking the stated isolation invariant. Consider snapshotting metatables at pool-fill time (for _G and each table in tableSnapshots) and restoring them during reset (or explicitly clearing to the init-time metatable) in addition to restoring table fields.

Copilot · 2026-04-22T07:08:53Z

+// than string. Lua permits any non-nil, non-NaN value as a table key,
+// so a script doing `_G[42] = "leak"` or `_G[true] = "bad"` bypasses a
+// naive string-only snapshot/wipe. The LValue-keyed snapshot + the
+// RawSetH-based reset in pool.reset must catch these.


This test comment says the pool reset is "RawSetH-based", but the implementation uses RawSet to correctly clear integer keys in the array part (see redis_lua_pool.go reset()). Updating the comment will avoid confusion when diagnosing regressions.

Suggested change

// RawSetH-based reset in pool.reset must catch these.

// RawSet-based reset in pool.reset must catch these.

The Lua VM pool's reset logic snapshotted and restored _G entries plus the fields of whitelisted tables, but it did not touch metatables. Because setmetatable is in the base library, a script could install a poisoned __index on _G (or on string/math/table/redis/cjson/cmsgpack) and leak attacker-controlled behaviour into every subsequent pooled eval that dereferenced an undefined key. Capture each table's raw metatable at pool-fill time (LTable.Metatable, bypassing __metatable dispatch) and restore it on reset. When the original had no metatable, restore lua.LNil so any metatable the script installed is stripped. Also fixes the misleading "RawSetH-based" comment in TestLua_VMReuseNonStringGlobalKeysAreWiped -- the implementation uses RawSet so integer keys in gopher-lua's array part are actually cleared. Covered tables: _G, string, table, math, redis, cjson, cmsgpack. Regression tests TestLua_VMReuseDoesNotPoisonGlobalsMetatable and TestLua_VMReuseDoesNotPoisonStringMetatable verify that both attack vectors no longer leak across pool reuse, and both reproduce the vulnerability when the metatable-restore loop is stubbed out.

bootjp · 2026-04-22T07:41:28Z

/gemini review

Addressed Copilot review items (commit `eacc395`)

1. Metatable snapshot/restore (security-relevant)

newPooledLuaState now captures the raw metatable of _G and of
every snapshotted whitelisted table (string, table, math,
redis, cjson, cmsgpack) alongside the existing field snapshot,
reading LTable.Metatable directly to bypass __metatable
indirection.

reset() restores each metatable as the first step, before reseting
table fields and globals. When the init-time metatable was absent,
lua.LNil is stored in the snapshot and passed to SetMetatable,
which strips any metatable the script installed.

Blocks:

setmetatable(_G, { __index = function() return "pwned" end })
setmetatable(string, { ... }) and the same for the other
whitelisted tables.

Regression tests:

TestLua_VMReuseDoesNotPoisonGlobalsMetatable
TestLua_VMReuseDoesNotPoisonStringMetatable

Both were verified to fail without the fix (by stubbing out the
restore loop) and pass after. Full targeted run:
go test -race -count=5 -run TestLua_Pool ./adapter/... is green,
as is the full short adapter suite and make lint.

2. Test comment fix

The comment in TestLua_VMReuseNonStringGlobalKeysAreWiped that said
"RawSetH-based reset" has been updated to "RawSet-based reset",
matching the implementation (which uses RawSet so gopher-lua's
integer-array part is correctly cleared).

gemini-code-assist

Code Review

This pull request introduces a Lua state pool to the Redis adapter to reduce heap and GC pressure by reusing *lua.LState instances during script execution. The implementation includes a robust reset mechanism that snapshots and restores globals, nested tables, and metatables to ensure security and isolation between scripts. Additionally, it replaces a mutex-guarded global map with a lock-free context lookup using the Lua registry to improve performance under high concurrency. One issue was identified in the hit/miss tracking logic, where the use of sync.Pool.New causes all acquisitions to be recorded as hits, even when a new state is allocated.

gemini-code-assist · 2026-04-22T07:46:10Z

+	pls, ok := p.pool.Get().(*pooledLuaState)
+	if !ok || pls == nil {
+		// New func never returns nil, but defend against misuse.
+		pls = newPooledLuaState()
+		p.misses.Add(1)
+	} else {
+		p.hits.Add(1)
+	}


The hit/miss tracking logic is currently inaccurate because sync.Pool.New is set in newLuaStatePool. When p.pool.Get() is called, if the pool is empty, it automatically invokes the New function and returns a fresh state. Consequently, ok will always be true and pls will never be nil, causing every acquisition to be recorded as a 'hit' even when a new allocation occurs. To accurately track reuse, you should either remove the New function from the pool and handle allocation manually in get, or use a flag within pooledLuaState to distinguish fresh instances.

gemini-code-assist Bot reviewed Apr 21, 2026

View reviewed changes

bootjp requested a review from Copilot April 22, 2026 07:03

Copilot started reviewing on behalf of bootjp April 22, 2026 07:03 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

	// RawSetH-based reset in pool.reset must catch these.
	// RawSet-based reset in pool.reset must catch these.

Conversation

bootjp commented Apr 21, 2026

Summary

Security invariant

Benchmark

Tests added

Test plan

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 21, 2026

1. Non-string-keyed globals leaking (SECURITY-HIGH, line 171)

2. Table poisoning surviving reset (SECURITY-HIGH, line 186)

3. Nil context nil-pointer deref (HIGH, lines 266 / 274)

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 22, 2026

Addressing Gemini's 2026-04-21 review — commit 3064b91f

HIGH: global luaStateBindings map (line 136)

MEDIUM: reuse scratchKeys (lines 103 / 281 / 318)

Gates

Uh oh!

gemini-code-assist Bot commented Apr 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Apr 22, 2026

Addressed Copilot review items (commit eacc395)

1. Metatable snapshot/restore (security-relevant)

2. Test comment fix

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading

Addressing Gemini's 2026-04-21 review — commit `3064b91f`

HIGH: global `luaStateBindings` map (line 136)

MEDIUM: reuse `scratchKeys` (lines 103 / 281 / 318)

Addressed Copilot review items (commit `eacc395`)