Fix memory safety issues by texodus · Pull Request #3188 · perspective-dev/perspective

texodus · 2026-06-19T14:21:08Z

I asked Claude to find memory safety bugs in Perspective and write a PR, while I played Balatro on my phone. Here's its own summary of what it found:

Fixes

Uneven column lengths in columnar table create/update (table.cpp —
Table::from_cols, Table::update_cols): the table was sized from a single
column's length while every column was filled to its own length. Now the
table is sized to the longest column and shorter columns are null-padded, so
all writes stay in bounds.
NDJSON row under-allocation (table.cpp — Table::from_ndjson): capacity
was estimated from the newline count but one row was written per parsed
object. The table is now grown per row (amortized O(1) via geometric capacity
growth), so concatenated objects without newline separators can no longer
overrun the buffer.
Arrow row-count truncation (arrow_loader.cpp — ArrowLoader::row_count):
Arrow's 64-bit row count was silently truncated to 32 bits, under-sizing the
destination table relative to the data written during fill. Oversized/negative
counts are now rejected instead of truncated.
Arrow time32 element width (arrow_loader.cpp — copy_array): time32
values are 32-bit and map to a 4-byte column, but the loader copied 8 bytes
per element, over-reading the source and over-writing the destination. Now
copies 4 bytes per element.
first/last aggregate with a missing sort dependency (sparse_tree.cpp
— first_last_helper): the helper assumed the aggregate spec always carried
both a value and a sort dependency and indexed the dependency list
unconditionally. A view whose sort column falls outside the visible set could
produce a spec without the sort dependency, causing an out-of-bounds read. Now
guarded (covers first, last, and last − first).
count aggregate over a string column reads at the wrong stride
(aggregate.cpp — t_aggregate::init, AGGTYPE_COUNT / DTYPE_STR): a
string column's backing store holds 4-byte t_uindex vocabulary indices, but
the count aggregate was instantiated as t_aggimpl_count<std::uint64_t, …>,
so build_aggregate → t_column::fill read the input column at an 8-byte
(uint64) stride — double the real element width — and ran off the end of the
buffer. The read is pure waste (t_aggimpl_count::reduce discards the values;
the count is filled in later from nstrands), but it still tripped a storage
bounds check. The raw type is now t_uindex, matching the storage width. This
was a latent over-read for years, masked by push_back's ~20% capacity
headroom; tighter column sizing in a later optimization removed the slack and
exposed it. It is reachable from every pivoted (group_by/split_by) view
with a string column, since the default per-column aggregate for a string is
count and that aggregate is built during view construction — so the abort
fired before the view's actual query (get_min_max, to_columns, etc.) ever
ran. Found by updating and enabling PSP_STORAGE_VERIFY in debug builds.
Residency eviction data race (residency.cpp, residency.h): the
shared pending-eviction vectors were cleared outside the manager's mutex, so
concurrent request-thread eviction passes could double-free. All mutations now
occur under the lock, and a dedicated mutex serializes each eviction cycle.
Unvalidated Arrow input (arrow_loader.cpp — ArrowLoader::initialize):
Arrow's IPC reader does not validate buffer contents, yet the fill paths index
value/offset/dictionary buffers directly. The loaded (remotely supplied) table
is now fully validated (ValidateFull()) before those buffers are trusted, so
a malformed payload — bad offsets, out-of-range dictionary indices, inconsistent
chunk lengths — is rejected instead of read out of bounds. This is the systemic
defense behind the time32 and row-count fixes above. Note: validation is
O(data) per ingested payload; Validate() plus targeted bounds checks would be
a cheaper alternative if ingest throughput matters.
Out-of-bounds access in expression vector functions (computed_function.cpp
— diff3, norm3, cross_product3, dot_product3): these operate on
3-element vectors and indexed v[0..2] unconditionally, but their exprtk
parameter sequences ("VVV"/"VV"/"V") enforce vector type, not length.
A user expression can declare shorter vectors (e.g. var v[2] := {1,2}; norm3(v)), causing an out-of-bounds read — and for diff3/cross_product3,
an out-of-bounds write to the (short) output vector. Each function now
clears its result for vectors shorter than 3 before indexing; this surfaces
as an invalid expression (rejected at view creation) rather than an
out-of-bounds access. Vectors of length 3 are unaffected.

texodus · 2026-06-22T03:33:48Z

+                hex.match(/.{2}/g)!.map((b) => parseInt(b, 16)),
+            );
+            await expect(perspective.table(bytes)).rejects.toThrow(
+                /row count exceeds maximum supported size/,


This is brittle and irrelevent, we don't cate about the message.

texodus · 2026-06-22T03:36:12Z

+            expect(paths.length).toEqual(3);
+            expect(paths).toContainEqual([]);
+            expect(paths).toContainEqual(["x"]);
+            expect(paths).toContainEqual(["y"]);


These data structures are small, for readability its worth just asserting the entire structure instead of multiple playwright assertion calls in a row.

texodus · 2026-06-22T03:37:48Z

+    const auto validation = m_table->ValidateFull();
+    if (!validation.ok()) {
+        PSP_COMPLAIN_AND_ABORT(validation.ToString());
+    }


I checked that this has a minimal (nonexistent) performance impact.

texodus · 2026-06-22T03:39:37Z

+    if (n < 0
+        || static_cast<std::uint64_t>(n)
+            > std::numeric_limits<std::uint32_t>::max()) {
+        PSP_COMPLAIN_AND_ABORT(


This is not something Perspective itself will check internally, e.g. a subsequent update call will still cause the same issue this verbose error messages "fixes".

texodus · 2026-06-22T03:40:26Z

+    if (v1.size() < 3 || v2.size() < 3 || out.size() < 3) {
+        rval.m_status = STATUS_CLEAR;
+        return rval;
+    }


This is an actual good catch but we can validate this better by just checking the exact cases and erroring else?

texodus · 2026-06-22T03:41:37Z

            case AGGTYPE_WEIGHTED_MEAN: {
                auto pkeys = get_pkeys(nidx);

+                if (spec.get_dependencies().size() < 2) {


I don't think empty categories can be constructed with the current config but this fix satisfies the analysis.

Signed-off-by: Andrew Stein <steinlink@gmail.com>

texodus added the bug Concrete, reproducible bugs label Jun 19, 2026

texodus requested a review from timkpaine June 19, 2026 14:22

texodus force-pushed the memory-safety-fixes branch 3 times, most recently from aa54fef to 30d26c8 Compare June 22, 2026 03:28

texodus commented Jun 22, 2026

View reviewed changes

Fix memory safety issues

c4cb7a3

Signed-off-by: Andrew Stein <steinlink@gmail.com>

texodus force-pushed the memory-safety-fixes branch from 30d26c8 to c4cb7a3 Compare June 22, 2026 04:48

texodus merged commit 82eaa40 into master Jun 23, 2026
44 of 46 checks passed

texodus deleted the memory-safety-fixes branch June 23, 2026 04:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory safety issues#3188

Fix memory safety issues#3188
texodus merged 1 commit into
masterfrom
memory-safety-fixes

texodus commented Jun 19, 2026 •

edited

Loading

Uh oh!

texodus Jun 22, 2026

Uh oh!

Uh oh!

Uh oh!

texodus Jun 22, 2026

Uh oh!

texodus Jun 22, 2026

Uh oh!

texodus Jun 22, 2026

Uh oh!

texodus Jun 22, 2026

Uh oh!

texodus Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

texodus commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixes

Uh oh!

texodus Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

texodus Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

texodus Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

texodus Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

texodus Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

texodus Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

texodus commented Jun 19, 2026 •

edited

Loading