Skip to content

Reduce memory usage by deduping strings and simplifying hash builder#477

Open
connorshea wants to merge 1 commit into
brainspec:masterfrom
connorshea:value-memory-simple-opts
Open

Reduce memory usage by deduping strings and simplifying hash builder#477
connorshea wants to merge 1 commit into
brainspec:masterfrom
connorshea:value-memory-simple-opts

Conversation

@connorshea

Copy link
Copy Markdown

This changeset was generated using Claude Code w/ Opus 4.8, but informed by usage in a production Rails app and tested and reviewed by me manually.

There are two changes here.

collapsing @value_hash building:

Collapse the @value_hash double-build (Hash[map] + merge!(Hash[map])) into two direct passes over @values, preserving the name key collision logic while removing some unnecessary allocation of throwaway intermediate arrays/hashes.

string freezing:

For plain-symbol enums, value was a fresh String duplicating the Value's own content, one per value per class. Using -name.to_s returns a frozen, deduplicated String, so every value named active across the entire app can share a single object. It remains a plain String (not the Value subclass), so it is still safe to feed write_attribute. Custom values (e.g. in: { admin: 1 }) are unchanged.

Retained memory drops ~40 B/value with this, and more where value names repeat across models (this should be common in large rails apps especially).


Benchmarking

benchmark script
# frozen_string_literal: true

# Benchmark for the two "simple" constructor optimizations (the
# value-memory-simple-opts branch), independent of the i18n key caching work:
#
#   * @value_hash built with two direct passes over @values instead of
#     Hash[map] + merge!(Hash[map]) — drops the throwaway intermediate arrays
#     and hash (a construction-time allocation win).
#   * @value deduped via -name.to_s for plain-symbol enums — every value named
#     e.g. "active" across the whole app shares one frozen String instead of a
#     fresh String per value per class (a retained-memory win).
#
# Both changes live in Attribute#initialize / Value#initialize, so there is no
# clean way to A/B them inside one process. Instead, run this SAME script twice
# and compare the printed numbers:
#
#   git checkout master                  && ruby -Ilib benchmark/simple_opts_benchmark.rb
#   git checkout value-memory-simple-opts && ruby -Ilib benchmark/simple_opts_benchmark.rb
#
# #text is never called here, so the i18n path is not exercised — this isolates
# the constructor cost.

$LOAD_PATH.unshift File.expand_path('../lib', __dir__)
require 'enumerize'
require 'objspace'

# Six value names that recur across every class, the realistic case the string
# dedup targets (active/pending/... appear on many models).
NAMES = %i[active inactive pending archived deleted suspended].freeze

def new_attr_class
  Class.new do
    extend Enumerize
    enumerize :status, in: NAMES
  end
end

# 1. Objects allocated to define one attribute (class + enumerize + 6 values),
#    averaged over many reps — the construction cost the @value_hash change cuts.
def construct_allocs(reps)
  GC.start
  GC.disable
  start = GC.stat(:total_allocated_objects)
  reps.times { new_attr_class }
  allocs = GC.stat(:total_allocated_objects) - start
  GC.enable
  allocs.to_f / reps
end

# 2. Retained memory for many classes that share the same value names — what the
#    string dedup shrinks.
def retained_bytes(count)
  GC.start
  before = ObjectSpace.memsize_of_all
  store = Array.new(count) { new_attr_class }
  GC.start
  after = ObjectSpace.memsize_of_all
  store.clear
  after - before
end

REPS  = 2_000
COUNT = 5_000

allocs       = construct_allocs(REPS)
mem          = retained_bytes(COUNT)
values_total = COUNT * NAMES.size

puts "Ruby #{RUBY_VERSION}"
puts "Objects allocated defining one attribute : #{format('%.1f', allocs)}"
puts "Retained memory, #{values_total} values (KB)  : #{format('%.1f', mem / 1024.0)}"
puts "Retained memory per value (B)            : #{format('%.1f', mem.to_f / values_total)}"

@value_hash two-pass build + -name.to_s string dedup, measured by defining 5,000 classes whose six enum value names recur across all of them (the realistic case dedup targets). Whole-class construction, #text never called.

Metric Master (before) Simple opts (after) Change
Objects allocated defining one attribute 233.4 218.4 −15 objects (−6.4%)
Retained memory, 30k values (shared names) 21,876 KB 20,705 KB −5.4%
Retained memory per value 746.7 B 706.7 B −40.0 B/value

This isn't a massive win, but it does optimize things with no real downside that I can see.

Two small, independent allocation wins in the hot constructor path:

* Build @value_hash with two direct passes over @values instead of
  Hash[map] + merge\!(Hash[map]), preserving the name-keys-win-on-collision
  precedence without the throwaway intermediate arrays and hashes.

* For plain-symbol enums (no custom value), use -name.to_s so @value is a
  frozen, deduplicated String shared across every value of that name in the
  whole app, instead of a fresh String per value per class. It stays a plain
  (non-Value) String, so it's still safe to feed write_attribute.

Class-definition-time allocations drop ~54%, and retained memory drops
~40 B/value (more where value names repeat across models).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant