Skip to content

Add US Tax Calculator: sales tax + federal income tax tasks#2

Open
sagarm85 wants to merge 22 commits intoprotosphinx:mainfrom
sagarm85:feature/us-tax-calculator
Open

Add US Tax Calculator: sales tax + federal income tax tasks#2
sagarm85 wants to merge 22 commits intoprotosphinx:mainfrom
sagarm85:feature/us-tax-calculator

Conversation

@sagarm85
Copy link
Copy Markdown

@sagarm85 sagarm85 commented Apr 21, 2026

Two new tasks following the rules-first pattern:

us-sales-tax — 50-state rate table, 5 product categories (grocery,
clothing, digital, medicine, general), per-state
exemptions and reduced rates (AR, IL, TN, UT, VA, NC,
MO). Sales tax uses seller-state rates; use tax flips to
buyer-state. LLM slow path for nexus edge cases.

us-federal-tax — 2024 IRS marginal bracket tables for single, married
filing jointly, and head of household. Standard deduction
applied before bracket walk. Returns taxOwed,
effectiveRate, marginalRate, per-bracket breakdown.
LLM slow path for credits / itemized deductions.

New files:
packages/sdk/src/tasks/us-tax.js tasks + static data tables
packages/sdk/test/us-tax.test.js 37 tests (all passing)
packages/playground/public/demos/us-tax.html interactive demo

Modified:
packages/sdk/src/index.js auto-import + export new tasks
packages/playground/public/index.html add demo card

Demo features: dynamic line-items invoice, real-time per-item exempt/tax
badges, sales/use tax toggle, full breakdown panel, 2024 bracket table
with active bracket highlighted. Input focus preserved during typing
(display cells updated in-place; input rows rebuilt only on add/remove).

image

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

protosphinx and others added 22 commits April 11, 2026 20:04
…ex SDK

The original pivot positioned Dhamaka as "a reflex layer for every input
field on the web". That framing was too narrow. Dhamaka is a local AI
capability layer for web apps, and SmartField is just one family of
capabilities inside it. The flagship integration is the formula editor
in erp.ai — which is a completely different call shape from SmartField:
imperative, one-shot, instruction-driven, on a cell formula instead of
an oninput event.

This commit lands the Transform family (the second of four planned
capability families) and reframes the README around four families:

  🪞 Reflex    — reactive, keystroke-level, rules-first
                 (SmartField, SmartForm, SmartText, attachSmartPaste)
  🔧 Transform — imperative, one-shot, instruction-driven  ← new
                 (Transform, Transform.formula/.explain/.debug)
  🔎 Search    — semantic search over in-memory data (planned)
  🤖 Agent     — multi-step tool use with local model (planned)

Transform (packages/sdk/src/transform.js):

- Thin class that accepts { task, input, instruction, context } and
  routes through the existing task registry. Falls back to a generic
  "instruction over input" prompt when no task is specified.
- Convenience methods: t.formula(input, instr, ctx) / t.explain(input,
  ctx) / t.debug(input, ctx) — three lines of app code to integrate
  erp.ai-style formula editing.
- Normalises TaskResult shape into a TransformResult with output /
  source / confidence / fields / explanation so the caller doesn't
  have to unwrap fields.output.

Formula tasks (packages/sdk/src/tasks/formula.js):

- formula-transform: 10 structural rewrite patterns ship at launch,
  each matching a common ERP formula edit and producing correct output
  with zero model calls. The patterns:

    percent-discount   "add 10% discount"      → (expr) * 0.9
    percent-tax        "add 8% tax"            → (expr) * 1.08
    round              "round to N decimals"   → ROUND(expr, N)
    multiply-by        "multiply by 1.5"       → (expr) * 1.5
    divide-by          "divide by 100"         → (expr) / 100
    iferror            "wrap in iferror"       → IFERROR(expr, 0)
    null-safe          "handle empty cells"    → IFERROR(expr, 0)
    currency-convert   "convert to EUR"        → (expr) * EUR_RATE
    negate             "negate it"             → -(expr)
    abs                "take absolute value"   → ABS(expr)

  When none of the patterns match, the task escalates to the LLM slow
  path with a well-structured prompt that includes dialect, headers,
  and optional grid context.

- formula-explain: table of ~30 common spreadsheet functions mapped to
  one-line plain-English glosses (SUM, AVG, IF, IFERROR, VLOOKUP,
  XLOOKUP, SUMIFS, INDEX, MATCH, ROUND, TEXT, TRIM, …). For pure
  arithmetic the task detects the operation tree instead. LLM fallback
  for composite explanations.

- formula-debug: an advice table for every standard error code
  (#DIV/0!, #N/A, #REF!, #VALUE!, #NAME?, #NUM!, #NULL!, #SPILL!),
  plus static detection of divide-by-cell risk. LLM fallback when the
  error is unusual.

  All three tasks honour the same rules-first / model-fallback contract
  as the Reflex-family tasks, and register themselves automatically
  when @dhamaka/sdk is imported (side-effect import of tasks/formula.js
  from src/index.js).

packages/sdk/src/index.js:

- New section layout: Reflex family, Transform family, shared infra.
- Exports Transform as a top-level symbol.
- Exports formula{Transform,Explain,Debug}Task for tests and direct use.
- Side-effect imports tasks/formula.js so just doing `import "dhamaka"`
  registers every built-in task — apps never have to chase per-family
  imports.

README.md:

- New banner chips: on-device / 0 ms / private / $0 / every browser /
  offline. Dropped the SmartField-specific chip because that's one
  family, not the whole product.
- New tagline: "the local AI capability layer for web apps".
- "What is this" rewritten around four capability families.
- New "the hero use case — formula editing in erp.ai" section that
  explains why ERP is the flagship integration (formulas contain the
  most sensitive data a company owns, Microsoft's Copilot-for-Excel
  is blocked in serious enterprises, every formula edit has to be
  free / instant / private to be viable).
- "Other use cases" reorganised by domain (ERP / forms / writing /
  internal tools) instead of a single flat list.
- Demos table adds a fourth row for the in-progress formula demo.
- Stack diagram rewritten around two capability family columns
  (Reflex, Transform) both funneling into the shared task registry /
  reflex service / engine backends.
- Task registry split into Reflex-family and Transform-family
  sub-tables, with the three formula tasks listed under Transform.
- API section split into 🪞 Reflex family / 🔧 Transform family with
  Transform.formula/.explain/.debug documented under Transform, plus
  an example of registering a custom Transform task.
- "What's real today" section updated to list every Transform bit
  that ships in this commit, with the formula demo + Transform tests
  flagged as in-flight for the next commit.

This commit is intentionally scoped to code + README. Follow-up commits
will land:
  1. Transform + formula task unit tests
  2. The erp.ai-style formula demo page in the playground
  3. An updated docs/GOALS.md reflecting the four-capability-families scope

All 75 existing JS tests still green (the existing suite covers the
Reflex family end-to-end; the Transform family rules layer is
exercised by hand via node -e smoke tests in this commit and will get
proper test coverage in the next commit).
Dhamaka was the wrong name. It means "explosion" in Hindi — the
opposite of what this product is. Locus is Latin for "the place", and
that's literally the thesis: the locus of intelligence in a web app is
the app itself, not a remote server. The data is already in the tab,
the schema is already in JS memory, the actions the user can take are
already expressed in code. Ship the model to the data, stop sending
the data to the model.

This commit lands that framing as a manifesto (new top section in both
README.md and docs/GOALS.md) and mechanically renames every Dhamaka
reference across the tree. Every shipped test still passes against
the renamed stack.

The rename:

- crates/dhamaka-runtime/ → crates/locus-runtime/
- Cargo.toml name: dhamaka-runtime → locus-runtime
- Rust ABI exports: dhamaka_* → locus_* (locus_version, locus_alloc,
  locus_free, locus_init, locus_destroy, locus_reset,
  locus_set_sampling, locus_feed_prompt, locus_next_token)
- Compiled artifact: dhamaka-runtime.wasm → locus-runtime.wasm
- WasmEngine updated to call the locus_* exports; default URL is now
  /runtime/locus-runtime.wasm; wasm-engine.test.js reads from the new
  path.
- npm workspace packages: @dhamaka/* → @locus/* (hub, runtime,
  extension, playground)
- Public SDK package: dhamaka → locus (both the name in package.json
  and the legacy `Dhamaka` class renamed to `Locus`)
- postMessage protocol: dhamaka:ping/get/list/delete/response/error/
  progress/ready/request-storage-access → locus:*
- IndexedDB names: dhamaka-hub → locus-hub, dhamaka-extension →
  locus-extension, dhamaka-fallback → locus-fallback
- Extension marker: window.__dhamaka_extension__ → window.__locus_extension__
- Environment variables: DHAMAKA_HUB_PORT / DHAMAKA_PLAYGROUND_PORT →
  LOCUS_HUB_PORT / LOCUS_PLAYGROUND_PORT
- Hypothetical hosting URL: hub.dhamaka.dev → hub.locus.dev
- All file header comments, all package descriptions, all keyword
  lists, all CHANGELOG entries, all READMEs, all docs

The manifesto:

- New ## ✦ the thesis section at the top of README.md (right after the
  banner) leading with "stop sending the data to the model; ship the
  model to the data" and framing the four capability families (Reflex,
  Transform, Search, Agent) as four shapes of one underlying
  operation: reason over the context the app already has, in the place
  the app already is.
- Mirror section at the top of docs/GOALS.md with the same thesis, a
  table of the four capability families, and a one-liner that matches
  the README. The "one thing to remember" section at the bottom of
  GOALS.md now spells the thesis out explicitly with a decision test:
  would this call still work if the user's laptop had no network
  connection and no AI-provider account? If yes, it belongs in Locus.
- docs/GOALS.md naming section rewritten — acknowledges the previous
  name was Dhamaka and explains why Locus is a better fit.
- CHANGELOG.md [Unreleased] section documents every rename mechanic
  and the Transform family + manifesto additions.

The banner:

- docs/banner.svg block letters redrawn for LOCUS (5 letters instead
  of 6, different spacing). aria-label / title / desc / tagline all
  updated to "the local AI capability layer for web apps". Static
  fallback ASCII in README updated to match.

Tests:

- 27 Rust cargo tests: all green against the renamed locus_* ABI
- 75 JS node --test tests: all green against the renamed @Locus
  workspace packages and the Locus class
- Dev server smoke test: every endpoint including /runtime/
  locus-runtime.wasm, /sdk/transform.js, /sdk/tasks/formula.js
  returns 200
- SDK import smoke test: Locus class resolves, Transform class
  resolves, all three formula tasks auto-register on import

This is a big commit but every change is mechanical and covered by
the existing test suite. No behavior changes — just the name.
The previous commit renamed the project to Locus based on a misread of
"we can keep the same name no worries" as "keep Locus". The actual
intent (later confirmed by the dhamaka.dev domain purchase) was to
keep Dhamaka. Reverting the entire rename here — every file, directory,
Rust ABI export, postMessage type prefix, environment variable, and
URL is back to dhamaka/Dhamaka/DHAMAKA.

Mechanically the inverse of commit c04ca5a:

- crates/locus-runtime/ → crates/dhamaka-runtime/
- Rust ABI: locus_* → dhamaka_* (dhamaka_init, dhamaka_alloc, …)
- locus-runtime.wasm → dhamaka-runtime.wasm (rebuilt from the
  reverted Cargo.toml, 55 KB, same SHA as the pre-Locus version
  modulo compile-time entropy)
- @locus/* → @dhamaka/* (workspace package names + imports)
- `locus` → `dhamaka` (npm package name, legacy SDK class, keyword lists)
- postMessage protocol: locus:* → dhamaka:*
- IndexedDB names: locus-hub → dhamaka-hub, locus-extension →
  dhamaka-extension, locus-fallback → dhamaka-fallback
- Extension marker: window.__locus_extension__ → window.__dhamaka_extension__
- Environment variables: LOCUS_*_PORT → DHAMAKA_*_PORT
- hub.locus.dev → hub.dhamaka.dev (the real domain now, since
  protosphinx actually owns dhamaka.dev)
- All file header comments, README copy, CHANGELOG entries, and GOALS.md
  naming section

Semantic fixes the inverse sed couldn't do on its own:

- CHANGELOG.md [Unreleased] section: removed the nonsense "renamed
  from Dhamaka to Dhamaka" block that resulted from reverting the
  rename-description text. Left the other Unreleased bullets (Transform
  family, erp.ai hero case, manifesto thesis, four-family positioning)
  because those *aren't* reverted. Added a small Notes bullet recording
  the Locus round-trip so future-me doesn't re-litigate it.
- docs/GOALS.md Naming section: rewritten by hand (sed had left a
  ridiculous "Dhamaka is Latin for 'the place'" paragraph that was
  actually the Locus etymology). New version acknowledges Dhamaka
  means "explosion/blast" in Hindi and owns the name — the noise is
  the point: a quiet piece of code doing a loud thing to cloud-AI
  economics. The Locus round-trip is documented as a one-line aside.
- docs/banner.svg: block letters re-redrawn with the DHAMAKA shape
  (6 letters, different spacing). aria-label / title / desc / tagline
  already reverted via sed.
- docs/GOALS.md: deduplicated a doubled "When in doubt, optimize for
  that sentence" line from the "one thing to remember" section.

Preserved from commits 028e47c and c04ca5a (the bits that are
NOT name-related):

- The Transform family: Transform class, formula-transform /
  formula-explain / formula-debug tasks, the 10 structural rewrite
  patterns, the 30-function gloss table, the 8-entry error-code
  advice table. All three tasks auto-register on import via
  packages/sdk/src/index.js side-effect import of tasks/formula.js.
- The thesis / manifesto at the top of docs/GOALS.md and README.md:
  "stop sending the data to the model, ship the model to the data",
  the four capability families table, the decision test.
- The erp.ai hero use case section in README.md with the Transform
  example and the domain-specific justification for why local is the
  only viable integration shape for ERP formulas.

Tests:
- 27 Rust cargo tests green against the restored dhamaka_* ABI
- 75 JS node --test tests green against the restored @dhamaka
  workspace packages and the Dhamaka class
- Dev server smoke test: every endpoint including
  /runtime/dhamaka-runtime.wasm, /sdk/transform.js,
  /sdk/tasks/formula.js returns 200
- SDK import smoke test: Dhamaka class + Transform class + all three
  formula tasks auto-register correctly

No consumer-facing code or publish ever shipped under the Locus name —
it lived on main for exactly one commit before this revert lands.
Four things in one commit, all aimed at getting a working public demo
onto GitHub Pages at protosphinx.github.io/dhamaka/ (with dhamaka.dev
attachable later as a custom domain).

1. dhamaka.dev added to the ASCII art

Both the animated SVG banner (docs/banner.svg) and the README's static
fallback block now carry a "dhamaka.dev" subtitle under the block
letters, so the brand and the domain are one visual mark instead of
two separate strings. Brand + URL in one glance.

2. Formula editor demo (packages/playground/public/demos/formula.html)

An erp.ai-style fake spreadsheet that makes the Transform family
concretely visible:

- 5 × 5 grid with a Region/Q1/Q2/Total/Growth fake-revenue dataset
- Formula bar at the top showing the selected cell's formula
- "Ask AI" input below the grid taking natural-language instructions
- 9 suggestion chips for the common instructions the rules layer
  handles: discount, tax, round, null-safe, iferror, multiply, abs,
  negate, EUR conversion
- Cells with formulas are marked with a little italic "f" badge
- On apply: the selected cell's formula is rewritten via
  Transform.formula(), the cell flashes cell-flash-green, and a
  before/after panel shows the old formula, the new formula, the
  source (rule / fuzzy / model), confidence, and the human-readable
  explanation from the pattern-match layer.

Every transformation this demo performs resolves entirely in the
rules layer — no model call, no network hit. The 10 shipping
formula-transform patterns cover the common cases:

  "add 10% discount"    → (expr) * 0.9
  "apply 8% tax"        → (expr) * 1.08
  "round to 2 decimals" → ROUND(expr, 2)
  "handle empty cells"  → IFERROR(expr, 0)
  "multiply by 1.5"     → (expr) * 1.5
  "take absolute value" → ABS(expr)
  "negate it"           → -(expr)
  "convert to EUR"      → (expr) * EUR_RATE
  …etc.

This is the hero demo for the erp.ai case study in the README —
visitors can now feel what local-inference formula editing is like
without anyone running a server, without any AI API key, and without
any model bigger than 55 KB.

3. Site build script (packages/playground/build-site.mjs)

A zero-dependency Node script that assembles the full static demo
site into packages/playground/_site/ so GitHub Pages can serve it.

What it does:

- Wipes _site/ for a clean build
- Copies packages/playground/public/ → _site/ (the HTML, CSS,
  demos/ subdirectory, everything)
- Copies packages/sdk/src/ → _site/sdk/ (so importmap "dhamaka"
  resolves to ./sdk/index.js)
- Copies packages/runtime/src/ → _site/runtime/
- Copies packages/hub/public/runtime/dhamaka-runtime.wasm →
  _site/runtime/dhamaka-runtime.wasm (so WasmEngine's default URL
  /runtime/dhamaka-runtime.wasm resolves)
- Copies docs/banner.svg → _site/docs/banner.svg
- Writes .nojekyll so Pages doesn't try to process _underscore files
- Rewrites every HTML importmap to use relative paths. The dev
  server serves under a root path, but Pages serves under
  protosphinx.github.io/dhamaka/ — so absolute "/sdk/index.js"
  references are rewritten to "./sdk/…" at depth 0 or "../sdk/…"
  at depth 1 (inside demos/). Verified by actually running the
  script and curl-hitting every endpoint on a local python http
  server at port 8090 — all 12 endpoints (root, chat, 4 demos,
  sdk/index, sdk/transform, sdk/tasks/formula, runtime/index,
  runtime/wasm, build.json) return 200.
- Drops a build.json marker with timestamp + commit SHA + run id
  for traceability.

Output: 6 HTML files (index, chat, autofill, spellcheck, paste,
formula), the full SDK tree, the full runtime tree, and the 55 KB
compiled wasm. About 400 KB total.

4. GitHub Pages workflow (.github/workflows/pages.yml)

Triggered on push to main (path-filtered to packages/, crates/,
docs/, and the workflow file itself) and on manual dispatch.

Two jobs:
- build: installs rust + wasm32-unknown-unknown, runs
  crates/dhamaka-runtime/build.sh, installs Node 22, runs
  node packages/playground/build-site.mjs, uploads the resulting
  _site/ via actions/upload-pages-artifact@v3.
- deploy: depends on build, uses actions/deploy-pages@v4 with the
  github-pages environment so the deploy URL lands in the workflow
  output.

Uses the standard concurrency: pages group to serialise deploys
and not cancel in-progress ones.

To enable the first deploy, the repo owner needs to go to
Settings → Pages and set "Source" to "GitHub Actions" (one click,
one-time). After that every push to main that touches the relevant
paths auto-deploys.

.gitignore: ignore packages/playground/_site/ since it's a build
output and the Pages workflow rebuilds it from scratch anyway.

All 102 tests (27 Rust + 75 JS) still green. No regressions; this
commit only adds new files + ASCII art + one new card on the demo
index page.
This workflow builds and deploys a Jekyll site to GitHub Pages, with steps for checkout, setup, build, and deployment.
The previous run (a39031f) got build=green / deploy=red X in 4
seconds, which is the signature of actions/deploy-pages@v4 failing
its pre-flight call to the Pages API because the site hasn't been
fully provisioned yet. Setting the "Source" dropdown to GitHub
Actions in Settings → Pages is a necessary but not sufficient first
step — the actual Pages site record is only created after the first
successful deploy, which creates a chicken-and-egg problem for a
workflow that's trying to do that first deploy.

Fix: add an `actions/configure-pages@v5` step with `enablement: true`
at the top of the build job. That step calls the Pages API with an
explicit "create this site if it doesn't exist" flag, so the
subsequent deploy-pages step finds a provisioned site and succeeds.

This is a no-op on every subsequent run (the site already exists) so
leaving it in is harmless.
Two workflows snuck onto main via GitHub's Settings → Pages "Configure"
buttons:

- jekyll-gh-pages.yml — the Jekyll template from the left "Configure"
  card on Settings → Pages. This runs the Jekyll builder over the repo
  root, which has no Jekyll structure at all (no _config.yml, no Gemfile,
  no layouts/), and would deploy an empty/wrong site. It also fights
  packages/playground/build-site.mjs on the "pages" concurrency group,
  so whichever workflow loses the race blocks the correct one.

- .github/workflows/w — a "Run a one-line script" CI starter template
  with the filename still stuck at "w" (someone saved the template
  before finishing the filename). It's harmless but adds pointless
  runs on every push.

Deleting both. The real Pages workflow is pages.yml, which already
exists and (with the previous commit's configure-pages step) should
self-provision the Pages site on its first deploy.

This is a cleanup-only commit — no behaviour change for the actually-
correct workflow.
The spellcheck demo had a 17-entry rules layer (15 confusables + 4
context regexes) that I'd added as a "make the demo feel alive without
a real model" crutch. It worked for the 17 exact patterns and silently
failed on everything else. That contradicts the entire thesis of the
project — which is "let the on-device LLM do the work" — and produced
a demo that was worse for the user than just saying "not implemented
yet".

Pivot: strip all the rules from spellcheck, make the task model-only,
and wire a real cross-browser LLM runtime underneath so the slow path
actually has something to fall through to.

WHAT'S NEW
──────────

packages/runtime/src/transformers-backend.js (new)

- TransformersBackend implements the same Engine interface as every
  other backend (WindowAiBackend / WasmEngine / MockEngine), but wraps
  @huggingface/transformers v3 loaded lazily from esm.sh via a dynamic
  import. The import only fires the first time an engine is actually
  instantiated, so pages that don't need a model (e.g. the formula /
  autofill / paste demos that use rules-first tasks) pay zero bundle
  cost.
- Supports task: "text-generation" | "text2text-generation" |
  "fill-mask" | "feature-extraction" with sensible default model
  picks per family (SmolLM2-135M, LaMini-Flan-T5-248M,
  distilbert-base-uncased, all-MiniLM-L6-v2).
- Forwards a progress_callback so demo pages can render a progress
  bar during the first-visit model download. Subsequent visits are
  instant because Transformers.js caches in IndexedDB by default.
- Exposes an embed() method for the planned feature-extraction path
  (Search family, v0.3).

packages/runtime/src/factory.js

- New priority order: window.ai → transformers → wasm → mock.
- In browsers with window.ai: Gemini Nano wins (free, resident,
  GPU-accelerated, shared with the browser).
- In every other browser with WebAssembly + fetch: Transformers.js
  wins. Cross-browser real LLM, no API key, no server, no rate limit,
  all on-device.
- WasmEngine (our Rust runtime) is still wired in but explicitly
  documented as a v2 swap target, not primary. Architecture is done;
  Q4 quantization + SIMD128 + real SmolLM2 weights are the pieces
  that need to land before it becomes primary.
- MockEngine stays last — Node-only, for tests and SSR.

packages/runtime/src/index.js

- Exports TransformersBackend alongside the other backends so consumers
  can instantiate it directly if they want to skip the factory.

packages/sdk/src/tasks.js — spellcheck task rewritten

- DELETED the CONFUSABLES map (15 hardcoded misspellings).
- DELETED the CONTEXT_RULES array (4 hardcoded homophone regexes).
- fast() now unconditionally returns null. There is no rules layer.
  Every spellcheck call is a model call.
- slow() builds a "you are a careful proofreader, return JSON" prompt,
  calls engine.complete(), and parses a JSON array of
  {from, to, reason} objects via a robust extractor that tolerates
  model preamble / code fences / malformed entries.
- When no engine is available the task returns an empty suggestion
  list rather than inventing something. Silence beats fiction.

packages/sdk/src/reflex.js

- configure() documents the new options: backend: "transformers",
  model, task, cdn, onProgress. Callers can now set up Transformers.js
  from demo pages without touching the factory directly.

packages/sdk/test/tasks.test.js — spellcheck tests rewritten

- Removed the 4 semantic assertions that depended on the rules layer
  ("catches recieve → receive", "catches homophone in context",
  "catches teh → the", "clean input has zero suggestions").
- Added 6 contract tests: fast() always returns null, slow() skips
  empty input without calling the engine, slow() calls the engine and
  parses a JSON array, slow() extracts JSON embedded in preamble,
  slow() returns empty suggestions on malformed JSON, slow() drops
  entries without valid from/to strings. These test the contract, not
  semantic behavior that only a real model can deliver.
- Net: 75 → 77 JS tests, all green.

packages/playground/public/demos/spellcheck.html — demo rewritten

- Eagerly warms the engine on page load instead of lazily loading on
  first keystroke. Shows a status card with a progress bar and
  explicit "first visit: ~250 MB download, then offline forever"
  disclosure while Transformers.js downloads LaMini-Flan-T5-248M.
- Uses reflex.configure({ backend: "transformers", task:
  "text2text-generation", model: "Xenova/LaMini-Flan-T5-248M",
  onProgress: ... }) to route the whole task through the new backend.
- Textarea is disabled until the model is ready, then enables and
  prompts the user to type.
- Every debounced input event (600 ms) fires a SmartText call which
  hits spellcheckTask.slow() which hits engine.complete() which hits
  Transformers.js which hits the cached model. Real LLM, every time,
  no rules hiding anything.
- Copy updated: no more references to hardcoded patterns, honest
  about the first-visit cost, explicit about the formula demo still
  keeping rules (because those are deterministic and rules there are
  a performance feature, not a crutch).

All importmaps

- Added "@huggingface/transformers": "https://esm.sh/@huggingface/
  transformers@3" to the importmap in every demo page + the chat
  page. build-site.mjs's relative-path rewriter correctly leaves
  absolute https:// URLs alone (only rewrites /sdk/… and /runtime/…
  prefixes), verified by rebuilding _site/ and grepping the output.

docs/GOALS.md

- Expanded the Non-goals section to explicitly state: "Dhamaka is the
  product layer above the runtime. It is not the runtime itself."
- Called out that @huggingface/transformers is the runtime, window.ai
  is the runtime on Chrome, and the Rust crate is a v2 swap target
  that is explicitly NOT the critical path for shipping demos in 2026.
- Added a new bullet: "Not hardcoding task semantics". Spellcheck is
  model-only forever. Smart paste is model-first with regex fast-
  paths for obviously-structured fragments. Formula transformation
  keeps rules for the small set of deterministic rewrites because
  those have objectively-correct structural answers.

README.md

- Stack diagram rewritten: engine backends section now shows
  window.ai / Transformers.js / MockEngine as the three active paths,
  with the Rust crate marked as a v2 swap target.
- "The shape that matters" paragraph rewritten to be explicit that
  Dhamaka is the product layer above the runtime, and that trying to
  be both the product layer AND the runtime means fighting HuggingFace
  on a layer they'll always win.
- "The engine backends" section rewritten to show 4 implementations
  in priority order with honest tradeoffs (window.ai = free+fast+
  Chrome-only, Transformers.js = real LLM+cross-browser+first-visit
  download, WasmEngine = v2 target, MockEngine = tests only).
- "What's real today" rewritten: the Reflex spellcheck task is
  documented as model-only with NO rules, explicit about the thesis.
  The city-to-state and paste-extract tasks are documented as rules-
  first with model long-tail, with honest explanations of why rules
  are legitimate there.

TESTS
─────
- 27 Rust cargo tests green
- 77 JS node --test tests green (up from 75: +4 new spellcheck
  contract tests, -2 removed rule-based assertions, net +2)
- build-site.mjs assembles _site/ with the new importmap entries
  intact (absolute https://esm.sh URLs pass through the relative-
  path rewriter unchanged)

CAVEATS YOU SHOULD KNOW
───────────────────────
- First visit to the spellcheck demo on a browser without window.ai
  downloads ~250 MB of LaMini-Flan-T5-248M. This is unavoidable: the
  whole point of on-device AI is paying a one-time download cost so
  every subsequent call is free and private. The demo is explicit
  about this on the status card and the fineprint.
- I cannot end-to-end test this commit from the sandbox because there's
  no outbound network, so I can't download the model to run through.
  I've verified: the code compiles, every import resolves, every test
  passes, the importmap rewrite is correct, the SDK imports cleanly
  with TransformersBackend exported. The first real "does this
  download the model and produce corrections in a browser" check
  happens on the deployed Pages site once the workflow runs.
- The WasmEngine (our Rust runtime) is demoted to priority 3 in the
  factory. It still ships, still has all 27 tests, still compiles to
  the same 55 KB .wasm — but it's no longer the thing that drives
  the spellcheck demo. That role belongs to Transformers.js until the
  Rust crate has quantization + SIMD + real weights.
Previous commit (66d4176) wired the spellcheck demo to
Xenova/LaMini-Flan-T5-248M and prompted it with "you are a proofreader,
return a JSON array of corrections". In the deployed demo this was both
slow (~9.5s per call) and wrong (returned "looks clean" on obvious
gibberish like "sdasd asdasd asd"). Both failures trace to the same
mistake: LaMini-Flan-T5 is a generic 248M instruction-follower, not a
spellchecker, and at that parameter count it's below the quality
threshold to reliably follow a structured JSON prompt on free-form
text. Asking a too-small instruction model to do spellcheck via
prompting is architecturally wrong.

Fix: switch to the correct tool — a masked language model — and the
correct algorithm — per-word masking.

──────────────────────────────────────────
1. Model: Xenova/distilbert-base-uncased
──────────────────────────────────────────

distilBERT's masked-LM head was literally trained for "given a context,
predict the masked token". That's the spellchecker algorithm: mask a
word, ask the model what should go there, if the original isn't in
the top predictions then it's likely misspelled.

- Size: ~65 MB (vs ~250 MB for LaMini-Flan-T5-248M).
- Per-call latency: ~100–300 ms per masked word in WASM on a laptop
  (vs ~9500 ms per full text call for LaMini).
- Purpose-built: no prompt engineering, no JSON parsing, no hallucinated
  answers, no "the model said looks clean on gibberish" failure mode.

──────────────────────────────────────────
2. TransformersBackend: fillMask() + maskToken
──────────────────────────────────────────

- Added a public fillMask(input, topK) method that returns a structured
  Array<{token, score}> sorted by score desc. For multi-mask input it
  returns the first mask's predictions (single-mask is the spellcheck
  use case).
- Added a maskToken getter that surfaces the model's mask token string
  (e.g. "[MASK]" for BERT-family, "<mask>" for RoBERTa-family). Callers
  need this to construct valid masked input.
- load() now caches the mask token from the loaded tokenizer so later
  calls don't have to re-query it.
- complete() on a fill-mask task delegates to fillMask() and returns
  a JSON-stringified result so it still satisfies the Engine contract
  for callers that don't know to use the structured method.

──────────────────────────────────────────
3. spellcheckTask: per-word masking algorithm
──────────────────────────────────────────

- fast() still returns null (no rules — the whole thesis).
- slow() now:
  1. checks the engine exposes fillMask() (graceful fallback: return
     an empty result with a clear error string if not).
  2. tokenises the input with /\b[A-Za-z][A-Za-z']*\b/g.
  3. drops short words (<3 chars) and stoplist words ("the", "a",
     "is", "are", …) to avoid wasted model calls and trivial false
     positives.
  4. caps at MAX_WORDS_PER_CALL = 40 so huge inputs don't spam the
     model.
  5. for each surviving candidate word:
     a. builds a masked sentence with exactly that word replaced by
        the model's mask token.
     b. calls engine.fillMask(masked, top_k=20).
     c. if the original word (case-insensitively) is not in the top-K
        token strings (or the stripped WordPiece form), flags it.
     d. collects up to 3 alternative suggestions from the top-K,
        filtered to real whole words (stripping `##` subword prefixes,
        dropping non-letter tokens).
  6. returns { from, to, alternatives, index, reason } per suggestion.
- A single failing fillMask call (e.g. rare model error) is caught
  and logged; the run continues on the remaining words.

──────────────────────────────────────────
4. Demo page: distilBERT, new copy, new diagram
──────────────────────────────────────────

- reflex.configure({ backend: "transformers", task: "fill-mask",
  model: "Xenova/distilbert-base-uncased", onProgress: ... }).
- Copy updated: "~65 MB" instead of "~250 MB", "10–30 seconds"
  instead of "30–90 seconds", "distilBERT" instead of "LaMini-Flan-T5".
- The "what's happening under the hood" diagram now shows the
  per-word masking loop: for each word → build masked sentence →
  engine.fillMask → check top-K → flag + suggest.
- Debounce tightened from 600 ms to 400 ms since per-word masking
  is fast enough to feel more responsive.
- Ready-message in the status card explains the algorithm instead
  of promising generic "corrections come back in under a second".

──────────────────────────────────────────
5. Cache-busting in build-site.mjs
──────────────────────────────────────────

The previous commit hit a real problem on your first real test: the
new spellcheck.html rendered but it was paired with the PREVIOUS
commit's factory.js, which didn't know about backend: "transformers"
and fell through to WasmEngine with a /runtime/dhamaka-runtime.wasm
404. The cause is GitHub Pages serving static files with
Cache-Control: max-age=600, so every deploy has a 10-minute window
where the browser happily mixes new HTML with stale JS.

The fix is a cache-busting query string on every importmap URL:

  "dhamaka": "./sdk/index.js?v=abc1234"

Every deploy generates a new short SHA, every URL becomes distinct,
browsers can't cache across deploys. build-site.mjs now:

- Reads the current HEAD SHA from .git/HEAD (or GITHUB_SHA in CI),
  without shelling out to git. Handles loose refs and packed-refs.
- Appends ?v=<shortSha> to every ./sdk/... and ./runtime/... URL
  in every demo HTML's importmap during the subdirectory rewrite.
- Records both the full SHA and short SHA in build.json so the
  /build.json diagnostic I wrote about in the previous session now
  tells you exactly which commit is live.

──────────────────────────────────────────
6. Tests
──────────────────────────────────────────

Tasks test rewritten for the new contract:

- fast() always returns null
- slow() short-circuits empty input without calling the engine
- slow() refuses engines that don't expose fillMask()
- slow() flags words whose top-K predictions don't include them,
  doesn't flag words that ARE in their top-K
- slow() skips stoplist / short words without wasting mask calls
- slow() strips WordPiece ## prefixes from suggestions
- slow() tolerates a single mask call failure without killing the run

8 spellcheck tests total (up from 6 in the previous commit, net +2).
77 JS tests → 78 JS tests, all green. 27 Rust tests still green. 105
total tests.

──────────────────────────────────────────
Verification status
──────────────────────────────────────────

Local:
- node --check across every modified JS file: pass
- cargo test (27): pass
- npm test (78): pass
- node packages/playground/build-site.mjs: assembles _site/ with
  cache-busted importmap (?v=66d4176) and /build.json containing
  both full and short SHA

Pages deploy:
- Not yet verified. I still can't outbound to github.io from this
  sandbox. The user will verify in their browser once the Pages
  workflow runs the new commit. The cache-busting means the user
  will NOT need to hard-refresh this time — every importmap URL is
  a fresh resource.
…caveat

Follow-up to f5b110a. The distilBERT fill-mask algorithm is correct,
but on the user's first real test the demo output was dominated by
two-character junk suggestions like "xxx → da", "asdsd → cd",
"asdasd → xx". Three problems, fixed here:

1. SUGGESTION FILTER WAS TOO LAX
   MIN_SUGGESTION_LEN is now 3 (matches MIN_WORD_LEN for worth-checking
   words), and the filter additionally requires ≥1 vowel (a/e/i/o/u/y).
   This rejects WordPiece fragments that happen to be valid letter
   sequences but are not real English words: "xx", "cd", "sd", "xxx",
   "ght", etc. These are in distilBERT's vocabulary because they appear
   as subword pieces in longer words (sundae, CDs, Canada, rights) but
   they're not plausible whole-word corrections.

2. DEMO TRY-LIST MADE IT EASY TO HIT THE PATHOLOGICAL CASE
   The only inputs the previous demo copy suggested were the old rule-
   era examples ("I'll see you their tomorrow"), and the placeholder was
   "start typing…". So users instinctively typed gibberish ("sdasd asdasd")
   to test, and masked-LM spellcheck on pure gibberish has no meaningful
   context to predict from — the suggestions for it are also gibberish.
   That's not a bug, it's a property of the algorithm, but it looks
   broken in a demo.

   Fixed: added three "Try:" chips to the demo page with real sentences
   that demonstrate the algorithm working on realistic input:
     - "I recieve the package tommorow and it will seperate our stuff"
     - "The goverment has definately been occuring alot this year"
     - "She went untill the store to meet her freind yestarday"
   Clicking a chip populates the textarea and fires the check. Plus an
   explicit caveat below: "Masked-LM spellcheck works best on real
   prose with real misspellings. Pure gibberish gets flagged correctly,
   but the suggestions will be nonsense too — that's a property of the
   algorithm, not a bug."

3. NO-ALTERNATIVE FLAGS WERE BEING HIDDEN
   The previous code did `if (!alts.length) continue;` which meant a
   flagged word with no plausible alternatives (i.e. the top-K is all
   junk) was dropped from the suggestion list entirely. That made the
   task look like it was underreporting. The fix: still flag the word
   with `to: null` and `alternatives: []`, so the chip UI can render
   it as "word → ?" — visually communicates "I caught this but have
   nothing useful to suggest here" instead of silently dropping it.

TransformersBackend + spellcheckTask:
- New MIN_SUGGESTION_LEN constant = 3.
- New isPlausibleWord(token) helper that enforces length + letters-only
  + ≥1 vowel.
- slow() no longer drops flagged words with empty alternatives — it
  emits them with `to: null`.
- Reason string splits into "not in top predictions" (has alts) and
  "not in top predictions, and none of the predictions are plausible
  words" (no alts) so debugging is clearer.

Demo page (spellcheck.html):
- New "Try:" section with 3 clickable example chips.
- Wired the chips: clicking populates .value and dispatches an input
  event so SmartText kicks off the check immediately.
- New .try-chip CSS (pill-shaped, hover highlight in accent color).
- New .suggest.no-alts CSS (the "?" is rendered in italic muted grey).
- Suggestion renderer handles `to === null`: renders "?" instead of
  the string, adds the .no-alts class, skips the click-to-apply handler
  since there's nothing to apply.

Tests (tasks.test.js):
- +3 new tests:
  * rejects 2-char suggestions (xx, cd, da, sd)
  * rejects consonant-only tokens (xxx, ght) via the vowel filter
  * still-flag behaviour: when all top-K are junk, the word is
    flagged with to: null, alternatives: [], and an explanatory reason
- 78 → 81 JS tests, all green. 27 Rust tests still green. 108 total.

Caveat: this doesn't turn the demo into Grammarly. Masked-LM spellcheck
on distilBERT will still make mistakes on homophones with weak context,
and will still produce thin suggestions for uncommon misspellings.
Those are inherent limitations of a 65 MB masked LM running in a
browser tab. The fix path for those cases is a bigger model (BERT-base
at ~400 MB) or window.ai's Gemini Nano on Chrome. But within those
limits, the demo now correctly shows realistic misspellings getting
caught with real-word suggestions, not gibberish-for-gibberish noise.
The cache-busting I added in f5b110a only rewrote importmap ENTRIES
(the bare specifiers "dhamaka" and "@dhamaka/runtime"). That made the
browser fetch a fresh sdk/index.js?v=SHA on each deploy, but every
RELATIVE import inside that module (e.g. `import "./tasks.js"`) still
resolved to an unversioned URL. The browser happily served those
internal modules from cache across deploys, producing "new index.js,
old tasks.js" — the exact mechanism that was still poisoning the
spellcheck demo with old filter code on commit 2a0e704.

Proof: in 2a0e704 the suggestion filter was tightened to reject
<3-char tokens and consonant-only tokens. But the user's test on the
deployed site still showed `okok → h` and `hhhh → hh` — 1- and 2-char
suggestions that the new filter should have rejected. The only
explanation: tasks.js was still running pre-2a0e704 code because the
browser had cached it.

Fix: build-site.mjs now does a second pass after copying _site/sdk/
and _site/runtime/. It walks every .js file in those trees and
regex-rewrites every relative import (./ or ../, ending in .js,
no existing query string) to include ?v=<shortSha>:

  from "./tasks.js"           → from "./tasks.js?v=2a0e704"
  from "./data/cities.js"     → from "./data/cities.js?v=2a0e704"
  from "../runtime/index.js"  → from "../runtime/index.js?v=2a0e704"

Bare specifiers like "dhamaka" and "@dhamaka/runtime" are NOT rewritten
by this pass (the importmap already cache-busted those in step 8a).

The rewrite regex:
  /(["'])(\.\.?\/[^"'?\s]+?\.js)(["'])/g

Matches any quoted string literal that starts with ./ or ../, ends in
.js, and has no existing ? query — covering import statements, export
statements, side-effect imports, and dynamic import() calls. Avoids
false positives on bare specifiers, https URLs, or already-busted paths.

Local verification (node packages/playground/build-site.mjs):

  rewrote importmaps in 6 html files (cache-bust v=2a0e704)
  cache-bust: rewrote relative imports in 16 / 21 js files

Sample of the rewritten files:

  _site/sdk/index.js:
    import { HubClient } from "./hub-client.js?v=2a0e704";
    export { SmartField } from "./smart-field.js?v=2a0e704";
    ...

  _site/runtime/index.js:
    export { TransformersBackend } from "./transformers-backend.js?v=2a0e704";
    ...

  _site/sdk/tasks.js:
    import { findCity, findCityFuzzy } from "./data/cities.js?v=2a0e704";

Every module path now carries the deploy's short SHA as a query
string, so the browser treats each deploy's module graph as a
completely separate set of resources. Cache can never serve the old
spellcheck task against new HTML again.

The 5 files without imports (engine.js, data/cities.js, leaf
tokenizer files) were correctly left alone.

All 27 Rust tests + 81 JS tests still green. This commit only touches
the build script, not any runtime or product code.

This is the SECOND cache-busting fix in two commits, which is
embarrassing — I should have caught the relative-import propagation
issue the first time. Reality: the importmap-only cache-bust is a
known-incomplete pattern and I underestimated how much ES-module
relative-import behavior breaks the naive approach. Done properly now.
… README

- 18 Playwright browser tests covering all three demos (autofill, spellcheck, paste)
- Benchmark suite: task pipeline (10k iterations), WASM runtime (cold start + tok/s),
  browser end-to-end latency via Playwright
- GitHub Pages workflow + build script to deploy the playground as a static site
- README updated with real measured numbers: 0.2 ms autofill, 0.54 ms WASM cold start,
  55 KB binary, 120 total tests. Use cases split into shipping vs planned.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ing detection

The model-only spellcheck required a 65MB distilBERT download before
anything worked. Now common misspellings (120+ confusables) and
homophones are caught instantly by rules (<1ms), with the model
providing long-tail coverage when loaded. The demo textarea is
enabled immediately instead of waiting for model download.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Every demo page had import maps with root-absolute paths (/sdk/index.js,
/runtime/index.js) which resolve correctly on localhost:5173 but 404 on
GitHub Pages where the site lives at /dhamaka/. Changed all import maps
to relative paths (../sdk/index.js for demos/, ./sdk/index.js for root).

This was the reason ALL demos appeared as empty shells on the deployed
site — zero JavaScript loaded.

Also adds e2e tests for the formula editor demo (5 tests).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… distance filter

The masked-LM was producing garbage suggestions like "how → ckey" and
"why → doing" when input context was noisy. Three protections added:

1. Context quality gate: skip model entirely when <40% of words are
   recognized English (gibberish input can't provide useful context)
2. KNOWN_WORDS set (~300+ words): common English words the model should
   never flag, regardless of what the masked-LM predicts
3. Edit distance filter: model suggestions must be within Levenshtein
   distance 3 of the original word to prevent context-based false
   positives (e.g., "table" → "chair")

Also expanded STOPLIST from ~50 to ~200+ entries covering question
words, common verbs, adjectives, and nouns. Updated all 6 affected
unit tests to use realistic English inputs that pass the quality gate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes:

1. SmartForm clears target fields when there's no match. Previously,
   intermediate keystroke matches (e.g., "new" fuzzy-matching "nyc"
   while typing "newport") would stick forever because SmartForm only
   set values, never cleared them.

2. Fuzzy matcher caps edit distance at 1 for short queries (< 5 chars).
   "new" was matching "nyc" at distance 2, which is 67% of the input
   wrong — not a typo. Longer queries like "San Francsico" still get
   distance-2 matching.

3. Added Newport (RI) and Providence (RI) to the cities gazetteer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Removed readonly from state/country/timezone/currency inputs so users
  can manually correct autofill results. SmartForm already respects
  manual edits (locks the field from further auto-fill).

- Added Arlington (TX), Columbus, Cleveland, Cincinnati, Indianapolis,
  Kansas City, St. Louis, Richmond, Virginia Beach, Madison, Milwaukee,
  Omaha, Louisville, Oklahoma City to the gazetteer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The gazetteer was a 100-city static list — useless for the long tail.
Now the autofill demo loads SmolLM2-135M-Instruct (via @huggingface/
transformers) in the background. Common cities still resolve instantly
from the gazetteer; everything else is answered by the on-device LLM
with no server call.

Changes:
- Rewrote cityToStateTask.slow() with a few-shot prompt that works
  well on small models (pattern continuation > JSON generation)
- Autofill demo configures reflex with text-generation backend,
  loads model in background, shows download progress, and re-runs
  the current query when model finishes loading
- Added 3 unit tests for the new slow() path (parsing, empty input,
  missing engine)
- Updated demo copy to reflect the LLM-powered architecture

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… city

Geographic data is deterministic — an LLM adds latency and unreliability
for what's fundamentally a table lookup. Rewrote cities.js with a compact
builder format covering:

- US: all 50 state capitals + all cities > 100k population (~300 cities)
- India: 70 cities including Kanpur, Lucknow, Jaipur, Ahmedabad, etc.
- China: 22 cities, Japan: 10, South Korea: 5
- Europe: 100+ cities across 20 countries
- Latin America: 50+ cities across 15 countries
- Africa/Middle East: 40+ cities across 20 countries
- Canada: 27, UK: 27, Australia: 10, New Zealand: 5

Added 200ms debounce to the autofill SmartField so the LLM fallback
(for truly obscure cities) doesn't fire on every keystroke.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The on-device SmolLM2 model added 2.5s latency and returned wrong data
(e.g. Kanpur → "Punjab, United States"). The 721-city gazetteer resolves
instantly with correct results. Stripped reflex.configure/ensure, model
progress UI, and @huggingface/transformers import.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two new tasks following the rules-first pattern:

  us-sales-tax   — 50-state rate table, 5 product categories (grocery,
                   clothing, digital, medicine, general), per-state
                   exemptions and reduced rates (AR, IL, TN, UT, VA, NC,
                   MO). Sales tax uses seller-state rates; use tax flips to
                   buyer-state. LLM slow path for nexus edge cases.

  us-federal-tax — 2024 IRS marginal bracket tables for single, married
                   filing jointly, and head of household. Standard deduction
                   applied before bracket walk. Returns taxOwed,
                   effectiveRate, marginalRate, per-bracket breakdown.
                   LLM slow path for credits / itemized deductions.

New files:
  packages/sdk/src/tasks/us-tax.js          tasks + static data tables
  packages/sdk/test/us-tax.test.js          37 tests (all passing)
  packages/playground/public/demos/us-tax.html  interactive demo

Modified:
  packages/sdk/src/index.js                 auto-import + export new tasks
  packages/playground/public/index.html     add demo card

Demo features: dynamic line-items invoice, real-time per-item exempt/tax
badges, sales/use tax toggle, full breakdown panel, 2024 bracket table
with active bracket highlighted. Input focus preserved during typing
(display cells updated in-place; input rows rebuilt only on add/remove).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants