Add US Tax Calculator: sales tax + federal income tax tasks#2
Open
sagarm85 wants to merge 22 commits intoprotosphinx:mainfrom
Open
Add US Tax Calculator: sales tax + federal income tax tasks#2sagarm85 wants to merge 22 commits intoprotosphinx:mainfrom
sagarm85 wants to merge 22 commits intoprotosphinx:mainfrom
Conversation
…ex SDK
The original pivot positioned Dhamaka as "a reflex layer for every input
field on the web". That framing was too narrow. Dhamaka is a local AI
capability layer for web apps, and SmartField is just one family of
capabilities inside it. The flagship integration is the formula editor
in erp.ai — which is a completely different call shape from SmartField:
imperative, one-shot, instruction-driven, on a cell formula instead of
an oninput event.
This commit lands the Transform family (the second of four planned
capability families) and reframes the README around four families:
🪞 Reflex — reactive, keystroke-level, rules-first
(SmartField, SmartForm, SmartText, attachSmartPaste)
🔧 Transform — imperative, one-shot, instruction-driven ← new
(Transform, Transform.formula/.explain/.debug)
🔎 Search — semantic search over in-memory data (planned)
🤖 Agent — multi-step tool use with local model (planned)
Transform (packages/sdk/src/transform.js):
- Thin class that accepts { task, input, instruction, context } and
routes through the existing task registry. Falls back to a generic
"instruction over input" prompt when no task is specified.
- Convenience methods: t.formula(input, instr, ctx) / t.explain(input,
ctx) / t.debug(input, ctx) — three lines of app code to integrate
erp.ai-style formula editing.
- Normalises TaskResult shape into a TransformResult with output /
source / confidence / fields / explanation so the caller doesn't
have to unwrap fields.output.
Formula tasks (packages/sdk/src/tasks/formula.js):
- formula-transform: 10 structural rewrite patterns ship at launch,
each matching a common ERP formula edit and producing correct output
with zero model calls. The patterns:
percent-discount "add 10% discount" → (expr) * 0.9
percent-tax "add 8% tax" → (expr) * 1.08
round "round to N decimals" → ROUND(expr, N)
multiply-by "multiply by 1.5" → (expr) * 1.5
divide-by "divide by 100" → (expr) / 100
iferror "wrap in iferror" → IFERROR(expr, 0)
null-safe "handle empty cells" → IFERROR(expr, 0)
currency-convert "convert to EUR" → (expr) * EUR_RATE
negate "negate it" → -(expr)
abs "take absolute value" → ABS(expr)
When none of the patterns match, the task escalates to the LLM slow
path with a well-structured prompt that includes dialect, headers,
and optional grid context.
- formula-explain: table of ~30 common spreadsheet functions mapped to
one-line plain-English glosses (SUM, AVG, IF, IFERROR, VLOOKUP,
XLOOKUP, SUMIFS, INDEX, MATCH, ROUND, TEXT, TRIM, …). For pure
arithmetic the task detects the operation tree instead. LLM fallback
for composite explanations.
- formula-debug: an advice table for every standard error code
(#DIV/0!, #N/A, #REF!, #VALUE!, #NAME?, #NUM!, #NULL!, #SPILL!),
plus static detection of divide-by-cell risk. LLM fallback when the
error is unusual.
All three tasks honour the same rules-first / model-fallback contract
as the Reflex-family tasks, and register themselves automatically
when @dhamaka/sdk is imported (side-effect import of tasks/formula.js
from src/index.js).
packages/sdk/src/index.js:
- New section layout: Reflex family, Transform family, shared infra.
- Exports Transform as a top-level symbol.
- Exports formula{Transform,Explain,Debug}Task for tests and direct use.
- Side-effect imports tasks/formula.js so just doing `import "dhamaka"`
registers every built-in task — apps never have to chase per-family
imports.
README.md:
- New banner chips: on-device / 0 ms / private / $0 / every browser /
offline. Dropped the SmartField-specific chip because that's one
family, not the whole product.
- New tagline: "the local AI capability layer for web apps".
- "What is this" rewritten around four capability families.
- New "the hero use case — formula editing in erp.ai" section that
explains why ERP is the flagship integration (formulas contain the
most sensitive data a company owns, Microsoft's Copilot-for-Excel
is blocked in serious enterprises, every formula edit has to be
free / instant / private to be viable).
- "Other use cases" reorganised by domain (ERP / forms / writing /
internal tools) instead of a single flat list.
- Demos table adds a fourth row for the in-progress formula demo.
- Stack diagram rewritten around two capability family columns
(Reflex, Transform) both funneling into the shared task registry /
reflex service / engine backends.
- Task registry split into Reflex-family and Transform-family
sub-tables, with the three formula tasks listed under Transform.
- API section split into 🪞 Reflex family / 🔧 Transform family with
Transform.formula/.explain/.debug documented under Transform, plus
an example of registering a custom Transform task.
- "What's real today" section updated to list every Transform bit
that ships in this commit, with the formula demo + Transform tests
flagged as in-flight for the next commit.
This commit is intentionally scoped to code + README. Follow-up commits
will land:
1. Transform + formula task unit tests
2. The erp.ai-style formula demo page in the playground
3. An updated docs/GOALS.md reflecting the four-capability-families scope
All 75 existing JS tests still green (the existing suite covers the
Reflex family end-to-end; the Transform family rules layer is
exercised by hand via node -e smoke tests in this commit and will get
proper test coverage in the next commit).
Dhamaka was the wrong name. It means "explosion" in Hindi — the opposite of what this product is. Locus is Latin for "the place", and that's literally the thesis: the locus of intelligence in a web app is the app itself, not a remote server. The data is already in the tab, the schema is already in JS memory, the actions the user can take are already expressed in code. Ship the model to the data, stop sending the data to the model. This commit lands that framing as a manifesto (new top section in both README.md and docs/GOALS.md) and mechanically renames every Dhamaka reference across the tree. Every shipped test still passes against the renamed stack. The rename: - crates/dhamaka-runtime/ → crates/locus-runtime/ - Cargo.toml name: dhamaka-runtime → locus-runtime - Rust ABI exports: dhamaka_* → locus_* (locus_version, locus_alloc, locus_free, locus_init, locus_destroy, locus_reset, locus_set_sampling, locus_feed_prompt, locus_next_token) - Compiled artifact: dhamaka-runtime.wasm → locus-runtime.wasm - WasmEngine updated to call the locus_* exports; default URL is now /runtime/locus-runtime.wasm; wasm-engine.test.js reads from the new path. - npm workspace packages: @dhamaka/* → @locus/* (hub, runtime, extension, playground) - Public SDK package: dhamaka → locus (both the name in package.json and the legacy `Dhamaka` class renamed to `Locus`) - postMessage protocol: dhamaka:ping/get/list/delete/response/error/ progress/ready/request-storage-access → locus:* - IndexedDB names: dhamaka-hub → locus-hub, dhamaka-extension → locus-extension, dhamaka-fallback → locus-fallback - Extension marker: window.__dhamaka_extension__ → window.__locus_extension__ - Environment variables: DHAMAKA_HUB_PORT / DHAMAKA_PLAYGROUND_PORT → LOCUS_HUB_PORT / LOCUS_PLAYGROUND_PORT - Hypothetical hosting URL: hub.dhamaka.dev → hub.locus.dev - All file header comments, all package descriptions, all keyword lists, all CHANGELOG entries, all READMEs, all docs The manifesto: - New ## ✦ the thesis section at the top of README.md (right after the banner) leading with "stop sending the data to the model; ship the model to the data" and framing the four capability families (Reflex, Transform, Search, Agent) as four shapes of one underlying operation: reason over the context the app already has, in the place the app already is. - Mirror section at the top of docs/GOALS.md with the same thesis, a table of the four capability families, and a one-liner that matches the README. The "one thing to remember" section at the bottom of GOALS.md now spells the thesis out explicitly with a decision test: would this call still work if the user's laptop had no network connection and no AI-provider account? If yes, it belongs in Locus. - docs/GOALS.md naming section rewritten — acknowledges the previous name was Dhamaka and explains why Locus is a better fit. - CHANGELOG.md [Unreleased] section documents every rename mechanic and the Transform family + manifesto additions. The banner: - docs/banner.svg block letters redrawn for LOCUS (5 letters instead of 6, different spacing). aria-label / title / desc / tagline all updated to "the local AI capability layer for web apps". Static fallback ASCII in README updated to match. Tests: - 27 Rust cargo tests: all green against the renamed locus_* ABI - 75 JS node --test tests: all green against the renamed @Locus workspace packages and the Locus class - Dev server smoke test: every endpoint including /runtime/ locus-runtime.wasm, /sdk/transform.js, /sdk/tasks/formula.js returns 200 - SDK import smoke test: Locus class resolves, Transform class resolves, all three formula tasks auto-register on import This is a big commit but every change is mechanical and covered by the existing test suite. No behavior changes — just the name.
The previous commit renamed the project to Locus based on a misread of "we can keep the same name no worries" as "keep Locus". The actual intent (later confirmed by the dhamaka.dev domain purchase) was to keep Dhamaka. Reverting the entire rename here — every file, directory, Rust ABI export, postMessage type prefix, environment variable, and URL is back to dhamaka/Dhamaka/DHAMAKA. Mechanically the inverse of commit c04ca5a: - crates/locus-runtime/ → crates/dhamaka-runtime/ - Rust ABI: locus_* → dhamaka_* (dhamaka_init, dhamaka_alloc, …) - locus-runtime.wasm → dhamaka-runtime.wasm (rebuilt from the reverted Cargo.toml, 55 KB, same SHA as the pre-Locus version modulo compile-time entropy) - @locus/* → @dhamaka/* (workspace package names + imports) - `locus` → `dhamaka` (npm package name, legacy SDK class, keyword lists) - postMessage protocol: locus:* → dhamaka:* - IndexedDB names: locus-hub → dhamaka-hub, locus-extension → dhamaka-extension, locus-fallback → dhamaka-fallback - Extension marker: window.__locus_extension__ → window.__dhamaka_extension__ - Environment variables: LOCUS_*_PORT → DHAMAKA_*_PORT - hub.locus.dev → hub.dhamaka.dev (the real domain now, since protosphinx actually owns dhamaka.dev) - All file header comments, README copy, CHANGELOG entries, and GOALS.md naming section Semantic fixes the inverse sed couldn't do on its own: - CHANGELOG.md [Unreleased] section: removed the nonsense "renamed from Dhamaka to Dhamaka" block that resulted from reverting the rename-description text. Left the other Unreleased bullets (Transform family, erp.ai hero case, manifesto thesis, four-family positioning) because those *aren't* reverted. Added a small Notes bullet recording the Locus round-trip so future-me doesn't re-litigate it. - docs/GOALS.md Naming section: rewritten by hand (sed had left a ridiculous "Dhamaka is Latin for 'the place'" paragraph that was actually the Locus etymology). New version acknowledges Dhamaka means "explosion/blast" in Hindi and owns the name — the noise is the point: a quiet piece of code doing a loud thing to cloud-AI economics. The Locus round-trip is documented as a one-line aside. - docs/banner.svg: block letters re-redrawn with the DHAMAKA shape (6 letters, different spacing). aria-label / title / desc / tagline already reverted via sed. - docs/GOALS.md: deduplicated a doubled "When in doubt, optimize for that sentence" line from the "one thing to remember" section. Preserved from commits 028e47c and c04ca5a (the bits that are NOT name-related): - The Transform family: Transform class, formula-transform / formula-explain / formula-debug tasks, the 10 structural rewrite patterns, the 30-function gloss table, the 8-entry error-code advice table. All three tasks auto-register on import via packages/sdk/src/index.js side-effect import of tasks/formula.js. - The thesis / manifesto at the top of docs/GOALS.md and README.md: "stop sending the data to the model, ship the model to the data", the four capability families table, the decision test. - The erp.ai hero use case section in README.md with the Transform example and the domain-specific justification for why local is the only viable integration shape for ERP formulas. Tests: - 27 Rust cargo tests green against the restored dhamaka_* ABI - 75 JS node --test tests green against the restored @dhamaka workspace packages and the Dhamaka class - Dev server smoke test: every endpoint including /runtime/dhamaka-runtime.wasm, /sdk/transform.js, /sdk/tasks/formula.js returns 200 - SDK import smoke test: Dhamaka class + Transform class + all three formula tasks auto-register correctly No consumer-facing code or publish ever shipped under the Locus name — it lived on main for exactly one commit before this revert lands.
Four things in one commit, all aimed at getting a working public demo onto GitHub Pages at protosphinx.github.io/dhamaka/ (with dhamaka.dev attachable later as a custom domain). 1. dhamaka.dev added to the ASCII art Both the animated SVG banner (docs/banner.svg) and the README's static fallback block now carry a "dhamaka.dev" subtitle under the block letters, so the brand and the domain are one visual mark instead of two separate strings. Brand + URL in one glance. 2. Formula editor demo (packages/playground/public/demos/formula.html) An erp.ai-style fake spreadsheet that makes the Transform family concretely visible: - 5 × 5 grid with a Region/Q1/Q2/Total/Growth fake-revenue dataset - Formula bar at the top showing the selected cell's formula - "Ask AI" input below the grid taking natural-language instructions - 9 suggestion chips for the common instructions the rules layer handles: discount, tax, round, null-safe, iferror, multiply, abs, negate, EUR conversion - Cells with formulas are marked with a little italic "f" badge - On apply: the selected cell's formula is rewritten via Transform.formula(), the cell flashes cell-flash-green, and a before/after panel shows the old formula, the new formula, the source (rule / fuzzy / model), confidence, and the human-readable explanation from the pattern-match layer. Every transformation this demo performs resolves entirely in the rules layer — no model call, no network hit. The 10 shipping formula-transform patterns cover the common cases: "add 10% discount" → (expr) * 0.9 "apply 8% tax" → (expr) * 1.08 "round to 2 decimals" → ROUND(expr, 2) "handle empty cells" → IFERROR(expr, 0) "multiply by 1.5" → (expr) * 1.5 "take absolute value" → ABS(expr) "negate it" → -(expr) "convert to EUR" → (expr) * EUR_RATE …etc. This is the hero demo for the erp.ai case study in the README — visitors can now feel what local-inference formula editing is like without anyone running a server, without any AI API key, and without any model bigger than 55 KB. 3. Site build script (packages/playground/build-site.mjs) A zero-dependency Node script that assembles the full static demo site into packages/playground/_site/ so GitHub Pages can serve it. What it does: - Wipes _site/ for a clean build - Copies packages/playground/public/ → _site/ (the HTML, CSS, demos/ subdirectory, everything) - Copies packages/sdk/src/ → _site/sdk/ (so importmap "dhamaka" resolves to ./sdk/index.js) - Copies packages/runtime/src/ → _site/runtime/ - Copies packages/hub/public/runtime/dhamaka-runtime.wasm → _site/runtime/dhamaka-runtime.wasm (so WasmEngine's default URL /runtime/dhamaka-runtime.wasm resolves) - Copies docs/banner.svg → _site/docs/banner.svg - Writes .nojekyll so Pages doesn't try to process _underscore files - Rewrites every HTML importmap to use relative paths. The dev server serves under a root path, but Pages serves under protosphinx.github.io/dhamaka/ — so absolute "/sdk/index.js" references are rewritten to "./sdk/…" at depth 0 or "../sdk/…" at depth 1 (inside demos/). Verified by actually running the script and curl-hitting every endpoint on a local python http server at port 8090 — all 12 endpoints (root, chat, 4 demos, sdk/index, sdk/transform, sdk/tasks/formula, runtime/index, runtime/wasm, build.json) return 200. - Drops a build.json marker with timestamp + commit SHA + run id for traceability. Output: 6 HTML files (index, chat, autofill, spellcheck, paste, formula), the full SDK tree, the full runtime tree, and the 55 KB compiled wasm. About 400 KB total. 4. GitHub Pages workflow (.github/workflows/pages.yml) Triggered on push to main (path-filtered to packages/, crates/, docs/, and the workflow file itself) and on manual dispatch. Two jobs: - build: installs rust + wasm32-unknown-unknown, runs crates/dhamaka-runtime/build.sh, installs Node 22, runs node packages/playground/build-site.mjs, uploads the resulting _site/ via actions/upload-pages-artifact@v3. - deploy: depends on build, uses actions/deploy-pages@v4 with the github-pages environment so the deploy URL lands in the workflow output. Uses the standard concurrency: pages group to serialise deploys and not cancel in-progress ones. To enable the first deploy, the repo owner needs to go to Settings → Pages and set "Source" to "GitHub Actions" (one click, one-time). After that every push to main that touches the relevant paths auto-deploys. .gitignore: ignore packages/playground/_site/ since it's a build output and the Pages workflow rebuilds it from scratch anyway. All 102 tests (27 Rust + 75 JS) still green. No regressions; this commit only adds new files + ASCII art + one new card on the demo index page.
This workflow builds and deploys a Jekyll site to GitHub Pages, with steps for checkout, setup, build, and deployment.
The previous run (a39031f) got build=green / deploy=red X in 4 seconds, which is the signature of actions/deploy-pages@v4 failing its pre-flight call to the Pages API because the site hasn't been fully provisioned yet. Setting the "Source" dropdown to GitHub Actions in Settings → Pages is a necessary but not sufficient first step — the actual Pages site record is only created after the first successful deploy, which creates a chicken-and-egg problem for a workflow that's trying to do that first deploy. Fix: add an `actions/configure-pages@v5` step with `enablement: true` at the top of the build job. That step calls the Pages API with an explicit "create this site if it doesn't exist" flag, so the subsequent deploy-pages step finds a provisioned site and succeeds. This is a no-op on every subsequent run (the site already exists) so leaving it in is harmless.
Two workflows snuck onto main via GitHub's Settings → Pages "Configure" buttons: - jekyll-gh-pages.yml — the Jekyll template from the left "Configure" card on Settings → Pages. This runs the Jekyll builder over the repo root, which has no Jekyll structure at all (no _config.yml, no Gemfile, no layouts/), and would deploy an empty/wrong site. It also fights packages/playground/build-site.mjs on the "pages" concurrency group, so whichever workflow loses the race blocks the correct one. - .github/workflows/w — a "Run a one-line script" CI starter template with the filename still stuck at "w" (someone saved the template before finishing the filename). It's harmless but adds pointless runs on every push. Deleting both. The real Pages workflow is pages.yml, which already exists and (with the previous commit's configure-pages step) should self-provision the Pages site on its first deploy. This is a cleanup-only commit — no behaviour change for the actually- correct workflow.
The spellcheck demo had a 17-entry rules layer (15 confusables + 4
context regexes) that I'd added as a "make the demo feel alive without
a real model" crutch. It worked for the 17 exact patterns and silently
failed on everything else. That contradicts the entire thesis of the
project — which is "let the on-device LLM do the work" — and produced
a demo that was worse for the user than just saying "not implemented
yet".
Pivot: strip all the rules from spellcheck, make the task model-only,
and wire a real cross-browser LLM runtime underneath so the slow path
actually has something to fall through to.
WHAT'S NEW
──────────
packages/runtime/src/transformers-backend.js (new)
- TransformersBackend implements the same Engine interface as every
other backend (WindowAiBackend / WasmEngine / MockEngine), but wraps
@huggingface/transformers v3 loaded lazily from esm.sh via a dynamic
import. The import only fires the first time an engine is actually
instantiated, so pages that don't need a model (e.g. the formula /
autofill / paste demos that use rules-first tasks) pay zero bundle
cost.
- Supports task: "text-generation" | "text2text-generation" |
"fill-mask" | "feature-extraction" with sensible default model
picks per family (SmolLM2-135M, LaMini-Flan-T5-248M,
distilbert-base-uncased, all-MiniLM-L6-v2).
- Forwards a progress_callback so demo pages can render a progress
bar during the first-visit model download. Subsequent visits are
instant because Transformers.js caches in IndexedDB by default.
- Exposes an embed() method for the planned feature-extraction path
(Search family, v0.3).
packages/runtime/src/factory.js
- New priority order: window.ai → transformers → wasm → mock.
- In browsers with window.ai: Gemini Nano wins (free, resident,
GPU-accelerated, shared with the browser).
- In every other browser with WebAssembly + fetch: Transformers.js
wins. Cross-browser real LLM, no API key, no server, no rate limit,
all on-device.
- WasmEngine (our Rust runtime) is still wired in but explicitly
documented as a v2 swap target, not primary. Architecture is done;
Q4 quantization + SIMD128 + real SmolLM2 weights are the pieces
that need to land before it becomes primary.
- MockEngine stays last — Node-only, for tests and SSR.
packages/runtime/src/index.js
- Exports TransformersBackend alongside the other backends so consumers
can instantiate it directly if they want to skip the factory.
packages/sdk/src/tasks.js — spellcheck task rewritten
- DELETED the CONFUSABLES map (15 hardcoded misspellings).
- DELETED the CONTEXT_RULES array (4 hardcoded homophone regexes).
- fast() now unconditionally returns null. There is no rules layer.
Every spellcheck call is a model call.
- slow() builds a "you are a careful proofreader, return JSON" prompt,
calls engine.complete(), and parses a JSON array of
{from, to, reason} objects via a robust extractor that tolerates
model preamble / code fences / malformed entries.
- When no engine is available the task returns an empty suggestion
list rather than inventing something. Silence beats fiction.
packages/sdk/src/reflex.js
- configure() documents the new options: backend: "transformers",
model, task, cdn, onProgress. Callers can now set up Transformers.js
from demo pages without touching the factory directly.
packages/sdk/test/tasks.test.js — spellcheck tests rewritten
- Removed the 4 semantic assertions that depended on the rules layer
("catches recieve → receive", "catches homophone in context",
"catches teh → the", "clean input has zero suggestions").
- Added 6 contract tests: fast() always returns null, slow() skips
empty input without calling the engine, slow() calls the engine and
parses a JSON array, slow() extracts JSON embedded in preamble,
slow() returns empty suggestions on malformed JSON, slow() drops
entries without valid from/to strings. These test the contract, not
semantic behavior that only a real model can deliver.
- Net: 75 → 77 JS tests, all green.
packages/playground/public/demos/spellcheck.html — demo rewritten
- Eagerly warms the engine on page load instead of lazily loading on
first keystroke. Shows a status card with a progress bar and
explicit "first visit: ~250 MB download, then offline forever"
disclosure while Transformers.js downloads LaMini-Flan-T5-248M.
- Uses reflex.configure({ backend: "transformers", task:
"text2text-generation", model: "Xenova/LaMini-Flan-T5-248M",
onProgress: ... }) to route the whole task through the new backend.
- Textarea is disabled until the model is ready, then enables and
prompts the user to type.
- Every debounced input event (600 ms) fires a SmartText call which
hits spellcheckTask.slow() which hits engine.complete() which hits
Transformers.js which hits the cached model. Real LLM, every time,
no rules hiding anything.
- Copy updated: no more references to hardcoded patterns, honest
about the first-visit cost, explicit about the formula demo still
keeping rules (because those are deterministic and rules there are
a performance feature, not a crutch).
All importmaps
- Added "@huggingface/transformers": "https://esm.sh/@huggingface/
transformers@3" to the importmap in every demo page + the chat
page. build-site.mjs's relative-path rewriter correctly leaves
absolute https:// URLs alone (only rewrites /sdk/… and /runtime/…
prefixes), verified by rebuilding _site/ and grepping the output.
docs/GOALS.md
- Expanded the Non-goals section to explicitly state: "Dhamaka is the
product layer above the runtime. It is not the runtime itself."
- Called out that @huggingface/transformers is the runtime, window.ai
is the runtime on Chrome, and the Rust crate is a v2 swap target
that is explicitly NOT the critical path for shipping demos in 2026.
- Added a new bullet: "Not hardcoding task semantics". Spellcheck is
model-only forever. Smart paste is model-first with regex fast-
paths for obviously-structured fragments. Formula transformation
keeps rules for the small set of deterministic rewrites because
those have objectively-correct structural answers.
README.md
- Stack diagram rewritten: engine backends section now shows
window.ai / Transformers.js / MockEngine as the three active paths,
with the Rust crate marked as a v2 swap target.
- "The shape that matters" paragraph rewritten to be explicit that
Dhamaka is the product layer above the runtime, and that trying to
be both the product layer AND the runtime means fighting HuggingFace
on a layer they'll always win.
- "The engine backends" section rewritten to show 4 implementations
in priority order with honest tradeoffs (window.ai = free+fast+
Chrome-only, Transformers.js = real LLM+cross-browser+first-visit
download, WasmEngine = v2 target, MockEngine = tests only).
- "What's real today" rewritten: the Reflex spellcheck task is
documented as model-only with NO rules, explicit about the thesis.
The city-to-state and paste-extract tasks are documented as rules-
first with model long-tail, with honest explanations of why rules
are legitimate there.
TESTS
─────
- 27 Rust cargo tests green
- 77 JS node --test tests green (up from 75: +4 new spellcheck
contract tests, -2 removed rule-based assertions, net +2)
- build-site.mjs assembles _site/ with the new importmap entries
intact (absolute https://esm.sh URLs pass through the relative-
path rewriter unchanged)
CAVEATS YOU SHOULD KNOW
───────────────────────
- First visit to the spellcheck demo on a browser without window.ai
downloads ~250 MB of LaMini-Flan-T5-248M. This is unavoidable: the
whole point of on-device AI is paying a one-time download cost so
every subsequent call is free and private. The demo is explicit
about this on the status card and the fineprint.
- I cannot end-to-end test this commit from the sandbox because there's
no outbound network, so I can't download the model to run through.
I've verified: the code compiles, every import resolves, every test
passes, the importmap rewrite is correct, the SDK imports cleanly
with TransformersBackend exported. The first real "does this
download the model and produce corrections in a browser" check
happens on the deployed Pages site once the workflow runs.
- The WasmEngine (our Rust runtime) is demoted to priority 3 in the
factory. It still ships, still has all 27 tests, still compiles to
the same 55 KB .wasm — but it's no longer the thing that drives
the spellcheck demo. That role belongs to Transformers.js until the
Rust crate has quantization + SIMD + real weights.
Previous commit (66d4176) wired the spellcheck demo to Xenova/LaMini-Flan-T5-248M and prompted it with "you are a proofreader, return a JSON array of corrections". In the deployed demo this was both slow (~9.5s per call) and wrong (returned "looks clean" on obvious gibberish like "sdasd asdasd asd"). Both failures trace to the same mistake: LaMini-Flan-T5 is a generic 248M instruction-follower, not a spellchecker, and at that parameter count it's below the quality threshold to reliably follow a structured JSON prompt on free-form text. Asking a too-small instruction model to do spellcheck via prompting is architecturally wrong. Fix: switch to the correct tool — a masked language model — and the correct algorithm — per-word masking. ────────────────────────────────────────── 1. Model: Xenova/distilbert-base-uncased ────────────────────────────────────────── distilBERT's masked-LM head was literally trained for "given a context, predict the masked token". That's the spellchecker algorithm: mask a word, ask the model what should go there, if the original isn't in the top predictions then it's likely misspelled. - Size: ~65 MB (vs ~250 MB for LaMini-Flan-T5-248M). - Per-call latency: ~100–300 ms per masked word in WASM on a laptop (vs ~9500 ms per full text call for LaMini). - Purpose-built: no prompt engineering, no JSON parsing, no hallucinated answers, no "the model said looks clean on gibberish" failure mode. ────────────────────────────────────────── 2. TransformersBackend: fillMask() + maskToken ────────────────────────────────────────── - Added a public fillMask(input, topK) method that returns a structured Array<{token, score}> sorted by score desc. For multi-mask input it returns the first mask's predictions (single-mask is the spellcheck use case). - Added a maskToken getter that surfaces the model's mask token string (e.g. "[MASK]" for BERT-family, "<mask>" for RoBERTa-family). Callers need this to construct valid masked input. - load() now caches the mask token from the loaded tokenizer so later calls don't have to re-query it. - complete() on a fill-mask task delegates to fillMask() and returns a JSON-stringified result so it still satisfies the Engine contract for callers that don't know to use the structured method. ────────────────────────────────────────── 3. spellcheckTask: per-word masking algorithm ────────────────────────────────────────── - fast() still returns null (no rules — the whole thesis). - slow() now: 1. checks the engine exposes fillMask() (graceful fallback: return an empty result with a clear error string if not). 2. tokenises the input with /\b[A-Za-z][A-Za-z']*\b/g. 3. drops short words (<3 chars) and stoplist words ("the", "a", "is", "are", …) to avoid wasted model calls and trivial false positives. 4. caps at MAX_WORDS_PER_CALL = 40 so huge inputs don't spam the model. 5. for each surviving candidate word: a. builds a masked sentence with exactly that word replaced by the model's mask token. b. calls engine.fillMask(masked, top_k=20). c. if the original word (case-insensitively) is not in the top-K token strings (or the stripped WordPiece form), flags it. d. collects up to 3 alternative suggestions from the top-K, filtered to real whole words (stripping `##` subword prefixes, dropping non-letter tokens). 6. returns { from, to, alternatives, index, reason } per suggestion. - A single failing fillMask call (e.g. rare model error) is caught and logged; the run continues on the remaining words. ────────────────────────────────────────── 4. Demo page: distilBERT, new copy, new diagram ────────────────────────────────────────── - reflex.configure({ backend: "transformers", task: "fill-mask", model: "Xenova/distilbert-base-uncased", onProgress: ... }). - Copy updated: "~65 MB" instead of "~250 MB", "10–30 seconds" instead of "30–90 seconds", "distilBERT" instead of "LaMini-Flan-T5". - The "what's happening under the hood" diagram now shows the per-word masking loop: for each word → build masked sentence → engine.fillMask → check top-K → flag + suggest. - Debounce tightened from 600 ms to 400 ms since per-word masking is fast enough to feel more responsive. - Ready-message in the status card explains the algorithm instead of promising generic "corrections come back in under a second". ────────────────────────────────────────── 5. Cache-busting in build-site.mjs ────────────────────────────────────────── The previous commit hit a real problem on your first real test: the new spellcheck.html rendered but it was paired with the PREVIOUS commit's factory.js, which didn't know about backend: "transformers" and fell through to WasmEngine with a /runtime/dhamaka-runtime.wasm 404. The cause is GitHub Pages serving static files with Cache-Control: max-age=600, so every deploy has a 10-minute window where the browser happily mixes new HTML with stale JS. The fix is a cache-busting query string on every importmap URL: "dhamaka": "./sdk/index.js?v=abc1234" Every deploy generates a new short SHA, every URL becomes distinct, browsers can't cache across deploys. build-site.mjs now: - Reads the current HEAD SHA from .git/HEAD (or GITHUB_SHA in CI), without shelling out to git. Handles loose refs and packed-refs. - Appends ?v=<shortSha> to every ./sdk/... and ./runtime/... URL in every demo HTML's importmap during the subdirectory rewrite. - Records both the full SHA and short SHA in build.json so the /build.json diagnostic I wrote about in the previous session now tells you exactly which commit is live. ────────────────────────────────────────── 6. Tests ────────────────────────────────────────── Tasks test rewritten for the new contract: - fast() always returns null - slow() short-circuits empty input without calling the engine - slow() refuses engines that don't expose fillMask() - slow() flags words whose top-K predictions don't include them, doesn't flag words that ARE in their top-K - slow() skips stoplist / short words without wasting mask calls - slow() strips WordPiece ## prefixes from suggestions - slow() tolerates a single mask call failure without killing the run 8 spellcheck tests total (up from 6 in the previous commit, net +2). 77 JS tests → 78 JS tests, all green. 27 Rust tests still green. 105 total tests. ────────────────────────────────────────── Verification status ────────────────────────────────────────── Local: - node --check across every modified JS file: pass - cargo test (27): pass - npm test (78): pass - node packages/playground/build-site.mjs: assembles _site/ with cache-busted importmap (?v=66d4176) and /build.json containing both full and short SHA Pages deploy: - Not yet verified. I still can't outbound to github.io from this sandbox. The user will verify in their browser once the Pages workflow runs the new commit. The cache-busting means the user will NOT need to hard-refresh this time — every importmap URL is a fresh resource.
…caveat Follow-up to f5b110a. The distilBERT fill-mask algorithm is correct, but on the user's first real test the demo output was dominated by two-character junk suggestions like "xxx → da", "asdsd → cd", "asdasd → xx". Three problems, fixed here: 1. SUGGESTION FILTER WAS TOO LAX MIN_SUGGESTION_LEN is now 3 (matches MIN_WORD_LEN for worth-checking words), and the filter additionally requires ≥1 vowel (a/e/i/o/u/y). This rejects WordPiece fragments that happen to be valid letter sequences but are not real English words: "xx", "cd", "sd", "xxx", "ght", etc. These are in distilBERT's vocabulary because they appear as subword pieces in longer words (sundae, CDs, Canada, rights) but they're not plausible whole-word corrections. 2. DEMO TRY-LIST MADE IT EASY TO HIT THE PATHOLOGICAL CASE The only inputs the previous demo copy suggested were the old rule- era examples ("I'll see you their tomorrow"), and the placeholder was "start typing…". So users instinctively typed gibberish ("sdasd asdasd") to test, and masked-LM spellcheck on pure gibberish has no meaningful context to predict from — the suggestions for it are also gibberish. That's not a bug, it's a property of the algorithm, but it looks broken in a demo. Fixed: added three "Try:" chips to the demo page with real sentences that demonstrate the algorithm working on realistic input: - "I recieve the package tommorow and it will seperate our stuff" - "The goverment has definately been occuring alot this year" - "She went untill the store to meet her freind yestarday" Clicking a chip populates the textarea and fires the check. Plus an explicit caveat below: "Masked-LM spellcheck works best on real prose with real misspellings. Pure gibberish gets flagged correctly, but the suggestions will be nonsense too — that's a property of the algorithm, not a bug." 3. NO-ALTERNATIVE FLAGS WERE BEING HIDDEN The previous code did `if (!alts.length) continue;` which meant a flagged word with no plausible alternatives (i.e. the top-K is all junk) was dropped from the suggestion list entirely. That made the task look like it was underreporting. The fix: still flag the word with `to: null` and `alternatives: []`, so the chip UI can render it as "word → ?" — visually communicates "I caught this but have nothing useful to suggest here" instead of silently dropping it. TransformersBackend + spellcheckTask: - New MIN_SUGGESTION_LEN constant = 3. - New isPlausibleWord(token) helper that enforces length + letters-only + ≥1 vowel. - slow() no longer drops flagged words with empty alternatives — it emits them with `to: null`. - Reason string splits into "not in top predictions" (has alts) and "not in top predictions, and none of the predictions are plausible words" (no alts) so debugging is clearer. Demo page (spellcheck.html): - New "Try:" section with 3 clickable example chips. - Wired the chips: clicking populates .value and dispatches an input event so SmartText kicks off the check immediately. - New .try-chip CSS (pill-shaped, hover highlight in accent color). - New .suggest.no-alts CSS (the "?" is rendered in italic muted grey). - Suggestion renderer handles `to === null`: renders "?" instead of the string, adds the .no-alts class, skips the click-to-apply handler since there's nothing to apply. Tests (tasks.test.js): - +3 new tests: * rejects 2-char suggestions (xx, cd, da, sd) * rejects consonant-only tokens (xxx, ght) via the vowel filter * still-flag behaviour: when all top-K are junk, the word is flagged with to: null, alternatives: [], and an explanatory reason - 78 → 81 JS tests, all green. 27 Rust tests still green. 108 total. Caveat: this doesn't turn the demo into Grammarly. Masked-LM spellcheck on distilBERT will still make mistakes on homophones with weak context, and will still produce thin suggestions for uncommon misspellings. Those are inherent limitations of a 65 MB masked LM running in a browser tab. The fix path for those cases is a bigger model (BERT-base at ~400 MB) or window.ai's Gemini Nano on Chrome. But within those limits, the demo now correctly shows realistic misspellings getting caught with real-word suggestions, not gibberish-for-gibberish noise.
The cache-busting I added in f5b110a only rewrote importmap ENTRIES (the bare specifiers "dhamaka" and "@dhamaka/runtime"). That made the browser fetch a fresh sdk/index.js?v=SHA on each deploy, but every RELATIVE import inside that module (e.g. `import "./tasks.js"`) still resolved to an unversioned URL. The browser happily served those internal modules from cache across deploys, producing "new index.js, old tasks.js" — the exact mechanism that was still poisoning the spellcheck demo with old filter code on commit 2a0e704. Proof: in 2a0e704 the suggestion filter was tightened to reject <3-char tokens and consonant-only tokens. But the user's test on the deployed site still showed `okok → h` and `hhhh → hh` — 1- and 2-char suggestions that the new filter should have rejected. The only explanation: tasks.js was still running pre-2a0e704 code because the browser had cached it. Fix: build-site.mjs now does a second pass after copying _site/sdk/ and _site/runtime/. It walks every .js file in those trees and regex-rewrites every relative import (./ or ../, ending in .js, no existing query string) to include ?v=<shortSha>: from "./tasks.js" → from "./tasks.js?v=2a0e704" from "./data/cities.js" → from "./data/cities.js?v=2a0e704" from "../runtime/index.js" → from "../runtime/index.js?v=2a0e704" Bare specifiers like "dhamaka" and "@dhamaka/runtime" are NOT rewritten by this pass (the importmap already cache-busted those in step 8a). The rewrite regex: /(["'])(\.\.?\/[^"'?\s]+?\.js)(["'])/g Matches any quoted string literal that starts with ./ or ../, ends in .js, and has no existing ? query — covering import statements, export statements, side-effect imports, and dynamic import() calls. Avoids false positives on bare specifiers, https URLs, or already-busted paths. Local verification (node packages/playground/build-site.mjs): rewrote importmaps in 6 html files (cache-bust v=2a0e704) cache-bust: rewrote relative imports in 16 / 21 js files Sample of the rewritten files: _site/sdk/index.js: import { HubClient } from "./hub-client.js?v=2a0e704"; export { SmartField } from "./smart-field.js?v=2a0e704"; ... _site/runtime/index.js: export { TransformersBackend } from "./transformers-backend.js?v=2a0e704"; ... _site/sdk/tasks.js: import { findCity, findCityFuzzy } from "./data/cities.js?v=2a0e704"; Every module path now carries the deploy's short SHA as a query string, so the browser treats each deploy's module graph as a completely separate set of resources. Cache can never serve the old spellcheck task against new HTML again. The 5 files without imports (engine.js, data/cities.js, leaf tokenizer files) were correctly left alone. All 27 Rust tests + 81 JS tests still green. This commit only touches the build script, not any runtime or product code. This is the SECOND cache-busting fix in two commits, which is embarrassing — I should have caught the relative-import propagation issue the first time. Reality: the importmap-only cache-bust is a known-incomplete pattern and I underestimated how much ES-module relative-import behavior breaks the naive approach. Done properly now.
… README - 18 Playwright browser tests covering all three demos (autofill, spellcheck, paste) - Benchmark suite: task pipeline (10k iterations), WASM runtime (cold start + tok/s), browser end-to-end latency via Playwright - GitHub Pages workflow + build script to deploy the playground as a static site - README updated with real measured numbers: 0.2 ms autofill, 0.54 ms WASM cold start, 55 KB binary, 120 total tests. Use cases split into shipping vs planned. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ing detection The model-only spellcheck required a 65MB distilBERT download before anything worked. Now common misspellings (120+ confusables) and homophones are caught instantly by rules (<1ms), with the model providing long-tail coverage when loaded. The demo textarea is enabled immediately instead of waiting for model download. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Every demo page had import maps with root-absolute paths (/sdk/index.js, /runtime/index.js) which resolve correctly on localhost:5173 but 404 on GitHub Pages where the site lives at /dhamaka/. Changed all import maps to relative paths (../sdk/index.js for demos/, ./sdk/index.js for root). This was the reason ALL demos appeared as empty shells on the deployed site — zero JavaScript loaded. Also adds e2e tests for the formula editor demo (5 tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… distance filter The masked-LM was producing garbage suggestions like "how → ckey" and "why → doing" when input context was noisy. Three protections added: 1. Context quality gate: skip model entirely when <40% of words are recognized English (gibberish input can't provide useful context) 2. KNOWN_WORDS set (~300+ words): common English words the model should never flag, regardless of what the masked-LM predicts 3. Edit distance filter: model suggestions must be within Levenshtein distance 3 of the original word to prevent context-based false positives (e.g., "table" → "chair") Also expanded STOPLIST from ~50 to ~200+ entries covering question words, common verbs, adjectives, and nouns. Updated all 6 affected unit tests to use realistic English inputs that pass the quality gate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes: 1. SmartForm clears target fields when there's no match. Previously, intermediate keystroke matches (e.g., "new" fuzzy-matching "nyc" while typing "newport") would stick forever because SmartForm only set values, never cleared them. 2. Fuzzy matcher caps edit distance at 1 for short queries (< 5 chars). "new" was matching "nyc" at distance 2, which is 67% of the input wrong — not a typo. Longer queries like "San Francsico" still get distance-2 matching. 3. Added Newport (RI) and Providence (RI) to the cities gazetteer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Removed readonly from state/country/timezone/currency inputs so users can manually correct autofill results. SmartForm already respects manual edits (locks the field from further auto-fill). - Added Arlington (TX), Columbus, Cleveland, Cincinnati, Indianapolis, Kansas City, St. Louis, Richmond, Virginia Beach, Madison, Milwaukee, Omaha, Louisville, Oklahoma City to the gazetteer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The gazetteer was a 100-city static list — useless for the long tail. Now the autofill demo loads SmolLM2-135M-Instruct (via @huggingface/ transformers) in the background. Common cities still resolve instantly from the gazetteer; everything else is answered by the on-device LLM with no server call. Changes: - Rewrote cityToStateTask.slow() with a few-shot prompt that works well on small models (pattern continuation > JSON generation) - Autofill demo configures reflex with text-generation backend, loads model in background, shows download progress, and re-runs the current query when model finishes loading - Added 3 unit tests for the new slow() path (parsing, empty input, missing engine) - Updated demo copy to reflect the LLM-powered architecture Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… city Geographic data is deterministic — an LLM adds latency and unreliability for what's fundamentally a table lookup. Rewrote cities.js with a compact builder format covering: - US: all 50 state capitals + all cities > 100k population (~300 cities) - India: 70 cities including Kanpur, Lucknow, Jaipur, Ahmedabad, etc. - China: 22 cities, Japan: 10, South Korea: 5 - Europe: 100+ cities across 20 countries - Latin America: 50+ cities across 15 countries - Africa/Middle East: 40+ cities across 20 countries - Canada: 27, UK: 27, Australia: 10, New Zealand: 5 Added 200ms debounce to the autofill SmartField so the LLM fallback (for truly obscure cities) doesn't fire on every keystroke. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The on-device SmolLM2 model added 2.5s latency and returned wrong data (e.g. Kanpur → "Punjab, United States"). The 721-city gazetteer resolves instantly with correct results. Stripped reflex.configure/ensure, model progress UI, and @huggingface/transformers import. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two new tasks following the rules-first pattern:
us-sales-tax — 50-state rate table, 5 product categories (grocery,
clothing, digital, medicine, general), per-state
exemptions and reduced rates (AR, IL, TN, UT, VA, NC,
MO). Sales tax uses seller-state rates; use tax flips to
buyer-state. LLM slow path for nexus edge cases.
us-federal-tax — 2024 IRS marginal bracket tables for single, married
filing jointly, and head of household. Standard deduction
applied before bracket walk. Returns taxOwed,
effectiveRate, marginalRate, per-bracket breakdown.
LLM slow path for credits / itemized deductions.
New files:
packages/sdk/src/tasks/us-tax.js tasks + static data tables
packages/sdk/test/us-tax.test.js 37 tests (all passing)
packages/playground/public/demos/us-tax.html interactive demo
Modified:
packages/sdk/src/index.js auto-import + export new tasks
packages/playground/public/index.html add demo card
Demo features: dynamic line-items invoice, real-time per-item exempt/tax
badges, sales/use tax toggle, full breakdown panel, 2024 bracket table
with active bracket highlighted. Input focus preserved during typing
(display cells updated in-place; input rows rebuilt only on add/remove).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two new tasks following the rules-first pattern:
us-sales-tax — 50-state rate table, 5 product categories (grocery,
clothing, digital, medicine, general), per-state
exemptions and reduced rates (AR, IL, TN, UT, VA, NC,
MO). Sales tax uses seller-state rates; use tax flips to
buyer-state. LLM slow path for nexus edge cases.
us-federal-tax — 2024 IRS marginal bracket tables for single, married
filing jointly, and head of household. Standard deduction
applied before bracket walk. Returns taxOwed,
effectiveRate, marginalRate, per-bracket breakdown.
LLM slow path for credits / itemized deductions.
New files:
packages/sdk/src/tasks/us-tax.js tasks + static data tables
packages/sdk/test/us-tax.test.js 37 tests (all passing)
packages/playground/public/demos/us-tax.html interactive demo
Modified:
packages/sdk/src/index.js auto-import + export new tasks
packages/playground/public/index.html add demo card
Demo features: dynamic line-items invoice, real-time per-item exempt/tax
badges, sales/use tax toggle, full breakdown panel, 2024 bracket table
with active bracket highlighted. Input focus preserved during typing
(display cells updated in-place; input rows rebuilt only on add/remove).
Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com