GrayCodeAI · Patel230 · Jun 30, 2026 · Jun 26, 2026 · Jun 26, 2026 · Jun 26, 2026
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -0,0 +1,61 @@
+# Tok Benchmark Publication Workflow
+
+This directory is the publication surface for Tok benchmarks.
+
+It complements:
+
+- `tok/benchmarks/run.sh` for the offline quality and root-package benchmark wrapper
+- `tok/evals/pipeline-bench.sh` for pipeline latency and allocation benchmarks
+- `tok/benchmarks/quality/` for the offline compression-quality harness
+- `tok/benchmarks/manifests/` for machine-readable suite definitions
+
+## Official suite ids
+
+- `tok-quality-core`
+- `tok-pipeline-microbench`
+
+See the Hawk-side benchmark registry at `hawk/docs/benchmarks/SUITES.md`.
+
+## Current published baselines
+
+- `tok-quality-core`: `results/tok-quality-core/2026-06-27/`
+- `tok-pipeline-microbench`: `results/tok-pipeline-microbench/2026-06-27/`
+
+## Current runnable commands
+
+```bash
+/bin/zsh -lc 'GOCACHE=$PWD/.gocache ./benchmarks/run.sh'
+/bin/zsh -lc 'GOCACHE=$PWD/.gocache ./evals/pipeline-bench.sh'
+```
+
+## Publication layout
+
+Recommended committed layout:
+
+```text
+benchmarks/
+  README.md
+  manifests/
+    tok-quality-core.yaml
+    tok-pipeline-microbench.yaml
+  results/
+    tok-quality-core/
+      2026-06-27/
+        report.md
+        result.txt
+        notes.md
+    tok-pipeline-microbench/
+      2026-06-27/
+        report.md
+        result.txt
+        notes.md
+```
+
+## Promotion rule
+
+Do not cite a Tok benchmark as evidence in cross-project comparison docs unless:
+
+1. the command used is recorded
+2. the repo state is identified as a committed snapshot or workspace snapshot
+3. the benchmark scope is stated clearly
+4. the raw output or generated report is committed
diff --git a/benchmarks/manifests/README.md b/benchmarks/manifests/README.md
@@ -0,0 +1,9 @@
+# Tok Benchmark Manifests
+
+This directory is the machine-readable registry for Tok benchmark suites.
+
+Rules:
+
+- manifests must match the suite ids documented in `../README.md`
+- `published: false` means the suite is official but no committed run is present yet
+- publication directories referenced by manifests must exist
diff --git a/benchmarks/manifests/tok-pipeline-microbench.yaml b/benchmarks/manifests/tok-pipeline-microbench.yaml
@@ -0,0 +1,20 @@
+suite: tok-pipeline-microbench
+status: shipped
+published: true
+kind: pipeline-latency-and-allocation
+owner: tok
+source:
+  - tok/evals/pipeline-bench.sh
+  - tok/internal/filter/pipeline_bench_test.go
+  - tok/internal/filter/optimizations_benchmark_test.go
+runner:
+  command: /bin/zsh -lc 'GOCACHE=$PWD/.gocache ./evals/pipeline-bench.sh'
+result_format:
+  primary: text_benchmark_output
+metrics:
+  - ns_per_op
+  - bytes_per_op
+  - allocs_per_op
+publication_dir: tok/benchmarks/results/tok-pipeline-microbench
+notes:
+  - This suite measures the filter pipeline directly through Go benchmark entrypoints.
diff --git a/benchmarks/manifests/tok-quality-core.yaml b/benchmarks/manifests/tok-quality-core.yaml
@@ -0,0 +1,24 @@
+suite: tok-quality-core
+status: shipped
+published: true
+kind: offline-compression-quality
+owner: tok
+source:
+  - tok/benchmarks/run.sh
+  - tok/benchmarks/quality/quality.go
+  - tok/benchmarks/quality/cmd/main.go
+runner:
+  command: /bin/zsh -lc 'GOCACHE=$PWD/.gocache ./benchmarks/run.sh'
+result_format:
+  primary: markdown_report
+  secondary: text_benchmark_output
+metrics:
+  - compression_ratio
+  - char_retention
+  - rouge1_fidelity_proxy
+  - ns_per_op
+  - bytes_per_op
+  - allocs_per_op
+publication_dir: tok/benchmarks/results/tok-quality-core
+notes:
+  - This suite combines root-package microbenchmarks with the offline compression-quality harness.
diff --git a/benchmarks/results/README.md b/benchmarks/results/README.md
@@ -0,0 +1,20 @@
+# Published Tok Benchmark Runs
+
+This directory is reserved for benchmark runs that are important enough to treat as evidence.
+
+Expected per-run files:
+
+- `report.md`
+- `result.txt`
+- `notes.md`
+
+Only commit runs that are intended to serve as:
+
+- baselines
+- release notes evidence
+- comparison evidence in docs
+
+Current published runs:
+
+- `tok-quality-core/2026-06-27/`
+- `tok-pipeline-microbench/2026-06-27/`
diff --git a/benchmarks/results/tok-pipeline-microbench/2026-06-27/notes.md b/benchmarks/results/tok-pipeline-microbench/2026-06-27/notes.md
@@ -0,0 +1,14 @@
+# Provenance Notes
+
+- Command:
+  `/bin/zsh -lc 'GOCACHE=$PWD/.gocache ./evals/pipeline-bench.sh'`
+- Verification:
+  benchmark command exited successfully with `PASS`
+- Environment:
+  local workspace run on `2026-06-27`
+  `goos=darwin`
+  `goarch=arm64`
+  `cpu=Apple M1`
+- Caveats:
+  pipeline timings are local benchmark measurements rather than CI medians
+  this suite measures the internal filter pipeline, not end-to-end answer quality
diff --git a/benchmarks/results/tok-pipeline-microbench/2026-06-27/report.md b/benchmarks/results/tok-pipeline-microbench/2026-06-27/report.md
@@ -0,0 +1,34 @@
+# Tok Pipeline Microbenchmark Baseline
+
+## Metadata
+
+- Suite: `tok-pipeline-microbench`
+- Date: `2026-06-27`
+- Model: none
+- Provider: none
+- Commit: current workspace snapshot
+- Command: `/bin/zsh -lc 'GOCACHE=$PWD/.gocache ./evals/pipeline-bench.sh'`
+
+## Headline Metrics
+
+- ProcessSmall: `20342 ns/op`, `8707 B/op`, `90 allocs/op`
+- ProcessMedium: `78233 ns/op`, `44992 B/op`, `70 allocs/op`
+- ProcessWithBudget: `43841 ns/op`, `24232 B/op`, `45 allocs/op`
+- EstimateTokens Small: `18.89 ns/op`
+- EstimateTokens Medium: `8442 ns/op`
+- EstimateTokens Large: `127261 ns/op`
+- Layer Entropy: `3508 ns/op`, `1930 B/op`, `22 allocs/op`
+- Layer Perplexity: `3775 ns/op`, `1607 B/op`, `27 allocs/op`
+- ProcessParallel: `1637 ns/op`, `3104 B/op`, `19 allocs/op`
+
+## Notes
+
+- This is the first committed repo-owned baseline artifact for `tok-pipeline-microbench`.
+- The runner targets the internal filter pipeline directly through Go benchmark entrypoints.
+- The script completed successfully and printed the benchmark summary footer.
+
+## Comparison Summary
+
+- Previous baseline: none committed
+- Change since baseline: initial published baseline
+- Interpretation: the core filter pipeline stays in the tens-of-microseconds range for small and medium process cases, with low-single-microsecond parallel path timings on this local harness.
diff --git a/benchmarks/results/tok-pipeline-microbench/2026-06-27/result.txt b/benchmarks/results/tok-pipeline-microbench/2026-06-27/result.txt
@@ -0,0 +1,19 @@
+==> tok pipeline benchmarks (internal/filter)
+
+goos: darwin
+goarch: arm64
+pkg: github.com/GrayCodeAI/tok/internal/filter
+cpu: Apple M1
+BenchmarkPipeline_ProcessSmall-8        	   58953	     20342 ns/op	    8707 B/op	      90 allocs/op
+BenchmarkPipeline_ProcessMedium-8       	   15304	     78233 ns/op	   44992 B/op	      70 allocs/op
+BenchmarkPipeline_ProcessWithBudget-8   	   27705	     43841 ns/op	   24232 B/op	      45 allocs/op
+BenchmarkEstimateTokens_Small-8         	63821442	        18.89 ns/op	       0 B/op	       0 allocs/op
+BenchmarkEstimateTokens_Medium-8        	  142575	      8442 ns/op	       0 B/op	       0 allocs/op
+BenchmarkEstimateTokens_Large-8         	    9369	    127261 ns/op	       0 B/op	       0 allocs/op
+BenchmarkLayer_Entropy-8                	  334741	      3508 ns/op	    1930 B/op	      22 allocs/op
+BenchmarkLayer_Perplexity-8             	  317840	      3775 ns/op	    1607 B/op	      27 allocs/op
+BenchmarkPipeline_ProcessParallel-8     	  729633	      1637 ns/op	    3104 B/op	      19 allocs/op
+PASS
+ok  	github.com/GrayCodeAI/tok/internal/filter	13.689s
+
+pipeline-bench: complete (see ns/op + B/op + allocs/op above)
diff --git a/benchmarks/results/tok-pipeline-microbench/README.md b/benchmarks/results/tok-pipeline-microbench/README.md
@@ -0,0 +1,13 @@
+# `tok-pipeline-microbench` Published Runs
+
+Current published runs:
+
+- `2026-06-27/`
+
+Each published run includes:
+
+- `report.md`
+- `result.txt`
+- `notes.md`
+
+Reference manifest: `../../manifests/tok-pipeline-microbench.yaml`
diff --git a/benchmarks/results/tok-quality-core/2026-06-27/notes.md b/benchmarks/results/tok-quality-core/2026-06-27/notes.md
@@ -0,0 +1,14 @@
+# Provenance Notes
+
+- Command:
+  `/bin/zsh -lc 'GOCACHE=$PWD/.gocache ./benchmarks/run.sh'`
+- Verification:
+  `/bin/zsh -lc 'GOCACHE=$PWD/.gocache go test ./... -count=1'`
+- Environment:
+  local workspace run on `2026-06-27`
+  `goos=darwin`
+  `goarch=arm64`
+  `cpu=Apple M1`
+- Caveats:
+  root-package benchmark timings are point-in-time local measurements, not CI medians
+  the quality harness uses a checked-in offline sample corpus rather than an external public benchmark dataset
diff --git a/benchmarks/results/tok-quality-core/2026-06-27/report.md b/benchmarks/results/tok-quality-core/2026-06-27/report.md
@@ -0,0 +1,36 @@
+# Tok Quality Core Baseline
+
+## Metadata
+
+- Suite: `tok-quality-core`
+- Date: `2026-06-27`
+- Model: none
+- Provider: none
+- Commit: current workspace snapshot
+- Command: `/bin/zsh -lc 'GOCACHE=$PWD/.gocache ./benchmarks/run.sh'`
+
+## Headline Metrics
+
+- CountTokens 100B: `97.64 ns/op`, `1024.19 MB/s`
+- CountTokens 100KB: `133845 ns/op`, `765.06 MB/s`
+- Compress 100B Minimal: `43500 ns/op`, `63080 B/op`, `222 allocs/op`
+- Compress 100B Aggressive: `31372 ns/op`, `62684 B/op`, `221 allocs/op`
+- Compress 100KB Minimal: `1485713 ns/op`, `1190139 B/op`, `244 allocs/op`
+- Compress 100KB Aggressive: `1492485 ns/op`, `1184748 B/op`, `236 allocs/op`
+- BPEEncode 100KB: `131183 ns/op`, `780.59 MB/s`
+- Offline quality harness summary:
+  - `surface`: ratio `0.921`, char retention `0.950`, fidelity `0.973`
+  - `trim`: ratio `0.922`, char retention `0.950`, fidelity `0.975`
+  - `extract`: ratio `0.456`, char retention `0.382`, fidelity `0.368`
+
+## Notes
+
+- This is the first committed repo-owned baseline artifact for `tok-quality-core`.
+- The root-package benchmark section and offline quality harness both completed successfully in the same command.
+- The quality harness remains fully offline and uses the checked-in sample corpus.
+
+## Comparison Summary
+
+- Previous baseline: none committed
+- Change since baseline: initial published baseline
+- Interpretation: Tok’s core token-count and encode paths stay sub-millisecond at large input sizes, while the offline quality harness shows that `extract` delivers the strongest compression with a clear fidelity tradeoff versus `surface` and `trim`.
diff --git a/benchmarks/results/tok-quality-core/2026-06-27/result.txt b/benchmarks/results/tok-quality-core/2026-06-27/result.txt
@@ -0,0 +1,31 @@
+tok Benchmark Runner
+====================
+
+==> Go benchmarks (root package)
+goos: darwin
+goarch: arm64
+pkg: github.com/GrayCodeAI/tok
+cpu: Apple M1
+BenchmarkCountTokens/100B-8         	11460319	        97.64 ns/op	1024.19 MB/s	       0 B/op	       0 allocs/op
+BenchmarkCountTokens/1KB-8          	  918771	      1299 ns/op	 788.31 MB/s	       0 B/op	       0 allocs/op
+BenchmarkCountTokens/10KB-8         	   89722	     13331 ns/op	 768.12 MB/s	       0 B/op	       0 allocs/op
+BenchmarkCountTokens/100KB-8        	    9020	    133845 ns/op	 765.06 MB/s	       0 B/op	       0 allocs/op
+BenchmarkCompress/100B/Minimal-8    	   27205	     43500 ns/op	   2.30 MB/s	   63080 B/op	     222 allocs/op
+BenchmarkCompress/100B/Aggressive-8 	   37798	     31372 ns/op	   3.19 MB/s	   62684 B/op	     221 allocs/op
+BenchmarkCompress/1KB/Minimal-8     	   27144	     44158 ns/op	  23.19 MB/s	   75664 B/op	     242 allocs/op
+BenchmarkCompress/1KB/Aggressive-8  	   21780	     54690 ns/op	  18.72 MB/s	   74355 B/op	     235 allocs/op
+BenchmarkCompress/10KB/Minimal-8    	    6374	    193246 ns/op	  52.99 MB/s	  175568 B/op	     245 allocs/op
+BenchmarkCompress/10KB/Aggressive-8 	    5500	    189578 ns/op	  54.01 MB/s	  173413 B/op	     238 allocs/op
+BenchmarkCompress/100KB/Minimal-8   	     806	   1485713 ns/op	  68.92 MB/s	 1190139 B/op	     244 allocs/op
+BenchmarkCompress/100KB/Aggressive-8         	     804	   1492485 ns/op	  68.61 MB/s	 1184748 B/op	     236 allocs/op
+BenchmarkBPEEncode/100B-8                    	 9851370	       120.9 ns/op	 827.08 MB/s	       0 B/op	       0 allocs/op
+BenchmarkBPEEncode/1KB-8                     	  911469	      1312 ns/op	 780.75 MB/s	       0 B/op	       0 allocs/op
+BenchmarkBPEEncode/10KB-8                    	   91712	     13102 ns/op	 781.55 MB/s	       0 B/op	       0 allocs/op
+BenchmarkBPEEncode/100KB-8                   	    9166	    131183 ns/op	 780.59 MB/s	       0 B/op	       0 allocs/op
+PASS
+ok  	github.com/GrayCodeAI/tok	24.210s
+
+==> Compression-quality harness
+quality benchmark: 15 samples x 4 tiers -> benchmarks/quality-results.md
+
+Results written to benchmarks/quality-results.md
diff --git a/benchmarks/results/tok-quality-core/README.md b/benchmarks/results/tok-quality-core/README.md
@@ -0,0 +1,13 @@
+# `tok-quality-core` Published Runs
+
+Current published runs:
+
+- `2026-06-27/`
+
+Each published run includes:
+
+- `report.md`
+- `result.txt`
+- `notes.md`
+
+Reference manifest: `../../manifests/tok-quality-core.yaml`
diff --git a/internal/codeaware/tokenizer.go b/internal/codeaware/tokenizer.go
@@ -1,4 +1,4 @@
-package tok
+package codeaware
 
 import (
 	"strings"

diff --git a/internal/codeaware/tokenizer_test.go b/internal/codeaware/tokenizer_test.go
@@ -1,4 +1,4 @@
-package tok
+package codeaware
 
 import (
 	"strings"