Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Tok Benchmark Publication Workflow

This directory is the publication surface for Tok benchmarks.

It complements:

- `tok/benchmarks/run.sh` for the offline quality and root-package benchmark wrapper
- `tok/evals/pipeline-bench.sh` for pipeline latency and allocation benchmarks
- `tok/benchmarks/quality/` for the offline compression-quality harness
- `tok/benchmarks/manifests/` for machine-readable suite definitions

## Official suite ids

- `tok-quality-core`
- `tok-pipeline-microbench`

See the Hawk-side benchmark registry at `hawk/docs/benchmarks/SUITES.md`.

## Current published baselines

- `tok-quality-core`: `results/tok-quality-core/2026-06-27/`
- `tok-pipeline-microbench`: `results/tok-pipeline-microbench/2026-06-27/`

## Current runnable commands

```bash
/bin/zsh -lc 'GOCACHE=$PWD/.gocache ./benchmarks/run.sh'
/bin/zsh -lc 'GOCACHE=$PWD/.gocache ./evals/pipeline-bench.sh'
```

## Publication layout

Recommended committed layout:

```text
benchmarks/
README.md
manifests/
tok-quality-core.yaml
tok-pipeline-microbench.yaml
results/
tok-quality-core/
2026-06-27/
report.md
result.txt
notes.md
tok-pipeline-microbench/
2026-06-27/
report.md
result.txt
notes.md
```

## Promotion rule

Do not cite a Tok benchmark as evidence in cross-project comparison docs unless:

1. the command used is recorded
2. the repo state is identified as a committed snapshot or workspace snapshot
3. the benchmark scope is stated clearly
4. the raw output or generated report is committed
9 changes: 9 additions & 0 deletions benchmarks/manifests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Tok Benchmark Manifests

This directory is the machine-readable registry for Tok benchmark suites.

Rules:

- manifests must match the suite ids documented in `../README.md`
- `published: false` means the suite is official but no committed run is present yet
- publication directories referenced by manifests must exist
20 changes: 20 additions & 0 deletions benchmarks/manifests/tok-pipeline-microbench.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
suite: tok-pipeline-microbench
status: shipped
published: true
kind: pipeline-latency-and-allocation
owner: tok
source:
- tok/evals/pipeline-bench.sh
- tok/internal/filter/pipeline_bench_test.go
- tok/internal/filter/optimizations_benchmark_test.go
runner:
command: /bin/zsh -lc 'GOCACHE=$PWD/.gocache ./evals/pipeline-bench.sh'
result_format:
primary: text_benchmark_output
metrics:
- ns_per_op
- bytes_per_op
- allocs_per_op
publication_dir: tok/benchmarks/results/tok-pipeline-microbench
notes:
- This suite measures the filter pipeline directly through Go benchmark entrypoints.
24 changes: 24 additions & 0 deletions benchmarks/manifests/tok-quality-core.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
suite: tok-quality-core
status: shipped
published: true
kind: offline-compression-quality
owner: tok
source:
- tok/benchmarks/run.sh
- tok/benchmarks/quality/quality.go
- tok/benchmarks/quality/cmd/main.go
runner:
command: /bin/zsh -lc 'GOCACHE=$PWD/.gocache ./benchmarks/run.sh'
result_format:
primary: markdown_report
secondary: text_benchmark_output
metrics:
- compression_ratio
- char_retention
- rouge1_fidelity_proxy
- ns_per_op
- bytes_per_op
- allocs_per_op
publication_dir: tok/benchmarks/results/tok-quality-core
notes:
- This suite combines root-package microbenchmarks with the offline compression-quality harness.
20 changes: 20 additions & 0 deletions benchmarks/results/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Published Tok Benchmark Runs

This directory is reserved for benchmark runs that are important enough to treat as evidence.

Expected per-run files:

- `report.md`
- `result.txt`
- `notes.md`

Only commit runs that are intended to serve as:

- baselines
- release notes evidence
- comparison evidence in docs

Current published runs:

- `tok-quality-core/2026-06-27/`
- `tok-pipeline-microbench/2026-06-27/`
14 changes: 14 additions & 0 deletions benchmarks/results/tok-pipeline-microbench/2026-06-27/notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Provenance Notes

- Command:
`/bin/zsh -lc 'GOCACHE=$PWD/.gocache ./evals/pipeline-bench.sh'`
- Verification:
benchmark command exited successfully with `PASS`
- Environment:
local workspace run on `2026-06-27`
`goos=darwin`
`goarch=arm64`
`cpu=Apple M1`
- Caveats:
pipeline timings are local benchmark measurements rather than CI medians
this suite measures the internal filter pipeline, not end-to-end answer quality
34 changes: 34 additions & 0 deletions benchmarks/results/tok-pipeline-microbench/2026-06-27/report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Tok Pipeline Microbenchmark Baseline

## Metadata

- Suite: `tok-pipeline-microbench`
- Date: `2026-06-27`
- Model: none
- Provider: none
- Commit: current workspace snapshot
- Command: `/bin/zsh -lc 'GOCACHE=$PWD/.gocache ./evals/pipeline-bench.sh'`

## Headline Metrics

- ProcessSmall: `20342 ns/op`, `8707 B/op`, `90 allocs/op`
- ProcessMedium: `78233 ns/op`, `44992 B/op`, `70 allocs/op`
- ProcessWithBudget: `43841 ns/op`, `24232 B/op`, `45 allocs/op`
- EstimateTokens Small: `18.89 ns/op`
- EstimateTokens Medium: `8442 ns/op`
- EstimateTokens Large: `127261 ns/op`
- Layer Entropy: `3508 ns/op`, `1930 B/op`, `22 allocs/op`
- Layer Perplexity: `3775 ns/op`, `1607 B/op`, `27 allocs/op`
- ProcessParallel: `1637 ns/op`, `3104 B/op`, `19 allocs/op`

## Notes

- This is the first committed repo-owned baseline artifact for `tok-pipeline-microbench`.
- The runner targets the internal filter pipeline directly through Go benchmark entrypoints.
- The script completed successfully and printed the benchmark summary footer.

## Comparison Summary

- Previous baseline: none committed
- Change since baseline: initial published baseline
- Interpretation: the core filter pipeline stays in the tens-of-microseconds range for small and medium process cases, with low-single-microsecond parallel path timings on this local harness.
19 changes: 19 additions & 0 deletions benchmarks/results/tok-pipeline-microbench/2026-06-27/result.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
==> tok pipeline benchmarks (internal/filter)

goos: darwin
goarch: arm64
pkg: github.com/GrayCodeAI/tok/internal/filter
cpu: Apple M1
BenchmarkPipeline_ProcessSmall-8 58953 20342 ns/op 8707 B/op 90 allocs/op
BenchmarkPipeline_ProcessMedium-8 15304 78233 ns/op 44992 B/op 70 allocs/op
BenchmarkPipeline_ProcessWithBudget-8 27705 43841 ns/op 24232 B/op 45 allocs/op
BenchmarkEstimateTokens_Small-8 63821442 18.89 ns/op 0 B/op 0 allocs/op
BenchmarkEstimateTokens_Medium-8 142575 8442 ns/op 0 B/op 0 allocs/op
BenchmarkEstimateTokens_Large-8 9369 127261 ns/op 0 B/op 0 allocs/op
BenchmarkLayer_Entropy-8 334741 3508 ns/op 1930 B/op 22 allocs/op
BenchmarkLayer_Perplexity-8 317840 3775 ns/op 1607 B/op 27 allocs/op
BenchmarkPipeline_ProcessParallel-8 729633 1637 ns/op 3104 B/op 19 allocs/op
PASS
ok github.com/GrayCodeAI/tok/internal/filter 13.689s

pipeline-bench: complete (see ns/op + B/op + allocs/op above)
13 changes: 13 additions & 0 deletions benchmarks/results/tok-pipeline-microbench/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# `tok-pipeline-microbench` Published Runs

Current published runs:

- `2026-06-27/`

Each published run includes:

- `report.md`
- `result.txt`
- `notes.md`

Reference manifest: `../../manifests/tok-pipeline-microbench.yaml`
14 changes: 14 additions & 0 deletions benchmarks/results/tok-quality-core/2026-06-27/notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Provenance Notes

- Command:
`/bin/zsh -lc 'GOCACHE=$PWD/.gocache ./benchmarks/run.sh'`
- Verification:
`/bin/zsh -lc 'GOCACHE=$PWD/.gocache go test ./... -count=1'`
- Environment:
local workspace run on `2026-06-27`
`goos=darwin`
`goarch=arm64`
`cpu=Apple M1`
- Caveats:
root-package benchmark timings are point-in-time local measurements, not CI medians
the quality harness uses a checked-in offline sample corpus rather than an external public benchmark dataset
36 changes: 36 additions & 0 deletions benchmarks/results/tok-quality-core/2026-06-27/report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Tok Quality Core Baseline

## Metadata

- Suite: `tok-quality-core`
- Date: `2026-06-27`
- Model: none
- Provider: none
- Commit: current workspace snapshot
- Command: `/bin/zsh -lc 'GOCACHE=$PWD/.gocache ./benchmarks/run.sh'`

## Headline Metrics

- CountTokens 100B: `97.64 ns/op`, `1024.19 MB/s`
- CountTokens 100KB: `133845 ns/op`, `765.06 MB/s`
- Compress 100B Minimal: `43500 ns/op`, `63080 B/op`, `222 allocs/op`
- Compress 100B Aggressive: `31372 ns/op`, `62684 B/op`, `221 allocs/op`
- Compress 100KB Minimal: `1485713 ns/op`, `1190139 B/op`, `244 allocs/op`
- Compress 100KB Aggressive: `1492485 ns/op`, `1184748 B/op`, `236 allocs/op`
- BPEEncode 100KB: `131183 ns/op`, `780.59 MB/s`
- Offline quality harness summary:
- `surface`: ratio `0.921`, char retention `0.950`, fidelity `0.973`
- `trim`: ratio `0.922`, char retention `0.950`, fidelity `0.975`
- `extract`: ratio `0.456`, char retention `0.382`, fidelity `0.368`

## Notes

- This is the first committed repo-owned baseline artifact for `tok-quality-core`.
- The root-package benchmark section and offline quality harness both completed successfully in the same command.
- The quality harness remains fully offline and uses the checked-in sample corpus.

## Comparison Summary

- Previous baseline: none committed
- Change since baseline: initial published baseline
- Interpretation: Tok’s core token-count and encode paths stay sub-millisecond at large input sizes, while the offline quality harness shows that `extract` delivers the strongest compression with a clear fidelity tradeoff versus `surface` and `trim`.
31 changes: 31 additions & 0 deletions benchmarks/results/tok-quality-core/2026-06-27/result.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
tok Benchmark Runner
====================

==> Go benchmarks (root package)
goos: darwin
goarch: arm64
pkg: github.com/GrayCodeAI/tok
cpu: Apple M1
BenchmarkCountTokens/100B-8 11460319 97.64 ns/op 1024.19 MB/s 0 B/op 0 allocs/op
BenchmarkCountTokens/1KB-8 918771 1299 ns/op 788.31 MB/s 0 B/op 0 allocs/op
BenchmarkCountTokens/10KB-8 89722 13331 ns/op 768.12 MB/s 0 B/op 0 allocs/op
BenchmarkCountTokens/100KB-8 9020 133845 ns/op 765.06 MB/s 0 B/op 0 allocs/op
BenchmarkCompress/100B/Minimal-8 27205 43500 ns/op 2.30 MB/s 63080 B/op 222 allocs/op
BenchmarkCompress/100B/Aggressive-8 37798 31372 ns/op 3.19 MB/s 62684 B/op 221 allocs/op
BenchmarkCompress/1KB/Minimal-8 27144 44158 ns/op 23.19 MB/s 75664 B/op 242 allocs/op
BenchmarkCompress/1KB/Aggressive-8 21780 54690 ns/op 18.72 MB/s 74355 B/op 235 allocs/op
BenchmarkCompress/10KB/Minimal-8 6374 193246 ns/op 52.99 MB/s 175568 B/op 245 allocs/op
BenchmarkCompress/10KB/Aggressive-8 5500 189578 ns/op 54.01 MB/s 173413 B/op 238 allocs/op
BenchmarkCompress/100KB/Minimal-8 806 1485713 ns/op 68.92 MB/s 1190139 B/op 244 allocs/op
BenchmarkCompress/100KB/Aggressive-8 804 1492485 ns/op 68.61 MB/s 1184748 B/op 236 allocs/op
BenchmarkBPEEncode/100B-8 9851370 120.9 ns/op 827.08 MB/s 0 B/op 0 allocs/op
BenchmarkBPEEncode/1KB-8 911469 1312 ns/op 780.75 MB/s 0 B/op 0 allocs/op
BenchmarkBPEEncode/10KB-8 91712 13102 ns/op 781.55 MB/s 0 B/op 0 allocs/op
BenchmarkBPEEncode/100KB-8 9166 131183 ns/op 780.59 MB/s 0 B/op 0 allocs/op
PASS
ok github.com/GrayCodeAI/tok 24.210s

==> Compression-quality harness
quality benchmark: 15 samples x 4 tiers -> benchmarks/quality-results.md

Results written to benchmarks/quality-results.md
13 changes: 13 additions & 0 deletions benchmarks/results/tok-quality-core/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# `tok-quality-core` Published Runs

Current published runs:

- `2026-06-27/`

Each published run includes:

- `report.md`
- `result.txt`
- `notes.md`

Reference manifest: `../../manifests/tok-quality-core.yaml`
2 changes: 1 addition & 1 deletion internal/codeaware/tokenizer.go
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package tok
package codeaware

import (
"strings"
Expand Down
2 changes: 1 addition & 1 deletion internal/codeaware/tokenizer_test.go
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package tok
package codeaware

import (
"strings"
Expand Down
Loading
Loading