KernelPilot

KernelPilot is now a lightweight prompt repository for standalone GPU-kernel optimization tasks. The diffusion tasks target SGLang diffusion operators, but the optimization and benchmark workspace does not patch, import, or monkey-patch SGLang at runtime.

For each diffusion task, the agent must copy the relevant upstream SGLang kernel source from the latest upstream SGLang main commit into the task's baseline/ directory, expose that copied baseline through the same low-overhead ABI used by the candidate, then benchmark baseline and candidate side by side inside the task directory.

Layout

docs/
  diffusion_correctness_contract.md
  diffusion_kernel_rules.md
  ghostty_claude_code_workflow.md
  diffusion_benchmark_shape_coverage.md
  standalone_diffusion_benchmark.md
  standalone_diffusion_benchmark_template.py
external/
  KernelWiki/
  ncu-report-skill/
kernels/
  {b200,h200}_diffusion_qknorm_rope__multi_shape/
  {b200,h200}_diffusion_norm_infer__multi_shape/
  {b200,h200}_diffusion_group_norm_silu__multi_shape/
  {b200,h200}_diffusion_rotary_embedding__multi_shape/
  {b200,h200}_diffusion_fuse_scale_shift__multi_shape/
  {b200,h200}_diffusion_cutedsl_norm_tanh_mul_add__multi_shape/
  {b200,h200}_diffusion_cutedsl_norm_scale_shift__multi_shape/
scripts/
  launch_kda_kernel_task.sh
  launch_kernels/

The old SGLang overlay/export/capture machinery has intentionally been removed. There is no kda_kernels/, no patch applied to an SGLang checkout, and no runtime install path. The only SGLang dependency in a diffusion task is the upstream source code copied into baseline/ as local benchmark input.

Diffusion Folder Contract

Each diffusion task starts clean:

prompt.md       # task card for the agent
config.toml     # benchmark/build defaults for the task
baseline/       # copied upstream baseline source, generated by the agent
solution/       # optimized candidate source, generated by the agent
bench/          # standalone benchmark/correctness harness, generated by the agent
docs/           # source notes, benchmark logs, profile notes

The central benchmark rules live in docs/standalone_diffusion_benchmark.md. Diffusion optimization guardrails live in docs/diffusion_kernel_rules.md. Canonical regression grids live in docs/diffusion_correctness_contract.md. The production preset and shape audit lives in docs/diffusion_benchmark_shape_coverage.md. Every task prompt requires the agent to follow these documents.

Benchmark Principle

Baseline and candidate must be compared through matching local interfaces. The preferred binding follows a local direct CUDA ABI:

language = "cuda"
entry_point = "kernel.cu::<exported_symbol>"
destination_passing_style = true
direct TVM_FFI_DLL_EXPORT_TYPED_FUNC export
output tensors passed as trailing arguments
CUDA launches on at::cuda::getCurrentCUDAStream()

The benchmark must use fixed workload rows, isolated per-workload execution, preallocated outputs, warmups, CUDA-event timing with inner-loop amplification, interleaved A/B sampling, strict correctness checks, and full provenance. Use docs/standalone_diffusion_benchmark_template.py as the starting point for every diffusion task's bench/benchmark.py.

Launch

The existing launch scripts still create task-owned worktrees and bootstrap the task prompt for an agent run:

scripts/launch_kernels/k03_b200_diffusion_qknorm_rope__multi_shape.sh

Set KDA_NO_CLAUDE=1 to prepare the worktree without launching Claude. Launchers default KDA_BASE_BRANCH to the current checkout branch, so task worktrees inherit the branch you are testing. Set KDA_BASE_BRANCH=<ref> only when you intentionally want to launch from another committed ref. For manual parallel panes in Ghostty, use docs/ghostty_claude_code_workflow.md.

Maintenance

Check shell launchers after edits:

bash -n scripts/launch_kda_kernel_task.sh scripts/launch_kernels/*.sh

External knowledge submodules remain optional supporting material:

git submodule update --init --recursive

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
docs		docs
external		external
kernels		kernels
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KernelPilot

Layout

Diffusion Folder Contract

Benchmark Principle

Launch

Maintenance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KernelPilot

Layout

Diffusion Folder Contract

Benchmark Principle

Launch

Maintenance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages