Skip to content

BBuf/kernel-pilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

117 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KernelPilot

KernelPilot is now a lightweight prompt repository for standalone GPU-kernel optimization tasks. The diffusion tasks target SGLang diffusion operators, but the optimization and benchmark workspace does not patch, import, or monkey-patch SGLang at runtime.

For each diffusion task, the agent must copy the relevant upstream SGLang kernel source from the latest upstream SGLang main commit into the task's baseline/ directory, expose that copied baseline through the same low-overhead ABI used by the candidate, then benchmark baseline and candidate side by side inside the task directory.

Layout

docs/
  diffusion_correctness_contract.md
  diffusion_kernel_rules.md
  ghostty_claude_code_workflow.md
  diffusion_benchmark_shape_coverage.md
  standalone_diffusion_benchmark.md
  standalone_diffusion_benchmark_template.py
external/
  KernelWiki/
  ncu-report-skill/
kernels/
  {b200,h200}_diffusion_qknorm_rope__multi_shape/
  {b200,h200}_diffusion_norm_infer__multi_shape/
  {b200,h200}_diffusion_group_norm_silu__multi_shape/
  {b200,h200}_diffusion_rotary_embedding__multi_shape/
  {b200,h200}_diffusion_fuse_scale_shift__multi_shape/
  {b200,h200}_diffusion_cutedsl_norm_tanh_mul_add__multi_shape/
  {b200,h200}_diffusion_cutedsl_norm_scale_shift__multi_shape/
scripts/
  launch_kda_kernel_task.sh
  launch_kernels/

The old SGLang overlay/export/capture machinery has intentionally been removed. There is no kda_kernels/, no patch applied to an SGLang checkout, and no runtime install path. The only SGLang dependency in a diffusion task is the upstream source code copied into baseline/ as local benchmark input.

Diffusion Folder Contract

Each diffusion task starts clean:

prompt.md       # task card for the agent
config.toml     # benchmark/build defaults for the task
baseline/       # copied upstream baseline source, generated by the agent
solution/       # optimized candidate source, generated by the agent
bench/          # standalone benchmark/correctness harness, generated by the agent
docs/           # source notes, benchmark logs, profile notes

The central benchmark rules live in docs/standalone_diffusion_benchmark.md. Diffusion optimization guardrails live in docs/diffusion_kernel_rules.md. Canonical regression grids live in docs/diffusion_correctness_contract.md. The production preset and shape audit lives in docs/diffusion_benchmark_shape_coverage.md. Every task prompt requires the agent to follow these documents.

Benchmark Principle

Baseline and candidate must be compared through matching local interfaces. The preferred binding follows a local direct CUDA ABI:

  • language = "cuda"
  • entry_point = "kernel.cu::<exported_symbol>"
  • destination_passing_style = true
  • direct TVM_FFI_DLL_EXPORT_TYPED_FUNC export
  • output tensors passed as trailing arguments
  • CUDA launches on at::cuda::getCurrentCUDAStream()

The benchmark must use fixed workload rows, isolated per-workload execution, preallocated outputs, warmups, CUDA-event timing with inner-loop amplification, interleaved A/B sampling, strict correctness checks, and full provenance. Use docs/standalone_diffusion_benchmark_template.py as the starting point for every diffusion task's bench/benchmark.py.

Launch

The existing launch scripts still create task-owned worktrees and bootstrap the task prompt for an agent run:

scripts/launch_kernels/k03_b200_diffusion_qknorm_rope__multi_shape.sh

Set KDA_NO_CLAUDE=1 to prepare the worktree without launching Claude. Launchers default KDA_BASE_BRANCH to the current checkout branch, so task worktrees inherit the branch you are testing. Set KDA_BASE_BRANCH=<ref> only when you intentionally want to launch from another committed ref. For manual parallel panes in Ghostty, use docs/ghostty_claude_code_workflow.md.

Maintenance

Check shell launchers after edits:

bash -n scripts/launch_kda_kernel_task.sh scripts/launch_kernels/*.sh

External knowledge submodules remain optional supporting material:

git submodule update --init --recursive

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors