layout	default
title	Getting Started
parent	Home
nav_order	1
permalink	/docs/getting-started

Getting Started

{: .fs-8 }

Build, run, and validate the project without guessing the toolchain {: .fs-6 .fw-300 }

Requirements

Item	Requirement
GPU	NVIDIA Volta (`sm_70`) or newer
CUDA Toolkit	11.0+
CMake	3.18+
Host compiler	GCC 9+ or Clang 10+

Tensor Core benchmarks require sm_70+ and dimensions aligned to 16. The code still runs on the guarded FP32 path when those conditions are not met.

Recommended build flow

git clone https://github.com/LessUp/sgemm-optimization.git
cd sgemm-optimization

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)

Run the default benchmark:

./build/bin/sgemm_benchmark

Run the broader benchmark set:

./build/bin/sgemm_benchmark -a

Run tests:

ctest --test-dir build

Choosing CUDA architectures

By default:

CMake 3.24+ uses native
older CMake falls back to the repository's explicit architecture list

If you want to override it, use CMake's native variable:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_ARCHITECTURES=86

Quick local Makefile flow:

make GPU_ARCH=sm_86
make benchmark
make test

Validation boundary

Environment	What to run
Local GPU machine	benchmark, runtime verification, `ctest`
Hosted CI	formatting, compile validation, OpenSpec/repository checks, Pages

This split is intentional: GitHub-hosted runners validate repository health, while performance and CUDA runtime correctness still require a real GPU machine.

Useful commands

# one explicit benchmark case
./build/bin/sgemm_benchmark --dims 256 384 640

# longer benchmark run
./build/bin/sgemm_benchmark -a --warmup 10 --benchmark 50

# OpenSpec validation
openspec validate --all

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started

Requirements

Recommended build flow

Choosing CUDA architectures

Validation boundary

Useful commands

Where to go next

FilesExpand file tree

getting-started.md

Latest commit

History

getting-started.md

File metadata and controls

Getting Started

Requirements

Recommended build flow

Choosing CUDA architectures

Validation boundary

Useful commands

Where to go next