Harshavardhan Reddy Pupking

Sri Harshavardhan Reddy Deverapalli

GPU Kernel & Performance Engineer | CUDA · CUTLASS · Tensor Cores · HPC

I build and profile CUDA/CUTLASS kernels for tensor, sparse, and AI workloads. My MS thesis developed an H100 block-sparse tensor contraction pipeline with 3.01× mean / 6.06× max speedup over cuTENSOR 2.5.0 and ~95% of H100 FP64 Tensor Core peak.

Technical Focus

CUDA kernel profiling studies: GEMM, WMMA Tensor Core GEMM, reductions, softmax, and FlashAttention-lite
GPU performance analysis: Nsight Compute, Nsight Systems, roofline modeling, occupancy, memory bandwidth, and register-pressure tuning
Research: high-performance block-sparse tensor contractions for quantum many-body simulation on NVIDIA H100

LinkedIn · Google Scholar · Email

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harshavardhan Reddy Pupking

Achievements

Achievements

Block or report Pupking

Sri Harshavardhan Reddy Deverapalli

Technical Focus

Pinned Loading

Uh oh!