Name	Name	Last commit message	Last commit date
parent directory ..
examples	examples
README.md	README.md
content.md	content.md

Module 1: Foundations of GPU Computing

Overview

This module introduces the fundamentals of GPU programming using CUDA and HIP. You'll learn about GPU architecture, parallel execution models, memory management, and basic optimization techniques.

Learning Objectives

After completing this module, you will be able to:

Understand the fundamental differences between CPU and GPU architectures
Set up CUDA and HIP development environments
Write, compile, and execute basic GPU kernels
Manage GPU memory allocation and data transfers
Debug common GPU programming issues
Apply basic optimization techniques for parallel execution

Module Content

content.md - Complete module content with theory and explanations
examples/ - Practical code examples and exercises

Quick Start

Prerequisites

NVIDIA GPU with CUDA support OR AMD GPU with ROCm support
CUDA Toolkit 13.0+ or ROCm 7.0+ (Docker images provide CUDA 13.0.1 and ROCm 7.0)
C/C++ compiler (GCC, Clang, or MSVC)

Tip: You can skip native installs by using our Docker environment (recommended):

./docker/scripts/run.sh --auto

Running Examples

Navigate to the examples directory:

cd examples/

Build and run examples (binaries are written to build/):

# Build all examples for your detected GPU
make

# Run specific examples (CUDA)
./build/01_vector_addition_cuda
./build/04_device_info_cuda
./build/05_performance_comparison_cuda || ./build/05_performance_comparison

# Or HIP versions (cross-platform)
./build/02_vector_addition_hip
./build/04_device_info_hip

Examples Overview

Example	Description	Key Concepts
`01_vector_addition_cuda.cu`	Basic CUDA vector addition	Kernels, memory management, error handling
`02_vector_addition_hip.cpp`	Cross-platform HIP version	HIP API, portability
`03_matrix_addition_cuda.cu`	2D matrix operations	2D threading, indexing
`03_matrix_addition_hip.cpp`	HIP 2D matrix operations	HIP indexing, portability
`04_device_info_cuda.cu`	GPU properties and capabilities	Device queries, system info
`04_device_info_hip.cpp`	HIP device and platform info	HIP device queries
`05_performance_comparison_cuda.cu`	CPU vs GPU benchmarking (CUDA)	Performance analysis, timing
`05_performance_comparison_hip.cpp`	Benchmarking (HIP)	HIP performance, memory bandwidth
`06_debug_example_cuda.cu`	Debugging and optimization (CUDA)	Error checking, occupancy
`06_debug_example_hip.cpp`	Debugging and optimization (HIP)	HIP debugging
`07_cross_platform_comparison.cpp`	AMD vs NVIDIA comparison	Portability, tuning

Topics Covered

1. GPU Architecture

SIMT (Single Instruction, Multiple Thread) execution model
Memory hierarchy (global, shared, registers, constant)
Streaming multiprocessors and warps
Thread hierarchy (threads → blocks → grids)

2. Programming Models

CUDA: NVIDIA's proprietary platform
HIP: Cross-platform alternative for AMD and NVIDIA

3. Memory Management

Host and device memory allocation
Data transfers between CPU and GPU
Memory access patterns and optimization

4. Parallel Execution

Thread indexing and coordination
Block and grid configuration
Avoiding thread divergence

5. Debugging and Optimization

Error handling best practices
Performance profiling tools
Occupancy optimization
Memory bandwidth utilization

Learning Path

Start Here: Read through content.md for comprehensive theory
Setup: Follow environment setup instructions
Practice: Work through examples in numerical order
Experiment: Modify examples with different parameters
Debug: Use debugging example to learn troubleshooting
Optimize: Apply performance analysis techniques

Common Issues and Solutions

Setup Problems

CUDA not found: Add CUDA to PATH and LD_LIBRARY_PATH
No GPU detected: Check drivers with nvidia-smi or rocm-smi
Compilation errors: Verify toolkit installation

Runtime Issues

Out of memory: Check available GPU memory, reduce problem size
Invalid configuration: Verify block sizes within GPU limits
Kernel errors: Use proper error checking macros

Next Steps

After completing this module:

Proceed to Module 2: Multi-Dimensional Data Processing
Explore advanced memory patterns
Learn about performance optimization techniques
Practice with larger, real-world problems

Additional Resources

Duration: 4-6 hours
Difficulty: Beginner
Prerequisites: Basic C/C++ programming knowledge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

Module 1: Foundations of GPU Computing

Overview

Learning Objectives

Module Content

Quick Start

Prerequisites

Running Examples

Examples Overview

Topics Covered

1. GPU Architecture

2. Programming Models

3. Memory Management

4. Parallel Execution

5. Debugging and Optimization

Learning Path

Common Issues and Solutions

Setup Problems

Runtime Issues

Next Steps

Additional Resources

Uh oh!

FilesExpand file tree

module1

Directory actions

More options

Directory actions

More options

Latest commit

History

module1

Folders and files

parent directory

README.md

Module 1: Foundations of GPU Computing

Overview

Learning Objectives

Module Content

Quick Start

Prerequisites

Running Examples

Examples Overview

Topics Covered

1. GPU Architecture

2. Programming Models

3. Memory Management

4. Parallel Execution

5. Debugging and Optimization

Learning Path

Common Issues and Solutions

Setup Problems

Runtime Issues

Next Steps

Additional Resources