GPU Boid Simulation – Prey, Predators & Leaders

Introduction

This project focuses on the optimization of a flocking simulation with prey–predator dynamics. Flocking models are widely used to reproduce the collective movement of animal groups such as schools of fish, flocks of birds, or herds. Each agent (boid) follows a few simple rules—alignment, cohesion, and separation—that together generate complex emergent group behavior.

Our implementation combines OpenGL for visualization and CUDA for GPU acceleration. We first developed a CPU-based version, then redesigned and optimized it to leverage GPU parallelism, achieving real-time performance with larger and more dynamic groups.

The project was inspired by a University of Pennsylvania assignment outline, which we used as a starting point and challenge to build our own full implementation.

CPU Baseline	GPU Optimized (CUDA)

Features

Flocking Forces: Cohesion (moving toward the center of nearby companions), Separation (maintaining safe distances), and Alignment (adjusting velocity to match neighbors).
Three types of agents:
- Prey → follow flocking rules, evade predators, follow leaders
- Leaders → guide prey, avoid predators and other leaders
- Predators → chase prey, keep distance from other predators
Obstacles (walls) → static obstacles that boids anticipate using a look-ahead vector along their velocity. A repulsive force, scaled quadratically by proximity, allows boids to turn smoothly before collision.
CPU & GPU implementations → fair comparison of performance
Profiler → measures execution times, FPS, exports results to CSV

GPU Optimizations

Structure of Arrays (SoA): Original boid structure was split into separate arrays to ensure threads in the same warp access contiguous memory, improving coalesced memory access.
GPU Constant Memory: Used to store simulation constants (grid size, world dimensions, interaction coefficients) to reduce redundant global memory reads during the update phase.
Uniform spatial grid with boid reordering → reduces neighbor checks from O(N²) to O(N·k).
Shared memory caching: Temporarily stores neighboring boids within the same grid cell to reduce expensive global memory accesses.
CUDA streams: Different interaction rules (boid-boid, wall repulsion, leader following, predator-prey) are dispatched to separate CUDA streams to overlap execution and improve throughput. Rendering buffers are also split into chunks and transferred using cudaMemcpyAsync on multiple streams.
Enhanced profiler using CUDA events for fine-grained, millisecond-level performance metrics.

Benchmarks (Performance)

To measure execution times and pinpoint bottlenecks, we developed an enhanced unified Profiler. It utilizes high-resolution timers (std::chrono) for the CPU and precise CUDA events for millisecond-level accuracy on the GPU. Results can be logged, averaged over multiple runs, and exported to CSV for offline analysis.

Our profiling reveals that rendering remains extremely fast and scales well (consistently under 3 ms on CPU and under 1 ms on GPU). The primary bottleneck of the simulation is the physics computation - specifically summing forces, sorting, and memory operations for inter-boid interactions.

CPU vs. GPU Comparison

CPU limitations: The CPU implementation quickly struggles, managing around 30 FPS with only 450 preys. Adding just 50 more entities drastically reduces the performance to ~16-17 FPS.
GPU scalability: The GPU efficiently offloads physics computations, successfully handling 10x more agents (+900% capacity) at the same framerate. It effortlessly simulates 4,500 preys while maintaining 31-32 FPS.

Platform	Number of Preys	Compute Forces (ms)	Total Update (ms)	Render (ms)	Actual FPS
CPU	450	~32 - 33	32.90	2.38	30
CPU	500	~43 - 44	43.28	2.91	16 - 17
GPU	4,500	20.28	31.83	0.76	31 - 32
GPU	8,000	42.15	60.08	0.83	15 - 17

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
includes		includes
libs		libs
output		output
resources		resources
shaders		shaders
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Flocking.sln		Flocking.sln
Flocking.vcxproj		Flocking.vcxproj
Flocking.vcxproj.filters		Flocking.vcxproj.filters
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Boid Simulation – Prey, Predators & Leaders

Introduction

Features

GPU Optimizations

Benchmarks (Performance)

CPU vs. GPU Comparison

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPU Boid Simulation – Prey, Predators & Leaders

Introduction

Features

GPU Optimizations

Benchmarks (Performance)

CPU vs. GPU Comparison

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages