A CUDA Sample that demonstrates how using batched CUBLAS API calls to improve overall performance.
Linear Algebra, CUBLAS Library
SM 5.0 SM 5.2 SM 5.3 SM 6.0 SM 6.1 SM 7.0 SM 7.2 SM 7.5 SM 8.0 SM 8.6 SM 8.7 SM 8.9 SM 9.0
Linux, Windows
x86_64, armv7l
cuRand, cuEqual
cudaMemcpy, cudaGetErrorString, cudaFree, cudaGetLastError, cudaDeviceSynchronize, cudaGetDevice, cudaMalloc, cudaStreamCreate, cudaGetDeviceProperties
Download and install the CUDA Toolkit for your corresponding platform. Make sure the dependencies mentioned in Dependencies section above are installed.