[ADMB Users] Does CUDA suck? answer NO!

Wed Sep 14 10:02:06 PDT 2011

On 11-09-14 08:40 AM, CHRIS GRANDIN wrote:

The main point of that exercise was simply to check the performance of 
gpu stuff.
It appears that opencl may be the future, but the interface sucks big 
time compared to cuda.
Since my expertise is in the derivative stuff and since conversion from 
cuda to opencl is
well understood  (just don't use any c++) in the kernel code, I decided 
to implement
a gpu version of the vectorized negative binomial density for 
dvar_vectors in cuda as an
example. With luck it will be finished today.

         Dave

> Dave, I am wondering why you didn't use OpenCL library like I did in 
> my matrix mult example at the workshop?  If you do there is no 
> requirement for a special compiler (nvcc) and extra makefiles, and the 
> code is already optimized.
>
> Yes, the limiting factor is the bussing of data to/from the GPU and 
> for addition it outweighs the cost of the addition operations. Its the 
> same for OpenCL, that's why I did the matrix mult example..
>
> Also I don't see how you are carrying the derivative information 
> around, that was my issue thus far, CUDA and OpenCL don't support C++ 
> classes yet!  Please let me know what you think of this as this 
> parallelization has been of ongoing interest to me.
>
> Thanks,
> Chris
>
> ----- Original Message -----
> From: dave fournier <davef at otter-rsch.com>
> Date: Saturday, September 3, 2011 4:05 pm
> Subject: Re: [ADMB Users] Does CUDA suck?  answer NO!
> To: users at admb-project.org
>
> > First there is an error in the code. It should read
> >
> >
> > return z;
> >
> >  and not
> >
> >           return x+y;
> >
> > However I thought that maybe the problem is that  addition
> > is too trivial compared to the
> > overhead of moving things to the GPU and back. I changed the
> > function to pow(x,y)
> > and lo!  the el cheapo GPU is faster (about 6 times faster).
> > So how hard is a vector pow.  All that was necessary was to
> > take the included VecAdd
> > function and modify it to
> >
> >
> > __global__ void VecPow(const double* A, const double* B, double*
> > C, int N)
> > {
> >     int i = blockDim.x * blockIdx.x + threadIdx.x;
> >     double x=0.0;
> >     if (i < N)
> >     {
> >         C[i] = pow(A[i],B[i]);
> >     }
> > }
> >
> > Code is attached. Note I use mypow just to avoid clash with
> > existing admb libs.
> >
> >