[ADMB Users] Does CUDA suck? answer NO!

CHRIS GRANDIN cgrandin at shaw.ca
Wed Sep 14 08:40:34 PDT 2011

Dave, I am wondering why you didn't use OpenCL library like I did in my matrix mult example at the workshop?  If you do there is no requirement for a special compiler (nvcc) and extra makefiles, and the code is already optimized.  

Yes, the limiting factor is the bussing of data to/from the GPU and for addition it outweighs the cost of the addition operations. Its the same for OpenCL, that's why I did the matrix mult example..

Also I don't see how you are carrying the derivative information around, that was my issue thus far, CUDA and OpenCL don't support C++ classes yet!  Please let me know what you think of this as this parallelization has been of ongoing interest to me.



----- Original Message -----

From: dave fournier <davef at otter-rsch.com>

Date: Saturday, September 3, 2011 4:05 pm

Subject: Re: [ADMB Users] Does CUDA suck?  answer NO!

To: users at admb-project.org

> First there is an error in the code. It should read



> return z;


>  and not


>           return x+y;


> However I thought that maybe the problem is that  addition 

> is too trivial compared to the

> overhead of moving things to the GPU and back. I changed the 

> function to pow(x,y)

> and lo!  the el cheapo GPU is faster (about 6 times faster).

> So how hard is a vector pow.  All that was necessary was to 

> take the included VecAdd

> function and modify it to



> __global__ void VecPow(const double* A, const double* B, double* 

> C, int N)

> {

>     int i = blockDim.x * blockIdx.x + threadIdx.x;

>     double x=0.0;

>     if (i < N)

>     {

>         C[i] = pow(A[i],B[i]);

>     }

> }


> Code is attached. Note I use mypow just to avoid clash with 

> existing admb libs.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/users/attachments/20110914/8e526751/attachment.html>

More information about the Users mailing list