[ADMB Users] Does CUDA suck? answer NO!
CHRIS GRANDIN
cgrandin at shaw.ca
Wed Sep 14 08:40:34 PDT 2011
Dave, I am wondering why you didn't use OpenCL library like I did in my matrix mult example at the workshop? If you do there is no requirement for a special compiler (nvcc) and extra makefiles, and the code is already optimized.
Yes, the limiting factor is the bussing of data to/from the GPU and for addition it outweighs the cost of the addition operations. Its the same for OpenCL, that's why I did the matrix mult example..
Also I don't see how you are carrying the derivative information around, that was my issue thus far, CUDA and OpenCL don't support C++ classes yet! Please let me know what you think of this as this parallelization has been of ongoing interest to me.
Thanks,
Chris
----- Original Message -----
From: dave fournier <davef at otter-rsch.com>
Date: Saturday, September 3, 2011 4:05 pm
Subject: Re: [ADMB Users] Does CUDA suck? answer NO!
To: users at admb-project.org
> First there is an error in the code. It should read
>
>
> return z;
>
> and not
>
> return x+y;
>
> However I thought that maybe the problem is that addition
> is too trivial compared to the
> overhead of moving things to the GPU and back. I changed the
> function to pow(x,y)
> and lo! the el cheapo GPU is faster (about 6 times faster).
> So how hard is a vector pow. All that was necessary was to
> take the included VecAdd
> function and modify it to
>
>
> __global__ void VecPow(const double* A, const double* B, double*
> C, int N)
> {
> int i = blockDim.x * blockIdx.x + threadIdx.x;
> double x=0.0;
> if (i < N)
> {
> C[i] = pow(A[i],B[i]);
> }
> }
>
> Code is attached. Note I use mypow just to avoid clash with
> existing admb libs.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/users/attachments/20110914/8e526751/attachment.html>
More information about the Users
mailing list