[Developers] A possible GPU project - first go
dave fournier
davef at otter-rsch.com
Wed Apr 18 08:07:14 PDT 2012
On second thought if we are to have anything at all before next years
meeting
we better get moving. The code from newfmin.cpp I addressed is like
for (i=2; i<=N; i++)
{
int i1=i-1;
double z=-g(i);
for (int j=1; j<=i1; j++)
{
int j1=j-1;
int offset=((2*N-j1)*j1+j1)/2;
z-=h(i-j+1+offset)*w1(j);
}
w1(i)=z;
}
Here N is the size of a symmetric NxN matrix. g and w1 are N
dimensional vectors.
I used N=5120. The increase in speed on the GPU was approx 20 to 1.
Code is attached. It would be interesting if other people with opencl
capable
code tried this out. I have no idea what the portability issues might
be as yet.
The GPU code consists of 3 kernels. Fairly steep learning curve to
figure out
how to do this stuff. Most of the examples one finds are trivial. The
difficulty is
that to calculate w1(i) you must already know the previous w1's. However
you can partially calculate w1(i) if you know w1(1) up to w1(m) for
some m<i.
This is what has been used to parallelize the code. The matrix is split
up into
vertical strips indexed in the code by k. Enjoy!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: opencl_test.zip
Type: application/zip
Size: 2864 bytes
Desc: not available
URL: <http://lists.admb-project.org/pipermail/developers/attachments/20120418/38a382ee/attachment.zip>
More information about the Developers
mailing list