[Developers] A possible GPU project - first go

Wed Apr 18 08:07:14 PDT 2012

On second thought if we are to have anything at all before next years 
meeting
we better get moving.   The code from newfmin.cpp I addressed is like

  for (i=2; i<=N; i++)
      {
        int i1=i-1;
        double z=-g(i);
        for (int j=1; j<=i1; j++)
        {
           int j1=j-1;
           int offset=((2*N-j1)*j1+j1)/2;
           z-=h(i-j+1+offset)*w1(j);
        }
        w1(i)=z;
      }

Here N is the size of a symmetric NxN matrix.  g  and w1 are N 
dimensional vectors.

I used N=5120.  The increase in speed on the GPU was approx 20 to 1.
Code is attached.  It would be interesting if other people with opencl 
capable
code tried this out.  I have no idea what the portability issues might 
be as yet.

The GPU code consists of 3 kernels.  Fairly steep learning curve to 
figure out
how to do this stuff.  Most of the examples one finds are trivial.  The 
difficulty is
that to calculate w1(i) you must already know the previous w1's.  However
you can partially calculate w1(i)  if you know w1(1) up to w1(m) for 
some m<i.
This is what has been used to parallelize the code.  The matrix is split 
up into
vertical strips indexed in the code by k. Enjoy!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: opencl_test.zip
Type: application/zip
Size: 2864 bytes
Desc: not available
URL: <http://lists.admb-project.org/pipermail/developers/attachments/20120418/38a382ee/attachment.zip>