[Developers] A possible GPU project

dave fournier davef at otter-rsch.com
Wed Apr 11 13:53:57 PDT 2012


On 12-04-11 12:39 PM, Matthew Supernaw wrote:


Well there are some interesting design questions to understand before one
needs a computer with suitable GPU although I have one.
As I understand it an important part of GPU
programming is to access contiguous memory as much as possible.
So if we are doing a dot product

      z +=x[i]*y[i]

and you have 1000 threads operating at the same time they are accessing

    x[1],x[2],  ... x[1000]  y[1],y[2],...y[1000]  ie contiguous memory 
for x and z so this is good.

Lets consider the first part of


     void hcalcs1(int n,int n1,int np,int is,
       dfsdmat & h,dvector & g,dvector & w)
     {
       int i,j,i1;
       double z;
       for (i=2; i<=n; i++)
       {
          i1=i-1;
          z=-g.elem(i);
          double * pd=&(h.elem(i,1));
          double * pw=&(w.elem(1));
          for (j=1; j<=i1; j++)
          {
             z-=*pd++ * *pw++;
          }
          w.elem(i)=z;
       }

  this can be written as


     w(2)  = -( g(2) + h(2,1)*w(1) )

     w(3)  = -( g(3) + h(3,1)*w(1)+ h(3,2)*w(2) )

     w(4)  = -( g(4) + h(4,1)*w(1)+ h(4,2)*w(2)+ h(4,3)*w(3) )

                ......

     w(k)  = -( g(k) + h(k,1)*w(1)+ h(k,2)*w(2)+ ... + h(k,k-1)*w(k-1) )

               ....

    There are two ways to parallelize this


      g(2) + h(2,1)*w(1)

      g(3) + h(3,1)*w(1)

      g(4) + h(4,1)*w(1)

                ......

      g(k) + h(k,1)*w(1)

then wait until

    w(2)  = -( g(2) + h(2,1)*w(1) )

so you need to synchronize here


then the next step is  (so the first thread is finished what toi do with it)

    g(3) + h(3,1)*w(1)+ h(3,2)*w(2)

    g(4) + h(4,1)*w(1)+ h(4,2)*w(2)

                ......

    g(k) + h(k,1)*w(1)+ h(k,2)*w(2)


then wait until

     w(3)  = -( g(3) + h(3,1)*w(1)+ h(3,2)*w(2) )

etc.

Note that for this to be good one wants h(i,j) to be near h(i+1,j) i.e 
store by column


The other way is to parallelize the dot product



     w(k)  = -( g(k) + h(k,1)*w(1)+ h(k,2)*w(2)+ ... + h(k,k-1)*w(k-1) )

or

     w(k) = -g(k) + dot(&(h(k,1)),&(w(1)),k-1)

where dot(double * x,double * y,int k)  is the dot product of two vector 
of length n).

Note: for this you want h(i,j) to be close to h(i,j+1) i.e. store by row.


This looks simpler and we can include g(k) into the dot product with a 
kludge.




> I did some GPU computing a few years ago for the park service. The big
> problem was that not all GPU/GPU drivers where created equal.
> Stability was a major issue. I haven't looked into it for a while, but
> maybe they have all the bugs worked out? We mostly used Nvidia FX
> 3800, which have 192 cores. Two FX 3800 per computer in a linux
> cluster, that was an awesome amount of power! I currently don't have
> access to any OpenCL/CUDA capable GPUs. I'll be happy to look into it
> when my development machine arrives!
>
> On Wed, Apr 11, 2012 at 3:00 PM,<developers-request at admb-project.org>  wrote:
>> Send Developers mailing list submissions to
>>         developers at admb-project.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         http://lists.admb-project.org/mailman/listinfo/developers
>> or, via email, send a message with subject or body 'help' to
>>         developers-request at admb-project.org
>>
>> You can reach the person managing the list at
>>         developers-owner at admb-project.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Developers digest..."
>>
>>
>> Today's Topics:
>>
>>    1. ADMB version control maintenance (Johnoel Ancheta)
>>    2. Re: A possible GPU project (dave fournier)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 10 Apr 2012 21:35:15 -1000
>> From: Johnoel Ancheta<johnoel at hawaii.edu>
>> To: ADMB Users<users at admb-project.org>, developers at admb-project.org
>> Subject: [Developers] ADMB version control maintenance
>> Message-ID:
>>         <CAJMx2XUmDYt8L0jR8LXK1R_=6hxdqDe8JXna99OsyrxsTco60g at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi all,
>>
>> The ADMB version control will be offline April 11, 2012 for maintenance.
>>
>> Johnoel
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:<http://lists.admb-project.org/pipermail/developers/attachments/20120410/6472abd7/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Wed, 11 Apr 2012 10:39:16 -0700
>> From: dave fournier<davef at otter-rsch.com>
>> To: developers at admb-project.org
>> Subject: Re: [Developers] A possible GPU project
>> Message-ID:<4F85C1C4.60500 at otter-rsch.com>
>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>
>> On 12-04-07 07:15 PM, Matthew Supernaw wrote:
>>
>>
>> To get back to your original question "I"  am not going to do anything.
>> At our meeting over a year ago I heard (between discussions on
>> organizing organizing)
>> talk about how wonderful GPU programming is.  So I studied it a bit and
>> did a few
>> examples to understand what it was like and posted the examples.  The
>> usual lack
>> of interest in any real development ensued.  But that involved AD so
>> extra technical
>> stuff besides simple GPU coding.
>>
>> So here is an example which involves really standard matrix-vector
>> calculations.
>> I have isolated that into 3 functions which are in the attached hcalcs.
>> (One change is that the matrix h should be stored as a vector.)
>> This is archetypical code for adapting to GPU calculations.  The rest is
>> up to
>> whomever is so convinced that GPU calculations are great.  Of course I am
>> happy to collaborate on the details.
>>
>>
>>> Dave,
>>> Great idea! Would you use opencl or cuda? I believe double precision is a add on for opencl, not sure about cuda.
>>> Matthew
>>>
>>>
>>>
>>> On Apr 6, 2012, at 3:00 PM, developers-request at admb-project.org wrote:
>>>
>>>> A possible GPU project
>>> _______________________________________________
>>> Developers mailing list
>>> Developers at admb-project.org
>>> http://lists.admb-project.org/mailman/listinfo/developers
>>>
>> -------------- next part --------------
>> An embedded and charset-unspecified text was scrubbed...
>> Name: hcalcs
>> URL:<http://lists.admb-project.org/pipermail/developers/attachments/20120411/f8810b6b/attachment-0001.ksh>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: newfmin.zip
>> Type: application/zip
>> Size: 5111 bytes
>> Desc: not available
>> URL:<http://lists.admb-project.org/pipermail/developers/attachments/20120411/f8810b6b/attachment-0001.zip>
>>
>> ------------------------------
>>
>> _______________________________________________
>> Developers mailing list
>> Developers at admb-project.org
>> http://lists.admb-project.org/mailman/listinfo/developers
>>
>>
>> End of Developers Digest, Vol 38, Issue 15
>> ******************************************
>
>



More information about the Developers mailing list