[Developers] A possible GPU project

Mon Apr 9 13:42:15 PDT 2012

On 12-04-09 12:25 PM, Matthew Supernaw wrote:

OK it sounds good. The first thing to do is to demonstrate that speeding 
up the
double loops in newfmin.cpp could speed up an ADMB program.  I moved them to
functions hcalcs1 and hcalcs2 at the top so that they could be 
profiled.  I created a simple function
where not much time is spent in the function evaluation itself.  First 
thing that came up was
that the w.elem() function was taking up about 10% of the time in the 
inner loop. I changed
it to a pointer.
before changing the w.elem the profile was

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  60.42     10.55    10.55       46   229.35   267.67  hcalcs2(int, int, 
int, int, int, int, dfsdmat&, dvector&, dvector&, double, double)
  25.32     14.97     4.42       24   184.17   184.26  hcalcs1(int, int, 
int, int, dfsdmat&, dvector&, dvector&)
  10.19     16.75     1.78 583125265     0.00     0.00  dvector::elem(int)

getting rid of the w.elem produced.

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  64.17      8.72     8.72       46   189.57   189.85  hcalcs2(int, int, 
int, int, int, int, dfsdmat&, dvector&, dvector&, double, double)
  32.60     13.15     4.43       24   184.58   184.73  hcalcs1(int, int, 
int, int, dfsdmat&, dvector&, dvector&)

So about 96% of time time is spent in hcals1 and hcalcs2.  So if we can 
make them run 10x faster
we get a reduction to 6.6+4=10.6  so a speed up of roughly 10 times for 
this kind of problem.

> In opencl, when you allocate memory with cl_mem object and copy to global memory,  it will stay until you release is with clReleaseMemobject. So, to reuse a buffer you just need to keep track of its Cl_mem object.
>
>
>
> On Apr 9, 2012, at 3:00 PM, developers-request at admb-project.org wrote:
>
>> Send Developers mailing list submissions to
>>     developers at admb-project.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>     http://lists.admb-project.org/mailman/listinfo/developers
>> or, via email, send a message with subject or body 'help' to
>>     developers-request at admb-project.org
>>
>> You can reach the person managing the list at
>>     developers-owner at admb-project.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Developers digest..."
>>
>>
>> Today's Topics:
>>
>>    1. A possible GPU project (Matthew Supernaw)
>>    2. Re: A possible GPU project (dave fournier)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Sun, 8 Apr 2012 22:21:14 -0400
>> From: Matthew Supernaw<matthew.supernaw at noaa.gov>
>> To: "developers at admb-project.org"<developers at admb-project.org>
>> Subject: [Developers] A possible GPU project
>> Message-ID:<F459FA98-2EDF-4E35-99E9-411334B1507E at noaa.gov>
>> Content-Type: text/plain;    charset=us-ascii
>>
>>
>> Yeah... I see what your saying! Looks like it might work if you just run the j loop in parallel.
>>
>>
>>
>> On Apr 8, 2012, at 3:00 PM, developers-request at admb-project.org wrote:
>>
>>> Send Developers mailing list submissions to
>>>    developers at admb-project.org
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>    http://lists.admb-project.org/mailman/listinfo/developers
>>> or, via email, send a message with subject or body 'help' to
>>>    developers-request at admb-project.org
>>>
>>> You can reach the person managing the list at
>>>    developers-owner at admb-project.org
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of Developers digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>>   1. A possible GPU project (Matthew Supernaw)
>>>   2. Re: A possible GPU project (dave fournier)
>>>   3. Re: A possible GPU project (dave fournier)
>>>   4. trying a new quasi newton method which might be good    for GPU
>>>      calculations. (dave fournier)
>>>   5. Re: trying a new quasi newton method which might be good    for
>>>      GPU calculations. (dave fournier)
>>>
>>>
>>> ----------------------------------------------------------------------
>>>
>>> Message: 1
>>> Date: Sat, 7 Apr 2012 22:15:15 -0400
>>> From: Matthew Supernaw<matthew.supernaw at noaa.gov>
>>> To: "developers at admb-project.org"<developers at admb-project.org>
>>> Subject: [Developers] A possible GPU project
>>> Message-ID:<E2B4E23A-AB66-40A8-84DF-4CCFB18551D6 at noaa.gov>
>>> Content-Type: text/plain;    charset=us-ascii
>>>
>>>
>>> Dave,
>>> Great idea! Would you use opencl or cuda? I believe double precision is a add on for opencl, not sure about cuda.
>>> Matthew
>>>
>>>
>>>
>>> On Apr 6, 2012, at 3:00 PM, developers-request at admb-project.org wrote:
>>>
>>>> A possible GPU project
>>>
>>> ------------------------------
>>>
>>> Message: 2
>>> Date: Sun, 08 Apr 2012 07:32:24 -0700
>>> From: dave fournier<davef at otter-rsch.com>
>>> To: developers at admb-project.org
>>> Subject: Re: [Developers] A possible GPU project
>>> Message-ID:<4F81A178.7030601 at otter-rsch.com>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>
>>>
>>> On 12-04-07 07:15 PM, Matthew Supernaw wrote:
>>>
>>>
>>> Unfortunately looking at the code in newfmin.cpp more carefully the
>>> main o(n^2) loop does not look parallelizable.
>>>
>>>
>>>    int iu=n;
>>>    int iv=2*n;
>>>    int ib=3*n;
>>>    for (int j=2;j<=n;j++)
>>>    {
>>>       double * pd=&(h.elem(j,1));
>>>       double * qd=&(w.elem(iu+j));
>>>       double * rd=&(w.elem(iv+1));
>>>       double * sd=&(w.elem(ib+1));
>>>       for (int i=1;i<j;i++)
>>>       {
>>>          *qd-=*pd * *rd++;
>>>          *pd++ +=*sd++ * *qd;
>>>       }
>>>    }
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> Dave,
>>>> Great idea! Would you use opencl or cuda? I believe double precision is a add on for opencl, not sure about cuda.
>>>> Matthew
>>>>
>>>>
>>>>
>>>> On Apr 6, 2012, at 3:00 PM, developers-request at admb-project.org wrote:
>>>>
>>>>> A possible GPU project
>>>> _______________________________________________
>>>> Developers mailing list
>>>> Developers at admb-project.org
>>>> http://lists.admb-project.org/mailman/listinfo/developers
>>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 3
>>> Date: Sun, 08 Apr 2012 07:52:55 -0700
>>> From: dave fournier<davef at otter-rsch.com>
>>> To: developers at admb-project.org
>>> Subject: Re: [Developers] A possible GPU project
>>> Message-ID:<4F81A647.4050401 at otter-rsch.com>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>
>>> On 12-04-07 07:15 PM, Matthew Supernaw wrote:
>>>
>>> However there are other quasi newton approaches that look parallelizable.
>>>
>>> As usual what we need is existing code that can just be plugged in rather
>>> than trying to reinvent this wheel.
>>>
>>>
>>>
>>>
>>>
>>>> Dave,
>>>> Great idea! Would you use opencl or cuda? I believe double precision is a add on for opencl, not sure about cuda.
>>>> Matthew
>>>>
>>>>
>>>>
>>>> On Apr 6, 2012, at 3:00 PM, developers-request at admb-project.org wrote:
>>>>
>>>>> A possible GPU project
>>>> _______________________________________________
>>>> Developers mailing list
>>>> Developers at admb-project.org
>>>> http://lists.admb-project.org/mailman/listinfo/developers
>>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 4
>>> Date: Sun, 08 Apr 2012 11:25:50 -0700
>>> From: dave fournier<davef at otter-rsch.com>
>>> To: "'developers at admb-project.org'"<developers at admb-project.org>
>>> Subject: [Developers] trying a new quasi newton method which might be
>>>    good    for GPU calculations.
>>> Message-ID:<4F81D82E.1050503 at otter-rsch.com>
>>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>>
>>> There is a quasi newton minimizer in the GSL.  It appears to use the
>>> BLAS for the
>>> vector matrix calculations involved in the quasi newton calcs.  That
>>> could lead to an
>>> easy path for using a GPU.  I wrote a little test program to see how it
>>> works.
>>>
>>> The next step is to interface it with the autodif stuff to compare it to
>>> the code in newfmin
>>> using automatic differentiation. That should not be very hard.  I hope
>>> it doesn't suck.
>>>
>>> example is attached.
>>>
>>> I trust this is more interesting than discussions on organizing organizing.
>>>
>>>
>>> -------------- next part --------------
>>> A non-text attachment was scrubbed...
>>> Name: testmin.cpp
>>> Type: text/x-c++src
>>> Size: 2549 bytes
>>> Desc: not available
>>> URL:<http://lists.admb-project.org/pipermail/developers/attachments/20120408/dec8e2c1/attachment-0001.cpp>
>>>
>>> ------------------------------
>>>
>>> Message: 5
>>> Date: Sun, 08 Apr 2012 11:32:53 -0700
>>> From: dave fournier<davef at otter-rsch.com>
>>> To: developers at admb-project.org
>>> Subject: Re: [Developers] trying a new quasi newton method which might
>>>    be good    for GPU calculations.
>>> Message-ID:<4F81D9D5.1050109 at otter-rsch.com>
>>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>>
>>> On 12-04-08 11:25 AM, dave fournier wrote:
>>>
>>> There is one gotcha with the newer versions of gcc
>>> For some reason you can get unsatisfied references in the gls stuff. To
>>> fix this
>>> you need to use a linker option --no-as-needed. to pass this option
>>> using the gcc shell
>>> you need to use the -Xlinker flag as in
>>>
>>>
>>>    -Xlinker --no-as-needed -lgsl -lgslcblas
>>>
>>>
>>>
>>>
>>>
>>>> There is a quasi newton minimizer in the GSL.  It appears to use the
>>>> BLAS for the
>>>> vector matrix calculations involved in the quasi newton calcs.  That
>>>> could lead to an
>>>> easy path for using a GPU.  I wrote a little test program to see how
>>>> it works.
>>>>
>>>> The next step is to interface it with the autodif stuff to compare it
>>>> to the code in newfmin
>>>> using automatic differentiation. That should not be very hard.  I hope
>>>> it doesn't suck.
>>>>
>>>> example is attached.
>>>>
>>>> I trust this is more interesting than discussions on organizing
>>>> organizing.
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Developers mailing list
>>>> Developers at admb-project.org
>>>> http://lists.admb-project.org/mailman/listinfo/developers
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL:<http://lists.admb-project.org/pipermail/developers/attachments/20120408/e6f54e70/attachment-0001.html>
>>>
>>> ------------------------------
>>>
>>> _______________________________________________
>>> Developers mailing list
>>> Developers at admb-project.org
>>> http://lists.admb-project.org/mailman/listinfo/developers
>>>
>>>
>>> End of Developers Digest, Vol 38, Issue 10
>>> ******************************************
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Mon, 09 Apr 2012 08:26:44 -0700
>> From: dave fournier<davef at otter-rsch.com>
>> To: developers at admb-project.org
>> Subject: Re: [Developers] A possible GPU project
>> Message-ID:<4F82FFB4.5010602 at otter-rsch.com>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> On 12-04-08 07:21 PM, Matthew Supernaw wrote:
>>
>>
>> There are a number of double loops involving the Hessian in fmin.
>> So each one is o(n^2).  Is it possible to leave the Hessian (or whatever
>> it is,
>> it has a size of 8*n*(n+1)/2 bytes) on the GPU during the entire
>> minimization. Then one just needs to move a few vector of size n back and
>> forth.
>>
>>
>>
>>
>>> Yeah... I see what your saying! Looks like it might work if you just run the j loop in parallel.
>>>
>>>
>>>
>>> On Apr 8, 2012, at 3:00 PM, developers-request at admb-project.org wrote:
>>>
>>>> Send Developers mailing list submissions to
>>>>     developers at admb-project.org
>>>>
>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>     http://lists.admb-project.org/mailman/listinfo/developers
>>>> or, via email, send a message with subject or body 'help' to
>>>>     developers-request at admb-project.org
>>>>
>>>> You can reach the person managing the list at
>>>>     developers-owner at admb-project.org
>>>>
>>>> When replying, please edit your Subject line so it is more specific
>>>> than "Re: Contents of Developers digest..."
>>>>
>>>>
>>>> Today's Topics:
>>>>
>>>>    1. A possible GPU project (Matthew Supernaw)
>>>>    2. Re: A possible GPU project (dave fournier)
>>>>    3. Re: A possible GPU project (dave fournier)
>>>>    4. trying a new quasi newton method which might be good    for GPU
>>>>       calculations. (dave fournier)
>>>>    5. Re: trying a new quasi newton method which might be good    for
>>>>       GPU calculations. (dave fournier)
>>>>
>>>>
>>>> ----------------------------------------------------------------------
>>>>
>>>> Message: 1
>>>> Date: Sat, 7 Apr 2012 22:15:15 -0400
>>>> From: Matthew Supernaw<matthew.supernaw at noaa.gov>
>>>> To: "developers at admb-project.org"<developers at admb-project.org>
>>>> Subject: [Developers] A possible GPU project
>>>> Message-ID:<E2B4E23A-AB66-40A8-84DF-4CCFB18551D6 at noaa.gov>
>>>> Content-Type: text/plain;    charset=us-ascii
>>>>
>>>>
>>>> Dave,
>>>> Great idea! Would you use opencl or cuda? I believe double precision is a add on for opencl, not sure about cuda.
>>>> Matthew
>>>>
>>>>
>>>>
>>>> On Apr 6, 2012, at 3:00 PM, developers-request at admb-project.org wrote:
>>>>
>>>>> A possible GPU project
>>>> ------------------------------
>>>>
>>>> Message: 2
>>>> Date: Sun, 08 Apr 2012 07:32:24 -0700
>>>> From: dave fournier<davef at otter-rsch.com>
>>>> To: developers at admb-project.org
>>>> Subject: Re: [Developers] A possible GPU project
>>>> Message-ID:<4F81A178.7030601 at otter-rsch.com>
>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>
>>>>
>>>> On 12-04-07 07:15 PM, Matthew Supernaw wrote:
>>>>
>>>>
>>>> Unfortunately looking at the code in newfmin.cpp more carefully the
>>>> main o(n^2) loop does not look parallelizable.
>>>>
>>>>
>>>>     int iu=n;
>>>>     int iv=2*n;
>>>>     int ib=3*n;
>>>>     for (int j=2;j<=n;j++)
>>>>     {
>>>>        double * pd=&(h.elem(j,1));
>>>>        double * qd=&(w.elem(iu+j));
>>>>        double * rd=&(w.elem(iv+1));
>>>>        double * sd=&(w.elem(ib+1));
>>>>        for (int i=1;i<j;i++)
>>>>        {
>>>>           *qd-=*pd * *rd++;
>>>>           *pd++ +=*sd++ * *qd;
>>>>        }
>>>>     }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Dave,
>>>>> Great idea! Would you use opencl or cuda? I believe double precision is a add on for opencl, not sure about cuda.
>>>>> Matthew
>>>>>
>>>>>
>>>>>
>>>>> On Apr 6, 2012, at 3:00 PM, developers-request at admb-project.org wrote:
>>>>>
>>>>>> A possible GPU project
>>>>> _______________________________________________
>>>>> Developers mailing list
>>>>> Developers at admb-project.org
>>>>> http://lists.admb-project.org/mailman/listinfo/developers
>>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Message: 3
>>>> Date: Sun, 08 Apr 2012 07:52:55 -0700
>>>> From: dave fournier<davef at otter-rsch.com>
>>>> To: developers at admb-project.org
>>>> Subject: Re: [Developers] A possible GPU project
>>>> Message-ID:<4F81A647.4050401 at otter-rsch.com>
>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>
>>>> On 12-04-07 07:15 PM, Matthew Supernaw wrote:
>>>>
>>>> However there are other quasi newton approaches that look parallelizable.
>>>>
>>>> As usual what we need is existing code that can just be plugged in rather
>>>> than trying to reinvent this wheel.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Dave,
>>>>> Great idea! Would you use opencl or cuda? I believe double precision is a add on for opencl, not sure about cuda.
>>>>> Matthew
>>>>>
>>>>>
>>>>>
>>>>> On Apr 6, 2012, at 3:00 PM, developers-request at admb-project.org wrote:
>>>>>
>>>>>> A possible GPU project
>>>>> _______________________________________________
>>>>> Developers mailing list
>>>>> Developers at admb-project.org
>>>>> http://lists.admb-project.org/mailman/listinfo/developers
>>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Message: 4
>>>> Date: Sun, 08 Apr 2012 11:25:50 -0700
>>>> From: dave fournier<davef at otter-rsch.com>
>>>> To: "'developers at admb-project.org'"<developers at admb-project.org>
>>>> Subject: [Developers] trying a new quasi newton method which might be
>>>>     good    for GPU calculations.
>>>> Message-ID:<4F81D82E.1050503 at otter-rsch.com>
>>>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>>>
>>>> There is a quasi newton minimizer in the GSL.  It appears to use the
>>>> BLAS for the
>>>> vector matrix calculations involved in the quasi newton calcs.  That
>>>> could lead to an
>>>> easy path for using a GPU.  I wrote a little test program to see how it
>>>> works.
>>>>
>>>> The next step is to interface it with the autodif stuff to compare it to
>>>> the code in newfmin
>>>> using automatic differentiation. That should not be very hard.  I hope
>>>> it doesn't suck.
>>>>
>>>> example is attached.
>>>>
>>>> I trust this is more interesting than discussions on organizing organizing.
>>>>
>>>>
>>>> -------------- next part --------------
>>>> A non-text attachment was scrubbed...
>>>> Name: testmin.cpp
>>>> Type: text/x-c++src
>>>> Size: 2549 bytes
>>>> Desc: not available
>>>> URL:<http://lists.admb-project.org/pipermail/developers/attachments/20120408/dec8e2c1/attachment-0001.cpp>
>>>>
>>>> ------------------------------
>>>>
>>>> Message: 5
>>>> Date: Sun, 08 Apr 2012 11:32:53 -0700
>>>> From: dave fournier<davef at otter-rsch.com>
>>>> To: developers at admb-project.org
>>>> Subject: Re: [Developers] trying a new quasi newton method which might
>>>>     be good    for GPU calculations.
>>>> Message-ID:<4F81D9D5.1050109 at otter-rsch.com>
>>>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>>>
>>>> On 12-04-08 11:25 AM, dave fournier wrote:
>>>>
>>>> There is one gotcha with the newer versions of gcc
>>>> For some reason you can get unsatisfied references in the gls stuff. To
>>>> fix this
>>>> you need to use a linker option --no-as-needed. to pass this option
>>>> using the gcc shell
>>>> you need to use the -Xlinker flag as in
>>>>
>>>>
>>>>     -Xlinker --no-as-needed -lgsl -lgslcblas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> There is a quasi newton minimizer in the GSL.  It appears to use the
>>>>> BLAS for the
>>>>> vector matrix calculations involved in the quasi newton calcs.  That
>>>>> could lead to an
>>>>> easy path for using a GPU.  I wrote a little test program to see how
>>>>> it works.
>>>>>
>>>>> The next step is to interface it with the autodif stuff to compare it
>>>>> to the code in newfmin
>>>>> using automatic differentiation. That should not be very hard.  I hope
>>>>> it doesn't suck.
>>>>>
>>>>> example is attached.
>>>>>
>>>>> I trust this is more interesting than discussions on organizing
>>>>> organizing.
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Developers mailing list
>>>>> Developers at admb-project.org
>>>>> http://lists.admb-project.org/mailman/listinfo/developers
>>>> -------------- next part --------------
>>>> An HTML attachment was scrubbed...
>>>> URL:<http://lists.admb-project.org/pipermail/developers/attachments/20120408/e6f54e70/attachment-0001.html>
>>>>
>>>> ------------------------------
>>>>
>>>> _______________________________________________
>>>> Developers mailing list
>>>> Developers at admb-project.org
>>>> http://lists.admb-project.org/mailman/listinfo/developers
>>>>
>>>>
>>>> End of Developers Digest, Vol 38, Issue 10
>>>> ******************************************
>>> _______________________________________________
>>> Developers mailing list
>>> Developers at admb-project.org
>>> http://lists.admb-project.org/mailman/listinfo/developers
>>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> Developers mailing list
>> Developers at admb-project.org
>> http://lists.admb-project.org/mailman/listinfo/developers
>>
>>
>> End of Developers Digest, Vol 38, Issue 11
>> ******************************************
> _______________________________________________
> Developers mailing list
> Developers at admb-project.org
> http://lists.admb-project.org/mailman/listinfo/developers
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: newfmin.cpp
Type: text/x-c++src
Size: 22017 bytes
Desc: not available
URL: <http://lists.admb-project.org/pipermail/developers/attachments/20120409/50b696b2/attachment-0001.cpp>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bigmin.tpl
URL: <http://lists.admb-project.org/pipermail/developers/attachments/20120409/50b696b2/attachment-0001.ksh>