[Developers] Big improvement in the function minimizer with GPU.

Tue May 8 10:54:34 PDT 2012

I too would happily have my models do minimization in 1/32 the time.
Reading about OpenCL makes it sound quite flexible across different GPU
models.
Perhaps for starters, Dave, could you commit or email the modified
newfmin.cpp file, perhaps with a brief note about any other changes needed
to make it run.

Arni, if HAFRO has an extra US$2.10, you can rent an hour of GPU cluster
from Amazon:
http://aws.typepad.com/aws/2010/11/new-ec2-instance-type-the-cluster-gpu-instance.html

On Tue, May 8, 2012 at 10:44 AM, Arni Magnusson <arnima at hafro.is> wrote:

> Hi Dave,
>
> Speeding up ADMB by a factor of 32 is a staggering improvement indeed!
> It's great to see the example written in user-end TPL (sections etc.) and
> not some lowest-level hardware-specific system calls in C. Obviously, those
> calls are being made somewhere, but the level of abstraction that you
> demonstrate looks very promising.
>
> Like you have suggested, it is time to get as many ADMB developers as
> possible to try this out, but how do I start? Specifically:
>
> (1) Does this approach assume a GPU from a certain vendor?
>
> (2) Do I need to install some GPU developer library? Hopefully no license
> conflict with BSD.
>
> (3) Can we expect this to work on all operating systems?
>
> (4) Can we do this speed comparison for other ADMB examples, like
> simple.tpl, catage.tpl, or even random effects? If not, what would need to
> be done in order to run those models on GPU?
>
> I ask these questions in an enthusiastic, not pessimistic, tone. This is
> revolutionary stuff. After all, the shiny 48-core Linux server downstairs
> probably has no GPU, so maybe the next IT purchase should be a GPU cluster?
>
> It would be great if ADMB will provide a -gpu option in the near future. I
> imagine the user would pass that option at an early stage (adcomp and
> adlink) and not the end stage (as in mymodel -gpu)?
>
> Arni
>
>
>
>
> On Tue, 8 May 2012, dave fournier wrote:
>
>  To get a proof of concept for any programming technique it is nice to get
>> a big result fairly easily.  almost all ADMB users rely on the function
>> minimizer fmin in the file newfmin.cpp.  So to improve the performance of
>> this function in a more or less transparent would immediately help a lot of
>> users.
>>
>>
>> I hacked the newfmin.cpp file to add the BFGS quasi Newton update with
>> the (sort of) hess inverse kept on the GPU and main calcs done on the GPU.
>>
>> I tested this with a modified Rosenbrock function with 6144 parameters.
>> The new setup is both much faster and more stable than the old one on
>> newfmin. It appears that newfmin uses a different quasi-Newton update which
>> is not as efficient for a large number of parameters.
>>
>> This is the tpl file for the example.
>>
>> DATA_SECTION
>>  int n
>> !! n=4096+2048;
>> PARAMETER_SECTION
>>  init_vector x(1,n);
>>  objective_function_value f
>> PROCEDURE_SECTION
>>  for (int i=1;i<=n/2;i++)
>>  {
>>    f+=100.*square(square(x(2*i-1)**)-x(2*i))+square(x(2*i-1)-1.0)**;
>>  }
>>
>> The new GPU version took 36 seconds and 477 function evals to converge
>> - final statistics:
>> 6144 variables; iteration 277; function evaluation 477
>> Function value   3.2531e-21; maximum gradient component mag   9.7979e-11
>> Exit code = 1;  converg criter   1.0000e-10
>>
>> real    0m35.414s
>> user   0m4.417s <--- most time waiting for the GPU calcs
>> sys     0m0.616s
>>
>> Old version took 288 seconds to do 477 function evaluations but is not
>> nearly as good at this point.
>>
>> 6144 variables; iteration 300; function evaluation 485; phase 1
>> Function value   6.6252316e+00; maximum gradient component mag
>>  -8.4966e+00
>>
>> Old version converged in about 19 min 36 seconds so the new version with
>> BFGS update on the GPU is about 32 times faster than the old version and
>> probably more stable.
>>
>> Here is the old version final output
>> - final statistics:
>> 6144 variables; iteration 1212; function evaluation 2119
>> Function value   1.7758e-21; maximum gradient component mag   9.7086e-11
>> Exit code = 1;  converg criter   1.0000e-10
>>
>> real    19m36.357s
>> user    19m35.848s
>> sys    0m0.093s
>>
>> Yawn.
>>
>>  ______________________________**_________________
> Developers mailing list
> Developers at admb-project.org
> http://lists.admb-project.org/**mailman/listinfo/developers<http://lists.admb-project.org/mailman/listinfo/developers>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/developers/attachments/20120508/dab5f85c/attachment.html>