[Developers] faster ludecomp in C++ for creating adjoint code

Mon Sep 15 12:38:22 PDT 2014

On 09/15/2014 12:23 PM, Johnoel Ancheta wrote:

To compile it I used something like

CXX=g++
ADMB_HOME=~/admodel
echo "!!! Note using the version found in directory "
echo ${ADMB_HOME}
${CXX}  -march=native -ggdb  -Ofast -funroll-loops -DOPT_LIB 
-ffast-math  -pthread -DUSE_PTHREADS -W -fpermissive -DUSE_LAPLACE -Dlinux \
   avx_vecdot.o \
   -L/home/dave/opt/OpenBLAS/lib \
   -I/home/dave/opt/OpenBLAS/include \
   -D__GNUDOS__ -o$1 $1.cpp  \
   -I. -I${ADMB_HOME}/include \
   -I/home/dave/include \
   -L${ADMB_HOME}/lib \
   -L/home/dave/lib \
   -ladmbo \
   -lopenblas \
   -lgfortran

Near the top of main you set the matrix size n and the block size m. At 
present n must be a multiple of m.

  main()
  {
     ad_set_new_handler();
   ad_exit=&ad_boundf;

   int n=2000;   // we will work with an nxn symmetric matrix
   const int m=100;  // block size for blocked LU code

To restrict openblas to one thread for comparison you can use the 
environment string

export OPENBLAS_NUM_THREADS=1

> Sure, it would be nice to do the comparison.
>
> Thanks in advance,
>
> Johnoel
>
> On Mon, Sep 15, 2014 at 9:19 AM, dave fournier <davef at otter-rsch.com 
> <mailto:davef at otter-rsch.com>> wrote:
>
>     On 09/15/2014 12:14 PM, Johnoel Ancheta wrote:
>
>     If you wan t I have code to call the Openblas version as well as a
>     blocked version which stores the relevant matrices in
>     blocks as recommended by Dongarra et al  (but it does not seem to
>     be worth the effort).
>
>
>>     Thanks Dave!  I'll provide feedback after testing...
>>
>>
>>
>>     On Mon, Sep 15, 2014 at 9:03 AM, otter <otter at otter-rsch.com
>>     <mailto:otter at otter-rsch.com>> wrote:
>>
>>         After much pain I have produced a C++ version of the LU
>>         decomposition which
>>         is suitable for producing adjoint code for ADMB and perhaps
>>         cppad. (Don't know what adjoint
>>         code for cppad is as yet!)  This code is about 25 times
>>         faster than the current
>>         ADMB code for a 2,000 x 2,000 matrix and about 4 times slower
>>         than the Openblas
>>         code which contains optimized assembler and Fortrash.  Any
>>         suggestion for improvements
>>         would be welcome.  Per usual I will hold my breath.
>>
>>
>>
>>         _______________________________________________
>>         Developers mailing list
>>         Developers at admb-project.org <mailto:Developers at admb-project.org>
>>         http://lists.admb-project.org/mailman/listinfo/developers
>>
>>
>>
>>
>>     _______________________________________________
>>     Developers mailing list
>>     Developers at admb-project.org  <mailto:Developers at admb-project.org>
>>     http://lists.admb-project.org/mailman/listinfo/developers
>
>
>     _______________________________________________
>     Developers mailing list
>     Developers at admb-project.org <mailto:Developers at admb-project.org>
>     http://lists.admb-project.org/mailman/listinfo/developers
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/developers/attachments/20140915/d7f25670/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: yblocklu10.cpp
Type: text/x-c++src
Size: 18081 bytes
Desc: not available
URL: <http://lists.admb-project.org/pipermail/developers/attachments/20140915/d7f25670/attachment-0001.cpp>