[Developers] OpenMPI

Hans J. Skaug Hans.Skaug at math.uib.no
Tue Jan 10 12:15:13 PST 2012


Dave: How many  cores do you have on your machine?

Derek: I have never used open MPI before. Can you give
me summary of what I need to do if I want to
compile your code.
- How do I know if open mpi is installed on my linux box?
- Is the make file in

svn co svn+ssh://www.admb-project.org/branches/parallel admb-mpi

set up to compile with MPI? (Dave's previous email suggests no).

Hans

>-----Original Message-----
>From: dave fournier [mailto:davef at otter-rsch.com]
>Sent: Tuesday, January 10, 2012 7:27 PM
>To: Derek Seiple
>Cc: Hans J. Skaug; 'Arni Magnusson'; 'anders at nielsensweb.org'
>Subject: Re: OpenMPI
>
>On 12-01-09 05:42 PM, Derek Seiple wrote:
>
>Hi,
>
>Using time ./netsted4 -nohess I get
>
>real    0m14.405s
>user    0m14.265s
>sys    0m0.132s
>
>Using time ./nested4 -nohess -master I get
>
>
>real    0m11.404s
>user    0m11.160s
>sys    0m0.223s
>
>This is for the slow safe debug version
>As I said hessian calculations are broke anyway and that is
>a different issue so you should try the -nohess option.
>from what I have it looks like your mpi stuff speeds it up,
>although not by a huge amount.
>
>I think it would be better to extend your work to date to the
>dense hessian and aparse hessian cases rather than having
>competing implementations.  One thing. Whay cant you use the
>existing separable_calls_counter and num_separable calls instead of
>adding a new one.   If that is OK we could add a global_num_separable_calls
>for when we really need them all.
>
>       Dave
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>> There are some other examples in the mpi_tests folder. One that takes
>> a while to run for me is the nested4 example. When I would run it
>> normally it would take about 40 seconds or so, but  with a slave and
>> master it takes over a minute.
>>
>> Derek
>>
>> On Mon, Jan 9, 2012 at 5:31 PM, dave fournier<davef at otter-rsch.com>
>wrote:
>>> On 12-01-09 07:51 AM, Derek Seiple wrote:
>>>
>>> Hi,
>>>
>>> this seems to run OK for the estimation. Do you have a large example?
>>>
>>> The Hessian calculations blow up I think because something like
>>> separable_calls_counter is
>>> not getting initialized.
>>>
>>>        Dave
>>>
>>>
>>>
>>>
>>>> Hi Dave,
>>>>
>>>> Probably the easiest way is for you to checkout the code using the
>>>> following:
>>>>
>>>> svn co svn+ssh://www.admb-project.org/branches/parallel
>>>> <your_destination_folder>
>>>>
>>>> That is the most up-to-date code. If that doesn't work for you i can
>>>> try to zip up the source and email it to you...
>>>>
>>>> I compile the source with the GNUmakefile but that probably wont work
>>>> for you. I think GNUmakefile.64bit might be the one you used. The
>>>> build process isn't that smooth yet in the parallel branch.
>>>>
>>>> I also add these 2 lines to my .bashrc file
>>>> export
>ADMB_HOME=<path_to_where_you_downloaded_the_source>/build/unix/
>unix
>>>> export PATH=$ADMB_HOME/bin:$PATH
>>>>
>>>> Then you can go to mpi_tests/orange_mpi_test and then just type make
>>>> or make debug to compile the example.
>>>> After that
>>>> ./orange it runs it as normal
>>>> ./orange -master runs it with one master and one slave
>>>> ./orange -master -nslaves n runs it with one master and n slaves
>>>>
>>>> Derek
>>>>
>>>>
>>>>
>>>> On Sun, Jan 8, 2012 at 12:36 PM, dave fournier<davef at otter-rsch.com>
>>>>   wrote:
>>>>> On 11-11-18 06:04 AM, Derek Seiple wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> If you want me to look at this you should send me all the files necessary
>>>>> to
>>>>> run it
>>>>> or tell me how to get them.  I don;t want to spend a lot of time and then
>>>>> find out
>>>>> I'm not using the latest source.
>>>>>
>>>>>     Dave
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Hi Dave,
>>>>>>
>>>>>> I have the separable case done. The only problem is that it isn't
>>>>>> really any faster. Some models, even a little slower. Can you look at
>>>>>> my code and see if you see any obvious slow downs.
>>>>>>
>>>>>> I suspect the code in df1b2lp9.cpp is probably the cause of the slow
>>>>>> down, but I don't know what to do with it right now.
>>>>>>
>>>>>> Thanks,
>>>>>> Derek
>>>>>>
>>>>>> On Sun, Nov 13, 2011 at 7:25 PM, dave fournier<davef at otter-
>rsch.com>
>>>>>>   wrote:
>>>>>>> On 11-11-11 09:05 AM, Derek Seiple wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Actually that is just the simplest case.
>>>>>>>
>>>>>>> But anyway just to clean up the code. I think that anything to do with
>>>>>>> MPI should be inside a
>>>>>>>
>>>>>>>
>>>>>>> #if defined(USE_ADMPI)
>>>>>>>
>>>>>>> #endif
>>>>>>>
>>>>>>> so that we can still compile the code without any MPI stuff.
>>>>>>> I try to do it that way but things keep sneaking out.
>>>>>>>
>>>>>>>             Dave
>>>>>>>
>>>>>>>
>>>>>>>> Ok.
>>>>>>>>
>>>>>>>> I am finishing up the RE hessian stuff now. Once that is done, then I
>>>>>>>> think the separable stuff will be done.
>>>>>>>>
>>>>>>>> Derek
>>>>>>>>
>>>>>>>> On Fri, Nov 11, 2011 at 12:01 PM, dave fournier<davef at otter-
>rsch.com>
>>>>>>>>   wrote:
>>>>>>>>> On 11-11-11 08:56 AM, dseiple84 at gmail.com wrote:
>>>>>>>>>
>>>>>>>>> I don;t see how it could be. It was only when creating the MPI code
>>>>>>>>> that
>>>>>>>>> the
>>>>>>>>> call to sd_routine got left out. So it was never part of the main
>>>>>>>>> distribution.
>>>>>>>>> However once I fix that std files do get created when I run a simple
>>>>>>>>> example
>>>>>>>>> using
>>>>>>>>> your admb-mpi code.
>>>>>>>>>
>>>>>>>>>> Is this related to one of the earlier emails about std files from
>>>>>>>>>> version
>>>>>>>>>> 9 to version 10?
>>>>>>>>>>
>>>>>>>>>> On , dave fournier<davef at otter-rsch.com>        wrote:
>>>>>>>>>>> On 11-11-08 11:39 AM, Derek Seiple wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> There was an error in this whihc cuased std files not to get
>>>>>>>>>>> printed
>>>>>>>>>>> out
>>>>>>>>>>> sometimes.
>>>>>>>>>>>
>>>>>>>>>>> It is only in the open mpi branch so far as I know.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     Dave
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Never mind. I see now that the RE are set during the Newton
>Raphson
>>>>>>>>>>>
>>>>>>>>>>> inside evaluate_function(uhat,pfmin)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 8, 2011 at 9:18 AM, Derek
>Seipledseiple84 at gmail.com>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks Dave,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I now haw the minimizer converging for multiple slaves. One
>thing I
>>>>>>>>>>>
>>>>>>>>>>> haven't figured out is where the random effects values are
>stored
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>>> where they get set to varsptr pr printing in the par file.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I know that the vector y in the laplace is comprised of y=(x,u)
>>>>>>>>>>> that
>>>>>>>>>>>
>>>>>>>>>>> is the variable and the random effects. However just updating
>the u
>>>>>>>>>>>
>>>>>>>>>>> protion of y in the master doesn't get the values in to the par
>>>>>>>>>>> file.
>>>>>>>>>>>
>>>>>>>>>>> For example this is the par file I get:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> # Number of parameters = 5  Objective function value = 94.8143
>>>>>>>>>>>
>>>>>>>>>>> Maximum gradient component = 5.56233e-05
>>>>>>>>>>>
>>>>>>>>>>> # beta:
>>>>>>>>>>>
>>>>>>>>>>>   0.0531627 1.90635 -7.92692
>>>>>>>>>>>
>>>>>>>>>>> # log_sigma:
>>>>>>>>>>>
>>>>>>>>>>> 2.05962297763
>>>>>>>>>>>
>>>>>>>>>>> # log_sigma_u:
>>>>>>>>>>>
>>>>>>>>>>> 3.45462211398
>>>>>>>>>>>
>>>>>>>>>>> # u:
>>>>>>>>>>>
>>>>>>>>>>>   -29.5621226547 31.7279669314 -37.1934482041 0.00000000000
>>>>>>>>>>> 0.00000000000
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So the random effects are stored somewhere (besides y). I
>need to
>>>>>>>>>>> find
>>>>>>>>>>>
>>>>>>>>>>> out where so I can send the RE from the slaves to the master.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Nov 4, 2011 at 5:17 PM, dave fournierdavef at otter-
>rsch.com>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11-11-04 01:43 PM, Derek Seiple wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> That seems right. So all that needs to be done is to take
>>>>>>>>>>>
>>>>>>>>>>> the f's from the slaves and the g from the slsves and
>>>>>>>>>>>
>>>>>>>>>>> add them to the f and the g from the master.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Keep in mind that this is for the simple block diagonal case.
>>>>>>>>>>>
>>>>>>>>>>> because the same random effects do not occur in two
>processes.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So the dvector uhat that get calculated by
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> uhat=get_uhat_quasi_newton_block_diagonal(x,pfmin);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> is the inner optimization. And then
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    for(int ii=1;ii
>>>>>>>>>>>    {
>>>>>>>>>>>
>>>>>>>>>>>      {
>>>>>>>>>>>
>>>>>>>>>>>        // test newton raphson
>>>>>>>>>>>
>>>>>>>>>>>        //Hess.initialize();
>>>>>>>>>>>
>>>>>>>>>>>        int check=initial_params::stddev_scale(scale,uhat);
>>>>>>>>>>>
>>>>>>>>>>>        check=initial_params::stddev_curvscale(curv,uhat);
>>>>>>>>>>>
>>>>>>>>>>>        max_separable_g=0.0;
>>>>>>>>>>>
>>>>>>>>>>>        pmin->inner_opt_flag=1;
>>>>>>>>>>>
>>>>>>>>>>>        step=get_newton_raphson_info_block_diagonal(pfmin);
>>>>>>>>>>>
>>>>>>>>>>>        cout
>>>>>>>>>>>        cout
>>>>>>>>>>>        uhat+=step;
>>>>>>>>>>>
>>>>>>>>>>>        evaluate_function(uhat,pfmin);
>>>>>>>>>>>
>>>>>>>>>>>        pmin->inner_opt_flag=0;
>>>>>>>>>>>
>>>>>>>>>>>      }
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> is the Newton-Raphson part that adjusts uhat to make
>>>>>>>>>>>
>>>>>>>>>>> F_u(x,uhat(x))=e
>>>>>>>>>>>
>>>>>>>>>>> even smaller?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The above snippets get called within
>>>>>>>>>>>
>>>>>>>>>>> g=(*lapprox)(x,f,this);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The resultant g is the gradient of the function
>>>>>>>>>>>
>>>>>>>>>>> L(x) = f(x,uhat(x)).
>>>>>>>>>>>
>>>>>>>>>>> Correct?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This g and corresponding f and x then get fed to the outer
>>>>>>>>>>> maximization.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Nov 4, 2011 at 2:20 PM, dave fournierdavef at otter-
>rsch.com>
>>>>>>>>>>>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11-11-04 10:57 AM, Derek Seiple wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> No the inner optimization(s)  use the function minimizer to do
>>>>>>>>>>>
>>>>>>>>>>> initial optimization.Tbhe problme was that qusi newton can not
>in
>>>>>>>>>>> general
>>>>>>>>>>>
>>>>>>>>>>> get the gradient small enough for the total function to be
>smooth
>>>>>>>>>>> enough.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Recall I told you that the chain rule is used in
>>>>>>>>>>>
>>>>>>>>>>> differentiating
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    F_u(x,uhdat(x))=0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> to solve for uhat'(x) = -inv(F_uu) * F_xu
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But  actually F_u(x,uhdat(x))=e
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> where e is (hopefully) small.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> To make e small the newton raphson is used after the quasi-
>newton.
>>>>>>>>>>>
>>>>>>>>>>> For the block diagonal hessian each block has its own inner
>>>>>>>>>>> minimization
>>>>>>>>>>>
>>>>>>>>>>> and newton-raphson.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> After this is done then the laplace approximation is carried out.
>>>>>>>>>>> So
>>>>>>>>>>> it
>>>>>>>>>>>
>>>>>>>>>>> all
>>>>>>>>>>>
>>>>>>>>>>> looks like this,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 1.) inner quasi newton (uses dvariables)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2.)Newton raphson 1 using df1b2variable to calculate hessian H
>>>>>>>>>>>
>>>>>>>>>>>   update is  inv(H)*g
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 3.)Newton raphson 2 using df1b2variable to calculate hessian H
>>>>>>>>>>>
>>>>>>>>>>>   update is  inv(H)*g
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> calculate H using df1b2variables and then calucate some
>function of
>>>>>>>>>>> H
>>>>>>>>>>>
>>>>>>>>>>> using dvariables this is essentially  log(det(H)) for the Laplace
>>>>>>>>>>>
>>>>>>>>>>> approximation
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Now do reverse AD on log(det(H)) to get derviatives back to
>where H
>>>>>>>>>>> is
>>>>>>>>>>>
>>>>>>>>>>> calculated.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Paste that into the final reverse for df1b2variables
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> You should just make a tiny toy program and follow this through
>>>>>>>>>>> with
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> debugger.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So if I understand how it works, the inner maximizations happen
>in
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> laplace_aprox correct? Then the out maximization is done with
>the
>>>>>>>>>>>
>>>>>>>>>>> function_minimizer?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If this is the case then how should the following code be
>executed?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>      This is OK at least for the first go. Later one might want to
>>>>>>>>>>>
>>>>>>>>>>> parallelize
>>>>>>>>>>>
>>>>>>>>>>>   the outer one as well but the jobs are separate.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>      if (negdirections==0)
>>>>>>>>>>>
>>>>>>>>>>>      {
>>>>>>>>>>>
>>>>>>>>>>>        while (fmc.ireturn>=0)
>>>>>>>>>>>
>>>>>>>>>>>        {
>>>>>>>>>>>
>>>>>>>>>>>          fmc.fmin(f,x,g); //
>>>>>>>>>>>          if (fmc.ireturn>0)
>>>>>>>>>>>
>>>>>>>>>>>          {
>>>>>>>>>>>
>>>>>>>>>>>            if (ifn_trap)
>>>>>>>>>>>
>>>>>>>>>>>            {
>>>>>>>>>>>
>>>>>>>>>>>              if (ifn_trap==fmc.ifn&&              itn_trap == fmc.itn)
>>>>>>>>>>>
>>>>>>>>>>>              {
>>>>>>>>>>>
>>>>>>>>>>>                cout
>>>>>>>>>>>              }
>>>>>>>>>>>
>>>>>>>>>>>            }
>>>>>>>>>>>
>>>>>>>>>>>            g=(*lapprox)(x,f,this); //
>>>>>>>>>>>            if (bad_step_flag==1)
>>>>>>>>>>>
>>>>>>>>>>>            {
>>>>>>>>>>>
>>>>>>>>>>>              g=1.e+4;
>>>>>>>>>>>
>>>>>>>>>>>              f=2.*fmc.fbest;
>>>>>>>>>>>
>>>>>>>>>>>              bad_step_flag=0;
>>>>>>>>>>>
>>>>>>>>>>>            }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>            if (lapprox->init_switch==0)
>>>>>>>>>>>
>>>>>>>>>>>            {
>>>>>>>>>>>
>>>>>>>>>>>              if (f
>>>>>>>>>>>              {
>>>>>>>>>>>
>>>>>>>>>>>                lapprox->ubest=lapprox->uhat;
>>>>>>>>>>>
>>>>>>>>>>>              }
>>>>>>>>>>>
>>>>>>>>>>>            }
>>>>>>>>>>>
>>>>>>>>>>>          }
>>>>>>>>>>>
>>>>>>>>>>>        }
>>>>>>>>>>>
>>>>>>>>>>>      }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Nov 3, 2011 at 7:40 PM, dave fournierdavef at otter-
>rsch.com>
>>>>>>>>>>>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11-11-03 02:52 PM, Derek Seiple wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> There is an inner maximization (actually a bunch of them)
>>>>>>>>>>>
>>>>>>>>>>> and an outer maximization.  I assume that the failure is
>>>>>>>>>>>
>>>>>>>>>>> in the outer maximization.  all you need to do is to compare
>>>>>>>>>>>
>>>>>>>>>>> the gradient and the function value that is going to the
>>>>>>>>>>>
>>>>>>>>>>> function minimizier. One or both of them must have changed.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It is a waste of time worrying about what goes on inside the
>>>>>>>>>>>
>>>>>>>>>>> function minimizer itself. If yoiu feed it the correct function
>>>>>>>>>>> value
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>>> gradient
>>>>>>>>>>>
>>>>>>>>>>> it will work.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But remember that you should only be running one outer
>maximization
>>>>>>>>>>>
>>>>>>>>>>> in the master. I'm worried about what the slaves are doing.  YOu
>>>>>>>>>>> should
>>>>>>>>>>>
>>>>>>>>>>> make
>>>>>>>>>>>
>>>>>>>>>>> sure they only do what you want them to do.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> fmm::fmin in newfmin.cpp is tough to read because of al the
>goto's.
>>>>>>>>>>>
>>>>>>>>>>> But one thing that is common amongst all the examples I've run
>with
>>>>>>>>>>> 2
>>>>>>>>>>>
>>>>>>>>>>> or more slaves ends up with
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Function minimizer: Step size  too small -- ialph=1
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It is difficult to track down what trips this; even with the
>>>>>>>>>>> debugger.
>>>>>>>>>>>
>>>>>>>>>>> Do you know what types of things cause that step size message.
>>>>>>>>>>>
>>>>>>>>>>> Particularly anything related to separable models.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Figuring this out is important to understanding why the
>minimizer
>>>>>>>>>>>
>>>>>>>>>>> doesn't converge, because right after the step size message the
>>>>>>>>>>>
>>>>>>>>>>> function value blows up.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Interestingly this doesn't happen with only one slave. The
>>>>>>>>>>> minimizer
>>>>>>>>>>>
>>>>>>>>>>> converges.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Derek
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Nov 3, 2011 at 10:26 AM, Derek
>Seipledseiple84 at gmail.com>
>>>>>>>>>>>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Alternatively, did you compile with either
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> GNUMakefile.js or GNUMakefile.64bit?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> They should all be in one make file but I haven't done that.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Nov 3, 2011 at 10:24 AM, Derek
>Seipledseiple84 at gmail.com>
>>>>>>>>>>>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> What is the path you have set to ADMB_HOME? On my machine
>I have
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> following in bashrc
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> export ADMB_HOME=/home/dseiple/admb-
>mpi/build/unix/unix
>>>>>>>>>>>
>>>>>>>>>>> export PATH=$ADMB_HOME/bin:$PATH
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If you look in orange.cpp there should be this at the top
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> extern ad_separable_manager * separable_manager;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If it isn't there then you probably tried to compile with another
>>>>>>>>>>>
>>>>>>>>>>> version of ADMB. Or maybe you need to make clean in order for
>the
>>>>>>>>>>>
>>>>>>>>>>> modified sed script to make it into the build directory?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Derek
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Nov 2, 2011 at 5:06 PM, dave fournierdavef at otter-
>rsch.com>
>>>>>>>>>>>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11-11-02 12:13 PM, Derek Seiple wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> your code won't compile for me any more. I get
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> orange.cpp:12:8: error: 'ad_separable_manager' does not name
>a type
>>>>>>>>>>>
>>>>>>>>>>> orange.cpp: In member function 'virtual void
>>>>>>>>>>>
>>>>>>>>>>> model_parameters::userfunction()':
>>>>>>>>>>>
>>>>>>>>>>> orange.cpp:41:20: error: 'sb' was not declared in this scope
>>>>>>>>>>>
>>>>>>>>>>> orange.cpp:41:31: error: 'separable_bounds' was not declared in
>>>>>>>>>>>
>>>>>>>>>>> this
>>>>>>>>>>>
>>>>>>>>>>> scope
>>>>>>>>>>>
>>>>>>>>>>> orange.cpp: In function 'int main(int, char**)':
>>>>>>>>>>>
>>>>>>>>>>> orange.cpp:106:7: error: 'dp' was not declared in this scope
>>>>>>>>>>>
>>>>>>>>>>> orange.cpp: In member function 'virtual void
>>>>>>>>>>>
>>>>>>>>>>> df1b2_parameters::user_function()':
>>>>>>>>>>>
>>>>>>>>>>> orange.cpp:161:20: error: 'sb' was not declared in this scope
>>>>>>>>>>>
>>>>>>>>>>> orange.cpp:161:31: error: 'separable_bounds' was not declared
>in
>>>>>>>>>>>
>>>>>>>>>>> this
>>>>>>>>>>>
>>>>>>>>>>> scope
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Error: could not create orange.o
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I added another example and changed some code in
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>laplace_approximation_calculator::get_uhat_quasi_newton_block_diagonal
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It appears that things converge in both models when I only have
>1
>>>>>>>>>>>
>>>>>>>>>>> slave, but go to hell when I have 2 slaves.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Derek
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 1, 2011 at 4:05 PM, Derek
>Seipledseiple84 at gmail.com>
>>>>>>>>>>>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Dave,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Can you take a look at the latest version of what I have in the
>>>>>>>>>>>
>>>>>>>>>>> repository?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> When I run the example model (or liver_gamma in the RE
>examples
>>>>>>>>>>>
>>>>>>>>>>> folder) the RE portion doesn't converge properly and I can't
>>>>>>>>>>>
>>>>>>>>>>> figure
>>>>>>>>>>>
>>>>>>>>>>> out why. I have compared values with the debugger and it
>appears
>>>>>>>>>>>
>>>>>>>>>>> that
>>>>>>>>>>>
>>>>>>>>>>> the evaluation of
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> fmc.fmin(f,x,g);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> g=(*lapprox)(x,f,this);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> agree whether I use a slave or not, but the behavior is
>>>>>>>>>>>
>>>>>>>>>>> different.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Is there anything in the underlying code that I might be over
>>>>>>>>>>>
>>>>>>>>>>> looking?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> For example, I found that I had to add some code in gradcalc() to
>>>>>>>>>>>
>>>>>>>>>>> manually adjust
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> INDVAR_LIST from the grad_stack class to account for splitting
>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> separable function.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Could there be something similar happening with RE stuff that I
>>>>>>>>>>>
>>>>>>>>>>> don't
>>>>>>>>>>>
>>>>>>>>>>> see yet? Something with the laplace? or the minimizer?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Derek
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Oct 16, 2011 at 3:12 PM, dave
>>>>>>>>>>>
>>>>>>>>>>> fournierdavef at otter-rsch.com>
>>>>>>>>>>>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11-10-16 10:27 AM, Derek Seiple wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I started the program in phase 2 with and without a slave.
>>>>>>>>>>>
>>>>>>>>>>> you can see that both the function value and derivatives are
>>>>>>>>>>>
>>>>>>>>>>> different.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Dave,
>>>>>>>>>>>
>>>>>>>>>>> Nice work.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I'm running your orange example. Are there any special flags.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> No special flags. The way I run the example is just
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ./orange -master or
>>>>>>>>>>>
>>>>>>>>>>> ./orange -master -nslaves n
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> As near as I can tell in user function both master and slave
>>>>>>>>>>>
>>>>>>>>>>> are
>>>>>>>>>>>
>>>>>>>>>>> starting
>>>>>>>>>>>
>>>>>>>>>>> with
>>>>>>>>>>>
>>>>>>>>>>> i=1 in the for loop below
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     for(i=sb->indexmin();iindexmax();i++)
>>>>>>>>>>>
>>>>>>>>>>>    {
>>>>>>>>>>>
>>>>>>>>>>>        ii = 0;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> When I started working on this I was having issues with the
>>>>>>>>>>>
>>>>>>>>>>> non-RE
>>>>>>>>>>>
>>>>>>>>>>> portion of the example. Since I was trying to get the RE stuff
>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>> work
>>>>>>>>>>>
>>>>>>>>>>> I just made the sb->indexmin() call work so that both slave and
>>>>>>>>>>>
>>>>>>>>>>> master
>>>>>>>>>>>
>>>>>>>>>>> would do all the non-RE stuff and then once RE kicked in they
>>>>>>>>>>>
>>>>>>>>>>> would
>>>>>>>>>>>
>>>>>>>>>>> change so that slave and master would take different parts.
>Now
>>>>>>>>>>>
>>>>>>>>>>> that I
>>>>>>>>>>>
>>>>>>>>>>> understand how everything is working, I think I can change it
>>>>>>>>>>>
>>>>>>>>>>> (only
>>>>>>>>>>>
>>>>>>>>>>> need to change one line of code) so that the separable calls
>>>>>>>>>>>
>>>>>>>>>>> are
>>>>>>>>>>>
>>>>>>>>>>> split
>>>>>>>>>>>
>>>>>>>>>>> up all the time.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On a side note, have you been able to run the example on your
>>>>>>>>>>>
>>>>>>>>>>> machine?
>>>>>>>>>>>
>>>>>>>>>>> I know you said it crashed for you. I am still figuring out
>>>>>>>>>>>
>>>>>>>>>>> what
>>>>>>>>>>>
>>>>>>>>>>> some
>>>>>>>>>>>
>>>>>>>>>>> of the subtle issues are. When I run it with ./orange -master
>>>>>>>>>>>
>>>>>>>>>>> (i.e.
>>>>>>>>>>>
>>>>>>>>>>> one slave) it converges and the par file has values pretty
>>>>>>>>>>>
>>>>>>>>>>> close
>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>> what you would get if you just ran ./orange (no slave), but the
>>>>>>>>>>>
>>>>>>>>>>> gradient values are a little off and it complains about the
>>>>>>>>>>>
>>>>>>>>>>> covariance
>>>>>>>>>>>
>>>>>>>>>>> matrix. When you run ./orange -master -nslaves 2 the
>>>>>>>>>>>
>>>>>>>>>>> convergence
>>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>>>>
>>>>>>>>>>> a
>>>>>>>>>>>
>>>>>>>>>>> little worse and the gradient values are even worse, that is
>>>>>>>>>>>
>>>>>>>>>>> not
>>>>>>>>>>>
>>>>>>>>>>> small
>>>>>>>>>>>
>>>>>>>>>>> and it complains about the hessian. (see the attached files)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This leads me to believe I am not handling the gradients
>>>>>>>>>>>
>>>>>>>>>>> properly.
>>>>>>>>>>>
>>>>>>>>>>> Late on Friday I realized I wasn't synchronizing the
>>>>>>>>>>>
>>>>>>>>>>> re_objective_function_value, but I haven't looked into that
>>>>>>>>>>>
>>>>>>>>>>> further
>>>>>>>>>>>
>>>>>>>>>>> yet. So I don't know if that will help things or if I am doing
>>>>>>>>>>>
>>>>>>>>>>> something wrong with the gradients.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Derek
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Oct 16, 2011 at 11:55 AM, dave
>>>>>>>>>>>
>>>>>>>>>>> fournierdavef at otter-rsch.com>
>>>>>>>>>>>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> BTW the best way to use the debugger ddd in linux is to put
>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> following
>>>>>>>>>>>
>>>>>>>>>>> code at the
>>>>>>>>>>>
>>>>>>>>>>> beginning of the program
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>   int ii = 0;
>>>>>>>>>>>
>>>>>>>>>>>     char hostname[256];
>>>>>>>>>>>
>>>>>>>>>>>     gethostname(hostname, sizeof(hostname));
>>>>>>>>>>>
>>>>>>>>>>>     printf("PID %d on %s ready for attach\n", getpid(),
>>>>>>>>>>>
>>>>>>>>>>> hostname);
>>>>>>>>>>>
>>>>>>>>>>>     fflush(stdout);
>>>>>>>>>>>
>>>>>>>>>>>     while (0 == ii)
>>>>>>>>>>>
>>>>>>>>>>>         sleep(5);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It will print out its PID and wait.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Then you can attatch to it with the debugger with
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     ddd orange pid
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> do a backtrace and set ii=1
>>>>>>>>>>>
>>>>>>>>>>> and set a breakpoint where ever you want and continue.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>



More information about the Developers mailing list