From otter at otter-rsch.com  Sun Mar  1 11:06:30 2015
From: otter at otter-rsch.com (otter)
Date: Sun, 01 Mar 2015 11:06:30 -0800
Subject: [ADMB Users] some light reading
Message-ID: <54F36336.6090108@otter-rsch.com>

This is a propos a question by Maunder a while back about whether random 
effects would
ever be included into multifan CL.

Every once and a while I read a fish model paper to convince myself that 
nothing ever gets any better.

This is a link to a paper adding random effects to some structures in 
the fish model Stock synthesis.


http://icesjms.oxfordjournals.org/content/72/1/178.short

The authors eschew the use of ADMB's random effects package in favour of 
an ad hoc approach
employing ADMB together with its Hessian calculations suitably 
modified.  They justify this approach with the
statement.

. Similarly, implementing a model in ADMB-RE requires
a large overhead of time and expertise due to the practical necessity of
finding a ?separable? formulation of the population model (Bolker
et al., 2013) and still may take a considerable amount of time
during optimization.

This is more or less completely false.   First it is not necessary to 
have a separable formulation
of the model.  It is quite feasible to have a model with thousands of 
random effects without
invoking separability.   For a fish model where the random effect (say a 
catchability deviation)
affects the population for the rest of the time after it occurs 
separability os of no use anyway.

The statement about "may take a considerable amount of time is true" for 
general RE models
of course but they are considering an application where  they integrate 
over all but a small
number of variance parameters (say up to 3 or 5).   In ADMB-RE this is 
equivalent to declaring
almost all the parameters to be of type random effect.   This has been 
discussed before
(although the concept seemed to baffle Bates.)  in the context of 
generalizing restricted maximum
likelihood estimation.

The point is that ADMB-RE does the parameter estimation by first doing 
an inner optimization over the
random effects.   This is essentially equivalent to the authors use of 
ADMB.  The difference is that
once the inner optimization is done ADMB-RE calculates the Hessian in a 
much more efficient manner
than  ADMB. It also uses this hessian to "polish" the estimates so that 
they are much closer to the
minimizing values in the sense that the max gradient magnitude is 
typically reduced from something like
1.e-4 to 1.e-12.   It then also computes the derivatives of the function 
with respect to the variance parameters
in a very efficient manner.   This enables one to use derivative based 
minimization rather than the
incredibly inefficient Nelder Mead  (which may work for a really well 
behaved problem with 3 parameters
but will never extend well to more parameters or more badly conditioned 
optimization procedures.

So ADMB-RE   is already almost perfectly equipped to handle this model.  
A small extension is needed to
permit it to have bounded random effects compatible with the bounded 
parameters in ADMB.


From davef at otter-rsch.com  Fri Mar 13 11:31:56 2015
From: davef at otter-rsch.com (dave fournier)
Date: Fri, 13 Mar 2015 11:31:56 -0700
Subject: [ADMB Users] When and how to scale parameters in ADMB?
In-Reply-To: <CABj5QwE2PcO_xb-5diteyrFyVjaXrZzpFd4DOxGCOmoVLEiTFA@mail.gmail.com>
References: <CABj5QwE2PcO_xb-5diteyrFyVjaXrZzpFd4DOxGCOmoVLEiTFA@mail.gmail.com>
Message-ID: <55032D1C.4080209@otter-rsch.com>

On 03/12/2015 06:17 PM, Shanae Allen - NOAA Affiliate wrote:

Hi,

I posted this response to the list in case others are interested in
this stuff.  You are correct that scaling alone is not the final solution
to reparameterizing functions in such a way that they are easy to minimize.

I think the gold standard for this is you were omniscient is the Morse
lemma. It says that for a "nice" function f of n variables ( parameters )
say f(x_1,x_2,...,x_n) which has a minimum, with the value of the
function at the minimum being b, then there exists a reparameterization
of the funtion given by the n functions g_i with

        x_i = g_i(y_1,...,y_n),   i=1,...,n

such that

   f(g_1(y_1,...,y_n),...,g_n(y_1,...,y_n)) = b + y_1^2 + ... + y_n^2   (1)

So the functions g_i provide a "perfect" reparameterization of f.

Of course to find the g_i you already need to know where the minimum
is located so by itself the Morse lemma is not very useful for this
problem.  It does however give one an idea of what we are trying to
accomplish by reparameterizing. Rescaling is really the last step in that if

        f(x_1,...,x_n) = b + a_1*x_1^2 + ... + a_n*x_n^2

then setting  y_i^2 = a_i*x_i^2  or y_i=sqrt(a)*x_i

provides the required reparameterization.

In general reparameterization is a bit of an art.  There are however
some general principles.  First is to "undemensionalize" the problme.
I'll give you a few examples.

    L_i =  Lmin + (Lmax-Lmin)/(1+exp(-b*(t_i-tmid))  (2)

Then one should rescale and translate the L_i to go from -1 to 1
and the t_i to go from -1 to 1.  This removes the dependence on the
units used to measure the L's and t's.  Having solved the problem
one must transform the estimates for the Lmin, Lmax, b, and tmid
back to the "real" ones. (Left as an exercise for the student.)
Once you know the transformation you can set this up to be done
automatically in ADMB by making the real parameters sdreport variables.

So say you have done that to the model.  You may still have trouble.
this is because  Lmin and Lmax are the lower and upper asymptote so
that they are not actually observed.

The next principal with parameterization is that one should use parameters
which are well identified.  A mental test of this is to ask yourself
if you already have a good idea what the values of these parameters are.
For the model above suppose that you have n ordered observations

   t_1 < ... < t_n

Let Lone be the true value for L at t=t_1 and Ln be the true value for
L at t=t_n.  Clearly we know a lot more about these parameters than
we do for Lmin and Lmax.  But there is also another advantage.
If we use Lone and Ln as the parameters, the predicted values for
L_1 and L_n are independent of the estimate for b.   In other words
b now just spaces the predicted values between Lone and Ln.
Using the original parameters Lmin and Lmax you will find that
many different combinations the parameters produce almost the same
predicted values for L_1 and L_n.  We say that there are interactions
or confounding between the parameters.  We want to remove the confounding.

Reducing the problem to the form in equation (1) minimizes the confounding.

To see that you really understand this stuff you should figure out
how to rewrite the equation (2) in terms of Lone,Ln,b, and tmid.

Now as to see what the set_scalefactor() functions does if anything
what you should do is to compile the source with debugging turned on.
This enables you to really see what it does, rather than relying on what
someone tells you. (Who knows what is really in the code any more?)

Then make a simple example like this

DATA_SECTION
PARAMETER_SECTION
   init_bounded_number x(-10,10)
  !! x.set_scalefactor(200.);
   init_number y
   objective_function_value f
PROCEDURE_SECTION
   f=0.5*(x*x+y*y);

Bounded numbers are more complicated than unbounded ones so this
is a good example to look at.

If you step through the program you should get to line 172 in df1b2qnm.cpp

   {
  172           dvariable vf=0.0;
  > 173           vf=initial_params::reset(dvar_vector(x));
  174           *objective_function_value::pobjfun=0.0;
  175           pre_userfunction();
  176           if ( no_stuff ==0 && 
quadratic_prior::get_num_quadratic_prior()>0)
  177           {

the reset function takes the x vector from the function minimizer
and puts the values into the model rescaling if desired.

stepping into that code youi should eventualy get to the line
  538 in model.cpp

  533   const int& ii, const dvariable& pen)
  534   {
  535     if (!scalefactor)
  536       ::set_value(*this,x,ii,minb,maxb,pen);
  537     else
  >538       ::set_value(*this,x,ii,minb,maxb,pen,scalefactor);
  539   }

Note that the field scalefactor is non zero. It was set by 
set_scalefactor().

Now step into that line and you get to line 56 in the file set.cpp.

  54 void set_value(const prevariable& _x,const dvar_vector& v,const 
int& _ii,
   55   double fmin, double fmax,const dvariable& _fpen,double s)
   >56 {
   57   int& ii=(int&)_ii;
   58   prevariable& x=(prevariable&) _x;
   59   dvariable& fpen=(dvariable&) _fpen;
   60   x=boundp(v(ii++),fmin,fmax,fpen,s);
   61 }

Now finally step into line 60 and you end up at line 76 in boundfun.cpp.

   75 dvariable boundp(const prevariable& x, double fmin, double 
fmax,const prevariable& _fpen,double s)
   76 {
   77   return boundp(x/s,fmin,fmax,_fpen);
   78 }

You see that at line 77 x is divided by s before being sent to the bounding
function boundp.  So x has come out of the function minimizer and
gets divided by s before going into the model

    minimizer -----> x   ----- x/s ---> y --->  boundp -----> model

To get the initial x value for the minimizer there must be a corresponding
sequence like

    model ----> boundpin ----> y ----> y*s  ----> x -----> minimizer

where boundpin is the inverse function of boundp.

Note that one divides by s, so if yo want to make the
gradient smaller by a factor of 100. one should use

    set_scalefactor(100.);


   Does that make sense?

       Dave


> Hi Dave,
>
> You helped me out a few years ago when I was just starting to use 
> ADMB, so I thought I'd try again. I posted a related question to the 
> ADMB list, but I don't know if it went through. I'm trying to 
> understand when and how to scale parameters.
>
> From what I gather there are three reasons to do this: 1) when the 
> likelihood function is highly sensitive to a given parameter (Nocedal 
> and Wright (1999)) 2) when a parameter has a high gradient (Hans), and 
> 3) when the condition number of the Hessian is very large (your post 
> here: 
> http://r.789695.n4.nabble.com/Complicated-nls-formula-giving-singular-gradient-message-td3085852.html 
> <http://www.admb-project.org/examples/function-minimization/parameter-scaling>). 
> I understand these are all related issues, however for my simulation 
> exercise a single 'problem parameter' does not always satisfy all of 
> these (e.g., a parameter with a high gradient does not have a 
> relatively high/low eigenvalue).
>
> So my question is how to apply scaling factors in a structured way and 
> is it ok to scale many parameters? Also how do you do this when you're 
> fitting many simulated datasets (and starting from many different 
> starting points). Finally, I'd very much appreciate a reference or 
> code where I can find out what the set_scalefactor function in ADMB does.
>
> Thank you Dave - any tips would be greatly appreciated!
>
> Shanae Allen
>
>
>
> Nocedal, J., & Wright, S. (1999). Numerical optimization. New York: 
> Springer.
>
>
> -- 
> Shanae Allen-Moran
> National Marine Fisheries Service
> 110 Shaffer Rd.
> Santa Cruz, CA 95060
> Phone: (831) 420-3970
> Email: shanae.allen at noaa.gov <mailto:shanae.allen at noaa.gov>
> Website: http://swfsc.noaa.gov/SalmonAssessment/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/users/attachments/20150313/1fb9ce62/attachment.html>

From davef at otter-rsch.com  Fri Mar 13 17:28:40 2015
From: davef at otter-rsch.com (dave fournier)
Date: Fri, 13 Mar 2015 17:28:40 -0700
Subject: [ADMB Users] When and how to scale parameters in ADMB?
In-Reply-To: <CABj5QwGEpmzUXk=S2Kho1cDK6B9s8Uhy9SjvTARzuhj2meb9WA@mail.gmail.com>
References: <CABj5QwE2PcO_xb-5diteyrFyVjaXrZzpFd4DOxGCOmoVLEiTFA@mail.gmail.com>	<55032D1C.4080209@otter-rsch.com>
	<CABj5QwGEpmzUXk=S2Kho1cDK6B9s8Uhy9SjvTARzuhj2meb9WA@mail.gmail.com>
Message-ID: <550380B8.3040200@otter-rsch.com>

On 03/13/2015 05:04 PM, Shanae Allen - NOAA Affiliate wrote:

Yes it is probably more natural to multiply by the scaling factor 
factor, but that day I decided to divide.


> Thank you Dave for the detailed response. I'm still digesting the 
> first part, but I'm making progress. As for the set_scalefactor() 
> function, I thought that's what it was doing but this page: 
> http://www.admb-project.org/examples/function-minimization/parameter-scaling 
> suggests otherwise ( '...which makes the function minimizer work 
> internally with b_internal = 0.001*b' and .001 is the scale factor).
> Thanks for taking the time to respond so thoroughly!
> Shanae
>
>
> On Fri, Mar 13, 2015 at 11:31 AM, dave fournier <davef at otter-rsch.com 
> <mailto:davef at otter-rsch.com>> wrote:
>
>     On 03/12/2015 06:17 PM, Shanae Allen - NOAA Affiliate wrote:
>
>     Hi,
>
>     I posted this response to the list in case others are interested in
>     this stuff.  You are correct that scaling alone is not the final
>     solution
>     to reparameterizing functions in such a way that they are easy to
>     minimize.
>
>     I think the gold standard for this is you were omniscient is the Morse
>     lemma. It says that for a "nice" function f of n variables (
>     parameters )
>     say f(x_1,x_2,...,x_n) which has a minimum, with the value of the
>     function at the minimum being b, then there exists a
>     reparameterization
>     of the funtion given by the n functions g_i with
>
>            x_i = g_i(y_1,...,y_n),   i=1,...,n
>
>     such that
>
>       f(g_1(y_1,...,y_n),...,g_n(y_1,...,y_n)) = b + y_1^2 + ... +
>     y_n^2   (1)
>
>     So the functions g_i provide a "perfect" reparameterization of f.
>
>     Of course to find the g_i you already need to know where the minimum
>     is located so by itself the Morse lemma is not very useful for this
>     problem.  It does however give one an idea of what we are trying to
>     accomplish by reparameterizing. Rescaling is really the last step
>     in that if
>
>            f(x_1,...,x_n) = b + a_1*x_1^2 + ... + a_n*x_n^2
>
>     then setting  y_i^2 = a_i*x_i^2  or y_i=sqrt(a)*x_i
>
>     provides the required reparameterization.
>
>     In general reparameterization is a bit of an art.  There are however
>     some general principles.  First is to "undemensionalize" the problme.
>     I'll give you a few examples.
>
>        L_i =  Lmin + (Lmax-Lmin)/(1+exp(-b*(t_i-tmid))  (2)
>
>     Then one should rescale and translate the L_i to go from -1 to 1
>     and the t_i to go from -1 to 1.  This removes the dependence on the
>     units used to measure the L's and t's.  Having solved the problem
>     one must transform the estimates for the Lmin, Lmax, b, and tmid
>     back to the "real" ones. (Left as an exercise for the student.)
>     Once you know the transformation you can set this up to be done
>     automatically in ADMB by making the real parameters sdreport
>     variables.
>
>     So say you have done that to the model.  You may still have trouble.
>     this is because  Lmin and Lmax are the lower and upper asymptote so
>     that they are not actually observed.
>
>     The next principal with parameterization is that one should use
>     parameters
>     which are well identified.  A mental test of this is to ask yourself
>     if you already have a good idea what the values of these
>     parameters are.
>     For the model above suppose that you have n ordered observations
>
>       t_1 < ... < t_n
>
>     Let Lone be the true value for L at t=t_1 and Ln be the true value
>     for
>     L at t=t_n.  Clearly we know a lot more about these parameters than
>     we do for Lmin and Lmax.  But there is also another advantage.
>     If we use Lone and Ln as the parameters, the predicted values for
>     L_1 and L_n are independent of the estimate for b.   In other words
>     b now just spaces the predicted values between Lone and Ln.
>     Using the original parameters Lmin and Lmax you will find that
>     many different combinations the parameters produce almost the same
>     predicted values for L_1 and L_n.  We say that there are interactions
>     or confounding between the parameters.  We want to remove the
>     confounding.
>
>     Reducing the problem to the form in equation (1) minimizes the
>     confounding.
>
>     To see that you really understand this stuff you should figure out
>     how to rewrite the equation (2) in terms of Lone,Ln,b, and tmid.
>
>     Now as to see what the set_scalefactor() functions does if anything
>     what you should do is to compile the source with debugging turned on.
>     This enables you to really see what it does, rather than relying
>     on what
>     someone tells you. (Who knows what is really in the code any more?)
>
>     Then make a simple example like this
>
>     DATA_SECTION
>     PARAMETER_SECTION
>       init_bounded_number x(-10,10)
>      !! x.set_scalefactor(200.);
>       init_number y
>       objective_function_value f
>     PROCEDURE_SECTION
>       f=0.5*(x*x+y*y);
>
>     Bounded numbers are more complicated than unbounded ones so this
>     is a good example to look at.
>
>     If you step through the program you should get to line 172 in
>     df1b2qnm.cpp
>
>       {
>      172           dvariable vf=0.0;
>      > 173 vf=initial_params::reset(dvar_vector(x));
>      174           *objective_function_value::pobjfun=0.0;
>      175           pre_userfunction();
>      176           if ( no_stuff ==0 &&
>     quadratic_prior::get_num_quadratic_prior()>0)
>      177           {
>
>     the reset function takes the x vector from the function minimizer
>     and puts the values into the model rescaling if desired.
>
>     stepping into that code youi should eventualy get to the line
>      538 in model.cpp
>
>      533   const int& ii, const dvariable& pen)
>      534   {
>      535     if (!scalefactor)
>      536       ::set_value(*this,x,ii,minb,maxb,pen);
>      537     else
>      >538 ::set_value(*this,x,ii,minb,maxb,pen,scalefactor);
>      539   }
>
>     Note that the field scalefactor is non zero. It was set by
>     set_scalefactor().
>
>     Now step into that line and you get to line 56 in the file set.cpp.
>
>      54 void set_value(const prevariable& _x,const dvar_vector&
>     v,const int& _ii,
>       55   double fmin, double fmax,const dvariable& _fpen,double s)
>       >56 {
>       57   int& ii=(int&)_ii;
>       58   prevariable& x=(prevariable&) _x;
>       59   dvariable& fpen=(dvariable&) _fpen;
>       60   x=boundp(v(ii++),fmin,fmax,fpen,s);
>       61 }
>
>     Now finally step into line 60 and you end up at line 76 in
>     boundfun.cpp.
>
>       75 dvariable boundp(const prevariable& x, double fmin, double
>     fmax,const prevariable& _fpen,double s)
>       76 {
>       77   return boundp(x/s,fmin,fmax,_fpen);
>       78 }
>
>     You see that at line 77 x is divided by s before being sent to the
>     bounding
>     function boundp.  So x has come out of the function minimizer and
>     gets divided by s before going into the model
>
>        minimizer -----> x   ----- x/s ---> y ---> boundp -----> model
>
>     To get the initial x value for the minimizer there must be a
>     corresponding
>     sequence like
>
>        model ----> boundpin ----> y ----> y*s ----> x -----> minimizer
>
>     where boundpin is the inverse function of boundp.
>
>     Note that one divides by s, so if yo want to make the
>     gradient smaller by a factor of 100. one should use
>
>        set_scalefactor(100.);
>
>
>       Does that make sense?
>
>           Dave
>
>
>
>
>>     Hi Dave,
>>
>>     You helped me out a few years ago when I was just starting to use
>>     ADMB, so I thought I'd try again. I posted a related question to
>>     the ADMB list, but I don't know if it went through. I'm trying to
>>     understand when and how to scale parameters.
>>
>>     From what I gather there are three reasons to do this: 1) when
>>     the likelihood function is highly sensitive to a given parameter
>>     (Nocedal and Wright (1999)) 2) when a parameter has a high
>>     gradient (Hans), and 3) when the condition number of the Hessian
>>     is very large (your post here:
>>     http://r.789695.n4.nabble.com/Complicated-nls-formula-giving-singular-gradient-message-td3085852.html
>>     <http://www.admb-project.org/examples/function-minimization/parameter-scaling>).
>>     I understand these are all related issues, however for my
>>     simulation exercise a single 'problem parameter' does not always
>>     satisfy all of these (e.g., a parameter with a high gradient does
>>     not have a relatively high/low eigenvalue).
>>
>>     So my question is how to apply scaling factors in a structured
>>     way and is it ok to scale many parameters? Also how do you do
>>     this when you're fitting many simulated datasets (and starting
>>     from many different starting points). Finally, I'd very much
>>     appreciate a reference or code where I can find out what the
>>     set_scalefactor function in ADMB does.
>>
>>     Thank you Dave - any tips would be greatly appreciated!
>>
>>     Shanae Allen
>>
>>
>>
>>     Nocedal, J., & Wright, S. (1999). Numerical optimization. New
>>     York: Springer.
>>
>>
>>     -- 
>>     Shanae Allen-Moran
>>     National Marine Fisheries Service
>>     110 Shaffer Rd.
>>     Santa Cruz, CA 95060
>>     Phone: (831) 420-3970 <tel:%28831%29%20420-3970>
>>     Email: shanae.allen at noaa.gov <mailto:shanae.allen at noaa.gov>
>>     Website: http://swfsc.noaa.gov/SalmonAssessment/
>
>
>
>
> -- 
> Shanae Allen-Moran
> National Marine Fisheries Service
> 110 Shaffer Rd.
> Santa Cruz, CA 95060
> Phone: (831) 420-3970
> Email: shanae.allen at noaa.gov <mailto:shanae.allen at noaa.gov>
> Website: http://swfsc.noaa.gov/SalmonAssessment/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/users/attachments/20150313/f049d54a/attachment-0001.html>

From quentin.schorpp at ti.bund.de  Wed Mar 25 01:38:10 2015
From: quentin.schorpp at ti.bund.de (Quentin Schorpp)
Date: Wed, 25 Mar 2015 09:38:10 +0100
Subject: [ADMB Users] confirm 51a51dcb273886f77fd6b16cdaa5d31476e7b8cc
In-Reply-To: <mailman.0.1427195077.29321.users@admb-project.org>
References: <mailman.0.1427195077.29321.users@admb-project.org>
Message-ID: <551273F2.4000704@ti.bund.de>

Hello

I'm using the glmmADMB package with R. On the Internet I found many very 
helpful posts and threads, that came all from the admb users-forum.
Sometimes i have some difficulties and want to ask other users for ideas 
or help.
I already subscribed for the users-mailing list, but there are very few 
opportunities, besides changing profile properties after the Login.
I'm not really sure if getting a Login to be able to contribute to the 
Users page on ww.admb-project.org will help me solving my problems.
Do you think so? is it even possible for me to register to get a Login?

kind regards,
Quentin Schorpp


Am 24.03.2015 um 12:04 schrieb users-request at admb-project.org:
> Mailing list subscription confirmation notice for mailing list Users
>
> We have received a request from 134.110.38.47 for subscription of your
> email address, "quentin.schorpp at ti.bund.de", to the
> users at admb-project.org mailing list.  To confirm that you want to be
> added to this mailing list, simply reply to this message, keeping the
> Subject: header intact.  Or visit this web page:
>
>      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
>
> Or include the following line -- and only the following line -- in a
> message to users-request at admb-project.org:
>
>      confirm 51a51dcb273886f77fd6b16cdaa5d31476e7b8cc
>
> Note that simply sending a `reply' to this message should work from
> most mail readers, since that usually leaves the Subject: line in the
> right form (additional "Re:" text in the Subject: is okay).
>
> If you do not wish to be subscribed to this list, please simply
> disregard this message.  If you think you are being maliciously
> subscribed to the list, or have any other questions, send them to
> users-owner at admb-project.org.

-- 
Quentin Schorpp, M.Sc.
Th?nen-Institut f?r Biodiversit?t
Bundesallee 50
38116 Braunschweig (Germany)

Tel:  +49 531 596-2524
Fax:  +49 531 596-2599
Mail: quentin.schorpp at ti.bund.de
Web:  http://www.ti.bund.de

Das Johann Heinrich von Th?nen-Institut, Bundesforschungsinstitut f?r L?ndliche R?ume, Wald und Fischerei ? kurz: Th?nen-Institut ?
besteht aus 15 Fachinstituten, die in den Bereichen ?konomie, ?kologie und Technologie forschen und die Politik beraten.

Quentin Schorpp, M.Sc.
Th?nen Institute of Biodiversity
Bundesallee 50
38116 Braunschweig (Germany)

Tel:  +49 531 596-2524
Fax:  +49 531 596-2599
Mail: quentin.schorpp at ti.bund.de
Web:  http://www.ti.bund.de

The Johann Heinrich von Th?nen Institute, Federal Research Institute for Rural Areas, Forestry and Fisheries ? Th?nen Institute in brief ?
consists of 15 specialized institutes that carry out research and provide policy advice in the fields of economy, ecology and technology.


From shanae.allen at noaa.gov  Mon Mar  9 16:29:22 2015
From: shanae.allen at noaa.gov (Shanae Allen - NOAA Affiliate)
Date: Mon, 09 Mar 2015 23:29:22 -0000
Subject: [ADMB Users] When and how to use set_scalefactor()?
Message-ID: <CABj5QwFoLEZKTLZzy7G8dnNavpWk4SXwXgPkQL3h1wzPATV5kw@mail.gmail.com>

Hi ADMB users!

I'm trying to understand when and how to scale parameters. The likelihood
surface for the problem I'm working on is a narrow, curved valley (when
varying the two most sensitive parameters), leading to high gradient
components and non positive definite Hessians for some simulated data sets
and starting values. According to Nocedal and Wright (1999), when the
likelihood function is highly sensitive to a given parameter (x), you
should set the scaling factor (d) to some large number such that a new
parameter z is equal to x/d.

I'd like to use the built-in function in ADMB, set_scalefactor(s), but I'm
unsure of how it treats the scaling factor. The ADMB site (
http://www.admb-project.org/examples/function-minimization/parameter-scaling)
suggests the new parameter z is equal to s*x, thus s=1/d. However, for my
problem I obtain much better convergence when s is large (thus d is close
to zero), which makes me wonder how ADMB is treating the scaling factor and
whether I'm using it appropriately. Any tips would be greatly appreciated!

Thanks!
Shanae Allen


Nocedal, J., & Wright, S. (1999). Numerical optimization. New York:
Springer.

-- 
Shanae Allen-Moran
National Marine Fisheries Service
110 Shaffer Rd.
Santa Cruz, CA 95060
Phone: (831) 420-3970
Email: shanae.allen at noaa.gov
Website: http://swfsc.noaa.gov/SalmonAssessment/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/users/attachments/20150309/53bee978/attachment-0001.html>

From shanae.allen at noaa.gov  Fri Mar 13 17:04:58 2015
From: shanae.allen at noaa.gov (Shanae Allen - NOAA Affiliate)
Date: Fri, 13 Mar 2015 17:04:58 -0700
Subject: [ADMB Users] When and how to scale parameters in ADMB?
In-Reply-To: <55032D1C.4080209@otter-rsch.com>
References: <CABj5QwE2PcO_xb-5diteyrFyVjaXrZzpFd4DOxGCOmoVLEiTFA@mail.gmail.com>
	<55032D1C.4080209@otter-rsch.com>
Message-ID: <CABj5QwGEpmzUXk=S2Kho1cDK6B9s8Uhy9SjvTARzuhj2meb9WA@mail.gmail.com>

Thank you Dave for the detailed response. I'm still digesting the first
part, but I'm making progress. As for the set_scalefactor() function, I
thought that's what it was doing but this page:
http://www.admb-project.org/examples/function-minimization/parameter-scaling
suggests otherwise ( '...which makes the function minimizer work internally
with b_internal = 0.001*b' and .001 is the scale factor).
Thanks for taking the time to respond so thoroughly!
Shanae


On Fri, Mar 13, 2015 at 11:31 AM, dave fournier <davef at otter-rsch.com>
wrote:

>  On 03/12/2015 06:17 PM, Shanae Allen - NOAA Affiliate wrote:
>
> Hi,
>
> I posted this response to the list in case others are interested in
> this stuff.  You are correct that scaling alone is not the final solution
> to reparameterizing functions in such a way that they are easy to minimize.
>
> I think the gold standard for this is you were omniscient is the Morse
> lemma. It says that for a "nice" function f of n variables ( parameters )
> say f(x_1,x_2,...,x_n) which has a minimum, with the value of the
> function at the minimum being b, then there exists a reparameterization
> of the funtion given by the n functions g_i with
>
>        x_i = g_i(y_1,...,y_n),   i=1,...,n
>
> such that
>
>   f(g_1(y_1,...,y_n),...,g_n(y_1,...,y_n)) = b + y_1^2 + ... + y_n^2   (1)
>
> So the functions g_i provide a "perfect" reparameterization of f.
>
> Of course to find the g_i you already need to know where the minimum
> is located so by itself the Morse lemma is not very useful for this
> problem.  It does however give one an idea of what we are trying to
> accomplish by reparameterizing. Rescaling is really the last step in that
> if
>
>        f(x_1,...,x_n) = b + a_1*x_1^2 + ... + a_n*x_n^2
>
> then setting  y_i^2 = a_i*x_i^2  or y_i=sqrt(a)*x_i
>
> provides the required reparameterization.
>
> In general reparameterization is a bit of an art.  There are however
> some general principles.  First is to "undemensionalize" the problme.
> I'll give you a few examples.
>
>    L_i =  Lmin + (Lmax-Lmin)/(1+exp(-b*(t_i-tmid))  (2)
>
> Then one should rescale and translate the L_i to go from -1 to 1
> and the t_i to go from -1 to 1.  This removes the dependence on the
> units used to measure the L's and t's.  Having solved the problem
> one must transform the estimates for the Lmin, Lmax, b, and tmid
> back to the "real" ones. (Left as an exercise for the student.)
> Once you know the transformation you can set this up to be done
> automatically in ADMB by making the real parameters sdreport variables.
>
> So say you have done that to the model.  You may still have trouble.
> this is because  Lmin and Lmax are the lower and upper asymptote so
> that they are not actually observed.
>
> The next principal with parameterization is that one should use parameters
> which are well identified.  A mental test of this is to ask yourself
> if you already have a good idea what the values of these parameters are.
> For the model above suppose that you have n ordered observations
>
>   t_1 < ... < t_n
>
> Let Lone be the true value for L at t=t_1 and Ln be the true value for
> L at t=t_n.  Clearly we know a lot more about these parameters than
> we do for Lmin and Lmax.  But there is also another advantage.
> If we use Lone and Ln as the parameters, the predicted values for
> L_1 and L_n are independent of the estimate for b.   In other words
> b now just spaces the predicted values between Lone and Ln.
> Using the original parameters Lmin and Lmax you will find that
> many different combinations the parameters produce almost the same
> predicted values for L_1 and L_n.  We say that there are interactions
> or confounding between the parameters.  We want to remove the confounding.
>
> Reducing the problem to the form in equation (1) minimizes the confounding.
>
> To see that you really understand this stuff you should figure out
> how to rewrite the equation (2) in terms of Lone,Ln,b, and tmid.
>
> Now as to see what the set_scalefactor() functions does if anything
> what you should do is to compile the source with debugging turned on.
> This enables you to really see what it does, rather than relying on what
> someone tells you. (Who knows what is really in the code any more?)
>
> Then make a simple example like this
>
> DATA_SECTION
> PARAMETER_SECTION
>   init_bounded_number x(-10,10)
>  !! x.set_scalefactor(200.);
>   init_number y
>   objective_function_value f
> PROCEDURE_SECTION
>   f=0.5*(x*x+y*y);
>
> Bounded numbers are more complicated than unbounded ones so this
> is a good example to look at.
>
> If you step through the program you should get to line 172 in df1b2qnm.cpp
>
>   {
>  172           dvariable vf=0.0;
>  > 173           vf=initial_params::reset(dvar_vector(x));
>  174           *objective_function_value::pobjfun=0.0;
>  175           pre_userfunction();
>  176           if ( no_stuff ==0 &&
> quadratic_prior::get_num_quadratic_prior()>0)
>  177           {
>
> the reset function takes the x vector from the function minimizer
> and puts the values into the model rescaling if desired.
>
> stepping into that code youi should eventualy get to the line
>  538 in model.cpp
>
>  533   const int& ii, const dvariable& pen)
>  534   {
>  535     if (!scalefactor)
>  536       ::set_value(*this,x,ii,minb,maxb,pen);
>  537     else
>  >538       ::set_value(*this,x,ii,minb,maxb,pen,scalefactor);
>  539   }
>
> Note that the field scalefactor is non zero. It was set by
> set_scalefactor().
>
> Now step into that line and you get to line 56 in the file set.cpp.
>
>  54 void set_value(const prevariable& _x,const dvar_vector& v,const int&
> _ii,
>   55   double fmin, double fmax,const dvariable& _fpen,double s)
>   >56 {
>   57   int& ii=(int&)_ii;
>   58   prevariable& x=(prevariable&) _x;
>   59   dvariable& fpen=(dvariable&) _fpen;
>   60   x=boundp(v(ii++),fmin,fmax,fpen,s);
>   61 }
>
> Now finally step into line 60 and you end up at line 76 in boundfun.cpp.
>
>   75 dvariable boundp(const prevariable& x, double fmin, double fmax,const
> prevariable& _fpen,double s)
>   76 {
>   77   return boundp(x/s,fmin,fmax,_fpen);
>   78 }
>
> You see that at line 77 x is divided by s before being sent to the bounding
> function boundp.  So x has come out of the function minimizer and
> gets divided by s before going into the model
>
>    minimizer -----> x   ----- x/s ---> y --->  boundp -----> model
>
> To get the initial x value for the minimizer there must be a corresponding
> sequence like
>
>    model ----> boundpin ----> y ----> y*s  ----> x -----> minimizer
>
> where boundpin is the inverse function of boundp.
>
> Note that one divides by s, so if yo want to make the
> gradient smaller by a factor of 100. one should use
>
>    set_scalefactor(100.);
>
>
>   Does that make sense?
>
>       Dave
>
>
>
>
>   Hi Dave,
>
> You helped me out a few years ago when I was just starting to use ADMB, so
> I thought I'd try again. I posted a related question to the ADMB list, but
> I don't know if it went through. I'm trying to understand when and how to
> scale parameters.
>
> From what I gather there are three reasons to do this: 1) when the
> likelihood function is highly sensitive to a given parameter (Nocedal and
> Wright (1999)) 2) when a parameter has a high gradient (Hans), and 3) when
> the condition number of the Hessian is very large (your post here:
> http://r.789695.n4.nabble.com/Complicated-nls-formula-giving-singular-gradient-message-td3085852.html
> <http://www.admb-project.org/examples/function-minimization/parameter-scaling>).
> I understand these are all related issues, however for my simulation
> exercise a single 'problem parameter' does not always satisfy all of these
> (e.g., a parameter with a high gradient does not have a relatively high/low
> eigenvalue).
>
>  So my question is how to apply scaling factors in a structured way and
> is it ok to scale many parameters? Also how do you do this when you're
> fitting many simulated datasets (and starting from many different starting
> points). Finally, I'd very much appreciate a reference or code where I can
> find out what the set_scalefactor function in ADMB does.
>
> Thank you Dave - any tips would be greatly appreciated!
>
> Shanae Allen
>
>
>
> Nocedal, J., & Wright, S. (1999). Numerical optimization. New York:
> Springer.
>
>
> --
>  Shanae Allen-Moran
> National Marine Fisheries Service
> 110 Shaffer Rd.
> Santa Cruz, CA 95060
> Phone: (831) 420-3970
> Email: shanae.allen at noaa.gov
> Website: http://swfsc.noaa.gov/SalmonAssessment/
>
>
>


-- 
Shanae Allen-Moran
National Marine Fisheries Service
110 Shaffer Rd.
Santa Cruz, CA 95060
Phone: (831) 420-3970
Email: shanae.allen at noaa.gov
Website: http://swfsc.noaa.gov/SalmonAssessment/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/users/attachments/20150313/2ff972fe/attachment-0001.html>

From davef at otter-rsch.com  Sat Mar 28 19:54:14 2015
From: davef at otter-rsch.com (dave fournier)
Date: Sat, 28 Mar 2015 19:54:14 -0700
Subject: [ADMB Users] confirm 51a51dcb273886f77fd6b16cdaa5d31476e7b8cc
In-Reply-To: <551273F2.4000704@ti.bund.de>
References: <551273F2.4000704@ti.bund.de>
Message-ID: <55176956.4080603@otter-rsch.com>


Maybe you should just ask your question and see what happens.