[ADMB Users] When and how to scale parameters in ADMB?

Fri Mar 13 17:28:40 PDT 2015

On 03/13/2015 05:04 PM, Shanae Allen - NOAA Affiliate wrote:

Yes it is probably more natural to multiply by the scaling factor 
factor, but that day I decided to divide.

> Thank you Dave for the detailed response. I'm still digesting the 
> first part, but I'm making progress. As for the set_scalefactor() 
> function, I thought that's what it was doing but this page: 
> http://www.admb-project.org/examples/function-minimization/parameter-scaling 
> suggests otherwise ( '...which makes the function minimizer work 
> internally with b_internal = 0.001*b' and .001 is the scale factor).
> Thanks for taking the time to respond so thoroughly!
> Shanae
>
>
> On Fri, Mar 13, 2015 at 11:31 AM, dave fournier <davef at otter-rsch.com 
> <mailto:davef at otter-rsch.com>> wrote:
>
>     On 03/12/2015 06:17 PM, Shanae Allen - NOAA Affiliate wrote:
>
>     Hi,
>
>     I posted this response to the list in case others are interested in
>     this stuff.  You are correct that scaling alone is not the final
>     solution
>     to reparameterizing functions in such a way that they are easy to
>     minimize.
>
>     I think the gold standard for this is you were omniscient is the Morse
>     lemma. It says that for a "nice" function f of n variables (
>     parameters )
>     say f(x_1,x_2,...,x_n) which has a minimum, with the value of the
>     function at the minimum being b, then there exists a
>     reparameterization
>     of the funtion given by the n functions g_i with
>
>            x_i = g_i(y_1,...,y_n),   i=1,...,n
>
>     such that
>
>       f(g_1(y_1,...,y_n),...,g_n(y_1,...,y_n)) = b + y_1^2 + ... +
>     y_n^2   (1)
>
>     So the functions g_i provide a "perfect" reparameterization of f.
>
>     Of course to find the g_i you already need to know where the minimum
>     is located so by itself the Morse lemma is not very useful for this
>     problem.  It does however give one an idea of what we are trying to
>     accomplish by reparameterizing. Rescaling is really the last step
>     in that if
>
>            f(x_1,...,x_n) = b + a_1*x_1^2 + ... + a_n*x_n^2
>
>     then setting  y_i^2 = a_i*x_i^2  or y_i=sqrt(a)*x_i
>
>     provides the required reparameterization.
>
>     In general reparameterization is a bit of an art.  There are however
>     some general principles.  First is to "undemensionalize" the problme.
>     I'll give you a few examples.
>
>        L_i =  Lmin + (Lmax-Lmin)/(1+exp(-b*(t_i-tmid))  (2)
>
>     Then one should rescale and translate the L_i to go from -1 to 1
>     and the t_i to go from -1 to 1.  This removes the dependence on the
>     units used to measure the L's and t's.  Having solved the problem
>     one must transform the estimates for the Lmin, Lmax, b, and tmid
>     back to the "real" ones. (Left as an exercise for the student.)
>     Once you know the transformation you can set this up to be done
>     automatically in ADMB by making the real parameters sdreport
>     variables.
>
>     So say you have done that to the model.  You may still have trouble.
>     this is because  Lmin and Lmax are the lower and upper asymptote so
>     that they are not actually observed.
>
>     The next principal with parameterization is that one should use
>     parameters
>     which are well identified.  A mental test of this is to ask yourself
>     if you already have a good idea what the values of these
>     parameters are.
>     For the model above suppose that you have n ordered observations
>
>       t_1 < ... < t_n
>
>     Let Lone be the true value for L at t=t_1 and Ln be the true value
>     for
>     L at t=t_n.  Clearly we know a lot more about these parameters than
>     we do for Lmin and Lmax.  But there is also another advantage.
>     If we use Lone and Ln as the parameters, the predicted values for
>     L_1 and L_n are independent of the estimate for b.   In other words
>     b now just spaces the predicted values between Lone and Ln.
>     Using the original parameters Lmin and Lmax you will find that
>     many different combinations the parameters produce almost the same
>     predicted values for L_1 and L_n.  We say that there are interactions
>     or confounding between the parameters.  We want to remove the
>     confounding.
>
>     Reducing the problem to the form in equation (1) minimizes the
>     confounding.
>
>     To see that you really understand this stuff you should figure out
>     how to rewrite the equation (2) in terms of Lone,Ln,b, and tmid.
>
>     Now as to see what the set_scalefactor() functions does if anything
>     what you should do is to compile the source with debugging turned on.
>     This enables you to really see what it does, rather than relying
>     on what
>     someone tells you. (Who knows what is really in the code any more?)
>
>     Then make a simple example like this
>
>     DATA_SECTION
>     PARAMETER_SECTION
>       init_bounded_number x(-10,10)
>      !! x.set_scalefactor(200.);
>       init_number y
>       objective_function_value f
>     PROCEDURE_SECTION
>       f=0.5*(x*x+y*y);
>
>     Bounded numbers are more complicated than unbounded ones so this
>     is a good example to look at.
>
>     If you step through the program you should get to line 172 in
>     df1b2qnm.cpp
>
>       {
>      172           dvariable vf=0.0;
>      > 173 vf=initial_params::reset(dvar_vector(x));
>      174           *objective_function_value::pobjfun=0.0;
>      175           pre_userfunction();
>      176           if ( no_stuff ==0 &&
>     quadratic_prior::get_num_quadratic_prior()>0)
>      177           {
>
>     the reset function takes the x vector from the function minimizer
>     and puts the values into the model rescaling if desired.
>
>     stepping into that code youi should eventualy get to the line
>      538 in model.cpp
>
>      533   const int& ii, const dvariable& pen)
>      534   {
>      535     if (!scalefactor)
>      536       ::set_value(*this,x,ii,minb,maxb,pen);
>      537     else
>      >538 ::set_value(*this,x,ii,minb,maxb,pen,scalefactor);
>      539   }
>
>     Note that the field scalefactor is non zero. It was set by
>     set_scalefactor().
>
>     Now step into that line and you get to line 56 in the file set.cpp.
>
>      54 void set_value(const prevariable& _x,const dvar_vector&
>     v,const int& _ii,
>       55   double fmin, double fmax,const dvariable& _fpen,double s)
>       >56 {
>       57   int& ii=(int&)_ii;
>       58   prevariable& x=(prevariable&) _x;
>       59   dvariable& fpen=(dvariable&) _fpen;
>       60   x=boundp(v(ii++),fmin,fmax,fpen,s);
>       61 }
>
>     Now finally step into line 60 and you end up at line 76 in
>     boundfun.cpp.
>
>       75 dvariable boundp(const prevariable& x, double fmin, double
>     fmax,const prevariable& _fpen,double s)
>       76 {
>       77   return boundp(x/s,fmin,fmax,_fpen);
>       78 }
>
>     You see that at line 77 x is divided by s before being sent to the
>     bounding
>     function boundp.  So x has come out of the function minimizer and
>     gets divided by s before going into the model
>
>        minimizer -----> x   ----- x/s ---> y ---> boundp -----> model
>
>     To get the initial x value for the minimizer there must be a
>     corresponding
>     sequence like
>
>        model ----> boundpin ----> y ----> y*s ----> x -----> minimizer
>
>     where boundpin is the inverse function of boundp.
>
>     Note that one divides by s, so if yo want to make the
>     gradient smaller by a factor of 100. one should use
>
>        set_scalefactor(100.);
>
>
>       Does that make sense?
>
>           Dave
>
>
>
>
>>     Hi Dave,
>>
>>     You helped me out a few years ago when I was just starting to use
>>     ADMB, so I thought I'd try again. I posted a related question to
>>     the ADMB list, but I don't know if it went through. I'm trying to
>>     understand when and how to scale parameters.
>>
>>     From what I gather there are three reasons to do this: 1) when
>>     the likelihood function is highly sensitive to a given parameter
>>     (Nocedal and Wright (1999)) 2) when a parameter has a high
>>     gradient (Hans), and 3) when the condition number of the Hessian
>>     is very large (your post here:
>>     http://r.789695.n4.nabble.com/Complicated-nls-formula-giving-singular-gradient-message-td3085852.html
>>     <http://www.admb-project.org/examples/function-minimization/parameter-scaling>).
>>     I understand these are all related issues, however for my
>>     simulation exercise a single 'problem parameter' does not always
>>     satisfy all of these (e.g., a parameter with a high gradient does
>>     not have a relatively high/low eigenvalue).
>>
>>     So my question is how to apply scaling factors in a structured
>>     way and is it ok to scale many parameters? Also how do you do
>>     this when you're fitting many simulated datasets (and starting
>>     from many different starting points). Finally, I'd very much
>>     appreciate a reference or code where I can find out what the
>>     set_scalefactor function in ADMB does.
>>
>>     Thank you Dave - any tips would be greatly appreciated!
>>
>>     Shanae Allen
>>
>>
>>
>>     Nocedal, J., & Wright, S. (1999). Numerical optimization. New
>>     York: Springer.
>>
>>
>>     -- 
>>     Shanae Allen-Moran
>>     National Marine Fisheries Service
>>     110 Shaffer Rd.
>>     Santa Cruz, CA 95060
>>     Phone: (831) 420-3970 <tel:%28831%29%20420-3970>
>>     Email: shanae.allen at noaa.gov <mailto:shanae.allen at noaa.gov>
>>     Website: http://swfsc.noaa.gov/SalmonAssessment/
>
>
>
>
> -- 
> Shanae Allen-Moran
> National Marine Fisheries Service
> 110 Shaffer Rd.
> Santa Cruz, CA 95060
> Phone: (831) 420-3970
> Email: shanae.allen at noaa.gov <mailto:shanae.allen at noaa.gov>
> Website: http://swfsc.noaa.gov/SalmonAssessment/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/users/attachments/20150313/f049d54a/attachment-0001.html>