[ADMB Users] When and how to scale parameters in ADMB?

Shanae Allen - NOAA Affiliate shanae.allen at noaa.gov
Fri Mar 13 17:04:58 PDT 2015


Thank you Dave for the detailed response. I'm still digesting the first
part, but I'm making progress. As for the set_scalefactor() function, I
thought that's what it was doing but this page:
http://www.admb-project.org/examples/function-minimization/parameter-scaling
suggests otherwise ( '...which makes the function minimizer work internally
with b_internal = 0.001*b' and .001 is the scale factor).
Thanks for taking the time to respond so thoroughly!
Shanae


On Fri, Mar 13, 2015 at 11:31 AM, dave fournier <davef at otter-rsch.com>
wrote:

>  On 03/12/2015 06:17 PM, Shanae Allen - NOAA Affiliate wrote:
>
> Hi,
>
> I posted this response to the list in case others are interested in
> this stuff.  You are correct that scaling alone is not the final solution
> to reparameterizing functions in such a way that they are easy to minimize.
>
> I think the gold standard for this is you were omniscient is the Morse
> lemma. It says that for a "nice" function f of n variables ( parameters )
> say f(x_1,x_2,...,x_n) which has a minimum, with the value of the
> function at the minimum being b, then there exists a reparameterization
> of the funtion given by the n functions g_i with
>
>        x_i = g_i(y_1,...,y_n),   i=1,...,n
>
> such that
>
>   f(g_1(y_1,...,y_n),...,g_n(y_1,...,y_n)) = b + y_1^2 + ... + y_n^2   (1)
>
> So the functions g_i provide a "perfect" reparameterization of f.
>
> Of course to find the g_i you already need to know where the minimum
> is located so by itself the Morse lemma is not very useful for this
> problem.  It does however give one an idea of what we are trying to
> accomplish by reparameterizing. Rescaling is really the last step in that
> if
>
>        f(x_1,...,x_n) = b + a_1*x_1^2 + ... + a_n*x_n^2
>
> then setting  y_i^2 = a_i*x_i^2  or y_i=sqrt(a)*x_i
>
> provides the required reparameterization.
>
> In general reparameterization is a bit of an art.  There are however
> some general principles.  First is to "undemensionalize" the problme.
> I'll give you a few examples.
>
>    L_i =  Lmin + (Lmax-Lmin)/(1+exp(-b*(t_i-tmid))  (2)
>
> Then one should rescale and translate the L_i to go from -1 to 1
> and the t_i to go from -1 to 1.  This removes the dependence on the
> units used to measure the L's and t's.  Having solved the problem
> one must transform the estimates for the Lmin, Lmax, b, and tmid
> back to the "real" ones. (Left as an exercise for the student.)
> Once you know the transformation you can set this up to be done
> automatically in ADMB by making the real parameters sdreport variables.
>
> So say you have done that to the model.  You may still have trouble.
> this is because  Lmin and Lmax are the lower and upper asymptote so
> that they are not actually observed.
>
> The next principal with parameterization is that one should use parameters
> which are well identified.  A mental test of this is to ask yourself
> if you already have a good idea what the values of these parameters are.
> For the model above suppose that you have n ordered observations
>
>   t_1 < ... < t_n
>
> Let Lone be the true value for L at t=t_1 and Ln be the true value for
> L at t=t_n.  Clearly we know a lot more about these parameters than
> we do for Lmin and Lmax.  But there is also another advantage.
> If we use Lone and Ln as the parameters, the predicted values for
> L_1 and L_n are independent of the estimate for b.   In other words
> b now just spaces the predicted values between Lone and Ln.
> Using the original parameters Lmin and Lmax you will find that
> many different combinations the parameters produce almost the same
> predicted values for L_1 and L_n.  We say that there are interactions
> or confounding between the parameters.  We want to remove the confounding.
>
> Reducing the problem to the form in equation (1) minimizes the confounding.
>
> To see that you really understand this stuff you should figure out
> how to rewrite the equation (2) in terms of Lone,Ln,b, and tmid.
>
> Now as to see what the set_scalefactor() functions does if anything
> what you should do is to compile the source with debugging turned on.
> This enables you to really see what it does, rather than relying on what
> someone tells you. (Who knows what is really in the code any more?)
>
> Then make a simple example like this
>
> DATA_SECTION
> PARAMETER_SECTION
>   init_bounded_number x(-10,10)
>  !! x.set_scalefactor(200.);
>   init_number y
>   objective_function_value f
> PROCEDURE_SECTION
>   f=0.5*(x*x+y*y);
>
> Bounded numbers are more complicated than unbounded ones so this
> is a good example to look at.
>
> If you step through the program you should get to line 172 in df1b2qnm.cpp
>
>   {
>  172           dvariable vf=0.0;
>  > 173           vf=initial_params::reset(dvar_vector(x));
>  174           *objective_function_value::pobjfun=0.0;
>  175           pre_userfunction();
>  176           if ( no_stuff ==0 &&
> quadratic_prior::get_num_quadratic_prior()>0)
>  177           {
>
> the reset function takes the x vector from the function minimizer
> and puts the values into the model rescaling if desired.
>
> stepping into that code youi should eventualy get to the line
>  538 in model.cpp
>
>  533   const int& ii, const dvariable& pen)
>  534   {
>  535     if (!scalefactor)
>  536       ::set_value(*this,x,ii,minb,maxb,pen);
>  537     else
>  >538       ::set_value(*this,x,ii,minb,maxb,pen,scalefactor);
>  539   }
>
> Note that the field scalefactor is non zero. It was set by
> set_scalefactor().
>
> Now step into that line and you get to line 56 in the file set.cpp.
>
>  54 void set_value(const prevariable& _x,const dvar_vector& v,const int&
> _ii,
>   55   double fmin, double fmax,const dvariable& _fpen,double s)
>   >56 {
>   57   int& ii=(int&)_ii;
>   58   prevariable& x=(prevariable&) _x;
>   59   dvariable& fpen=(dvariable&) _fpen;
>   60   x=boundp(v(ii++),fmin,fmax,fpen,s);
>   61 }
>
> Now finally step into line 60 and you end up at line 76 in boundfun.cpp.
>
>   75 dvariable boundp(const prevariable& x, double fmin, double fmax,const
> prevariable& _fpen,double s)
>   76 {
>   77   return boundp(x/s,fmin,fmax,_fpen);
>   78 }
>
> You see that at line 77 x is divided by s before being sent to the bounding
> function boundp.  So x has come out of the function minimizer and
> gets divided by s before going into the model
>
>    minimizer -----> x   ----- x/s ---> y --->  boundp -----> model
>
> To get the initial x value for the minimizer there must be a corresponding
> sequence like
>
>    model ----> boundpin ----> y ----> y*s  ----> x -----> minimizer
>
> where boundpin is the inverse function of boundp.
>
> Note that one divides by s, so if yo want to make the
> gradient smaller by a factor of 100. one should use
>
>    set_scalefactor(100.);
>
>
>   Does that make sense?
>
>       Dave
>
>
>
>
>   Hi Dave,
>
> You helped me out a few years ago when I was just starting to use ADMB, so
> I thought I'd try again. I posted a related question to the ADMB list, but
> I don't know if it went through. I'm trying to understand when and how to
> scale parameters.
>
> From what I gather there are three reasons to do this: 1) when the
> likelihood function is highly sensitive to a given parameter (Nocedal and
> Wright (1999)) 2) when a parameter has a high gradient (Hans), and 3) when
> the condition number of the Hessian is very large (your post here:
> http://r.789695.n4.nabble.com/Complicated-nls-formula-giving-singular-gradient-message-td3085852.html
> <http://www.admb-project.org/examples/function-minimization/parameter-scaling>).
> I understand these are all related issues, however for my simulation
> exercise a single 'problem parameter' does not always satisfy all of these
> (e.g., a parameter with a high gradient does not have a relatively high/low
> eigenvalue).
>
>  So my question is how to apply scaling factors in a structured way and
> is it ok to scale many parameters? Also how do you do this when you're
> fitting many simulated datasets (and starting from many different starting
> points). Finally, I'd very much appreciate a reference or code where I can
> find out what the set_scalefactor function in ADMB does.
>
> Thank you Dave - any tips would be greatly appreciated!
>
> Shanae Allen
>
>
>
> Nocedal, J., & Wright, S. (1999). Numerical optimization. New York:
> Springer.
>
>
> --
>  Shanae Allen-Moran
> National Marine Fisheries Service
> 110 Shaffer Rd.
> Santa Cruz, CA 95060
> Phone: (831) 420-3970
> Email: shanae.allen at noaa.gov
> Website: http://swfsc.noaa.gov/SalmonAssessment/
>
>
>


-- 
Shanae Allen-Moran
National Marine Fisheries Service
110 Shaffer Rd.
Santa Cruz, CA 95060
Phone: (831) 420-3970
Email: shanae.allen at noaa.gov
Website: http://swfsc.noaa.gov/SalmonAssessment/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/users/attachments/20150313/2ff972fe/attachment-0001.html>


More information about the Users mailing list