[ADMB Users] When and how to scale parameters in ADMB?

Fri Mar 13 11:31:56 PDT 2015

On 03/12/2015 06:17 PM, Shanae Allen - NOAA Affiliate wrote:

Hi,

I posted this response to the list in case others are interested in
this stuff.  You are correct that scaling alone is not the final solution
to reparameterizing functions in such a way that they are easy to minimize.

I think the gold standard for this is you were omniscient is the Morse
lemma. It says that for a "nice" function f of n variables ( parameters )
say f(x_1,x_2,...,x_n) which has a minimum, with the value of the
function at the minimum being b, then there exists a reparameterization
of the funtion given by the n functions g_i with

        x_i = g_i(y_1,...,y_n),   i=1,...,n

such that

   f(g_1(y_1,...,y_n),...,g_n(y_1,...,y_n)) = b + y_1^2 + ... + y_n^2   (1)

So the functions g_i provide a "perfect" reparameterization of f.

Of course to find the g_i you already need to know where the minimum
is located so by itself the Morse lemma is not very useful for this
problem.  It does however give one an idea of what we are trying to
accomplish by reparameterizing. Rescaling is really the last step in that if

        f(x_1,...,x_n) = b + a_1*x_1^2 + ... + a_n*x_n^2

then setting  y_i^2 = a_i*x_i^2  or y_i=sqrt(a)*x_i

provides the required reparameterization.

In general reparameterization is a bit of an art.  There are however
some general principles.  First is to "undemensionalize" the problme.
I'll give you a few examples.

    L_i =  Lmin + (Lmax-Lmin)/(1+exp(-b*(t_i-tmid))  (2)

Then one should rescale and translate the L_i to go from -1 to 1
and the t_i to go from -1 to 1.  This removes the dependence on the
units used to measure the L's and t's.  Having solved the problem
one must transform the estimates for the Lmin, Lmax, b, and tmid
back to the "real" ones. (Left as an exercise for the student.)
Once you know the transformation you can set this up to be done
automatically in ADMB by making the real parameters sdreport variables.

So say you have done that to the model.  You may still have trouble.
this is because  Lmin and Lmax are the lower and upper asymptote so
that they are not actually observed.

The next principal with parameterization is that one should use parameters
which are well identified.  A mental test of this is to ask yourself
if you already have a good idea what the values of these parameters are.
For the model above suppose that you have n ordered observations

   t_1 < ... < t_n

Let Lone be the true value for L at t=t_1 and Ln be the true value for
L at t=t_n.  Clearly we know a lot more about these parameters than
we do for Lmin and Lmax.  But there is also another advantage.
If we use Lone and Ln as the parameters, the predicted values for
L_1 and L_n are independent of the estimate for b.   In other words
b now just spaces the predicted values between Lone and Ln.
Using the original parameters Lmin and Lmax you will find that
many different combinations the parameters produce almost the same
predicted values for L_1 and L_n.  We say that there are interactions
or confounding between the parameters.  We want to remove the confounding.

Reducing the problem to the form in equation (1) minimizes the confounding.

To see that you really understand this stuff you should figure out
how to rewrite the equation (2) in terms of Lone,Ln,b, and tmid.

Now as to see what the set_scalefactor() functions does if anything
what you should do is to compile the source with debugging turned on.
This enables you to really see what it does, rather than relying on what
someone tells you. (Who knows what is really in the code any more?)

Then make a simple example like this

DATA_SECTION
PARAMETER_SECTION
   init_bounded_number x(-10,10)
  !! x.set_scalefactor(200.);
   init_number y
   objective_function_value f
PROCEDURE_SECTION
   f=0.5*(x*x+y*y);

Bounded numbers are more complicated than unbounded ones so this
is a good example to look at.

If you step through the program you should get to line 172 in df1b2qnm.cpp

   {
  172           dvariable vf=0.0;
  > 173           vf=initial_params::reset(dvar_vector(x));
  174           *objective_function_value::pobjfun=0.0;
  175           pre_userfunction();
  176           if ( no_stuff ==0 && 
quadratic_prior::get_num_quadratic_prior()>0)
  177           {

the reset function takes the x vector from the function minimizer
and puts the values into the model rescaling if desired.

stepping into that code youi should eventualy get to the line
  538 in model.cpp

  533   const int& ii, const dvariable& pen)
  534   {
  535     if (!scalefactor)
  536       ::set_value(*this,x,ii,minb,maxb,pen);
  537     else
  >538       ::set_value(*this,x,ii,minb,maxb,pen,scalefactor);
  539   }

Note that the field scalefactor is non zero. It was set by 
set_scalefactor().

Now step into that line and you get to line 56 in the file set.cpp.

  54 void set_value(const prevariable& _x,const dvar_vector& v,const 
int& _ii,
   55   double fmin, double fmax,const dvariable& _fpen,double s)
   >56 {
   57   int& ii=(int&)_ii;
   58   prevariable& x=(prevariable&) _x;
   59   dvariable& fpen=(dvariable&) _fpen;
   60   x=boundp(v(ii++),fmin,fmax,fpen,s);
   61 }

Now finally step into line 60 and you end up at line 76 in boundfun.cpp.

   75 dvariable boundp(const prevariable& x, double fmin, double 
fmax,const prevariable& _fpen,double s)
   76 {
   77   return boundp(x/s,fmin,fmax,_fpen);
   78 }

You see that at line 77 x is divided by s before being sent to the bounding
function boundp.  So x has come out of the function minimizer and
gets divided by s before going into the model

    minimizer -----> x   ----- x/s ---> y --->  boundp -----> model

To get the initial x value for the minimizer there must be a corresponding
sequence like

    model ----> boundpin ----> y ----> y*s  ----> x -----> minimizer

where boundpin is the inverse function of boundp.

Note that one divides by s, so if yo want to make the
gradient smaller by a factor of 100. one should use

    set_scalefactor(100.);

   Does that make sense?

       Dave

> Hi Dave,
>
> You helped me out a few years ago when I was just starting to use 
> ADMB, so I thought I'd try again. I posted a related question to the 
> ADMB list, but I don't know if it went through. I'm trying to 
> understand when and how to scale parameters.
>
> From what I gather there are three reasons to do this: 1) when the 
> likelihood function is highly sensitive to a given parameter (Nocedal 
> and Wright (1999)) 2) when a parameter has a high gradient (Hans), and 
> 3) when the condition number of the Hessian is very large (your post 
> here: 
> http://r.789695.n4.nabble.com/Complicated-nls-formula-giving-singular-gradient-message-td3085852.html 
> <http://www.admb-project.org/examples/function-minimization/parameter-scaling>). 
> I understand these are all related issues, however for my simulation 
> exercise a single 'problem parameter' does not always satisfy all of 
> these (e.g., a parameter with a high gradient does not have a 
> relatively high/low eigenvalue).
>
> So my question is how to apply scaling factors in a structured way and 
> is it ok to scale many parameters? Also how do you do this when you're 
> fitting many simulated datasets (and starting from many different 
> starting points). Finally, I'd very much appreciate a reference or 
> code where I can find out what the set_scalefactor function in ADMB does.
>
> Thank you Dave - any tips would be greatly appreciated!
>
> Shanae Allen
>
>
>
> Nocedal, J., & Wright, S. (1999). Numerical optimization. New York: 
> Springer.
>
>
> -- 
> Shanae Allen-Moran
> National Marine Fisheries Service
> 110 Shaffer Rd.
> Santa Cruz, CA 95060
> Phone: (831) 420-3970
> Email: shanae.allen at noaa.gov <mailto:shanae.allen at noaa.gov>
> Website: http://swfsc.noaa.gov/SalmonAssessment/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/users/attachments/20150313/1fb9ce62/attachment.html>