[ADMB Users] When and how to scale parameters in ADMB?
dave fournier
davef at otter-rsch.com
Fri Mar 13 17:28:40 PDT 2015
On 03/13/2015 05:04 PM, Shanae Allen - NOAA Affiliate wrote:
Yes it is probably more natural to multiply by the scaling factor
factor, but that day I decided to divide.
> Thank you Dave for the detailed response. I'm still digesting the
> first part, but I'm making progress. As for the set_scalefactor()
> function, I thought that's what it was doing but this page:
> http://www.admb-project.org/examples/function-minimization/parameter-scaling
> suggests otherwise ( '...which makes the function minimizer work
> internally with b_internal = 0.001*b' and .001 is the scale factor).
> Thanks for taking the time to respond so thoroughly!
> Shanae
>
>
> On Fri, Mar 13, 2015 at 11:31 AM, dave fournier <davef at otter-rsch.com
> <mailto:davef at otter-rsch.com>> wrote:
>
> On 03/12/2015 06:17 PM, Shanae Allen - NOAA Affiliate wrote:
>
> Hi,
>
> I posted this response to the list in case others are interested in
> this stuff. You are correct that scaling alone is not the final
> solution
> to reparameterizing functions in such a way that they are easy to
> minimize.
>
> I think the gold standard for this is you were omniscient is the Morse
> lemma. It says that for a "nice" function f of n variables (
> parameters )
> say f(x_1,x_2,...,x_n) which has a minimum, with the value of the
> function at the minimum being b, then there exists a
> reparameterization
> of the funtion given by the n functions g_i with
>
> x_i = g_i(y_1,...,y_n), i=1,...,n
>
> such that
>
> f(g_1(y_1,...,y_n),...,g_n(y_1,...,y_n)) = b + y_1^2 + ... +
> y_n^2 (1)
>
> So the functions g_i provide a "perfect" reparameterization of f.
>
> Of course to find the g_i you already need to know where the minimum
> is located so by itself the Morse lemma is not very useful for this
> problem. It does however give one an idea of what we are trying to
> accomplish by reparameterizing. Rescaling is really the last step
> in that if
>
> f(x_1,...,x_n) = b + a_1*x_1^2 + ... + a_n*x_n^2
>
> then setting y_i^2 = a_i*x_i^2 or y_i=sqrt(a)*x_i
>
> provides the required reparameterization.
>
> In general reparameterization is a bit of an art. There are however
> some general principles. First is to "undemensionalize" the problme.
> I'll give you a few examples.
>
> L_i = Lmin + (Lmax-Lmin)/(1+exp(-b*(t_i-tmid)) (2)
>
> Then one should rescale and translate the L_i to go from -1 to 1
> and the t_i to go from -1 to 1. This removes the dependence on the
> units used to measure the L's and t's. Having solved the problem
> one must transform the estimates for the Lmin, Lmax, b, and tmid
> back to the "real" ones. (Left as an exercise for the student.)
> Once you know the transformation you can set this up to be done
> automatically in ADMB by making the real parameters sdreport
> variables.
>
> So say you have done that to the model. You may still have trouble.
> this is because Lmin and Lmax are the lower and upper asymptote so
> that they are not actually observed.
>
> The next principal with parameterization is that one should use
> parameters
> which are well identified. A mental test of this is to ask yourself
> if you already have a good idea what the values of these
> parameters are.
> For the model above suppose that you have n ordered observations
>
> t_1 < ... < t_n
>
> Let Lone be the true value for L at t=t_1 and Ln be the true value
> for
> L at t=t_n. Clearly we know a lot more about these parameters than
> we do for Lmin and Lmax. But there is also another advantage.
> If we use Lone and Ln as the parameters, the predicted values for
> L_1 and L_n are independent of the estimate for b. In other words
> b now just spaces the predicted values between Lone and Ln.
> Using the original parameters Lmin and Lmax you will find that
> many different combinations the parameters produce almost the same
> predicted values for L_1 and L_n. We say that there are interactions
> or confounding between the parameters. We want to remove the
> confounding.
>
> Reducing the problem to the form in equation (1) minimizes the
> confounding.
>
> To see that you really understand this stuff you should figure out
> how to rewrite the equation (2) in terms of Lone,Ln,b, and tmid.
>
> Now as to see what the set_scalefactor() functions does if anything
> what you should do is to compile the source with debugging turned on.
> This enables you to really see what it does, rather than relying
> on what
> someone tells you. (Who knows what is really in the code any more?)
>
> Then make a simple example like this
>
> DATA_SECTION
> PARAMETER_SECTION
> init_bounded_number x(-10,10)
> !! x.set_scalefactor(200.);
> init_number y
> objective_function_value f
> PROCEDURE_SECTION
> f=0.5*(x*x+y*y);
>
> Bounded numbers are more complicated than unbounded ones so this
> is a good example to look at.
>
> If you step through the program you should get to line 172 in
> df1b2qnm.cpp
>
> {
> 172 dvariable vf=0.0;
> > 173 vf=initial_params::reset(dvar_vector(x));
> 174 *objective_function_value::pobjfun=0.0;
> 175 pre_userfunction();
> 176 if ( no_stuff ==0 &&
> quadratic_prior::get_num_quadratic_prior()>0)
> 177 {
>
> the reset function takes the x vector from the function minimizer
> and puts the values into the model rescaling if desired.
>
> stepping into that code youi should eventualy get to the line
> 538 in model.cpp
>
> 533 const int& ii, const dvariable& pen)
> 534 {
> 535 if (!scalefactor)
> 536 ::set_value(*this,x,ii,minb,maxb,pen);
> 537 else
> >538 ::set_value(*this,x,ii,minb,maxb,pen,scalefactor);
> 539 }
>
> Note that the field scalefactor is non zero. It was set by
> set_scalefactor().
>
> Now step into that line and you get to line 56 in the file set.cpp.
>
> 54 void set_value(const prevariable& _x,const dvar_vector&
> v,const int& _ii,
> 55 double fmin, double fmax,const dvariable& _fpen,double s)
> >56 {
> 57 int& ii=(int&)_ii;
> 58 prevariable& x=(prevariable&) _x;
> 59 dvariable& fpen=(dvariable&) _fpen;
> 60 x=boundp(v(ii++),fmin,fmax,fpen,s);
> 61 }
>
> Now finally step into line 60 and you end up at line 76 in
> boundfun.cpp.
>
> 75 dvariable boundp(const prevariable& x, double fmin, double
> fmax,const prevariable& _fpen,double s)
> 76 {
> 77 return boundp(x/s,fmin,fmax,_fpen);
> 78 }
>
> You see that at line 77 x is divided by s before being sent to the
> bounding
> function boundp. So x has come out of the function minimizer and
> gets divided by s before going into the model
>
> minimizer -----> x ----- x/s ---> y ---> boundp -----> model
>
> To get the initial x value for the minimizer there must be a
> corresponding
> sequence like
>
> model ----> boundpin ----> y ----> y*s ----> x -----> minimizer
>
> where boundpin is the inverse function of boundp.
>
> Note that one divides by s, so if yo want to make the
> gradient smaller by a factor of 100. one should use
>
> set_scalefactor(100.);
>
>
> Does that make sense?
>
> Dave
>
>
>
>
>> Hi Dave,
>>
>> You helped me out a few years ago when I was just starting to use
>> ADMB, so I thought I'd try again. I posted a related question to
>> the ADMB list, but I don't know if it went through. I'm trying to
>> understand when and how to scale parameters.
>>
>> From what I gather there are three reasons to do this: 1) when
>> the likelihood function is highly sensitive to a given parameter
>> (Nocedal and Wright (1999)) 2) when a parameter has a high
>> gradient (Hans), and 3) when the condition number of the Hessian
>> is very large (your post here:
>> http://r.789695.n4.nabble.com/Complicated-nls-formula-giving-singular-gradient-message-td3085852.html
>> <http://www.admb-project.org/examples/function-minimization/parameter-scaling>).
>> I understand these are all related issues, however for my
>> simulation exercise a single 'problem parameter' does not always
>> satisfy all of these (e.g., a parameter with a high gradient does
>> not have a relatively high/low eigenvalue).
>>
>> So my question is how to apply scaling factors in a structured
>> way and is it ok to scale many parameters? Also how do you do
>> this when you're fitting many simulated datasets (and starting
>> from many different starting points). Finally, I'd very much
>> appreciate a reference or code where I can find out what the
>> set_scalefactor function in ADMB does.
>>
>> Thank you Dave - any tips would be greatly appreciated!
>>
>> Shanae Allen
>>
>>
>>
>> Nocedal, J., & Wright, S. (1999). Numerical optimization. New
>> York: Springer.
>>
>>
>> --
>> Shanae Allen-Moran
>> National Marine Fisheries Service
>> 110 Shaffer Rd.
>> Santa Cruz, CA 95060
>> Phone: (831) 420-3970 <tel:%28831%29%20420-3970>
>> Email: shanae.allen at noaa.gov <mailto:shanae.allen at noaa.gov>
>> Website: http://swfsc.noaa.gov/SalmonAssessment/
>
>
>
>
> --
> Shanae Allen-Moran
> National Marine Fisheries Service
> 110 Shaffer Rd.
> Santa Cruz, CA 95060
> Phone: (831) 420-3970
> Email: shanae.allen at noaa.gov <mailto:shanae.allen at noaa.gov>
> Website: http://swfsc.noaa.gov/SalmonAssessment/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/users/attachments/20150313/f049d54a/attachment-0001.html>
More information about the Users
mailing list