<div dir="ltr"><div><div>Thank you Dave for the detailed response. I'm still digesting the first part, but I'm making progress. As for the set_scalefactor() function, I thought that's what it was doing but this page: <a href="http://www.admb-project.org/examples/function-minimization/parameter-scaling">http://www.admb-project.org/examples/function-minimization/parameter-scaling</a> suggests otherwise ( '...which makes the function minimizer work internally with b_internal = 0.001*b' and .001 is the scale factor).<br></div>Thanks for taking the time to respond so thoroughly!<br></div>Shanae<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 13, 2015 at 11:31 AM, dave fournier <span dir="ltr"><<a href="mailto:davef@otter-rsch.com" target="_blank">davef@otter-rsch.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div>On 03/12/2015 06:17 PM, Shanae Allen -
NOAA Affiliate wrote:<br>
<br>
Hi,<br>
<br>
I posted this response to the list in case others are interested
in<br>
this stuff. You are correct that scaling alone is not the final
solution<br>
to reparameterizing functions in such a way that they are easy to
minimize.<br>
<br>
I think the gold standard for this is you were omniscient is the
Morse<br>
lemma. It says that for a "nice" function f of n variables (
parameters )<br>
say f(x_1,x_2,...,x_n) which has a minimum, with the value of the
<br>
function at the minimum being b, then there exists a
reparameterization<br>
of the funtion given by the n functions g_i with<br>
<br>
x_i = g_i(y_1,...,y_n), i=1,...,n<br>
<br>
such that<br>
<br>
f(g_1(y_1,...,y_n),...,g_n(y_1,...,y_n)) = b + y_1^2 + ... +
y_n^2 (1)<br>
<br>
So the functions g_i provide a "perfect" reparameterization of f.<br>
<br>
Of course to find the g_i you already need to know where the
minimum<br>
is located so by itself the Morse lemma is not very useful for
this<br>
problem. It does however give one an idea of what we are trying
to<br>
accomplish by reparameterizing. Rescaling is really the last step
in that if<br>
<br>
f(x_1,...,x_n) = b + a_1*x_1^2 + ... + a_n*x_n^2<br>
<br>
then setting y_i^2 = a_i*x_i^2 or y_i=sqrt(a)*x_i<br>
<br>
provides the required reparameterization.<br>
<br>
In general reparameterization is a bit of an art. There are
however<br>
some general principles. First is to "undemensionalize" the
problme.<br>
I'll give you a few examples. <br>
<br>
L_i = Lmin + (Lmax-Lmin)/(1+exp(-b*(t_i-tmid)) (2)<br>
<br>
Then one should rescale and translate the L_i to go from -1 to 1<br>
and the t_i to go from -1 to 1. This removes the dependence on
the<br>
units used to measure the L's and t's. Having solved the problem<br>
one must transform the estimates for the Lmin, Lmax, b, and tmid<br>
back to the "real" ones. (Left as an exercise for the student.)<br>
Once you know the transformation you can set this up to be done<br>
automatically in ADMB by making the real parameters sdreport
variables.<br>
<br>
So say you have done that to the model. You may still have
trouble.<br>
this is because Lmin and Lmax are the lower and upper asymptote
so<br>
that they are not actually observed.<br>
<br>
The next principal with parameterization is that one should use
parameters<br>
which are well identified. A mental test of this is to ask
yourself<br>
if you already have a good idea what the values of these
parameters are.<br>
For the model above suppose that you have n ordered observations<br>
<br>
t_1 < ... < t_n<br>
<br>
Let Lone be the true value for L at t=t_1 and Ln be the true value
for <br>
L at t=t_n. Clearly we know a lot more about these parameters
than<br>
we do for Lmin and Lmax. But there is also another advantage. <br>
If we use Lone and Ln as the parameters, the predicted values for<br>
L_1 and L_n are independent of the estimate for b. In other
words<br>
b now just spaces the predicted values between Lone and Ln. <br>
Using the original parameters Lmin and Lmax you will find that<br>
many different combinations the parameters produce almost the same<br>
predicted values for L_1 and L_n. We say that there are
interactions<br>
or confounding between the parameters. We want to remove the
confounding.<br>
<br>
Reducing the problem to the form in equation (1) minimizes the
confounding.<br>
<br>
To see that you really understand this stuff you should figure out<br>
how to rewrite the equation (2) in terms of Lone,Ln,b, and tmid.<br>
<br>
Now as to see what the set_scalefactor() functions does if
anything<br>
what you should do is to compile the source with debugging turned
on.<br>
This enables you to really see what it does, rather than relying
on what <br>
someone tells you. (Who knows what is really in the code any
more?)<br>
<br>
Then make a simple example like this<br>
<br>
DATA_SECTION<br>
PARAMETER_SECTION<br>
init_bounded_number x(-10,10)<br>
!! x.set_scalefactor(200.);<br>
init_number y<br>
objective_function_value f<br>
PROCEDURE_SECTION<br>
f=0.5*(x*x+y*y);<br>
<br>
Bounded numbers are more complicated than unbounded ones so this<br>
is a good example to look at.<br>
<br>
If you step through the program you should get to line 172 in
df1b2qnm.cpp<br>
<br>
{<br>
172 dvariable vf=0.0;<br>
> 173 vf=initial_params::reset(dvar_vector(x));<br>
174 *objective_function_value::pobjfun=0.0;<br>
175 pre_userfunction();<br>
176 if ( no_stuff ==0 &&
quadratic_prior::get_num_quadratic_prior()>0)<br>
177 {<br>
<br>
the reset function takes the x vector from the function minimizer
<br>
and puts the values into the model rescaling if desired.<br>
<br>
stepping into that code youi should eventualy get to the line<br>
538 in model.cpp<br>
<br>
533 const int& ii, const dvariable& pen)<br>
534 {<br>
535 if (!scalefactor)<br>
536 ::set_value(*this,x,ii,minb,maxb,pen);<br>
537 else<br>
>538 ::set_value(*this,x,ii,minb,maxb,pen,scalefactor);<br>
539 }<br>
<br>
Note that the field scalefactor is non zero. It was set by
set_scalefactor().<br>
<br>
Now step into that line and you get to line 56 in the file
set.cpp.<br>
<br>
54 void set_value(const prevariable& _x,const
dvar_vector& v,const int& _ii, <br>
55 double fmin, double fmax,const dvariable& _fpen,double
s)<br>
>56 {<br>
57 int& ii=(int&)_ii; <br>
58 prevariable& x=(prevariable&) _x;<br>
59 dvariable& fpen=(dvariable&) _fpen;<br>
60 x=boundp(v(ii++),fmin,fmax,fpen,s);<br>
61 }<br>
<br>
Now finally step into line 60 and you end up at line 76 in
boundfun.cpp.<br>
<br>
75 dvariable boundp(const prevariable& x, double fmin,
double fmax,const prevariable& _fpen,double s)<br>
76 {<br>
77 return boundp(x/s,fmin,fmax,_fpen);<br>
78 }<br>
<br>
You see that at line 77 x is divided by s before being sent to the
bounding<br>
function boundp. So x has come out of the function minimizer and<br>
gets divided by s before going into the model<br>
<br>
minimizer -----> x ----- x/s ---> y ---> boundp
-----> model<br>
<br>
To get the initial x value for the minimizer there must be a
corresponding<br>
sequence like<br>
<br>
model ----> boundpin ----> y ----> y*s ----> x
-----> minimizer<br>
<br>
where boundpin is the inverse function of boundp.<br>
<br>
Note that one divides by s, so if yo want to make the<br>
gradient smaller by a factor of 100. one should use<br>
<br>
set_scalefactor(100.);<br>
<br>
<br>
Does that make sense?<span class="HOEnZb"><font color="#888888"><br>
<br>
Dave<br>
<br>
<br>
<br>
<br>
</font></span></div><div><div class="h5">
<blockquote type="cite">
<div dir="ltr">
<div>Hi Dave, <br>
<br>
You helped me out a few years ago when I was just starting to
use ADMB, so I thought I'd try again. I posted a related
question to the ADMB list, but I don't know if it went
through. I'm trying to understand when and how to scale
parameters. <br>
<br>
From what I gather there are three reasons to do this: 1) when
the likelihood function is highly sensitive to a given
parameter (Nocedal and Wright (1999)) 2) when a parameter has
a high gradient (Hans), and 3) when the condition number of
the Hessian is very large (your post here: <a href="http://www.admb-project.org/examples/function-minimization/parameter-scaling" target="_blank">http://r.789695.n4.nabble.com/Complicated-nls-formula-giving-singular-gradient-message-td3085852.html</a>).
I understand these are all related issues, however for my
simulation exercise a single 'problem parameter' does not
always satisfy all of these (e.g., a parameter with a high
gradient does not have a relatively high/low eigenvalue).<br>
<br>
</div>
<div>So my question is how to apply scaling factors in a
structured way and is it ok to scale many parameters? Also how
do you do this when you're fitting many simulated datasets
(and starting from many different starting points). Finally,
I'd very much appreciate a reference or code where I can find
out what the set_scalefactor function in ADMB does.<br>
</div>
<div><br>
Thank you Dave - any tips would be greatly appreciated!<br>
<br>
Shanae Allen<br>
<br>
<br>
<br>
Nocedal, J., & Wright, S. (1999). Numerical optimization.
New York: Springer.
<div>
<div><img src="https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif"></div>
</div>
<br clear="all">
<div><br>
-- <br>
<div>
<div dir="ltr">Shanae Allen-Moran<br>
National Marine Fisheries Service<br>
110 Shaffer Rd.<br>
Santa Cruz, CA 95060<br>
Phone: <a href="tel:%28831%29%20420-3970" value="+18314203970" target="_blank">(831) 420-3970</a><br>
Email: <a href="mailto:shanae.allen@noaa.gov" target="_blank">shanae.allen@noaa.gov</a>
<br>
Website: <a href="http://swfsc.noaa.gov/SalmonAssessment/" target="_blank">http://swfsc.noaa.gov/SalmonAssessment/</a></div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</div></div></div>
</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><div dir="ltr">Shanae Allen-Moran<br>National Marine Fisheries Service<br>110 Shaffer Rd.<br>Santa Cruz, CA 95060<br>Phone: (831) 420-3970<br>Email: <a href="mailto:shanae.allen@noaa.gov" target="_blank">shanae.allen@noaa.gov</a> <br>Website: <a href="http://swfsc.noaa.gov/SalmonAssessment/" target="_blank">http://swfsc.noaa.gov/SalmonAssessment/</a></div></div>
</div>