# [ADMB Users] Problems estimating variance component(s)

Chris Gast cmgast at gmail.com
Sun May 2 16:34:36 PDT 2010

Thanks, Hans, for your thoughtful response.

I have tried a number of ways to increase the information about tau,
including those you describe.  Success has been mixed--but this does bring
up a related issue that has caused me great difficulty, and has backed me
into the corner in which I find myself.  When estimating random effects and
both of these in a second phase), I frequently encounter loglikelihood=NaN
in this second phase.  It seems to be occurring when values for the random
effects that are too large (in absolute value) are tested.  The definition
of "too large", of course, depends on the parameter estimates and the
resulting distribution based on the structure (typically,
normally-distributed) that I have assumed/simulated.  Frequently, my vector
of random effects contains values in the hundreds or even thousands, when I
have assumed (and simulated) a normal distribution centered around 0, and my
estimated variance is ~5 (I simulated it to be about 2.5).

Is anyone aware of a way to exert some control over this search, so I can
prevent NaN's from occurring, even if at the cost of a less-than-full search
of the random effects' distribution?

-----------------------------
Chris Gast
cmgast at gmail.com

On Sat, May 1, 2010 at 10:40 PM, H. Skaug <hskaug at gmail.com> wrote:

> Chris,
>
> What you are indirectly doing when using
>
>  totL += (.5)*(-(nyears-1)*log(tau)-.5*norm2(e/tau));
>
> is to place a penalty on tau, favoring large values of tau. (Somewhat
> like a Bayesian prior on tau.) The penalty you have added is
>
> totL += (.5)*(nyears-1)*log(tau);
>
>
> But, your MLE of tau is 0. Since this is happending repeatedly
> on simulated datasets where you used tau>0 to generate the
> data, the conclusion is that tau is weakly identified under your
> model (given parameters and amount of data). This is maybe not
> good news for you, but you should appreciate ADMB telling you this.
> To investigate my claim you can increase the information about
> tau in the simulations, by increasing value of tau and the amount of data.
>
>
> Hans
>
>
>
>
> On Sat, May 1, 2010 at 9:44 PM, Chris Gast <cmgast at gmail.com> wrote:
> > Hello,
> > I'm still relatively new to ADMB--that information might be helpful in
> > diagnosing the problem I seem to be encountering.
> > The setup is this:  I have a relatively simple product-multinomial model
> > that I can fit well (with simulated data) when all effects are considered
> > fixed.  The data are age-specific harvest numbers of a fictional animal.
>  I
> > am simulating stochasticity in both survival ( logistic transformation of
> > \beta+e_i, e_i~N(0,tau^2)), and in my vulnerability parameter.  I have a
> > single vulnerability coefficient c, such that the probability of harvest
> is
> > 1-exp(-(c+e_i)*f_i), for hunter-effort level f_i in year i.  I am
> currently
> > working with 12 simulated years of data and three age classes.  For the
> > discussion below, I am limiting myself to fitting the single random
> effect
> > for survival (rather than both random effects for survival and
> > vulnerability).  (I am also simulating 4 years of binomial telemetry data
> to
> > provide information on the probability of harvest.)
> > The problem is this:  ADMB seems to have some trouble estimating the
> single
> > variance component.  My resulting report either spits out my initial
> value
> > (I'm using true simulated values as initial estimates), or ADMB forces
> the
> > parameter estimates as close to zero as I will allow it (of course, it's
> > constrained to be positive).  I have tried many avenues of attack from
> the
> > manual and in other examples, and none of them seem to alleviate the
> > problem.  Having run out of options, I tried replacing the "prior"
> > distribution portion of the log-likelihood with
> > totL += (.5)*(-(nyears-1)*log(tau)-.5*norm2(e/tau));
> > instead of the usual
> > totL += (-(nyears-1)*log(tau)-.5*norm2(e/tau));
> > and the variance component estimates are actually OK--they are at least
> on
> > the correct order of magnitude, and not precisely equal to my initial
> > parameter estimate.  I don't really understand why this down-weighting is
> > successful (or if it is truly successful or I have just gotten lucky
> > somehow).  I would also really like to avoid this seemingly ad hoc
> > procedure, since I think ADMB is powerful enough to fit this model
> without
> > resorting to such things--it seems to fit more complex models relatively
> > easily.
> > For the record, I have tried the transformation procedure of using a
> > standard N(0,1) density for the random effect, and then scaling up to
> match
> > the true N(0,tau^2) distribution, and in this case, this "trick" seems to
> > hasten ADMBs result of loglikelihood=nan, from which it does not recover.
>  I
> > have also tried playing with boundary values on constrained parameters,
> and
> > met with no success.
> > I have pasted a stripped-down version of my .tpl file (the working
> version,
> > with the down-weighting intact), with accompanying .dat and .pin files,
> in
> > the hope that someone would be kind enough to examine them.  I am aware
> that
> > there are probably some inefficiencies here, but I'm not yet concerned
> > optimizing the program for run-time.  Of course, I'm only posting one
> > simulated dataset---this phenomenon occurs for hundreds of simulations in
> a
> > row.
> > Thanks very much,
> > Chris Gast
> >
> >
> > /********************* start tpl file ******************************/
> > DATA_SECTION
> >   init_int nyears;
> >   init_int nages;
> >   init_int m;
> >   init_vector effdat(0,nyears-1);
> >   init_int telnyears;
> >   init_vector numtagged(0,telnyears-1);
> >   init_vector numrecovered(0,telnyears-1);
> >   init_imatrix dat(0,nyears-1,0,nages-1);
> >   imatrix fcohort(0,nyears-1,0,nages-1);
> >   imatrix pcohort(0,nages-2,0,nages-1);
> >   imatrix fdat(0,nyears+nages-2,0,nages-1);
> >   int i;
> >   int j;
> > PARAMETER_SECTION
> >   init_vector logparms(0,m-1,1);
> >   init_bounded_number beta(-2,1.7,1);
> >   init_bounded_number logcmu(-6,1,1);
> >   init_bounded_number logtau(-10,1.5,2);
> >   random_effects_vector e(0,nyears-2,2);
> >   number tau;
> >   vector parms(0,m-1);
> >   number prob;
> >   number prevq;
> >   number prod;
> >   number datL;
> >   number telL;
> >   number auxL;
> >   number extraL;
> >   number s;
> >   number c;
> >   number sprod;
> >   objective_function_value totL;
> > PROCEDURE_SECTION
> >   const double pi=3.14159265358979323846264338327950288;
> >   double datsum;
> >   int colrank;
> >   datL=0.0; auxL=0.0; telL=0.0; tau=mfexp(logtau);
> >   parms=mfexp(logparms);
> >   c=mfexp(logcmu);
> >   cout << "tau=" << tau << ", c=" << c << "\n";
> >   for(i=0;i<(nyears+nages-1);i++){
> >     datsum=0;
> >     datL+=gammln(parms[i]+1);
> >     prevq=1;
> >     prod=0;
> >     colrank=0;
> >     sprod=1;
> >     for(j=0;j<nages;j++){
> >       if(fdat[i][j]!=-1){
> >         if(i<nages) prob=1-exp(-c*effdat[colrank]);
> >         else if(i>=nages) prob=1-exp(-c*effdat[i-nages+1+j]);
> >         if(colrank==0) s=1;
> >         else if(i<nages)
> >  s=exp(beta+e[colrank-1])/(1+exp(beta+e[colrank-1]));
> >         else if(i>=nages)
> > s=exp(beta+e[i-nages+1+j-1])/(1+exp(beta+e[i-nages+1+j-1]));
> >         sprod*=s;
> >         datsum+=fdat[i][j];
> >         datL-=gammln(fdat[i][j]+1);
> >         if(colrank>0) prod+=prevq*sprod*prob;
> >         else prod=prob;
> >         //cout << "prod=" << prod << ", prob=" << prob << ", s=" << s <<
> ",
> > sprod=" << sprod << ", prevq=" << prevq << "\n";
> >         datL+=fdat[i][j]*log(prevq*sprod*prob);
> >         colrank++;
> >         prevq*=(1-prob);
> >       }
> >     }
> >     datL+=(parms[i]-datsum)*log(1-prod) - gammln(parms[i]-datsum+1);
> >   }
> >   //now compute radio-telemetry likelihood
> >   for(i=0;i<telnyears;i++){
> >     telL+=gammln(numtagged[i]+1) - gammln(numrecovered[i]+1) -
> > gammln(numtagged[i]-numrecovered[i]+1) +
> > numrecovered[i]*log((1-exp(-c*effdat[nyears-1]))) +
> > (numtagged[i]-numrecovered[i])*log(exp(-c*effdat[nyears-1]));
> >   }
> >   for(i=0;i<(nyears-1);i++){
> >     cout << e[i] << "\t";
> >   }
> >   cout << "\n\ndatL=" << datL << ", telL=" << telL << ", extraL=" <<
> extraL
> > << ", totL=" << -(datL+telL) <<  "\n";
> >   cout << "cest=" << c << ", betaest=" << beta << ", tauest=" << tau <<
> > "\n\n\n\n\n\n\n\n\n";
> >   totL=(datL+telL);
> >   totL += (.5)*(-(nyears-1)*log(tau)-.5*norm2(e/tau));
> >   /*totL += -.5*norm2(u);
> >   e=tau*u;*/
> >   totL *= -1;
> > REPORT_SECTION
> >   report << "s\n" << exp(beta)/(1+exp(beta)) << "\n";
> >   report << "tau\n" << tau << "\n";
> >   report << "cmu\n" << c << "\n";
> >   report << "parms\n" << parms << "\n";
> > PRELIMINARY_CALCS_SECTION
> >   for(i=0;i<nyears;i++){
> >     effdat[i]/=1000;
> >   }
> >   for(i=0;i<(nages-1);i++){
> >     for(j=0;j<nages;j++){
> >       pcohort[i][j]=-1;
> >     }
> >   }
> >   for(i=0;i<nyears;i++){
> >     for(j=0;j<nages;j++){
> >       fcohort[i][j]=-1;
> >     }
> >   }
> >   //get the partial cohorts first
> >   for(i=0;i<(nages-1);i++){
> >     for(j=0;j<nages;j++){
> >       if(j>i){
> >         pcohort[nages-j-1+i][j] = dat[i][j];
> >       }
> >     }
> >   }
> >   //now get the "full" cohorts
> >   for(i=0;i<nyears;i++){
> >     for(j=0;j<nages;j++){
> >       if(i<=(nyears-nages)) fcohort[i][j]=dat[i+j][j];
> >         else if((j+i)<nyears) fcohort[i][j]=dat[i+j][j];
> >       }
> >   }
> >   //now put all the cohort data together in a single matrix
> >   for(i=0;i<(nyears+nages-1);i++){
> >     for(j=0;j<nages;j++){
> >       if(i<(nages-1))  fdat[i][j]=pcohort[i][j];
> >       else             fdat[i][j]=fcohort[i-(nages-1)][j];
> >     }
> >   }
> >
> > /********************* end tpl file ******************************/
> >
> > /********************* start dat file ******************************/
> > #nyears
> > 12
> > #nages
> > 3
> > #parameters
> > 14
> > #effort
> > 379 233 829 1012 807 698 859 137 779 574 460 684
> > #telnyears
> > 4
> > #numtagged
> > 100 100 100 100
> > #numrecovered
> > 13 19 17 11
> > #harvest data
> > 28 13 9
> > 15 8 4
> > 49 24 13
> > 53 27 13
> > 42 24 12
> > 36 16 9
> > 33 18 8
> > 5 3 1
> > 29 13 8
> > 18 12 5
> > 16 7 4
> > 23 12 5
> >
> > /********************* end dat file ******************************/
> > /********************* start pin file ******************************/
> > #logparms
> > 5.45959 5.86647 5.92426 5.85793 5.81114 5.69709 5.5835 5.60212 5.39816
> > 5.30827 5.34711 5.17615 5.32301 5.11799
> > #beta
> > 0.322773
> > #logtau
> > -1.20397
> > #logcmu
> > -1.60944
> > #e
> > 0 0 0 0 0 0 0 0 0 0 0
> >
> > /********************* end pin file ******************************/
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > -----------------------------
> > Chris Gast
> > cmgast at gmail.com
> >
> > _______________________________________________
> > Users mailing list