[ADMB Users] ADMB versus R

Wed Nov 7 18:36:58 PST 2012

Hi,

I am continuing on sequence of emails "Integrating R and ADMB to
generalize TPL files".

Dave is worried about the following aspect of glmmADMB/R: 1) lack of flexibility
2) lack of transparency.

Regarding 1) it is quite obvious that you cannot get the same flexibility when
you specify the model in one single line of code (glmmADMB) as get with ADMB.
Our hope when we started out with glmmADMB (before Ben got involved) was
that people who felt that glmmADMB was too rigid would move to ADMB.
THis has not happened (at least that I am aware). Migration to ADMB is
still being pointed
out as an option to to users on email lists.

Regarding 2) users of glmmADMB must know what a factor in R is, and
also they should
know about "contrast". Having said that,  I had not realized myself
what the implication
of the default "independence" of random effects were. That is why I suggest that
by default the full covariance matrix is estimated for the random
effects. Regardless
of what the default becomes in the future: Ben is doing a fantastic
job explaining to
users how the package works, and knowledge should accumulate in the
user community.
I see some other people answering emails on the list.

To Dave's argument that it is easy mis-specify the model in glmmADMB, and hence
get the wrong answer. A sound principle is that every user should generate
data (using R or some other program) from what they think is the
underlying model.
If the parameter estimates from glmmADMB does not match the truth,
something is wrong. A user not able to generate data most likely has
not understood
the model, and should downgrade to a simpler model that he/she understands.

Finally, the tpl-file underlying glmmADMB is probably the one that has been used
by most people (through its .exe). We do not know the exact number of users, but
a minimum estimate is provided by the number of unique individuals that
has occurred on the email list. Given that glmmadmb.tpl has less than 600 lines
of code that number in itself is quite impressive. The clue to the
success is the
R functions that Ben has written on top of the tpl-file. Before these
functions where
written more than 50% of the questions were of the type "how can I get .....".

Hans

On Thu, Nov 8, 2012 at 1:03 AM, dave fournier <davef at otter-rsch.com> wrote:
> On 12-11-07 11:32 AM, Ben Bolker wrote:
>
> Well I don't recall getting personal. I was referring to
> what I feel are problems with the design of R.
>
> Anyway it apears that you don't seem to have the right model here.
>
> This should be pretty simple.  One has 192 random effects.  all I did
> was to use 3 different parameters for the standard deviations of the random
> effects.  sd1 for obs 1-64  sd2 for obs 65-128 and sd3 for obs 129-192.
> You seem to have something completely different.  the par file is
>
> # Number of parameters = 8  Objective function value = 285.193  Maximum
> gradient component = 6.97992e-05
> # pz:
> 0.000100000000000
> # beta:
>  -1.68901008026 -0.817412602915 -4.06986974939 6.84143453412
> # tmpL:
>  -0.114257455234 -0.256577632335 -0.00360094954491
> # tmpL1:
>  0.000100000000000 0.000100000000000 0.000100000000000
> # log_alpha:
> 1.36928124319
>
> so the fit is much worse.
>
>
>
>
>
> On 12-11-07 01:17 PM, dave fournier wrote:
>
>> But that is not right.  The model is to assume that the
>> random effects in different time periods are independent and that their
>> variances are different.  You can do this by figuring out what the
>> correct covariance is for the current design matrix or by
>> writing the model properly which is trivial to do in ADMB.  As usual it
>> is the R interface and these silly design matrices which introduces the
>> problem.
>> _______________________________________________
>> Users mailing list
>> Users at admb-project.org
>> http://lists.admb-project.org/mailman/listinfo/users
>
>   Just to continue this a little bit:
>
> If I do this:
>
> l2 <- read.csv("bigelow_dat.txt")
>
> ## unnecessary picture
> library(ggplot2)
> ggplot(l2,aes(factor(time),snag))+
>     geom_dotplot(binaxis="y",stackdir="center")+
>     facet_wrap(~Ecozone+time,scale="free_x")
>
> with(l2,table(table(plot))) ## check exp. design: 3 samples per plot
> with(l2,table(table(plot,time))) ## check exp. design: 1 sample per
> plot/time
> l2$ftime <- factor(l2$time) ## factor-ize time
>
> library(glmmADMB)
> ## note zero-inflation is FALSE/off throughout (default)
> g1 <- glmmadmb(formula = snag ~ ftime + Ecozone + (1 | plot), data = l2,
> family = "nbinom1")
> VarCorr(g1)
> log(sqrt(VarCorr(g1)[[1]]))  ## log std. dev. =0.4315505  ## matches DF
> logLik(g1)
> ##  'log Lik.' -255.758 (df=6)  ## matches DF
> coef(g1)
> ## (Intercept)      ftime2      ftime3    EcozoneW
> ##   -0.7635323  -0.5884655  -0.6037133   1.2493021
>
> ## ?? The fixed effects don't match DF results: did he orthogonalize the
> design matrix ... ?
> ## never mind
>
> ## besides being the wrong model, this fails on my system (32-bit Linux)
> try(g2 <- glmmadmb(formula = snag ~ ftime + Ecozone + (ftime | plot),
> data = l2, family = "nbinom1"))
>
> ## this works and is (I think) the right model ...
> g3 <- glmmadmb(formula = snag ~ ftime + Ecozone + (ftime-1 | plot), data
> = l2, family = "nbinom1",
>                save.dir="glmmadmb_tmp")
> ## the Z matrix looks like I think it should (an indicator variable for
> time period)
> VarCorr(g3)
>
> ## Random effects:
> ## Structure: Diagonal matrix
> ## Group=plot
> ##       Variance StdDev
> ## time1   0.7957 0.8920
> ## time2   0.5986 0.7737
> ## time3   0.9928 0.9964
>
> log(sqrt(diag(VarCorr(g3)[[1]])))
> ##      ftime1      ftime2      ftime3
> ## -0.11426024 -0.25658084 -0.00360295
>
> ## This doesn't match DF: he gets the log-sd of the variances as
>
> ## tmpL:
> ##  0.352458239856 0.489012745656 0.496221176099
>
> ## i.e. increasing from period 1 to periods 2 and 3
>
> For the overdispersion model (which I haven't run yet -- glmmADMB is not
> flexible enough) -- DF gets
>
> # tmpL:
>  0.436593520224
> # tmpL1:
>  0.000100000000000
> # log_alpha:
>  0.826338727816 0.0677913813376 0.657156858701
>
> i.e. the log-sd among plots (tmpL) is about the same as before, and
> log_alpha is the log of the NB scale parameter.  This suggests *lower*
> variance (at least when scaled by the mean) in the second period,
> *higher* variance in the first period -- which seems (??) to match the
> (ftime-1|plot) glmmADMB results rather than the DF results (??)
>
>   I'd rather be arguing over what the data say than over whose tools are
> better
> or worse, or who's an idiot.  I'm happy to concede that I'm an idiot, so we
> can get that over right at the beginning ...
>
>   cheers
>
>
>
> _______________________________________________
> Users mailing list
> Users at admb-project.org
> http://lists.admb-project.org/mailman/listinfo/users
>