[ADMB Users] comparing highly parameterized models
Fowler, Mark
Mark.Fowler at dfo-mpo.gc.ca
Wed May 1 07:41:15 PDT 2013
I want to compare tweaks to stock assessment models using AIC, but am challenged by deviation parameters. I found a (draft?) PhD thesis by Ian Stewart (reviewers indicated as Ray Hilborn, Andre´ Punt and Richard Methot) that tackles the issue on pages 208-210, excerpted below. But I’m not clear on how the data (fdata) and parameter (fparameter) scalars are derived. The data scalar, and I expect the parameter scalar, should range between 0 and 1. Anybody know more about these?
Both AICc and BIC require specification of the number of data points, an area
of some disagreement when diverse data sets are included with multiple error
assumptions in the same model. Specifically, the sample sizes of multinomial likelihoods
applied to compositional data are often tuned during model fitting to be more consistent
with observed residual error (although no tuning of multinomial sample sizes was
employed in the 2005 English sole assessment). Therefore, the number of categories has
been used as a proxy for the dimension of these data in some studies (Helu et al., 2000).
Assuming only that the likelihoods and error structures are correctly specified, the
effective dimension of these data should lie somewhere between the number of
multinomially distributed observations and the sum of the input sample sizes for all such
distributions used in the model. Therefore, for the purposes of calculating measures of
model fit, a range in the dimension of the data is explored via a multiplicative scalar
(bounded by zero and one) describing the additional dimension added through the
number of observations greater than one per multinomial component. All other data
sources, including survey indices, discard and mean weight observations are enumerated
directly for a total count of data points equal to:
D =Σ(Nindex obs,discard obs,...) + M +fdata i(ΣNmult -M) ,
where D is the dimension of the data, N… is the number of individual data points, or input
sample size, M is the number of multinomially distributed length- or age-frequency
distributions, and fdata is the multiplicative data scalar. This generalized approach
allows full exploration of the role of sample size in model comparison.
209
Both metrics also require specifying the number of parameters estimated in
the model, however, the dimension of model parameters differs from many statistical
applications because many parameters (e.g., recruitment deviations) are explicitly
constrained. This means that the effective number of parameters is a function of the
relative constraint. Although Bayesian approaches for calculating the effective number of
parameters have been developed (e.g., Spiegelhalter et al., 2002), there are currently no
maximum likelihood based tools available. However, the effective number of parameters
must lie within a definable range, so an approach similar the data scalar is employed here.
As the limit as the variance (σ2) of the distribution constraining deviation parameters
goes to infinity, the number of constrained parameters is equal to the number of
deviations estimated, while as σ 2 goes to 0, there are effectively zero estimated
parameters. Another multiplicative scalar is used to describe this range:
Peff =Σ(Nnon-dev ) +f parameter i(ΣNdev ) ,
where Peff is the effective number of parameters, N is the raw number of parameters,
either deviations or non-deviation parameters, and f parameter is the multiplicative
parameter scalar. This approach allows direct exploration of the relative model weights
conditioned on both the data and parameter dimensions (via the scalars); allowing a
determination of the conditions under which alternate model weighting might occur.
To draw inference from more than one model, it is necessary to obtain the relative
model weights. First, the metrics (AICc or BIC) for all candidate models are standardized
to the relative difference between the value for each model and the minimum value for
any model considered (Burnham and Anderson, 2002):
210
△i= AICci - AICcmin
The likelihood or probability of the data (D) given the model (Mi), or relative strength of
evidence for each model, is then proportional (after renormalizing) to wi:
p(D Mi )* wi = exp(-0.5△i )
Model weights were calculated for each model under values of the data and parameter
scalars ranging from 0 to 1.0. Resulting sensitivity to the assumed dimension of the data
and parameters are then jointly explored.
Mark Fowler
Population Ecology Division
Bedford Inst of Oceanography
Dept Fisheries & Oceans
Dartmouth NS Canada
B2Y 4A2
Tel. (902) 426-3529
Fax (902) 426-9710
Email Mark.Fowler at dfo-mpo.gc.ca
Home Tel. (902) 461-0708
Home Email mark.fowler at ns.sympatico.ca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.admb-project.org/pipermail/users/attachments/20130501/b2249e47/attachment.html>
More information about the Users
mailing list