[ADMB Users] ADMB-RE fitting failure in 11 but not in 10.1

Thu Dec 6 13:38:35 PST 2012

On 12-12-06 11:47 AM, Tim Miller wrote:
> Dear Users,
> 
> During the fitting of a random effects model in version 11 (Windows 7,
> MinGW GCC-4.5.2), I get a failure with error message:
> 
> Error in matrix inverse -- matrix singular in solve(dmatrix)
> 
> This occurs before a .par file is written. In 10.1 convergence is fine.
> The error doesn't occur in 11 when I use the sparse hessian option
> -shess and in that case I get the same result as 10.1 (with or without
> -shess).
> 
> I think I have traced it to the solve function in ludcmp_solve.cpp just
> by searching for the error text in all the source code files and I think
> the error occurs when checking the determinant.  In version 11 the
> function appears to check whether exp(sum(log(fabs(determinant
> components)))) == 0.0 whereas in 10.1 it checks product(determinant
> components) == 0.0.
> 
> Assuming I have traced the source of the error correctly, why is the
> newer determinant checking method better than that in 10.1? If it is
> better, maybe there is a way to continue on rather than exiting on the
> error?
> 
> I could provide the tpl/dat files, but fitting the model without -shess
> is annoying because it takes a while to get to the error.
> 
> Tim
> 

  Hmm.  I'm trying to track these changes in the SVN repository.

I did

   find . -name "*.cpp" -exec grep -H singular {} \;

to find every instance of "singular" in the code ...

found ludcmp_solve.cpp in contrib/ludcmp/linad99 (took me a little while
since originally I was looking just in src/)

  As Mollie pointed out,

================
r303 | mbrooks | 2012-03-15 21:31:26 -0400 (Thu, 15 Mar 2012) | 1 line

sum log det instead of product of det to check invertibility
===============

  but a less stringent (possibly better??) check is (as implemented
elsewhere in the code) to check individual elements of the products to
make sure none of them are zero.  (The product can underflow even if
none of the individual elements is numerically equal to zero.)

  This is fundamentally a difficult problem (I think; I would be happy
to be corrected if someone has an easy answer!).  The rules

a. check whether product is zero, or close to zero
b. check whether exp(sum(log(...))) is zero, or close to zero
c. check whether individual elements to be multiplied are zero, or close
to zero

  are _mathematically_ equivalent, but they aren't computationally
equivalent. It may be possible to prove that one of them is always more
sensitive (i.e. more likely to detect singularity), but I wouldn't be
surprised if they differ depending on the scenario.  Furthermore (and
this is the important part), **it's not clear what the 'right' answer
is** in terms of sensitivity.  There is a very sensible argument that
says that the software should detect when the user is trying to invert a
matrix that is very ill-conditioned/close to singularity, and at least
warn and possibly give an error -- the argument being that the answers
may be extremely unreliable in these situations, and that it isn't
ethical to allow the user to proceed (thinking they got an answer when
in fact it's rubbish) in this case.

PS: checking (a) vs (b) in R shows that the answers differ for different
sets of random numbers -- sometimes (a) is bigger, sometimes (b) is bigger.

> tmpf <- function() { r <- 10^runif(100,min=-3,max=-1);
prod(r)-exp(sum(log(r))) }
> replicate(20,tmpf())
 [1]  7.556839e-210 -2.392981e-220 -5.453231e-218  2.266101e-208
-1.133139e-216
 [6] -1.728037e-217 -3.965986e-218 -1.867355e-221 -3.146840e-221
9.885676e-210
[11] -9.282674e-214 -1.682003e-218  1.822128e-205 -1.469539e-218
5.105471e-214
[16]  1.827526e-214 -7.008184e-207  4.061170e-215 -1.594751e-202
4.641337e-213

  Which one is bigger is pretty much random:

> table(sign(replicate(1000,tmpf())))

 -1   0   1
507   6 487

  My only argument would be that it might be a good idea if the 23
places that singularity is checked in the ADMB code (find . -name
"*.cpp" -exec grep -H "Error in matrix inverse.*singular" {} \; | wc )
behaved *consistently*, to the extent that was possible.  It might be
nice, *if* it seemed to make sense, to allow some degree of user control
(either provide parallel sloppy versions of the functions, or a global
argument -danger_danger_ignore_singularity_checks ...) to allow the
computation to go forward under these circumstances.

  However, since I'm not volunteering to do the work myself, this is
just a suggestion ...

  cheers
    Ben Bolker