convergence status OK but calculated Hessian with negative eigenvalue

8 replies [Last post]
wuhao_osu's picture
Offline
Joined: 09/07/2010

Hello,

I just found in a run that the calculated Hessian has a negative eigenvalue (which also result in NAs for some standard errors) and some of the gradients seems large. However, the convergence code is 0 so no error or warning is displayed. Should a warning be displayed in this case?

I tried another starting value and this problem goes away, with objective function decreases about 0.28, indicating the first run does not reach the minimum.

The code and data are in the attachment.

Thanks.

- Hao

AttachmentSize
code.R1.84 KB
Data.dat6.6 KB
carey's picture
Offline
Joined: 10/19/2009
this is not surprising. the

this is not surprising. the fact that the optimization algorithm behind OpenMx, NPSOL, "converges" does not guarantee that the solution is at a minimum. i always examine the values of the gradient and the status of the hessian (is it positive definite). they are valuable clues about the solution.

the GUI that i am writing for OpenMx will flag these conditions, even though OpenMx does not (yet? think that it would be a good idea to implement these checks and issue a warning).

greg

wuhao_osu's picture
Offline
Joined: 09/07/2010
Thanks, Greg. I did not know

Thanks, Greg. I did not know that.

This is quite regrettable. Why doesn't NPSOL check for optimality conditions?

tbrick's picture
Offline
Joined: 07/31/2009
The cases that Greg is

The cases that Greg is talking about are indicators of local minimum conditions. While there are techniques for dealing with local minima, it's often impossible to distinguish a local minimum from a global.

That said, Greg has listed some excellent and easy-to-automate checks for some obvious cases of local minima, and I agree that OpenMx should warn the user in these conditions. Thanks, Greg, for reminding us about this issue.

I'm going to recommend at the upcoming developer meeting that we test gradient elements to make sure none are very close to zero, and test the hessian to ensure that it is positive definite (provided the user has not elected to turn off the hessian calculation). If either of these is violated, I think we should warn the user.

Specifically, I think these warnings should go in the same section as the optimization status, rather than just being shown as R warnings. As Michael Spiegel is quick to point out, people will generally ignore warnings, but may pay more attention to the convergence status. If they also return as numeric codes in a slot in the summary, users running large numbers of models in a batch mode will be able to access them afterwards for aggregation or retry.

Does that sound like a reasonable solution? Are there other cases (besides zero-gradient and non-positive-definite hessian) that we should be looking for?

wuhao_osu's picture
Offline
Joined: 09/07/2010
I think an indefinite Hessian

I think an indefinite Hessian suggests a saddle point instead of a local minimum, if the gradient is close to 0. Gradient elements are supposed to be close to 0, unless constraints are imposed.

I would be good if OpenMx can automate several consecutive runs when such issues arise. Of course, this may not get rid of the problems, but will reduce the chance of their occurrence, as many of such problems go away if the user continue the iterations. This is especially helpful for simulation studies.

It would also be good to use a scale invariant convergence criterion, so that changing the scale of the variables (and possibly the parametrization of the model) do not change the solution. I found multiplying all variables by a positive constant may change the solution.

Ryne's picture
Offline
Joined: 07/31/2009
While I understand how it

While I understand how it would be useful, OpenMx does not automate multiple runs. The reason for this is that we have made a point to do only what users tell us to do, and we have no desire to create criterion for alternative starting values. However, provided a user has a sufficient number of cores on their machine/grid/whatever, someone could write a function or series of functions that pulls alternate starting values from a specified distribution and runs multiple models in parallel. Such a method wouldn't increase run time provided the number of parallel models was sufficiently low relative to parallel computing resources.

I do agree with regards to the convergence criteria, but that's a bigger discussion. We've thus far not messed with some NPSOL options regarding convergence; some of these criteria are absolute values that we probably should be varying with the data. However, I don't know what the right values are or the best method to adjust them. For example, pretend that one criterion is that the gradient norm should be below a specific value, which I'll call k. If we rescale the objective function (either explicitly by rescaling it to a value of 1 at the starting values or implicitly by rescaling the data), we are effectively changing that criterion, such that lowering the absolute value of the objective function effectively raises the gradient norm criterion. While I don't know that the right value of that criterion should be, I'll at least point out that making it a function of the initial values "rewards" bad starting values with easier convergence, while penalizing great starting values with a difficult convergence criterion.

wuhao_osu's picture
Offline
Joined: 09/07/2010
A convergence criterion

A convergence criterion suggested by R Jennrich is the residual cosine, max{g/sqrt(diag(H)*F)}, which is scale invariant, but works only for minimizing a discrepancy function F>0. When model fits perfectly with F=0, a criterion is not needed b/c F=0 is the minimum possible value.

Rescaling the objective function at the starting value implicitly assumes the function has a meaningful origin "0", and the reasoning of its penalization of good starting values assumes the minimum objective function is non negative.

Well, of course, the convergence criterion is an issue with NPSOL and may not be easy to change....

carey's picture
Offline
Joined: 10/19/2009
my $.02, (1) one must

my $.02,

(1) one must distinguish issues of openmx from those of the optimizer NPSOL, although the two are certainly correlated.

(2) openmx should directly access (or provide mechanisms for a user to assess) convergence to a local minimum that exceed the convergence criteria of NPSOL. as i've pointed out to mike neale (? in this forum), NPSOL can "converge" with a completely unidentified model. the tests that i use for convergence are: (1) are all gradient elements close to 0 (ryne is correct in pointing out that "how close to 0" is a function of scaling)? (2) is the numerically estimated hessian positive definite?, and (3) are there any gradient elements that are exactly 0? if i have any questions about a solution, then i use a different set of starting values and see whether that solution converges on the previous solution. i have R routines for all of these and you are most welcome to use them (see below).
PS. Hao is correct in that a local minimum implies a positive definite hessian.

(3) tim's point on local minima is also well taken. i've been told that some models like factor invariance across several groups using an IRT model can have many local minima. people fitting these models effectively rerun the model many times using a grid to find the global minimum. i think that it is too much to ask openmx to implement strategies for a global minimum. leave that to the user.

(4) this is premature, but if want to see the way that i implement assessment of convergence, go to http://psych.colorado.edu/~carey/OpenMxGUI/index.php
and download my test GUI for openmx. you can just run the test model and examine the window that assesses convergence. but be forewarned--the R package tcltk (on which this GUI depends) has evaporated from CRAN and all mirror sites. if you do not have tcltk installed, you cannot use the windowing interface.

greg

carey's picture
Offline
Joined: 10/19/2009
not certain what you mean by

not certain what you mean by "zero-gradient", but i also check whether a gradient element is exactly 0. this can happen naturally because of numerical rounding, but it is a great diagnostic for an unidentified parameter.