potential optimization problem

14 replies [Last post]
carey's picture
Offline
Joined: 10/19/2009

There MAY be a problem with the optimization using some likelihood functions in OpenMx. Specifically, the gradient and the standard errors from the output slots in an MxModel are congruent with minimizing -log(L). The calculatedHessian, however, is congruent with minimizing -2log(L). The attached R code provides an illustration.

The potential problem involves which function (-log(L) or -2log(L)) NPSOL minimizes. If it is -log(L), then everything is fine with respect to optimization, but OpenMx is incorrectly calculating the Hessian. If it is -2log(L), however, things MAY get gnarly. The estimated gradient elements will be twice as big as they should be: i.e., dX/d(-2log(L)) = 2dX/d(-log(L)). One of the convergence criteria is the norm of the gradient. If g is the gradient then the norm equals t(g) %*% g. If the function minimized is -2log(L), then the norm of the gradient will be FOUR times its appropriate value. This could lead to convergence problems, particularly with ill conditioned problems.

Best,
Greg

tbrick's picture
Offline
Joined: 07/31/2009
I apologize for missing this

I apologize for missing this when it came around the first time.

With the standard objective functions (FIML, ML, and RAM), we minimize the -2log(L). The hessian reflects this calculation, and standard error calculation takes this into account. As a result, the standard errors line up with what you expect, and the hessian is congruent with a calculation of -2log(L).

I think what you're arguing here is that if convergence is based on the gradient norm being smaller than some absolute epsilon, using -2log(L) will make the convergence check more conservative than using -log(L). More precisely, that to maintain the same convergence requirements for -log(L) and -2log(L), that the convergence criterion has be adjusted accordingly.

The real question, then, is whether or not we're choosing an appropriate optimality tolerance given that we're calculating the -2log(L) instead of the -log(L). This is one of the optimizer options we don't presently expose to the user, but we could if you think people would want to see and manipulate it.

So I guess I have two questions. First, is there a problem with the existing optimality tolerance? Second, would it be helpful to expose the parameters of the tolerance calculation to the user so that it can be adjusted if needed? In my experience, this is one of the parameters that's rarely adjusted, but that doesn't mean it wouldn't be useful.

carey's picture
Offline
Joined: 10/19/2009
thanks for the response,

thanks for the response, tim

you are correct in surmising that minimizing -2logL is more conservative than minimizing -logL, but i am not certain that adjusting the optimality tolerance is the best solution. equations 8.1 and 8.2a in the NPSOL documentation show that the optimality tolerance influences both the convergence criteria (the difference in parameter values between one iteration and the next in 8.1) and the "optimality condition" in 8.2 that is a function of the norm of the gradient. one might simultaneously vary the function precision parameter, given that -2logL can take on large values when sample sizes are large.

with most problems in the behavioral sciences, there will be no substantive difference. some genetic problems, however, can have very large sample sizes and be poorly conditioned. here, i agree with mike that all options should be made available to the user.

a more radical suggestion is to permit the user to specify the function to minimize. many of us have found that minimizing -logL/-logL0 where -logL0 is minus the log likelihood of the initial set of parameter estimates can help solve some gnarly problems.

neale's picture
Offline
Joined: 07/31/2009
I think we have this covered;

I think we have this covered; mxAlgebraObjective() permits the user to specify the function to minimize.

carey's picture
Offline
Joined: 10/19/2009
many thanks, mike & ryne (1)

many thanks, mike & ryne
(1) always wondered what mxAlgebraObjective() was all about. suggest that you make that more specific in the documentation.
(2) just ran a problem with 90 parms and rescaling the function so that it is close to 1 reduced the time by one third.
greg

neale's picture
Offline
Joined: 07/31/2009
Interesting. For the record,

Interesting. For the record, this is what Classic Mx always does - rescales the function value at the starting values to 1. That we don't do this with OpenMx may explain some differences in optimization performance. This could affect both time to get to the solution and its estimated location.

For even greater flexibility in specification of functions, mxRowObjective() is also available, but it needs a few helper functions to make it convenient to use for cases with NA's in the data frame.

carey's picture
Offline
Joined: 10/19/2009
any documentation on

any documentation on mxRowObjective? closest i find is mxRObjective = another feature the purpose of which eludes me.

tbrick's picture
Offline
Joined: 07/31/2009
The docs for mxRowObjective

The docs for mxRowObjective are available in the latest source revision through ?mxRowObjective in the R window. I'll double-check on why there are no docs in 1.0.3, and if it's just an oversight we'll add them for 1.0.4.

As to why the functions are there:
All three of the self-specified objective functions (mxAlgebraObjective, mxRObjective, and mxRowObjective) are there to cover all the things that we haven't implemented directly in OpenMx. For example, the likelihood weighting you just did. Alternately, if you wanted to use a different objective function entirely (say, weighted least squares), you could implement that as an algebra, and do things the same way.

We also wanted to make a rapid prototyping sequence available for folks who wanted to quickly implement and test new objective functions to expand OpenMx or to develop new techniques for the field.

mxAlgebraObjective and mxRowObjective are for new objective functions that can be expressed algebraically using the OpenMx operators, either for a moment matrix calculation (like ML) or a row-by-row calculation (like FIML). Again, as of 1.0.3, mxRowObjective is still awaiting the completion of a few more functions to handle missing data before it can be used for full information methods. mxRObjective is there for new functions that can't be easily expressed by the OpenMx operators.

The hope is that methods developers will be able to use these to quickly build and test their own functions. The functions can then be implemented in C in the kernel to make then faster. We already have a few folks working on some methods now.

mspiegel's picture
Offline
Joined: 07/31/2009
The documentation is not

The documentation is not available for mxRowObjective in the OpenMx 1.0 series because this feature has been implemented but is not supported in the current release. There is only a single test case in the test suite for row objective functions. As we finish the missing data filters and add more test cases, the feature will be supported in the next major release of OpenMx.

mspiegel's picture
Offline
Joined: 07/31/2009
Documentation for

Documentation for mxRowObjective is available in OpenMx pre-release r1527.

carey's picture
Offline
Joined: 10/19/2009
muchas gracias, tim. that

muchas gracias, tim.
that was very informative.
if you are checking on documentation (or can send this to someone who does that), i've noticed that the Code demos link and the demos() link on http://openmx.psyc.virginia.edu/docs/OpenMx/latest/_static/Rdoc/00Index.... are broken.
also, the documentation of an MxModel object (http://openmx.psyc.virginia.edu/docs/OpenMx/latest/_static/Rdoc/MxModel-...) does not include the "intervals" slot.

Ryne's picture
Offline
Joined: 07/31/2009
You're right, but this could

You're right, but this could be another application of weighted likelihoods. Weight the likelihood function by .5 and you're optimizing -logL; weight the likelihood by .5 * (-2LL(initialparam)) and you have Greg's recommendation. Standard errors will be off, but we can either weight them by weight^-.5 or just kill standard errors in the case of weighted likelihood functions.

Also, Tim and I discovered yesterday that the algebra objective treats the user specified objective function as a -2LL to calculate standard errors. We'll discuss this in the meeting, but we might want to find a way to disable that while maintaining as much backwards compatability as possible.

neale's picture
Offline
Joined: 07/31/2009
Overall, it's a bit difficult

Overall, it's a bit difficult to predict which optimization parameters users might, at some point in time, wish to fiddle with. Therefore, I recommend exposing them all.

carey's picture
Offline
Joined: 10/19/2009
opps! sorry about that.

opps! sorry about that.

AttachmentSize
checkDerivatives.R 2.97 KB
Ryne's picture
Offline
Joined: 07/31/2009
I apologize for the delay in

I apologize for the delay in responding to this. However, there's no code attached. Could you put it up again?