Hessian

10 replies [Last post]
HanOud's picture
Offline
Joined: 12/20/2009

What is called the calculated Hessian in OpenMx is that the same as e.g. in LISREL the information matrix (expected Hessian) on which the standard errors are based?

Ryne's picture
Offline
Joined: 07/31/2009
Yes. The calculated Hessian

Yes. The calculated Hessian is the matrix of second derivatives of the likelihood function with respect to the parameters that is inverted and multiplied by 2 to create the parameter covariance matrix.

There is also a slot for the estimated Hessian in OpenMx models. The calculated Hessian is created numerically, though we're working on expanding analytic gradients and hessians for some models. This Hessian is created specifically for standard errors and model evaluation. The estimated Hessian is the approximate Hessian used in the optimization routine, and is iteratively updated throughout the quasi-Newtonian procedure. This Hessian is constrained to be positive definite at all times. As this constraint could bias standard errors, we use the calculated Hessian instead.

HanOud's picture
Offline
Joined: 12/20/2009
Hessian

OK, Ryne. But do I understand well that this calculated Hessian or information matrix (negative of the expectation of the second derivative) is different from the true or observed Hessian as it is derived for SEM, for example, by Neudecker & Satorra (Statistics and Probability Letters, 11, 1991,pp. 57-61)?

Ryne's picture
Offline
Joined: 07/31/2009
I'm looking for the Neudecker

I'm looking for the Neudecker & Satorra paper (my university only has access to 'Statistics and Probability Letters' from '95 to the present), but I'll temporarily assume that this paper refers to and presents the analytic gradient and Hessian of the LISREL model.

The hessian we currently use is not derived, as the technique we're using is built for more general applications than the LISREL SEM model. What we're referring to as the calculated Hessian is in fact a numerical estimation created by sampling the parameter space around the converged parameter values. While something of a misnomer, the word 'calculated' is meant to reflect a difference between this method and the estimation used by the optimizer, which as sufficient constraints to affect its utility as an information matrix.

Alpha testing for our soon-to-be-implemented analytic gradients and Hessians for RAM and LISREL models shows that this numerical method is equivalent to analytic procedures, and thus the calculated hessian we provide should be equivalent to the well-known derivatives of the LISREL model. Once implemented, these analytic methods will likely provide some degree of speed-up, if only by replacing the expensive numerical procedure after optimization has finished. As OpenMx is built to be as flexible as possible, we try to provide methods that do not depend on a particular type of model specification. The numerically-estimated "calculated" Hessian will be equally valid whether you use a defined method like LISREL or RAM or our objectives for arbitrary user-defined covariance algebras that don't have closed-form derivatives.

I'll check back in when I find that paper.

ryne

HanOud's picture
Offline
Joined: 12/20/2009
Hessian

Thanks, Ryne, for helping me out of confusion and all detailed information you provide. Because you refer to the well-known derivatives of the LISREL model, I guess it is better to call it information matrix (see the LISREL manual: probability limit of the negative of the logarithm of a likelihood function) and to reserve the word "Hessian" to the observed Hessian as in Neudecker and Satorra? As you will see, the latter one is also calculated but different from the expected Hessian. Curious what you think about it.
Han

Ryne's picture
Offline
Joined: 07/31/2009
I think this is fundamentally

I think this is fundamentally a question of terminology. For those unfamiliar, quasi-Newtonian optimization methods estimate a vector of first derivatives (called a gradient) of the likelihood function with respect to the free parameters. The Hessian matrix of second derivatives of the likelihood function are estimated iteratively, usually starting optimization with an identity matrix standing in for the Hessian and changing that estimated Hessian at every iteration using information gleaned from the gradient.

The Neudecker and Satorra paper refers to an expected Hessian resulting from a quasi-Newtonian optimization. This expected Hessian matches the definition above, and is used to determine step size in the optimization. This Hessian is used in OpenMx, and is returned to the user in the output slot and called "estimatedHessian".

OpenMx also returns what we're calling the "calculatedHessian", which is not a terribly good name. While Neudecker and Satorra's math is correct, it is specific to the LISREL model. OpenMx can estimate lots of arbitrary models that don't necessarily fit into the LISREL or RAM specification, and thus we cannot depend on a pre-existing derivation for all models. For that reason, we've implemented a calculated Hessian that generates the second derivative of the likelihood function using the same methods that are used for the first derivative: we sample the parameter space in every direction away from the final converged value and fit a curve.

This method is equivalent to the derived Hessian referenced above, with the added benefit of being general beyond the LISREL model and the cost of being slower. This method does not have the problems referenced in the Neudecker and Satorra paper, as this Hessian is not constrained to positive definiteness nor is it simply a by-product of a quasi-Newtonian estimation. Despite the fact that our "calculatedHessian" method requires iterations rather than calculus, it is not what Neudecker and Satorra call an estimated Hessian.

We're currently working to implement what Neudecker and Satorra call observed gradients and Hessians, though we're internally referring to them as analytic due to their derived rather than iterative nature. However, there is no reason to have both an analytic and calculated Hessian in the same model, so users will likely have to choose between them at runtime and have whichever one is chosen occupy the 'calculationHessian' location.

Hope this wall-of-text helps, and I hope that someone with memory of implementing our 'calculatedHessian' object can remind me how far we go in each direction when numerically calculating the observed Hessian.

ryne

prast's picture
Offline
Joined: 08/18/2010
Extract SE from the information matrix

I stumbled upon this thread because I tried to estimate the standard errors from, in this case, the calculated Hessian.
Ryne, you said that the calculated Hessian is equivalent to the information matrix - at least from the standpoint of the calculation of the standard errors. So if this is the case, then I should be able to take the square root of the diagonal elements of the inverted information matrix to get to the standard errors? Is this a naive assumption? I get very different estimates if I do e.g.

calcHess <- test@output$calculatedHessian
sqrt(diag(calcHess))

Compared to the standard errors OpneMx summary(test) writes out.

What did I miss?

Philippe

Ryne's picture
Offline
Joined: 07/31/2009
It looks like you missed the

It looks like you missed the inversion. Try:

calcInf <- solve(test@output$calculatedHessian)
sqrt(diag(calcInf))

prast's picture
Offline
Joined: 08/18/2010
uh sorry, that was actually

uh sorry, that was actually what I was doing.
In fact I did:

calcHess <- test@output$calculatedHessian
sqrt(diag(solve(calcHess)))

but I get different results.

From summary(test):
free parameters:
name matrix row col Estimate Std.Error lbound ubound
1 resY1 S Y1 Y1 22.362416 3.569192 0
2 resY2 S Y2 Y2 19.873494 2.635527 0
3 resY3 S Y3 Y3 18.516428 2.354879 0
4 resY4 S Y4 Y4 16.680493 2.289132 0
5 resY5 S Y5 Y5 22.864692 3.461194 0
....

> sqrt(diag(solve(calcHess)))
resY1 resY2 resY3 resY4 resY5 resXYcov1 resX1 resXYcov2 resX2 resXYcov3 resX3
2.523800 1.863599 1.665151 1.618661 2.447434 2.011534 3.185422 1.449823 2.191473 1.296295 2.040519 ....

Ryne's picture
Offline
Joined: 07/31/2009
I forgot the 2. It's

I forgot the 2. It's sqrt(2*diag(solve(calcHess))). If you multiply the first five elements of your returned sqrt(...), you get:

3.569192 2.635527 2.354879 2.289132 3.461194

which looks right. Sorry for the omission.

prast's picture
Offline
Joined: 08/18/2010
Thanks! sqrt(2*diag(solve(cal

Thanks!

sqrt(2*diag(solve(calcHess))) does the trick, the results are identical