Tue, 12/08/2009 - 11:03 — neale

### Jump to:

Project: | OpenMx |

Component: | Code |

Category: | feature request |

Priority: | normal |

Assigned: | Unassigned |

Status: | active |

Description

Per some recent dialog with Greg Carey, I suggest that we alter the OpenMx Hessian estimation by default. Rather than use the estimates from NPSOL to compute the standard errors, we should use the numDeriv package.

The attached R script, when run, shows that NPSOL's Hessian* is incorrect when the model is underidentified (more free parameters than statistics). As a result a user could be misled into thinking that a model is identified when it is not. A better picture emerges when numDeriv is used.

Exactly how to implement/integrate numDeriv() is not clear to me. The example sets up the fit function directly in R code; we would want to use the built-in code for this purpose, so as to take advantage of computational efficiency. Basically, one supplies the function to evaluate, and a vector of parameter estimates at which the Hessian is required.

I'm not sure that it will work with constraints, that should be investigated.

*The Hessian is the matrix of second partial derivatives of the fit function with respect to the parameters, and is the inverse of the covariance matrix of the parameters.

Attachment | Size |
---|---|

omxTest2.R | 2.34 KB |

- Login or register to post comments
- Printer-friendly version
- Send to friend

## Comments

## #1

Mike,

(a) Is there any way, to flag an underidentified model as such? Perhaps using some heruristic? Will not work in all cases, but may catch most culprits!?

(b) I thought NPSOL Hessian will not be PD if the model is underidentified. Why is that not the case?

paras

## #2

Related to Paras' second point, why would we care about standard errors for an underidentified model? By definition, an underidentified model doesn't have a unique solution, so I'm not sure I understand the utility of the confidence region of any one possible set of parameter values.

## #3

the "hessian" from NPSOL is actually not a true hessian. it is an approximation to a hessian matrix that is deliberately constructed to be positive definite (technically, a Broydon-Fletcher-Goldfarb-Shanno, ? sp, or BFGS update). hence, in unidentified models, save for numerical error, it will be positive definite.

there are two criteria for testing whether a solution is at a local minimum: (1) all gradient elements are close to 0; (2) the hessian--either analytic or numerically estimated--is positive definite. one cannot trust the hessian from NPSOL to examine condition (2). i suspect that it would suffice when a model is truly identified, but it is definitely inappropriate in at least some cases of unidentified models.

to assess convergence, i use a routine that examines OpenMx output for (a) elements in which the gradient is exactly 0 (useful to isolate a parameter that does not contribute to the function value); (2) the maximum absolute value of the gradient elements; (3) the number of gradient elements with an absolute value greater than .001; and (4), hopefully if it is possible to get a numerical estimate of the hessian, a check on whether it is positive definite.

greg

## #4

True, we don't really care about errors for an underidentified model. They would best be returned as NA's. In its present form, however, OpenMx will return incorrect errors in such an instance. For example, adding a second factor to the homepage model:

require(OpenMx)

data(demoOneFactor)

manifests <- names(demoOneFactor)

latents <- c("G","G2")

factorModel <- mxModel("One Factor", type="RAM",

manifestVars = manifests,

latentVars = latents,

mxPath(from=latents, to=manifests, all=TRUE),

mxPath(from=manifests, arrows=2),

mxPath(from=latents, arrows=2,

free=F, values=1.0),

mxData(cov(demoOneFactor), type="cov",

numObs=500))

summary(mxRun(factorModel))

gives us:

name matrix row col Estimate Std.Error A x1 G 0.38992864 0.226214872 A x2 G 0.49334864 0.278522409 A x3 G 0.56885880 0.338047426 A x4 G 0.68604328 0.381755391 A x5 G 1.01690047 2.978002493 A x1 G2 -0.07950142 1.112480039 A x2 G2 -0.09795222 1.408279928 A x3 G2 -0.11894525 1.622112067 A x4 G2 -0.13411732 1.958912336 A x5 G2 1.01822349 2.548480441 S x1 x1 0.04018047 0.002068288 S x2 x2 0.03870883 0.002239603 S x3 x3 0.03628890 0.002446600 S x4 x4 0.04463908 0.003227481 S x5 x5 -1.40056041 0.869262743

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Clearly the second factor is struggling, and different starting values might yield a different solution. But it would be better if underidentification were automatically detected and flagged in the Summary, rather than giving the user a false sense of security.