Tue, 06/14/2011 - 14:05

Hi all,

Based on a discussion in another thread (http://openmx.psyc.virginia.edu/thread/505), I'm soliciting feature request details for a helper function that builds saturated and independence models.

Saturated and independence models are commonly used comparison models for many SEM fit indices. OpenMx will calculate the likelihoods for these models only in cases that they can be added with trivial impact on estimation time and without making assumptions regarding the user's model. As of OpenMx 1.1, we estimate these models only when covariance matrices are used as input in the ML and RAM objectives, because the final fitted values for each of these models can be determined directly from the data without invoking the estimation procedure. For other types of models, specifically raw data methods, these models must be manually specified.

I'm proposing a function that takes a dataset (and customization options) as input and returns a model as output that can be run and used as a saturated model for model comparison. It may look something like so:

imxSaturatedHelper(data, useVars=??, labelOption=??, ..., name=??) <\code> The resulting model would contain free parameters for all means, variances and covariances between all included variables. Questions for the userbase: -what options would you like to specify? -how should the function handle definition variables? -should Independence and Saturated models be separate functions with similar options, or one function with a 'type' option? -what am I forgetting?

Hi all,

Is there an update on the saturated/independence models helper function?

Sorry, not yet. Some other features have taken priority, and things like analytic hessians and other optimization improvements for the next major release are likely to affect the way that we would create this helper function. Relatedly, independence models may be so fast that we add them to the backend at a future release, further changing how we'd do this function. Thanks for checking in!

I believe that the independence model can be evaluated without optimization. The sample variances (with N rather than N-1 as denominator) for each variable, and the sample means should be the ML estimates also, regardless of the missing data mechanism. This could in principle be handled in the front end. Same goes for ordinal variables - thresholds are estimated from the marginals of each variable, and the covariance matrix is just an identity matrix.

I think some 'lets get in the right ballpark' approach for FIML, using perhaps unweighted least squares as a robust initial fit function would be a good idea. The advantage of ULS is that it is really hard to break it - no worries about non-invertible matrices etc. Ordinal is a bit harder, but ULS on the Pearson correlations may be sufficient.

Definition variables are more difficult, especially if they are continuously distributed. In principle, all means, variances and covariances could be arbitrarily complex functions of one or more definition variables, so the space of "saturated models" is pretty much infinite. I propose therefore to consider only "First Order Saturated (FOS)", in which every mean, variance and covariance is modeled as a + b*Def, where b is a (row) vector of parameters to be estimated, and Def is a (column) vector of the definition variables. I don't expect estimating this model would be very easy, and that higher order regressions on, e.g., vech(Def %&% t(Def)) - all squares and cross products of the definition variables - would be even harder. So how about just the FOS for now?

Would be nice if users could easily get TLI/RMSEA etc. out of FIML RAM models. I wonder if there's some part-finished code we can work on to get this helper done?

I'm guessing that in most cases a very fast strategy would be something like

Would be nice if summary() could gain a satModel=NA parameter, and if that is not NA, then get what it needs from there to fill in the fit indices and chi^2 of the summarised model.

This is something which I am sure many people need.

I assume it would handle multiple group models?

Why is it an imx function not omx?

If it is going to fit two sorts of model then it needs a more neutral name as I would not expect imxSaturatedHelper to help me with an independence model. I would not expect imxGenerateDog to give me a Cat.

The first thing I would do when it available would be to write a wrapper which called it, called mxRun, extracted the relevant information, ... Would you consider providing a skeleton of that in the documentation?

I think perhaps omxSaturatedModel() is the correct name (as opposed to imx and helper?)

I kind of favour not proliferating functions, so would support type="saturated|independence" as a parameter. If having saturated in the name is a problem, might call it "omxBaseModel", but I think saturated will enhance discoverability and memorability.

Would be nice if it was smart enough to run a cov model to set start values for the ML fit

Not sure how to handle definition variables...

The helper might look like

genEpi_Run_Saturated <- function(data, useVars, labelOption, name){

fit = omxSaturatedHelper(data, useVars, labelOption, name)

fit = mxRun(fit)

return(summary(fit)$Minus2LogLikelihood)

}

Is there any updates on this?.

The development trunk now contains a function which computes the saturated solution (but not independence).

There's a growing library of helpers ("user mx or "umx" functions) here which is not officially supported but which contains a function which does both now. This will likely become redundant with future versions.

https://github.com/tbates/umx/blob/master/umx.lib.R

look for the function