Mon, 10/04/2010 - 11:17

Is it possible to obtain fit statistics such as CFI, RMSEA, etc. with FIML? The model I'm currently running does not output the desired fit indices (see below). After reading some of the threads, it appears that I would need to use FIML estimation to fit a saturated model, and then compare the predicted model to the saturated model. Is that correct? Could someone explain a little more how to do that? Sorry, I'm new to OpenMx. Thanks so much for your help, and I love the program so far!

Here's the output from my model:

observed statistics: 276

estimated parameters: 23

degrees of freedom: 253

-2 log likelihood: 668.405

saturated -2 log likelihood: NA

number of observations: 67

chi-square: NA

p: NA

AIC (Mx): 162.4051

BIC (Mx): -197.6911

adjusted BIC:

RMSEA: NA

Sorry for posting this in the wrong place. I couldn't figure out how to open a new topic. I am trying to adapt the sample script on FIML estimation for missing data. In the script below how do I change the 2 in log2pi %*% 2 in "firstHalfCalc" expression to represent the number of variables that has nonmissing data for that observation? In the example shown 2 worked as the right number for all observations for likelihood calculation since there was no missing data.

Sorry if this is a silly question. I am quite new to OpenMx.

#row objective specification

bivCorModel <- mxModel(model=bivCorModel,

mxMatrix("Full", 1, 1, values = log(2*pi), name = "log2pi"),

mxAlgebra(

expression=log2pi %*% 2 + log(det(filteredExpCov)),

name ="firstHalfCalc",

),

mxAlgebra(

expression=(filteredDataRow - filteredExpMean) %&% solve(filteredExpCov) ,

name = "secondHalfCalc",

),

mxAlgebra(

expression=(firstHalfCalc + secondHalfCalc),

name="rowAlgebra",

),

mxAlgebra(

expression=sum(rowResults),

name = "reduceAlgebra",

),

mxRowObjective(

rowAlgebra='rowAlgebra',

reduceAlgebra='reduceAlgebra',

dimnames=c('X','Y'),

)

)

It is supposed to be possible to obtain fit statistics with the summary() command. This involves running a saturated model (e.g., saturatedModel) so that it can be compared to a test model (e.g., testModel). Then use summary:

summary(testModel, SaturatedLikelihood=saturatedModel)

or

summary(testModel, SaturatedLikelihood=132) where 132 is the numeric value of the -2 log likelihood from the saturated model.

However, when I used this for a FIML model, the summary() command returned a chi-square difference test, but not adjusted BIC or RMSEA (AIC and BIC values were from the testModel). It may be that RMSEA is not computed for models with missing data. Does anyone know?

~ Angela

Hmm.

Adjusted BIC shouldn't depend on the saturated model. Looks like we just plain forgot to include that one.

I don't know if not having RMSEA computed under FIML was a conscious choice. RMSEA technically isn't defined with missing data, though that doesn't stop people from using it.

Adjusted BIC has not been implemented. I think we were looking for volunteers to implement more fit statistics. This would be a good project for someone to start their involvement with the OpenMx development. I think RMSEA is calculated as:

`sqrt((chi / DoF - 1) / retval[['numObs']])`

If the value inside of the

`sqrt()`

is NaN or negative, then RMSEA is assigned a value of NA. Could that be what's happening for your model?That looks right, with the caveat that if (chi/Dof-1) is less than zero, RMSEA "should" return zero rather than NA. You'll sometimes see the formula stated as one of the following, invoking some type of "maximum of" function to keep chi/df-1 (or equivalently, chi-df) non-negative.

sqrt(max(chi/df-1, 0)/n)

sqrt(max(chi-df, 0)/df/n)

Yes, you are correct regarding saturated model estimation. With missing data under FIML, we have to estimate the saturated model, where we could simply calculate it without missing data. This estimation can get time-consuming, so we don't do it by default.

The process for you:

-run your model as usual,

-create a saturated model, allowing all variances, covariances and means to be freely estimated

-compare the two

The mxCompare function allows for the use of likelihood ratio tests and comparisons on AIC and BIC. RMSEA, CFI, etc must be calculated manually.

Is there anything like a generic fit index helper function? All the fit indices I have seen rely on: chi^2/N/DF from the model, saturated, or null models. Is that oversimplified? Would there be interest in such a function? If there is interest and I can rely on relatively consistent slots on the model object holding the likelihoods, DF, and N, I could write something and post it.

I appreciate that all fit indices may not make sense under all circumstances, particularly when using FIML. If the goal is to discourage use by requiring manual calculation, perhaps a succinct statement indicating this with support could be written?

I am trying to push for change, but I would be interested in some discussion. I have had two clients now come into our universities consulting center for help using OpenMx, and they expect fit indices to be computed. I am a little torn between computing them for them, or trying to explain why they may not make sense, to which I have gotten the response that Mplus, Amos (insert other SEM package), compute it and taht reviewers expect it.

If such a generic way to compute fit indices were added, it seems like it may make sense to be an option in either summary or mxCompare. For example:

set.seed(4)

cor(dat <- prcomp(matrix(rnorm(1000), 200))$x %*%

chol(matrix(c(1, rep(c(rep(.5, 5), 1), 4)), 5, 5,

dimnames = list(NULL, mvars <- paste("X", 1:5, sep = '')))))

dat[1:20, 1:2] <- NA

require(OpenMx)

mSat <- mxRun(mxModel(model = "Saturated Model",

type = "RAM", manifestVars = mvars,

mxData(dat, "raw"), mxPath(mvars, arrows = 2), mxPath("one", mvars),

mxPath(combn(mvars, 2)[1, ], combn(mvars, 2)[2, ])))

mFactor <- mxRun(mxModel(model = "One Factor Model",

type = "RAM", manifestVars = mvars, latentVars = "g",

mxData(dat, "raw"), mxPath("g", mvars), mxPath("one", mvars),

mxPath(c("g", mvars), arrows = 2, free = c(FALSE, rep(TRUE, 5)), values = 1)))

## both of these functions access necessary components

## if something like a NullLikelihood were added for comparative fit indices

## it seems like they could readily be adapted

summary(mFactor, SaturatedLikelihood = mSat)

mxCompare(mSat, mFactor)

I am also a little curious as to why the p differs between summary and mxCompare? Is summary not using deltaDF? findMethods("summary", classes = "MxModel") lead me to try SaturatedDoF, but this did not seem like a fruitful approach.

Apologies if I should have started a new thread---it seemed to fit in well with this thread so I left it here so people searching for information would find it all in one place. Thoughts and comments welcomed, and thanks to the developers for a wonderful, flexible matrix optimizer!

Thanks for the interest, and a huge thank you for "push[ing] for change" in the program! Apologizes in advance for the wall o' text that's coming. I'd first like to point people towards two relevant recent threads. The first thread discusses a variety of fit functions that could in theory be implemented, a decent proportion of which are actually in the OpenMx version of summary: http://openmx.psyc.virginia.edu/thread/765 . Second, I recently fixed a bug in saturated and independence model degrees of freedom that can affect p-values. This is currently available in our source builds, and will be released as a bux fix release soon: http://openmx.psyc.virginia.edu/thread/1104 .

Yes, there is interest in having helper functions to assist with fit index calculation. I know that we've discussed in the past (we=developer's team) having functions to create saturated and independence models to be fit, which would allow users to supply these models as arguments to summary() and have the current slate of fit indices calculated (CFI, TLI, RMSEA, in addition to AIC & BIC variants that don't depend on these models). Descriptions of this functionality should be in the R help file for summary(). This also allows users to estimate the saturated model once, then use it in a whole host of model comparisons rather than have it be re-estimated every time a model is run. As we have users with very large models (taking hours or days to terminate), we do our best not to add extra things to the initial model estimation.

We also (ok, I) think that the fact that the common fit indices aren't appropriate in all circumstances is a great reason not to calculate them. The last thing we (I) want is for users to avail themselves of inappropriate fit indices and make bad modeling decisions because of a convenience feature. When people use definition variables or have clustered/twin data, there can be many definitions of saturated models, and OpenMx tries to make as few assumptions about your desired model as possible. There are certainly a number of fit indices (things like SMRS) that depend on residual correlation matrices that cannot be calculated using FIML (each matrix cell would have a different n), and RMSEA has shown a few problems that are documented in the literature. If users think they're important, they have the tools to add saturated and independence models to OpenMx and get the appropriate indices.

I recognize that this is a different approach than other programs, and that users want things to be easy to get ready for publication. I view it as an extension of the OpenMx philosophy (which every developer has a slightly different version of), that I state as "we only do what you tell us to." OpenMx doesn't do any more than it is instructed to, has as few defaults as possible, and doesn't do anything automatically that could potentially be wrong. This philosophy will undoubtedly not be embraced by all, and some people will still want to use other programs because they can just click and get an answer out, and that's fine. We can and should be better about telling OpenMx to compute fit indices, but that will likely take the form of functions to estimate the relevant models rather than an option like "computeRMSEA=TRUE".

I also encourage you, in your consulting and other publication dealings, to not simply calculate whatever inappropriate statistic people want and encourage them to use whatever method is correct for their particular model. People don't always like being told that they're wrong, or that their software package shouldn't give them a particular stat, or that they need to add a paragraph to their review letter about how RMSEA is completely useless in a particular circumstance. If people are coming to your for your expertise, then you should share it with them.

We should definitely keep talking about this. What other functions are people looking for that we aren't or are undersupporting?

Hi,

I have estimated some models using FIML and want to use the results for a paper. But reading back all threads on fit indices and FIML and particularly this reply, I'm left wondering if it makes sense to include any fit indice at all using FIML? If somebody has good references in the literature, I would be very thankful as well.

Regards Rob

For the most part, we need to compare the fit of two or models of interest. Some of the parsimony-based indices, such as AIC and BIC should do this as well for FIML as for, e.g., covariance matrix analysis, as of course does the likelihood-ratio test. I personally find these indices most useful. Other indices are often based around the idea of absolute fit, by comparing the fit of your model against a saturated model. It is possible to use these indices with FIML also, but it is necessary to fit the saturated model, and this is computationally effortful so one doesn't want to do it every time one fits a hypothesized model. Yet others are based around comparison with a very simple model - that there are variances and means, but no covariances. Such models are quite easy to fit with FIML; indeed they don't require estimation, so they are going to appear in later versions of OpenMx.

Finally, note that for some models - particularly mixture distributions, e.g., latent class models - some alternative model fit comparisons must be used. The computationally intensive bootstrap likelihood ratio test (where one simulates data based on the simpler model parameter estimates, then fits both the simple and the more complex model, and obtains an empirical distribution of difference in fit for such models) can be very useful in such situations. It is also the sort of thing that would be straightforward to implement as an R function, although some thought would be needed about the simulation step. Generic LCA models would be easy enough, but the class of models that OpenMx can specify goes well beyond the usual clearly defined sets (SEM, LCA etc), and simulating data for some class members would need to be hand-tailored.

Oh wow, I meant to say, "I am ****NOT**** trying to push for change", but I would be interested in some discussion." Talk about a bad typo (surprised I didn't get my head bit off for that one). I enjoyed your wall of text---thanks for taking the time to write it!

I really like the OpenMx philosophy of not computing saturated and independence models, both in terms of "I have to know what they are" and in terms of not having the computer do extra work I did not ask for. I think you are right that it is best to educate people and help them craft a paragraph why particular indices are not appropriate in their situation, but I get lazy sometimes.

I like the idea that *if* I have fit and stored a saturated and independent model, I could pass those objects to summary along with the model I am testing and then get more output by default (or with a simple argument). One of the threads you referenced has a function that looks like it does this already.

In terms of other functions, what about something to estimate indirect effects between two variables (with standard errors or confidence intervals)?

I completely agree. As a user, I think there should be an option to calculate fit various fit indices that reviewers expect. If I recall, the reason that the developers didn't want to calculate some fit statistics automatically was that it had to be compared to a saturated model, which can take a long time to fit. My suggestion would be to give the user the option to calculate the saturated model along with the fitted model and to calculate fit indices automatically in those instances. It could be a statement added to mxModel, where saturated = TRUE (set to FALSE by default). That way, the user isn't fitting the saturated model by default, but is fitting it in cases where he or she needs the fit indices.

I think this is an important function for any SEM program. I love OpenMx, but I think that it needs a way to calculate fit indices automatically in order to compete with the other alternatives.

There are, however, some pretty heavy caveats with the use of such indices with missing data structures. One is that RMSEA would not notice differences in effective sample size associated with different statistics. Thus a degenerate dataset such as:

X Y

1.2 NA

3.4 NA

NA 5.6

NA 7.8

NA 9.0

would apparently contain bivariate data, but in fact provide no information whatsoever about the covariance between X and Y. In other less extreme cases, it may provide some but less than for other statistics. Taking RMSEA=sqrt[(Chisq-df)/(df(N-1))] then the problem becomes obvious because N is no longer consistent across variables. Similar issues exist with all indices that use N, which is one reason why I tend to prefer the likelihood ratio test and AIC for model comparisons. Not that these are flawless, mind!

That's very helpful. I'm mostly looking to see how well my original model fits the data (not to do nested model comparisons to improve fit), so that's why I was looking to calculate RMSEA & CFI, but I see the problem with calculating RMSEA when using datasets with different amounts of missingness across variables. Given my aim, would you suggest doing a likelihood ratio test against a saturated model or should I stick with RMSEA?

Also.. In my searches, I've noticed conflicting equations for the calculation of RMSEA:

sqrt[(Chisq-df)/(df(N-1))] -- as you describe, where I assume Chisq represents the chi-square value of the proposed model?

sqrt[((chi/df)-1)/N], where chi is the difference between that model's -2LL and a fully saturated model's -2LL, and df is degrees of freedom (http://openmx.psyc.virginia.edu/thread/47-welcome-documentation-forum#co...)

sqrt[((Chisq/df)-1)/(N-1)] (http://davidakenny.net/cm/fit.htm)

These appear to be non-equivalent formulas, so I'm not sure which one to use, or if there is even a generally accepted one.

If I need to estimate the saturated model to calculate RMSEA or the likelihood ratio test, it doesn't appear that I am correctly specifying the saturated model. If my understanding is correct, a saturated model is one where all variances, covariances, and means are allowed to be freely estimated so that there are as many parameter estimates as there are degrees of freedom. I don't think my model specification is correct, however, as this is not what I get (I obtain fewer estimated parameters than df). Given my code below, can you see what else I would have to specify to obtain a saturated model?

### Saturated Model (allowing all variances, covariances, and means to be freely estimated)

# where 'modeldata' includes all variables in the model

# where 'manifests' includes the names of all variables in the model (in 'modeldata') -- it is a manifest-only model (no latent variables)

saturated_model <- mxModel(paste("Saturated Model"),

type="RAM",

manifestVars=manifests,

#Variances

mxPath(

from=manifests,

arrows=2,

free=TRUE,

values=1,

labels=paste("Var_",manifests,sep="")),

#Covariances

mxPath(

from=manifests,

to=manifests,

arrows=2,

free=TRUE),

#Means

mxPath(

from="one",

to=manifests,

arrows=1,

free=TRUE,

labels=paste("Mean_",manifests,sep="")),

mxData(observed=modeldata, type="raw")

)

Output:

observed statistics: 276

estimated parameters: 12

degrees of freedom: 264

-2 log likelihood: 777.1855

saturated -2 log likelihood: NA

number of observations: 67

chi-square: NA

p: NA

AIC (Mx): 249.1855

BIC (Mx): -166.4267

adjusted BIC:

RMSEA: NA

Thanks so much for your help guys!

Hi dadivr

From the ?mxPath documentation:

mxPath(from, to = NA, all = FALSE, arrows = 1,

free = TRUE, values = NA, labels = NA,

lbound = NA, ubound = NA)

from character vector. these are the sources of the new paths.

to character vector. these are the sinks of the new paths.

all boolean value. If TRUE, then connect all sources to all sinks.

So you need to put all=TRUE as an argument to your first mxPath() call. The second one is then redundant. However, I would not use the same value as a starting value throughout the variances and covariances. You might try building a matrix of starting values with something like:

mxPath(from=manifests, to=manifests, arrows=2,values=as.vector(diag(length(manifests))),free=TRUE,all=TRUE)

or use the sd() function (and square it) to get the variances about right (as opposed to using an identity matrix returned by the hopelessly overloaded diag() function.