Question about RMSEA

6 replies [Last post]
marklhc's picture
Offline
Joined: 10/02/2011

Hi,

I'm trying to run a random intercept, random slope model from the book Latent Growth Curve Modeling by Preacher, Wichman, MacCallum, & Briggs (2008, p.31). All the coefficients, degrees of freedom, chi-square, look the same as the book, but not for RMSEA. The reported RMSEA from the book is 0.07, which I also replicated in LISREL. However, the value in OpenMx is 0.1453412. Does anyone know how that is calculated?

I attached the R script. Thanks in advance.

Mark

degrees of freedom: 14
-2 log likelihood: 11832.83
saturated -2 log likelihood: 11756.92
number of observations: 851
chi-square: 75.90627
p: 1.613149e-10
Information Criteria:
df Penalty Parameters Penalty Sample-Size Adjusted
AIC 47.90627 87.90627 NA
BIC -18.54350 116.38474 97.3305
CFI: 0.9998968
TLI: 0.999742
RMSEA: 0.1453412

AttachmentSize
Model4.r1.77 KB
mspiegel's picture
Offline
Joined: 07/31/2009
This is outside my area, but

This is outside my area, but a common mistake is to leave out lower bounds of 0. The net effect is that free parameters that should be non-negative take on negative values. Could that explain the behavior you're observing?

marklhc's picture
Offline
Joined: 10/02/2011
Thx Michael. It is not

Thx Michael. It is not related, but that's a good suggestion to prevent other conditions.

mhunter's picture
Offline
Joined: 07/31/2009
According to the source code

According to the source code that I found, the RMSEA in OpenMx is calculated as

sqrt( ( (ChiSquare / deltaDF) - 1) / N )

which is exactly equal to

sqrt( (ChiSquare - deltaDF) / (deltaDF*N) )

where ChiSquare is the chi square value, deltaDF is the change in degrees of freedom from the saturated model to the fitted model (i.e. fitDF - satDF), and N is the number of observations.

Elsewhere, I've seen N replaced by N-1, but in your case (851 vs 850) the difference would be tiny.

I think the difference you're seeing is due to our use of deltaDF. Elsewhere, I've seen the above formulas for RMSEA stated in terms of raw degrees of freedom (fitDF), rather than the change in degrees of freedom (deltaDF). The two versions are exactly equal when the saturated model has zero degrees of freedom (deltaDF = fitDF - satDF = fitDF - 0 = fitDF).

When you plug the numbers you reported into the RMSEA formula assuming the saturated model has zero degrees of freedom, you get RMSEA = 0.07208394 using N, and RMSEA = 0.07212633 using N-1, replicating the results you found reported. If however you assume the saturated model has 10 degrees of freedom (arrived at through trial and error), you get RMSEA = 0.1453412 using N, replicating the results given by OpenMx. It seems that OpenMx thinks your saturated model has 10 degree of freedom. I admit I do not know why the saturated model seems to have 10 degrees of freedom in this case.

Does anyone else have a clue about this? Is it a bug or a conceptual difference?

Ryne's picture
Offline
Joined: 07/31/2009
It's both a conceptual

It's both a conceptual difference and a bug. I'll explain one and fix the other.

The conceptual difference comes from the way that OpenMx describes raw data, and how that impacts the rest of our calculations. Unlike other SEM software, which always pretends that any data it sees is a covariance matrix and means, OpenMx treats raw data as raw data. This means that your 5 x 5 covariance matrix and means vector have 20 degrees of freedom, but the raw data it comes from has 5 x n degrees of freedom. If you had modeled a raw dataset, your summary would list 4255 observed statistics, 6 estimated parameters and 4249 degrees of freedom in your model, as opposed to the 20/6/14 split you currently have. This can be very cool (for instance, you could include enough definition variables and non-linear effects to surpass 20 parameters, which would be estimable from raw data but not from a covariance matrix), but also means that all of our models would look like they fit really badly if we just assumed that the saturated model had 0 df.

So we have to calculate saturated and independence model df in a weird way. We take the data df (either nvar x n for raw data, or n * (n-1)/2 + 2*n) for covariance matrices with means) and subtract the number of estimated parameters for each model, which is n * (n-1)/2 covariance parameters, n variances and n means (2 * n total) for the saturated model and n variances and n means for the independence model. This seems like a lot of extra work for covariance matrices, and it is, but it allows us to have one df calculation for all models rather than different formulas for raw and moment matrix data, and means that the summary function doesn't have to know your data input method to calculate fit statistics.

The bug seems to be coming from the way those formulas are applied. There's a little "catch" for ordinal data, such that the means and variances are estimated only for continuous variables, while the thresholds are estimated for ordinal variables. That catch is apparently failing, and not adding in degrees of freedom for your means and variances. With five variables, this yields the 10 df difference that affects your RMSEA calculation. It isn't affecting CFI/TLI because the bug is affecting both the saturated and independence df equally, and 10-10 is zero. This will be fixed in the next binary release, and should be available for source download as soon as I figure out exactly what caused the bug.

Best of luck with your modeling!

Ryne's picture
Offline
Joined: 07/31/2009
Bug is fixed. The following

Bug is fixed. The following is a note for developers, those who care and so I don't forget.

computeOptimizationStatistics function workflow (where bug was: start at line 225 of summary per r1797)
-get estimated parameters
-get data from runstate$datalist. define useMeans (TRUE if raw or cov w/means, FALSE if cov w/o means) and nvar (number of variables)
-get objective from runstate$objective. define thresholdLevels, which is the number of thresholds for each variable (NA=continuous, numeric=ordinal). sum(is.na(thresholdLevels) = number of continuous variables, sum(thresholdLevels) is total number of thresholds
**bug description.
previous version defined new variable thresholdLevels from contents of runstate$objective[[1]]$thresholdLevels if that slot existed, defined thresholdLevels as rep(NA, nvar) otherwise. if the slot is there but empty, thresholdLevels was set to return length zero vector (assume zero variables). now, empty but existent thresholdLevels slot returns rep(NA, nvar), fixing bug
**
-calculate appropriate degrees of freedom
-set NULL DF to NA DF
-return

marklhc's picture
Offline
Joined: 10/02/2011
Thx Ryne. Now I understand

Thx Ryne. Now I understand the problem.