Tue, 08/26/2014 - 17:15

I have a data set with six binary variables, which I am trying to determine the temporal relationship. I was using lavaan R package, where they suggested to use dummy variable for endogenous variables (independent) and use ordered for exogenous (dependent variables). I was using the model as described in pdf file. I have gotten following results:

lavaan (0.5-16) converged normally after 31 iterations

Number of observations 51

Estimator DWLS Robust

Minimum Function Test Statistic 5.699 7.295

Degrees of freedom 7 7

P-value (Chi-square) 0.575 0.399

Scaling correction factor 1.006

Shift parameter 1.632

for simple second-order correction (Mplus variant)

Model test baseline model:

Minimum Function Test Statistic 61.153 45.447

Degrees of freedom 14 14

P-value 0.000 0.000

User model versus baseline model:

Comparative Fit Index (CFI) 1.000 0.991

Tucker-Lewis Index (TLI) 1.055 0.981

Root Mean Square Error of Approximation:

RMSEA 0.000 0.029

90 Percent Confidence Interval 0.000 0.153 0.000 0.178

P-value RMSEA <= 0.05 0.654 0.486

According to lavaan since it was using DWLS, AIC and BIC are calculated. However, we had someone analyze the data before for us was using AIC and BIC to compare which temporal relationship is better at explaining our data. So I decided to try OpenMx, I am very new to path analysis and OpenMx, so I am not quite sure if I am treating the variables correctly. I have attached the code and dataset and the model was run and gave me some estimates, however, RMSEA was not computed. And the estimates for each path was different compared to lavaan. At this point, I am not sure how to compare my results from these two methods and if it's even comparable since OpenMx I turned everything into ordinal but in lavaan I used dummy variables. And why is that OpenMx didn't calcualte RMSEA for categorical data.

rl

Attachment | Size |
---|---|

model.R | 2.79 KB |

Models.pdf | 19 KB |

path input alt1.csv | 707 bytes |

Hi Rufei

We need a saturated model here. One approach is to allow all covariances to be free parameters, while constraining all variances to 1.0. I note also that there were some means free as well as thresholds, so I repaired that by fixing all means to zero. The variance fixing has to be done because the data are binary. So the saturated model looks like this:

We have to add some matrix algebra and nonlinear constraints to force the variances of the dependent (endogenous) variables to 1.0. This is pretty tricky stuff, and I believe that OpenMx should make this much easier than it is. I use a matrix fmat to pluck out the endogenous variables, numbers 2,3,5 and 6.

The model you want to fit looks like this, I think:

The output looks like this to me (and I would like the developers to look at it because I am not sure that the chi-squared df are correct:

Finally, note that OpenMx is using a normal theory threshold model here, effectively working with tetrachoric correlations. These may differ from the statistics being used in other software. Some robust methods use Pearson correlations, which may underestimate the latent correlation relative to those of the threshold model.

Thank you so much for your help.

I'm not totally sure what your question is, but here are a few answers.

First, you do not have a lot of observations (rows of data). Around 50 observations is rather small for SEM, especially with the number of parameters you are estimating (21).

Second, you may get different estimates with lavaan using diagonally weighted least squares (DWLS) vs OpenMx using full information maximum likelihood (FIML). They are using very different methods with different assumptions. I would guess they would be ballpark similar, but nowhere near identical.

Third, after the model you made runs, it comes back with NA in many standard errors. This generally indicates a problem either in starting values or in model specification. My bet here is model specification. I'd have to look more closely at the model, but I suspect it might not be identified.

Fourth, OpenMx is not reporting the RMSEA because it wants to save estimation time. For raw data, RMSEA, TLI, and CFI require some comparison models to be fit and these may take time equal to the model you're interested in, so we don't fit them by default. If you're using the Beta (and please do), then we have a helper function to make and run these comparison models.

Hopefully this helps!

Thank you. Your answer really helps. I will have to look closer at my model and I agree with about the sample size.

So the variances have to be fixed at 1 for binary data. Possibly we could be smarter about it (detecting binary variables)? I would use the fix the first two thresholds at 0 & 1 for 3+ category data.

The Cholesky used for the saturated model helps a lot with stability, but is problematic to constrain the diagonal of ltCov%*%t(ltCov) to equal 1. Non-linear constraints can be used:

The independence model has the same problem, but it's simpler (just give it a Stand matrix of the right dimensions with free=F). For now, since it's diagonal,

Even then, the model isn't very stable unless we make the lbound for the diagonal elements of ltCov say .01 - if they hit closer to zero, non-positive definiteness can cause havoc.