fit indices and df for a simple path model

6 replies [Last post]
dkaplan's picture
Offline
Joined: 08/04/2009

Greetings all,

Now finally getting Mx to work on my Mac, I'm having a good time putting it through its paces. So, I have a simple model I'm playing with

probsolv -> grades -> irtsci

Below is the syntax and output. It's reading the number of obs correctly and the estimates seem correct. However, I am not getting a value for the LR chi-square test nor the RMSEA and the df seem fairly large considering it should only be 1 df. Thanks in advance. David

--------
>
> sciach <- read.table("~/Desktop/Mx stuff/sciach.txt",header=TRUE)
>
> require(OpenMx)
>
> #data(sciach)
>
> sciachsimple <- sciach[,c("irtsci", "grades", "probsolv")]
>
> manifests <- names(sciachsimple)
>
> pathmodel <- mxModel("path model",type="RAM",
+ mxData(observed=sciachsimple,type="raw"),
+ manifestVars = manifests,
+
+ mxPath(from="probsolv", to="grades",arrows=1,free=TRUE),
+
+ mxPath(from="grades", to="irtsci",arrows=1,free=TRUE),
+
+ mxPath(from=c("probsolv", "grades", "irtsci"),
+ arrows=2,
+ free=TRUE,
+ values = c(1, 1, 1),
+ labels=c("varx", "residual1", "residual2")),
+
+ mxPath(from="one",
+ to=c("probsolv", "grades", "irtsci"),
+ arrows=1,
+ free=TRUE,
+ values=c(1, 1, 1),
+ labels=c("meanx","beta0grades", "beta0irtsci")))
>
>
>
> pathmodelfit <- mxRun(pathmodel)
Running path model
> # pathmodelfit@output
>
> summary(mxRun(pathmodel))
Running path model
irtsci grades probsolv
Min. : 4.98 Min. :0.000 Min. : 0.00
1st Qu.:10.90 1st Qu.:2.000 1st Qu.: 7.00
Median :14.50 Median :3.000 Median : 9.00
Mean :14.69 Mean :2.858 Mean : 9.01
3rd Qu.:18.80 3rd Qu.:3.500 3rd Qu.:12.00
Max. :24.90 Max. :4.000 Max. :15.00

name matrix row col Estimate Std.Error
1 A irtsci grades 2.13628016 0.14758041
2 A grades probsolv 0.05409493 0.00729430
3 residual2 S irtsci irtsci 21.36742233 0.88520752
4 residual1 S grades grades 0.80461695 0.03333778
5 varx S probsolv probsolv 12.98102667 0.53787536
6 beta0irtsci M 1 irtsci 8.58666103 0.44305855
7 beta0grades M 1 grades 2.37100192 0.07077779
8 meanx M 1 probsolv 9.00943610 0.10555834

Observed statistics: 3495
Estimated parameters: 8
Degrees of freedom: 3487
-2 log likelihood: 16218.66
Saturated -2 log likelihood: NA
numObs: 1165
Chi-Square: NA
p: NA
AIC (Mx): 9244.662
BIC (Mx): -4200.609
adjusted BIC:
RMSEA: NA
frontend time: 0.2014029 secs
backend time: 3.162092 secs
independent submodels time: 6.508827e-05 secs
wall clock time: 3.36356 secs
cpu time: 3.36356 secs
openmx version number: 0.2.9-1147

mspiegel's picture
Offline
Joined: 07/31/2009
Degrees of freedom is

Degrees of freedom is computed as the difference between the number of observed statistics and the number of free parameters. When using full information maximum likelihood, the number of observed statistics is calculated as the number of non-NA entries in the columns used by the optimization.

There is a bug in 0.2.9 where all columns of the data set are used, instead of only the columns selected for the expected covariance matrix. However, I don't think that bug can manifest in a RAM style model because we currently don't allow subsets of data to be selected in RAM, it's all or nothing. In other words, 'sciachsimple' above doesn't contain unused columns.

Ryne's picture
Offline
Joined: 07/31/2009
The reason that both the df

The reason that both the df look weird and the RMSEA is non-existant is the lack of a saturated model to compare your model to. This is missing because of a difference in the definition of degrees of freedom in SEM and multivariate regression/GLM. A full-information optimizer like FIML approaches the model as having a higher number of degrees of freedom than is typically thought of in SEM (i.e., n*k-p intead of k(k+1)/2-p). By extension, its definition of a saturated model is very different than in SEM, as a saturated model would have k predictors per person. This model is not fit because it is massively large.

If you so choose, you could manually specify a saturated model that would match the definition of a fully saturated model in SEM. From there, the -2LL difference and df differences across the two models would exactly match the standard chi-square test of perfect fit in SEM, and could be turned into RMSEA. See the thread below to hear more about model fit stats and this fully saturated issue for FIML.

http://openmx.psyc.virginia.edu/thread/135

ecmerkle's picture
Offline
Joined: 03/08/2010
Aside from manually

Aside from manually specifying the saturated/unrestricted model through OpenMx, there are a couple R functions that fit unrestricted multivariate normals via ML and return the LL. fit.norm() in the QRMlib package and mvnXXX() in the mclust package both appear to do this.

Unless I'm missing something, you could use these functions to obtain -2LL for the unrestricted model without worrying about manually specifying the model. Then take [-2LL(restricted) - -2LL(unrestricted)] for the chi square, where -2LL(restricted) comes from the OpenMx run. Then manually calculate the difference in degrees of freedom using the traditional SEM framework.

neale's picture
Offline
Joined: 07/31/2009
Agreed, it looks really

Agreed, it looks really promising for fitting saturated models. Unfortunately, at least fit.norm() breaks when the data set includes any missing values:

> data <- rmnorm(1000,rho=0.7,d=10)
> data[1,1]<-NA
> fit.norm(data)
Error in solve.default(cov, ...) :
system is computationally singular: reciprocal condition number = 0

Sadly, it's only the case when fitting to raw data (with missing values) that computing the -2lnL is computationally difficult. Estimating it with complete data, i.e., when there are no missing values, is an easy, non-iterative task (as fit.norm shows). I've not tested mvnXXX; it sounds like a genre of porn so perhaps I should...

ecmerkle's picture
Offline
Joined: 03/08/2010
Sorry it took me so long to

Sorry it took me so long to get back to this thread...

I can assure you that mvnXXX is not as exciting as it sounds. For the missing data piece, I wonder whether em.norm from the norm package would work. I think it is typically used to obtain starting values for multiple imputation, but it seems relevant here.

tbates's picture
Offline
Joined: 07/31/2009
Hello, Easier to help if the

Hello,
Easier to help if the script runs: Can you make the data or a subset available... perhaps attach it here, then replace the file path in your script with an URL to the data here, i.e.,

sciach=read.table("http://openmx.psyc.virginia.edu/sites/default/files/sciach.txt")