Thu, 12/09/2010 - 09:46

This is something that would be really useful if someone who is actually competent in the area would fix it. I did make a start, and I know that some of this is wrong; some is probably not sufficiently general; and some does indeed match the sem output exactly. So it is a start. Someone, please fix it and include with OpenMx, this would be very useful for a whole bunch of ignoramuses like me who occasionally need to run an SEM or path analysis and aren't covered by the sem package (e.g., need multi-group models).

# starter R code to compute the usual assortment of fit indices

#

# based on (what I understood of):

# Tabachnick & Fidell, pp. 697-702

# Klein, pp. 135-144

# also http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap19/sect34.htm

#

fit.index <- function(indepfit,modelfit) {

indep <- summary(indepfit)

indep.chi <- indep$Chi

indep.df <- indep$degreesOfFreedom

model <- summary(modelfit)

model.chi <- model$Chi

model.df <- model$degreesOfFreedom

model.ml <- as.numeric(modelfit$objective@result)

N <- model$numObs # sample size

N.est.param <- model$estimatedParameters # t

N.manifest <- length(modelfit@manifestVars) # n

observed.cov <- modelfit@data@observed # S

estimate.cov <- modelfit@matrices$S@values[1:N.manifest,1:N.manifest] # C, W

observed.cor <- cov2cor(observed.cov)

estimate.cor <- cov2cor(estimate.cov)

residual.cov <- observed.cov-estimate.cov

residual.cor <- observed.cor-estimate.cor

F0 <- (model.chi-model.df)/N

if (F0<0) { F0 <- 0 }

NFI <- (indep.chi-model.chi)/indep.chi

NNFI <- (indep.chi-indep.df/model.df*model.chi)/(indep.chi-indep.df)

PNFI <- (model.df/indep.df)*NFI

IFI <- (indep.chi-model.chi)/(indep.chi-model.df)

CFI <- 1.0-(model.chi-model.df)/(indep.chi-indep.df)

RMSEA<- sqrt(F0/model.df) # need confidence intervals

MFI <- exp(-0.5*(model.chi-model.df)/N)

GFI <- sum(estimate.cov*estimate.cov)/sum(observed.cov*observed.cov) # definitely not right!

AGFI <- 1.0 - (1.0-GFI)/(1.0-N.est.param/N)

PGFI <- (1.0-N.est.param/N)*GFI

AIC <- model.chi-2*model.df

AIC2 <- model.chi+2*N.est.param

CAIC <- model.chi-(log(N)+1.0)*model.df

CAIC2<- model.chi+(log(N)+1.0)*N.est.param

BIC <- model.chi-log(N)*model.df

BIC2 <- model.chi+log(N)*N.est.param

RMR <- sqrt(2.0*sum((residual.cov*residual.cov)/(N.est.param*(N.est.param+1))))

SRMR <- sqrt(2.0*sum((residual.cor*residual.cor)/(N.est.param*(N.est.param+1)))) # not right!

indices <- rbind(NFI,NNFI,PNFI,IFI,CFI,RMSEA,MFI,GFI,AGFI,PGFI,AIC,AIC2,CAIC,CAIC2,BIC,BIC2,RMR,SRMR)

return(indices)

}

Update on fit statistics.

First, I've noted your observation that AIC and BIC "seemed OK or off by a factor of 2." For some reason, we were multiplying BIC by 0.5. This will be checked in for the next binary release; we should support AIC and BIC with both the -2*df and the +2*parameters corrections soon, as well as sample size adjusted BIC.

The bigger issue pertains to fit statistics like CFI that require not just a saturated likelihood, but an independence model likelihood and df for all models as well. This will take a few more additions to the summary function and optimization, and may take a little time to make sure we get the specification right.

Thanks a lot for the update. This is beyond my competence, as you have seen, so I'll just be looking forward to future releases for the fit indices. I appreciate your continued effort.

again on fit statistics:

I updated and extended Oavasns original function (I think I found a couple of minor mistakes and hopefully was able to fix them). Any check by the OpenMx team or anybody else would be appreciated. The code can certainly be simplified. For instance, I ignore how to extract a model's estimated covariance matrix (called 'estimate.cov' below), hence got this with RAM algebra. The Confidence Interval of the RMSEA is still missing.

fit.index <- function(indepfit,modelfit) {

indep <- summary(indepfit)

indep.chi <- indep$Chi

indep.df <- indep$degreesOfFreedom

model <- summary(modelfit)

model.chi <- model$Chi

model.dev <- model$SaturatedLikelihood

model.df <- model$degreesOfFreedom

model.ml <- as.numeric(modelfit$objective@result)

N <- model$numObs

N.est.param <- model$estimatedParameters

N.manifest <- length(modelfit@manifestVars)

N.latent <- length(modelfit@latentVars)

observed.cov <- modelfit@data@observed

A <- modelfit@matrices$A@values

S <- modelfit@matrices$S@values

F <- modelfit@matrices$F@values

I <- diag(N.manifest+N.latent)

estimate.cov <- F %*% (qr.solve(I-A)) %*% S %*% (t(qr.solve(I-A))) %*% t(F)

observed.cor <- cov2cor(observed.cov)

estimate.cor <- cov2cor(estimate.cov)

Id.manifest <-diag(N.manifest)

residual.cov <- observed.cov-estimate.cov

residual.cor <- observed.cor-estimate.cor

F0 <- (model.chi-model.df)/(N-1)

if (F0<0) {F0 <- 0}

NFI <- (indep.chi-model.chi)/indep.chi

NNFI.TLI <- (indep.chi-indep.df/model.df*model.chi)/(indep.chi-indep.df)

PNFI <- (model.df/indep.df)*NFI

IFI <- (indep.chi-model.chi)/(indep.chi-model.df)

CFI <- 1.0-(model.chi-model.df)/(indep.chi-indep.df)

if (CFI>1) {CFI=1}

RMSEA<- sqrt(F0/model.df) # need confidence intervals

MFI <- exp(-0.5*(model.chi-model.df)/N)

GH <- N.manifest / (N.manifest+2*((model.chi-model.df)/(N-1)))

GFI <- 1 - (sum(diag(((solve(estimate.cor)%*%observed.cor)-Id.manifest)%*%((solve(estimate.cor)%*%observed.cor)-Id.manifest))) /

(sum(diag((solve(estimate.cor)%*%observed.cor)%*%(solve(estimate.cor)%*%observed.cor)))))

AGFI <- 1.0 - ((N.manifest*(N.manifest+1))/(2*model.df) * (1-GFI))

PGFI <- (2*model.df)/(N.manifest*(N.manifest+1))*GFI

#AIC <- model.chi-2*model.df

AIC <- model.chi+2*N.est.param

#CAIC <- model.chi-(log(N)+1.0)*model.df

CAIC<- model.chi+(log(N)+1.0)*N.est.param

#BIC <- model.chi-log(N)*model.df

BIC <- model.chi+log(N)*N.est.param

RMR <- sqrt((2*sum(residual.cov^2))/(N.manifest*(N.manifest+1)))

SRMR <- sqrt((2*sum(residual.cor^2))/(N.manifest*(N.manifest+1)))

indices <- rbind(N,model.chi,model.df,NFI,NNFI.TLI,PNFI,IFI,CFI,RMSEA,MFI,GH,GFI,AGFI,PGFI,AIC,CAIC,BIC,RMR,SRMR)

return(indices)

}

Thanks, Paolo. These formulae look correct based on my casual inspection, but I'll put look at them more carefully soon. We're working on adding automatic calculation of the independence model, which should make these a lot easier.

If I am reading Paolo's code correctly, it expects the saturated model likelihood to be in the model being evaluated.

Can I just stick this in there before using his fuction?

So you *are* my god! Thanks so much! I think I need to upgrade things to get this to work for me, I'll try it out and let you know if there are any problems.

This looks good! We already have a few of these in the summary function, but we should add a few of these as well. I'll try to find some time to double-check the equations and define them in terms of likelihoods whenever possible. I'll pose a few question for you and the community:

-Which of these functions is very important to you? Which do you not care to see?

-Are there any additional functions you'd like to see?

If these all get forwarded into summary, I'd propose something like this:

-Information criteria are put in a data frame, making a 4 x 2 (or 2 x 4) table of AIC, BIC, CBIC and sBIC with both the -2*observedStatistics and +2*parameters corrections

-*FI get put in a data frame because there's a lot of them. If there is no saturated model given, an NA or character message is presented instead

-RMSEA (and its CI) remain their own object

-RMR and SRMR are put together, as I understand SRMR to be the standardized version of RMR. (This will have to wait for an explicit output of the estimated and observed covariances; i have no idea what to do with these with ordinal data).

See this thread for a proposal of CI for RMSEA: http://openmx.psyc.virginia.edu/thread/1269#comment-4919

cheers,

Andreas

My impression is that the most important (judging by what seems most reported or recommended in the literature) are CFI, RMSEA (with CI), and SRMR, followed by GFI and NNFI, AIC, and/or BIC. Of these, I think I got CFI and NNFI right, but GFI and SRMR wrong. AIC/BIC seem OK or off by a factor of 2.

By all means it's better to have something that works for the most common cases than wait for something that would be perfectly general for any possible situation (ordinal data or what have you).

Another thing that would be extremely useful is the calculation of standardized coefficients. I have probably misunderstood how that's supposed to be done, because my first attempt is very definitely wrong:

my.std.coef <- function(modelfit) { # accepts result of MxRun

s <- modelfit@matrices$S@values

s[which(modelfit@matrices$A@free)] <- modelfit@matrices$A@values[which(modelfit@matrices$A@free)]

t <- matrix(nrow=nrow(s))

for (i in 1:nrow(s)) {t[i] <- sum(s[i,])}

std <- matrix(nrow=nrow(s),ncol=ncol(s))

rownames(std)=rownames(s)

colnames(std)=colnames(s)

for (i in 1:nrow(s)) {std[i,] <- s[i,]/t[i]}

std <- sqrt(std)

mod <- summary(modelfit)

nparam <- mod$estimatedParameters

stdcoef <- vector(length=nparam)

for (i in 1:nparam) { stdcoef[i] <- std[mod$parameters$row[i], mod$parameters$col[i]] }

z <- mod$parameters$Estimate / mod$parameters$Std.Error

p <- 1-pnorm(z)

augcoef <- cbind(mod$parameters,z,p,stdcoef)

return(augcoef)

}

If you will be my god and fix these so that we can run our usual analyses with OpenMx I'll have one more reason to convince my students to use R for everything!

Thanks for the information.

I've already written a RAM model standardization function, which can be found here: http://openmx.psyc.virginia.edu/thread/718. It shows up as posted by anonymous for some reason, but let me know if you have any questions.

Works like a charm! Except for model[[nM]]@values[,] because my model has M=NA, but I got rid of that line and the rest is perfect. Thanks so much!!

I guess that was a bit too fast... Not everything works like a charm. Specifically, the variance of the endogenous latent is not shown and the loadings of its indicators are incorrect (it seems that the unstandardized loadings are divided by the standard deviation of the latent instead of being multiplied). The rest seem fine in my test case but that's no reassurance because the exogenous latents' variance was fixed to 1.0 for identification. I'll see if I can understand enough of your code to try and address the discrepancies.

That's odd. If you don't mind, post code to generate your model and I'll take a look.

Thanks a lot for looking into it. Code (stdtest.R) and data (rcdata.txt) attached. Here are the results:

(I hope the code shows up correctly with all the angle brackets in there... I don't know how to get the tabs to align in the posting, so I am also attaching the output separately)

The problem is evident in comparing rows 11-13 of semTwo.std to rows 7-8 of mxTwo.std (there's no standardized disturbance of R in mxTwo). The corresponding unstandardized coefficients match fine.

Fixed. I'll update and detail the standardizeRAM change in the original thread.

I've altered your code to make comparison easier. I used R's merge() function to allow for more direct comparison of the sem and OpenMx output.

I've also changed all of your '->' assignment operators to '<-'. Left assignment is the standard in every programming language I'm familiar with. Mixing left and right assignment in the same script is especially bad, as it becomes very easy to misread assignment operators.

Hi,

Thanks to the script below (suggested by Athanassios Protopapas and further developed by Paolo Ghisletta, thank you!!!) I was able to obtain a large number of fit indices, but I still can't get the 90% confidence interval for RMSEA and p close (the test of the null hypothesis that RMSEA (in the population) in less than .05). Can anyone help me?

fit.cov <- function(indepfit,modelfit) {

options(scipen=3)

indep <- summary(indepfit)

model <- summary(modelfit)

deviance <- model$Minus2LogLikelihood

Chi <- model$Chi

indep.chi <- indep$Chi

df <- model$degreesOfFreedom

p.Chi <- 1 - pchisq(Chi, df)

Chi.df <- Chi/df

indep.df <- indep$degreesOfFreedom

N <- model$numObs

N.parms <- model$estimatedParameters

N.manifest <- length(modelfit@manifestVars)

q <- (N.manifest*(N.manifest+1))/2

N.latent <- length(modelfit@latentVars)

observed.cov <- modelfit@data@observed

observed.cor <- cov2cor(observed.cov)

A <- modelfit@matrices$A@values

S <- modelfit@matrices$S@values

F <- modelfit@matrices$F@values

I <- diag(N.manifest+N.latent)

estimate.cov <- F %*% (qr.solve(I-A)) %*% S %*% (t(qr.solve(I-A))) %*% t(F)

estimate.cor <- cov2cor(estimate.cov)

Id.manifest <-diag(N.manifest)

residual.cov <- observed.cov-estimate.cov

residual.cor <- observed.cor-estimate.cor

F0 <- (Chi-df)/(N-1)

if (F0<0) {F0 <- 0}

NFI <- (indep.chi-Chi)/indep.chi

NNFI.TLI <- (indep.chi-indep.df/df*Chi)/(indep.chi-indep.df)

PNFI <- (df/indep.df)*NFI

RFI <- 1 - (Chi/df) / (indep.chi/indep.df)

IFI <- (indep.chi-Chi)/(indep.chi-df)

CFI <- 1.0-(Chi-df)/(indep.chi-indep.df)

if (CFI>1) {CFI=1}

PRATIO <- df/indep.df

PCFI <- PRATIO*CFI

NCP <- max((Chi-df),0)

RMSEA<- sqrt(F0/df) # need confidence intervals

MFI <- exp(-0.5*(Chi-df)/N)

GH <- N.manifest / (N.manifest+2*((Chi-df)/(N-1)))

GFI <- 1-(sum(diag(((solve(estimate.cor)%*%observed.cor)-Id.manifest)%*%((solve(estimate.cor)%*%observed.cor)-Id.manifest))) / sum(diag((solve(estimate.cor)%*%observed.cor)%*%(solve(estimate.cor)%*%observed.cor))))

AGFI <- 1 - (q/df)*(1-GFI)

PGFI <- GFI * df/q

AICchi <- Chi+2*N.parms

AICdev <- deviance+2*N.parms

BCCchi <- Chi + 2*N.parms/(N-N.manifest-2)

BCCdev <- deviance + 2*N.parms/(N-N.manifest-2)

BICchi <- Chi+N.parms*log(N)

BICdev <- deviance+N.parms*log(N)

CAICchi <- Chi+N.parms*(log(N)+1)

CAICdev <- deviance+N.parms*(log(N)+1)

ECVIchi <- 1/N*AICchi

ECVIdev <- 1/N*AICdev

MECVIchi <- 1/BCCchi

MECVIdev <- 1/BCCdev

RMR <- sqrt((2*sum(residual.cov^2))/(2*q))

SRMR <- sqrt((2*sum(residual.cor^2))/(2*q))

indices <-

rbind(N,deviance,N.parms,Chi,df,p.Chi,Chi.df,AICchi,AICdev,BCCchi,BCCdev,BICchi,BICdev,CAICchi,CAICdev,RMSEA,SRMR,RMR,GFI,AGFI,PGFI,NFI,RFI,IFI,NNFI.TLI,CFI,PRATIO,PNFI,PCFI,NCP,ECVIchi,ECVIdev,MECVIchi,MECVIdev,MFI,GH)

return(indices)

}

fit.cov(indepfit,modelfit)

Thank you very much for fixing the standardization code, I greatly appreciate it!

(as for the left/right assignment, it is one of the things I really like about R :-) )