A function to return fit indices

17 replies [Last post]
0avasns's picture
Offline
Joined: 12/09/2010

This is something that would be really useful if someone who is actually competent in the area would fix it. I did make a start, and I know that some of this is wrong; some is probably not sufficiently general; and some does indeed match the sem output exactly. So it is a start. Someone, please fix it and include with OpenMx, this would be very useful for a whole bunch of ignoramuses like me who occasionally need to run an SEM or path analysis and aren't covered by the sem package (e.g., need multi-group models).

# starter R code to compute the usual assortment of fit indices
#
# based on (what I understood of):
# Tabachnick & Fidell, pp. 697-702
# Klein, pp. 135-144
# also http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap19/sect34.htm
#
fit.index <- function(indepfit,modelfit) {

indep <- summary(indepfit)
indep.chi <- indep$Chi
indep.df <- indep$degreesOfFreedom

model <- summary(modelfit)
model.chi <- model$Chi
model.df <- model$degreesOfFreedom

model.ml <- as.numeric(modelfit$objective@result)
N <- model$numObs # sample size
N.est.param <- model$estimatedParameters # t
N.manifest <- length(modelfit@manifestVars) # n
observed.cov <- modelfit@data@observed # S
estimate.cov <- modelfit@matrices$S@values[1:N.manifest,1:N.manifest] # C, W
observed.cor <- cov2cor(observed.cov)
estimate.cor <- cov2cor(estimate.cov)
residual.cov <- observed.cov-estimate.cov
residual.cor <- observed.cor-estimate.cor
F0 <- (model.chi-model.df)/N
if (F0<0) { F0 <- 0 }

NFI <- (indep.chi-model.chi)/indep.chi
NNFI <- (indep.chi-indep.df/model.df*model.chi)/(indep.chi-indep.df)
PNFI <- (model.df/indep.df)*NFI
IFI <- (indep.chi-model.chi)/(indep.chi-model.df)
CFI <- 1.0-(model.chi-model.df)/(indep.chi-indep.df)
RMSEA<- sqrt(F0/model.df) # need confidence intervals
MFI <- exp(-0.5*(model.chi-model.df)/N)
GFI <- sum(estimate.cov*estimate.cov)/sum(observed.cov*observed.cov) # definitely not right!
AGFI <- 1.0 - (1.0-GFI)/(1.0-N.est.param/N)
PGFI <- (1.0-N.est.param/N)*GFI
AIC <- model.chi-2*model.df
AIC2 <- model.chi+2*N.est.param
CAIC <- model.chi-(log(N)+1.0)*model.df
CAIC2<- model.chi+(log(N)+1.0)*N.est.param
BIC <- model.chi-log(N)*model.df
BIC2 <- model.chi+log(N)*N.est.param
RMR <- sqrt(2.0*sum((residual.cov*residual.cov)/(N.est.param*(N.est.param+1))))
SRMR <- sqrt(2.0*sum((residual.cor*residual.cor)/(N.est.param*(N.est.param+1)))) # not right!

indices <- rbind(NFI,NNFI,PNFI,IFI,CFI,RMSEA,MFI,GFI,AGFI,PGFI,AIC,AIC2,CAIC,CAIC2,BIC,BIC2,RMR,SRMR)
return(indices)

}

Ryne's picture
Offline
Joined: 07/31/2009
Update on fit

Update on fit statistics.

First, I've noted your observation that AIC and BIC "seemed OK or off by a factor of 2." For some reason, we were multiplying BIC by 0.5. This will be checked in for the next binary release; we should support AIC and BIC with both the -2*df and the +2*parameters corrections soon, as well as sample size adjusted BIC.

The bigger issue pertains to fit statistics like CFI that require not just a saturated likelihood, but an independence model likelihood and df for all models as well. This will take a few more additions to the summary function and optimization, and may take a little time to make sure we get the specification right.

0avasns's picture
Offline
Joined: 12/09/2010
Thanks a lot for the update.

Thanks a lot for the update. This is beyond my competence, as you have seen, so I'll just be looking forward to future releases for the fit indices. I appreciate your continued effort.

paologhisletta's picture
Offline
Joined: 01/19/2010
again on fit statistics: I

again on fit statistics:

I updated and extended Oavasns original function (I think I found a couple of minor mistakes and hopefully was able to fix them). Any check by the OpenMx team or anybody else would be appreciated. The code can certainly be simplified. For instance, I ignore how to extract a model's estimated covariance matrix (called 'estimate.cov' below), hence got this with RAM algebra. The Confidence Interval of the RMSEA is still missing.

fit.index <- function(indepfit,modelfit) {
indep <- summary(indepfit)
indep.chi <- indep$Chi
indep.df <- indep$degreesOfFreedom
model <- summary(modelfit)
model.chi <- model$Chi
model.dev <- model$SaturatedLikelihood
model.df <- model$degreesOfFreedom
model.ml <- as.numeric(modelfit$objective@result)
N <- model$numObs
N.est.param <- model$estimatedParameters
N.manifest <- length(modelfit@manifestVars)
N.latent <- length(modelfit@latentVars)
observed.cov <- modelfit@data@observed
A <- modelfit@matrices$A@values
S <- modelfit@matrices$S@values
F <- modelfit@matrices$F@values
I <- diag(N.manifest+N.latent)
estimate.cov <- F %*% (qr.solve(I-A)) %*% S %*% (t(qr.solve(I-A))) %*% t(F)
observed.cor <- cov2cor(observed.cov)
estimate.cor <- cov2cor(estimate.cov)
Id.manifest <-diag(N.manifest)
residual.cov <- observed.cov-estimate.cov
residual.cor <- observed.cor-estimate.cor
F0 <- (model.chi-model.df)/(N-1)
if (F0<0) {F0 <- 0}
NFI <- (indep.chi-model.chi)/indep.chi
NNFI.TLI <- (indep.chi-indep.df/model.df*model.chi)/(indep.chi-indep.df)
PNFI <- (model.df/indep.df)*NFI
IFI <- (indep.chi-model.chi)/(indep.chi-model.df)
CFI <- 1.0-(model.chi-model.df)/(indep.chi-indep.df)
if (CFI>1) {CFI=1}
RMSEA<- sqrt(F0/model.df) # need confidence intervals
MFI <- exp(-0.5*(model.chi-model.df)/N)
GH <- N.manifest / (N.manifest+2*((model.chi-model.df)/(N-1)))
GFI <- 1 - (sum(diag(((solve(estimate.cor)%*%observed.cor)-Id.manifest)%*%((solve(estimate.cor)%*%observed.cor)-Id.manifest))) /
(sum(diag((solve(estimate.cor)%*%observed.cor)%*%(solve(estimate.cor)%*%observed.cor)))))
AGFI <- 1.0 - ((N.manifest*(N.manifest+1))/(2*model.df) * (1-GFI))
PGFI <- (2*model.df)/(N.manifest*(N.manifest+1))*GFI
#AIC <- model.chi-2*model.df
AIC <- model.chi+2*N.est.param
#CAIC <- model.chi-(log(N)+1.0)*model.df
CAIC<- model.chi+(log(N)+1.0)*N.est.param
#BIC <- model.chi-log(N)*model.df
BIC <- model.chi+log(N)*N.est.param
RMR <- sqrt((2*sum(residual.cov^2))/(N.manifest*(N.manifest+1)))
SRMR <- sqrt((2*sum(residual.cor^2))/(N.manifest*(N.manifest+1)))
indices <- rbind(N,model.chi,model.df,NFI,NNFI.TLI,PNFI,IFI,CFI,RMSEA,MFI,GH,GFI,AGFI,PGFI,AIC,CAIC,BIC,RMR,SRMR)
return(indices)
}

Ryne's picture
Offline
Joined: 07/31/2009
Thanks, Paolo. These formulae

Thanks, Paolo. These formulae look correct based on my casual inspection, but I'll put look at them more carefully soon. We're working on adding automatic calculation of the independence model, which should make these a lot easier.

deb193's picture
Offline
Joined: 12/01/2010
saturated likelihood

If I am reading Paolo's code correctly, it expects the saturated model likelihood to be in the model being evaluated.

Can I just stick this in there before using his fuction?

0avasns's picture
Offline
Joined: 12/09/2010
So you *are* my god! Thanks

So you *are* my god! Thanks so much! I think I need to upgrade things to get this to work for me, I'll try it out and let you know if there are any problems.

Ryne's picture
Offline
Joined: 07/31/2009
This looks good! We already

This looks good! We already have a few of these in the summary function, but we should add a few of these as well. I'll try to find some time to double-check the equations and define them in terms of likelihoods whenever possible. I'll pose a few question for you and the community:

-Which of these functions is very important to you? Which do you not care to see?
-Are there any additional functions you'd like to see?

If these all get forwarded into summary, I'd propose something like this:
-Information criteria are put in a data frame, making a 4 x 2 (or 2 x 4) table of AIC, BIC, CBIC and sBIC with both the -2*observedStatistics and +2*parameters corrections
-*FI get put in a data frame because there's a lot of them. If there is no saturated model given, an NA or character message is presented instead
-RMSEA (and its CI) remain their own object
-RMR and SRMR are put together, as I understand SRMR to be the standardized version of RMR. (This will have to wait for an explicit output of the estimated and observed covariances; i have no idea what to do with these with ordinal data).

brandmaier's picture
Offline
Joined: 02/04/2010
CIs of RMSEA

See this thread for a proposal of CI for RMSEA: http://openmx.psyc.virginia.edu/thread/1269#comment-4919

cheers,
Andreas

0avasns's picture
Offline
Joined: 12/09/2010
My impression is that the

My impression is that the most important (judging by what seems most reported or recommended in the literature) are CFI, RMSEA (with CI), and SRMR, followed by GFI and NNFI, AIC, and/or BIC. Of these, I think I got CFI and NNFI right, but GFI and SRMR wrong. AIC/BIC seem OK or off by a factor of 2.

By all means it's better to have something that works for the most common cases than wait for something that would be perfectly general for any possible situation (ordinal data or what have you).

Another thing that would be extremely useful is the calculation of standardized coefficients. I have probably misunderstood how that's supposed to be done, because my first attempt is very definitely wrong:

my.std.coef <- function(modelfit) { # accepts result of MxRun

s <- modelfit@matrices$S@values
s[which(modelfit@matrices$A@free)] <- modelfit@matrices$A@values[which(modelfit@matrices$A@free)]
t <- matrix(nrow=nrow(s))
for (i in 1:nrow(s)) {t[i] <- sum(s[i,])}

std <- matrix(nrow=nrow(s),ncol=ncol(s))
rownames(std)=rownames(s)
colnames(std)=colnames(s)
for (i in 1:nrow(s)) {std[i,] <- s[i,]/t[i]}
std <- sqrt(std)

mod <- summary(modelfit)
nparam <- mod$estimatedParameters
stdcoef <- vector(length=nparam)
for (i in 1:nparam) { stdcoef[i] <- std[mod$parameters$row[i], mod$parameters$col[i]] }

z <- mod$parameters$Estimate / mod$parameters$Std.Error
p <- 1-pnorm(z)

augcoef <- cbind(mod$parameters,z,p,stdcoef)
return(augcoef)

}

If you will be my god and fix these so that we can run our usual analyses with OpenMx I'll have one more reason to convince my students to use R for everything!

Ryne's picture
Offline
Joined: 07/31/2009
Thanks for the

Thanks for the information.

I've already written a RAM model standardization function, which can be found here: http://openmx.psyc.virginia.edu/thread/718. It shows up as posted by anonymous for some reason, but let me know if you have any questions.

0avasns's picture
Offline
Joined: 12/09/2010
Works like a charm! Except

Works like a charm! Except for model[[nM]]@values[,] because my model has M=NA, but I got rid of that line and the rest is perfect. Thanks so much!!

0avasns's picture
Offline
Joined: 12/09/2010
I guess that was a bit too

I guess that was a bit too fast... Not everything works like a charm. Specifically, the variance of the endogenous latent is not shown and the loadings of its indicators are incorrect (it seems that the unstandardized loadings are divided by the standard deviation of the latent instead of being multiplied). The rest seem fine in my test case but that's no reassurance because the exogenous latents' variance was fixed to 1.0 for identification. I'll see if I can understand enough of your code to try and address the discrepancies.

Ryne's picture
Offline
Joined: 07/31/2009
That's odd. If you don't

That's odd. If you don't mind, post code to generate your model and I'll take a look.

0avasns's picture
Offline
Joined: 12/09/2010
Thanks a lot for looking into

Thanks a lot for looking into it. Code (stdtest.R) and data (rcdata.txt) attached. Here are the results:

&gt; semTwo$coeff
            Estimate  Std Error   z value     Pr(&gt;|z|)                     
b_wrdflu_F 0.9065400 0.07784689 11.645167 0.000000e+00    WRDFLU3 &lt;--- lv_F
b_pseflu_F 0.8461698 0.07447068 11.362455 0.000000e+00    PSEFLU3 &lt;--- lv_F
eps_wrdflu 0.1781848 0.12684788  1.404713 1.601068e-01 WRDFLU3 &lt;--&gt; WRDFLU3
eps_pseflu 0.2839963 0.11156310  2.545611 1.090866e-02 PSEFLU3 &lt;--&gt; PSEFLU3
b_lc_L     0.5033123 0.06281616  8.012466 1.110223e-15        LC3 &lt;--- lv_L
b_token_L  0.4961814 0.06238059  7.954099 1.776357e-15     TOKEN2 &lt;--- lv_L
eps_lc     0.7466788 0.06687948 11.164543 0.000000e+00         LC3 &lt;--&gt; LC3
eps_token  0.7538076 0.06629075 11.371233 0.000000e+00   TOKEN2 &lt;--&gt; TOKEN2
b_rcabc_R  0.2817331 0.09090740  3.099122 1.940949e-03    zRC3abc &lt;--- lv_R
b_rcdef_R  0.3453483 0.11609160  2.974791 2.931878e-03    zRC3def &lt;--- lv_R
eps_rcabc  0.6383103 0.05663863 11.269875 0.000000e+00 zRC3abc &lt;--&gt; zRC3abc
eps_rcdef  0.4565309 0.06581463  6.936617 4.015899e-12 zRC3def &lt;--&gt; zRC3def
b_L_R      1.7222738 0.71833503  2.397591 1.650327e-02       lv_R &lt;--- lv_L
b_F_R      0.5387619 0.20021772  2.690880 7.126376e-03       lv_R &lt;--- lv_F
&gt; mxTwo$parameters
         name matrix     row     col  Estimate  Std.Error
1  b_wrdflu_F      A WRDFLU3       F 0.9065392 0.07791102
2  b_pseflu_F      A PSEFLU3       F 0.8461700 0.07451795
3       b_F_R      A       R       F 0.5387590 0.19795745
4      b_lc_L      A     LC3       L 0.5033116 0.06266909
5   b_token_L      A  TOKEN2       L 0.4961789 0.06224838
6       b_L_R      A       R       L 1.7222646 0.70604404
7   b_rcabc_R      A zRC3abc       R 0.2817338 0.08940058
8   b_rcdef_R      A zRC3def       R 0.3453491 0.11413211
9  eps_wrdflu      S WRDFLU3 WRDFLU3 0.1781857 0.12691719
10 eps_pseflu      S PSEFLU3 PSEFLU3 0.2839953 0.11162171
11     eps_lc      S     LC3     LC3 0.7466774 0.06672460
12  eps_token      S  TOKEN2  TOKEN2 0.7538065 0.06615317
13  eps_rcabc      S zRC3abc zRC3abc 0.6383107 0.05662865
14  eps_rcdef      S zRC3def zRC3def 0.4565310 0.06571747
&gt; semTwo.std
              Std. Estimate                     
1                 1.0000000       lv_F &lt;--&gt; lv_F
2  b_wrdflu_F     0.9065402    WRDFLU3 &lt;--- lv_F
3  b_pseflu_F     0.8461700    PSEFLU3 &lt;--- lv_F
4  eps_wrdflu     0.1781849 WRDFLU3 &lt;--&gt; WRDFLU3
5  eps_pseflu     0.2839964 PSEFLU3 &lt;--&gt; PSEFLU3
6                 1.0000000       lv_L &lt;--&gt; lv_L
7      b_lc_L     0.5033118        LC3 &lt;--- lv_L
8   b_token_L     0.4961805     TOKEN2 &lt;--- lv_L
9      eps_lc     0.7466772         LC3 &lt;--&gt; LC3
10  eps_token     0.7538049   TOKEN2 &lt;--&gt; TOKEN2
11                0.2349353       lv_R &lt;--&gt; lv_R
12  b_rcabc_R     0.5883051    zRC3abc &lt;--- lv_R
13  b_rcdef_R     0.7256102    zRC3def &lt;--- lv_R
14  eps_rcabc     0.6538971 zRC3abc &lt;--&gt; zRC3abc
15  eps_rcdef     0.4734898 zRC3def &lt;--&gt; zRC3def
16      b_L_R     0.8347882       lv_R &lt;--- lv_L
17      b_F_R     0.2611386       lv_R &lt;--- lv_F
&gt; mxTwo.std
         name matrix     row     col Std. Estimate Std.Std.Error
1  b_wrdflu_F      A WRDFLU3       F     0.9065397    0.07791105
2  b_pseflu_F      A PSEFLU3       F     0.8461704    0.07451799
3       b_F_R      A       R       F     0.2611382    0.09595061
4      b_lc_L      A     LC3       L     0.5033116    0.06266909
5   b_token_L      A  TOKEN2       L     0.4961789    0.06224838
6       b_L_R      A       R       L     0.8347872    0.34222181
7   b_rcabc_R      A zRC3abc       R     0.1382145    0.04385865
8   b_rcdef_R      A zRC3def       R     0.1704727    0.05633838
9  eps_wrdflu      S WRDFLU3 WRDFLU3     0.1781858    0.12691731
10 eps_pseflu      S PSEFLU3 PSEFLU3     0.2839956    0.11162182
11     eps_lc      S     LC3     LC3     0.7466774    0.06672460
12  eps_token      S  TOKEN2  TOKEN2     0.7538065    0.06615317
13  eps_rcabc      S zRC3abc zRC3abc     0.6538981    0.05801151
14  eps_rcdef      S zRC3def zRC3def     0.4734907    0.06815881
&gt; 

(I hope the code shows up correctly with all the angle brackets in there... I don't know how to get the tabs to align in the posting, so I am also attaching the output separately)

The problem is evident in comparing rows 11-13 of semTwo.std to rows 7-8 of mxTwo.std (there's no standardized disturbance of R in mxTwo). The corresponding unstandardized coefficients match fine.

AttachmentSize
stdtest.R 2.27 KB
rcdata.txt 360.81 KB
stdtest_results.txt 3.84 KB
Ryne's picture
Offline
Joined: 07/31/2009
Fixed. I'll update and detail

Fixed. I'll update and detail the standardizeRAM change in the original thread.

I've altered your code to make comparison easier. I used R's merge() function to allow for more direct comparison of the sem and OpenMx output.

I've also changed all of your '->' assignment operators to '<-'. Left assignment is the standard in every programming language I'm familiar with. Mixing left and right assignment in the same script is especially bad, as it becomes very easy to misread assignment operators.

AttachmentSize
stdtest.R 2.52 KB
brauer's picture
Offline
Joined: 01/28/2012
Fit indices - RMSEA confidence interval, p close

Hi,

Thanks to the script below (suggested by Athanassios Protopapas and further developed by Paolo Ghisletta, thank you!!!) I was able to obtain a large number of fit indices, but I still can't get the 90% confidence interval for RMSEA and p close (the test of the null hypothesis that RMSEA (in the population) in less than .05). Can anyone help me?

fit.cov <- function(indepfit,modelfit) {
options(scipen=3)
indep <- summary(indepfit)
model <- summary(modelfit)
deviance <- model$Minus2LogLikelihood
Chi <- model$Chi
indep.chi <- indep$Chi
df <- model$degreesOfFreedom
p.Chi <- 1 - pchisq(Chi, df)
Chi.df <- Chi/df
indep.df <- indep$degreesOfFreedom
N <- model$numObs
N.parms <- model$estimatedParameters
N.manifest <- length(modelfit@manifestVars)
q <- (N.manifest*(N.manifest+1))/2
N.latent <- length(modelfit@latentVars)
observed.cov <- modelfit@data@observed
observed.cor <- cov2cor(observed.cov)
A <- modelfit@matrices$A@values
S <- modelfit@matrices$S@values
F <- modelfit@matrices$F@values
I <- diag(N.manifest+N.latent)
estimate.cov <- F %*% (qr.solve(I-A)) %*% S %*% (t(qr.solve(I-A))) %*% t(F)
estimate.cor <- cov2cor(estimate.cov)
Id.manifest <-diag(N.manifest)
residual.cov <- observed.cov-estimate.cov
residual.cor <- observed.cor-estimate.cor
F0 <- (Chi-df)/(N-1)
if (F0<0) {F0 <- 0}
NFI <- (indep.chi-Chi)/indep.chi
NNFI.TLI <- (indep.chi-indep.df/df*Chi)/(indep.chi-indep.df)
PNFI <- (df/indep.df)*NFI
RFI <- 1 - (Chi/df) / (indep.chi/indep.df)
IFI <- (indep.chi-Chi)/(indep.chi-df)
CFI <- 1.0-(Chi-df)/(indep.chi-indep.df)
if (CFI>1) {CFI=1}
PRATIO <- df/indep.df
PCFI <- PRATIO*CFI
NCP <- max((Chi-df),0)
RMSEA<- sqrt(F0/df) # need confidence intervals
MFI <- exp(-0.5*(Chi-df)/N)
GH <- N.manifest / (N.manifest+2*((Chi-df)/(N-1)))
GFI <- 1-(sum(diag(((solve(estimate.cor)%*%observed.cor)-Id.manifest)%*%((solve(estimate.cor)%*%observed.cor)-Id.manifest))) / sum(diag((solve(estimate.cor)%*%observed.cor)%*%(solve(estimate.cor)%*%observed.cor))))
AGFI <- 1 - (q/df)*(1-GFI)
PGFI <- GFI * df/q
AICchi <- Chi+2*N.parms
AICdev <- deviance+2*N.parms
BCCchi <- Chi + 2*N.parms/(N-N.manifest-2)
BCCdev <- deviance + 2*N.parms/(N-N.manifest-2)
BICchi <- Chi+N.parms*log(N)
BICdev <- deviance+N.parms*log(N)
CAICchi <- Chi+N.parms*(log(N)+1)
CAICdev <- deviance+N.parms*(log(N)+1)
ECVIchi <- 1/N*AICchi
ECVIdev <- 1/N*AICdev
MECVIchi <- 1/BCCchi
MECVIdev <- 1/BCCdev
RMR <- sqrt((2*sum(residual.cov^2))/(2*q))
SRMR <- sqrt((2*sum(residual.cor^2))/(2*q))
indices <-
rbind(N,deviance,N.parms,Chi,df,p.Chi,Chi.df,AICchi,AICdev,BCCchi,BCCdev,BICchi,BICdev,CAICchi,CAICdev,RMSEA,SRMR,RMR,GFI,AGFI,PGFI,NFI,RFI,IFI,NNFI.TLI,CFI,PRATIO,PNFI,PCFI,NCP,ECVIchi,ECVIdev,MECVIchi,MECVIdev,MFI,GH)
return(indices)
}

fit.cov(indepfit,modelfit)

0avasns's picture
Offline
Joined: 12/09/2010
Thank you very much for

Thank you very much for fixing the standardization code, I greatly appreciate it!
(as for the left/right assignment, it is one of the things I really like about R :-) )