Computing and Reporting fit statistics

17 replies [Last post]
tbates's picture
Offline
Joined: 07/31/2009

A place to discuss models current and desired og fit, df, AIC, etc and how to make basic functionality happen

tbates's picture
Offline
Joined: 07/31/2009
Front page factor model in

Front page factor model in pathicCov, matricCov, and matricRaw forms as an example of differences in fit (models attached).

One-factor pathic model of covariance data

	# Observed statistics:  16 
	# Estimated parameters:  10 
	# Degrees of freedom:  6 
	# -2 log likelihood:  -3648.281 
	# Saturated -2 log likelihood:  -3655.665 
	# Chi-Square:  7.384002 
	# p:  0.2867893 
	# AIC (Mx):  -4.615998 
	# BIC (Mx):  -14.95182 
	# adjusted BIC: 
	# RMSEA:  0.02147869

Now a Matrix model of the cov data

	# Observed statistics:  16 
	# Estimated parameters:  0 
	# Degrees of freedom:  16 
	# -2 log likelihood:  972.155 
	# Saturated -2 log likelihood:  -3655.665 
	# Chi-Square:  4627.819 
	# p:  0 
	# AIC (Mx):  4595.819 
	# BIC (Mx):  2264.193 
	# adjusted BIC: 
	# RMSEA:  0.7592611

NB: same model of the same data as the pathic, but RMSEA now suggests terrible fit, p differs etc. etc.

Next try a matrix model with full information

	# Observed statistics:  2500 
	# Estimated parameters:  0 
	# Degrees of freedom:  2500 
	# -2 log likelihood:  216183.3 
	# Saturated -2 log likelihood:  
	# Chi-Square:   
	# p:   
	# AIC (Mx):  211183.3 
	# BIC (Mx):  100323.4 
	# adjusted BIC: 
	# RMSEA:  0

Same model, but raw data. RMSEA now improbably = 0, most statistics not computed

AttachmentSize
ML_vs_FIML.R 3.03 KB
Steve's picture
Offline
Joined: 07/30/2009
I tried running the code in

I tried running the code in ML_vs_FIML.R and the ML matrix example came up with the same results as the pathic example. The FIML example returned an error because "names" was not defined.

Once I set

names <- manifests

your code ran and came up with the same parameter estimates and reasonable fit statistics.

Observed statistics:  2500 
Estimated parameters:  15 
Degrees of freedom:  2485 
-2 log likelihood:  934.096 
Saturated -2 log likelihood:  
Chi-Square:   
p:   
AIC (Mx):  -4035.904 
BIC (Mx):  -7254.603 
adjusted BIC: 
RMSEA:  0 

tbates's picture
Offline
Joined: 07/31/2009
Hi Steve, Thanks for running

Hi Steve,
Thanks for running the script. I did it again from a clean boot at home and all is well. Can't see how the cov matrix version ever got 0 estimated but as you say... it runs fine.

For the FIML, I must have had the "names" variable set from some other model... A lesson in ensuring all used variables are set in the job, not just hanging around. oops.

For the missing Saturated -2 log likelihood in the FIML model, could mxRun() get this by estimating a saturated model (for means, variances, and covariances of the data)? Then we could have Chi-Square in the FIML model?

tbates's picture
Offline
Joined: 07/31/2009
hmm. In at work again and

hmm. In at work again and running the covariance model. Building today's R, shutting down and reloading R.app (today's build)... I get a model with 0 df.

require(OpenMx);
data(demoOneFactor)
factorModel <- mxModel("One Factor",
      mxMatrix("Full", 5, 1, values=0.2, free=T, name="A"),
      mxMatrix("Symm", 1, 1, values=1,   free=F, name="L"),
      mxMatrix("Diag", 5, 5, values=1,   free=T, name="U"),
      mxAlgebra(A %*% L %*% t(A) + U, name="R", dimnames = list(names(demoOneFactor), names(demoOneFactor)) ),
      mxMLObjective("R"),
      mxData(cov(demoOneFactor), type="cov", numObs=500)
)
summary(mxRun(factorModel))
 
Running One Factor 
       x1               x2               x3               x4               x5        
 Min.   :0.1985   Min.   :0.2000   Min.   :0.2312   Min.   :0.2784   Min.   :0.3156  
 1st Qu.:0.2000   1st Qu.:0.2917   1st Qu.:0.2925   1st Qu.:0.3515   1st Qu.:0.4019  
 Median :0.2312   Median :0.2925   Median :0.3740   Median :0.4061   Median :0.4574  
 Mean   :0.2447   Mean   :0.3075   Mean   :0.3522   Mean   :0.4261   Mean   :0.4813  
 3rd Qu.:0.2784   3rd Qu.:0.3515   3rd Qu.:0.4061   3rd Qu.:0.5333   3rd Qu.:0.5611  
 Max.   :0.3156   Max.   :0.4019   Max.   :0.4574   Max.   :0.5611   Max.   :0.6703  
 
Observed statistics:  16 
Estimated parameters:  0 
Degrees of freedom:  16 
-2 log likelihood:  972.155 
Saturated -2 log likelihood:  -3655.665 
Chi-Square:  4627.819 
p:  0 
AIC (Mx):  4595.819 
BIC (Mx):  2264.193 
adjusted BIC: 
RMSEA:  0.7592611 

get the same result just copying the front page ALA'+U model into R and pressing return... Baffled, as it worked at home (intel laptop) last night.
Can some other people try this please?

Steve's picture
Offline
Joined: 07/30/2009
Just did an svn up cd

Just did an

svn up
cd trunk
make clean install
R

That installed rev 803.

Copied the model from your post.

I got exactly what I'd expect:

[6]sboker@localhost:svn/OpenMx/trunk % R
 
R version 2.9.2 (2009-08-24)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
 
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
 
  Natural language support but running in an English locale
 
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
 
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
 
> require(OpenMx);
Loading required package: OpenMx
Loading required package: snowfall
Loading required package: snow
snowfall 1.70 initialized: sequential execution, one CPU.
 
> data(demoOneFactor)
> factorModel <- mxModel("One Factor",
+       mxMatrix("Full", 5, 1, values=0.2, free=T, name="A"),
+       mxMatrix("Symm", 1, 1, values=1,   free=F, name="L"),
+       mxMatrix("Diag", 5, 5, values=1,   free=T, name="U"),
+       mxAlgebra(A %*% L %*% t(A) + U, name="R", dimnames = list(names(demoOneFactor), names(demoOneFactor)) ),
+       mxMLObjective("R"),
+       mxData(cov(demoOneFactor), type="cov", numObs=500)
+ )
> summary(mxRun(factorModel))
Running One Factor 
       x1               x2               x3               x4        
 Min.   :0.1985   Min.   :0.2000   Min.   :0.2312   Min.   :0.2784  
 1st Qu.:0.2000   1st Qu.:0.2917   1st Qu.:0.2925   1st Qu.:0.3515  
 Median :0.2312   Median :0.2925   Median :0.3740   Median :0.4061  
 Mean   :0.2447   Mean   :0.3075   Mean   :0.3522   Mean   :0.4261  
 3rd Qu.:0.2784   3rd Qu.:0.3515   3rd Qu.:0.4061   3rd Qu.:0.5333  
 Max.   :0.3156   Max.   :0.4019   Max.   :0.4574   Max.   :0.5611  
       x5        
 Min.   :0.3156  
 1st Qu.:0.4019  
 Median :0.4574  
 Mean   :0.4813  
 3rd Qu.:0.5611  
 Max.   :0.6703  
 
   name matrix row col parameter estimate error estimate
1  <NA>      A   1   1         0.39715251    0.010964307
2  <NA>      A   2   1         0.50366157    0.012783503
3  <NA>      A   3   1         0.57724193    0.014400946
4  <NA>      A   4   1         0.70277434    0.016884658
5  <NA>      A   5   1         0.79625074    0.018766326
6  <NA>      U   1   1         0.04081420    0.001953551
7  <NA>      U   2   2         0.03801999    0.001723759
8  <NA>      U   3   3         0.04082719    0.002185719
9  <NA>      U   4   4         0.03938707    0.002396026
10 <NA>      U   5   5         0.03628712    0.002584026
 
Observed statistics:  16 
Estimated parameters:  10 
Degrees of freedom:  6 
-2 log likelihood:  -3648.281 
Saturated -2 log likelihood:  -3655.665 
Chi-Square:  7.384002 
p:  0.2867893 
AIC (Mx):  -4.615998 
BIC (Mx):  -14.95182 
adjusted BIC: 
RMSEA:  0.02147869 

tbates's picture
Offline
Joined: 07/31/2009
yes, and works fine or me at

yes, and works fine or me at home (intel macbook).

I wonder if it is something about the PowerPC (that's my work machine) NPSOL?

mspiegel's picture
Offline
Joined: 07/31/2009
No it's not a PPC issue.

No it's not a PPC issue. Your model has been specified with no free parameters. I took the original model and replaced all the free parameters with fixed parameters and I got the same summary statistics as you reported when running the model. Try executing rm(list=ls()) in your workspace before running the model (I'm assuming you don't store critical data or code in your workspace). Hmm, I believe you had another forum post where you assigned T <- FALSE as an example of R weirdness. If that assignment was done in your workspace, that would explain the behavior you are seeing.

tbates's picture
Offline
Joined: 07/31/2009
hi mike, just came here to

hi mike, just came here to report that exact solution... beat me too it.

This has to be a _great_ reason for stopping using variables as native booleans in example code...

Steve's picture
Offline
Joined: 07/30/2009
The reason I did it on the

The reason I did it on the front page code is... well... it's the front page and I was trying to fit a whole factor model into one column.

I think we should have a short check script for R newbies (and wicked beta testers who do bad things on purpose).

Something like

T==TRUE
F==FALSE
c(1,2,3)==1:3
t(matrix(c(1,2,3,4), 2, 2))==matrix(c(1,3,2,4),2,2)

and so forth so that if you reassign 'T' or 't' or 'F' or 'c', etc. we can catch it.

I mean, we can be verbose and protect people against 'T', but we can't protect people against 't'. So, I'm not sure I even buy the logic of never ever using 'T'.

Instead, let's put together an omnibus script that we recommend bug reporters run to verify that they have a working instance of .RData.

Steve's picture
Offline
Joined: 07/30/2009
Interesting proposal. Does

Interesting proposal. Does anyone have objections?

neale's picture
Offline
Joined: 07/31/2009
There are reasons that Mx1

There are reasons that Mx1 does not fit the saturated model for FIML. One is because fitting the saturated model (a free parameter for every mean, variance and covariance) is computationally intensive. Ideally it should be done only once. Mx1 has two ways of using the results of such a model: Option issat, which assumes that later in the same script there is a model of interest for which chi-sq fit will be computed; and ii) option saturated=-2lnL,df which allows the user to supply fit statistics which have been computed in a separate job.

Two, when there are definition variables, what the saturated model actually is becomes obscure, since any parameter of the model might be any polynomial function of any or all of the definition variables.

Note that there isn't a shortcut to computing the likelihood when there are missing data, although it is possible to supply decent starting values as long as the data are MCAR; when data are MAR the observed means, variances and covariances are not the same as the MLE's.

I am not sure whether we should just have an mxCompare(model1,model2,...) function to compare models, or if an additional argument to mxModel(compare=list) might allow for comparative statistics to be computed on the fly. I lean towards mxCompare, albeit at a cost of additional commands.

irebollo's picture
Offline
Joined: 09/24/2009
Hi Mike, I'm finally starting

Hi Mike,
I'm finally starting with OpenMx, and I must say that I love it!
Have you given any thoughts to the mxCompare function?
In the meantime, is there a way to get the degrees of freedom using mxEval, in the same way that we get the LL. If that's possible then one can simply compute the LRT and the difference in degrees of freedom, with the corresponding p value using R commands.

Thanks!

tbates's picture
Offline
Joined: 07/31/2009
Dear Irene, You can get the

Dear Irene,
You can get the degrees of freedom from summary().
assuming you have a fitted model in "fit"
> a = summary(fit)
> names(a)
[1] "parameters" "Minus2LogLikelihood" "SaturatedLikelihood" "numObs"
[5] "estimatedParameters" "observedStatistics" "degreesOfFreedom" "Chi"
[9] "p" "AIC.Mx" "BIC.Mx" "RMSEA"
[13] "dataSummary" "frontendTime" "backendTime" "mxVersion"

> a$Minus2LogLikelihood
1066

Steve's picture
Offline
Joined: 07/30/2009
I would like to see an

I would like to see an mxCompare function. As you say, one of the things it could do is compare to a saturated model.

Tim Brick and I had a conversation yesterday about the saturated model problem. I continue to think that the obscure cases are only maybe 5% of actual usage in SEM. So, we should expect to need to override something that can be automatically computed, and we need to throw an error when a person asks for automatic computation and there are potential pitfalls such as definition variables in the model. For the vast majority of SEMers, an auto saturated model computation (expensive though it might be) would be useful. Whether it should be on by default is another question about which I remain unsure.

tbates's picture
Offline
Joined: 07/31/2009
Especially as the saturated

Especially as the saturated model can be computed in parallel, so on machines where the user has a few spare cores, will add no time.

Maybe something like

saturated=(T|F) as default, and if the user provides a numeric value, that is used as the saturated value of the objective?

Steve's picture
Offline
Joined: 07/30/2009
Are you sure these are

Are you sure these are actually the same models? The non-pathic models are reporting 0 estimated parameters. That's just a count of the number of elements in the parameter vector passed to the estimator. Zero free parameters means that the estimator just calculated the -2LL of the starting values and returned. Fit stats will be horrible for such a model (unless you are very lucky with your starting values!)

tbates's picture
Offline
Joined: 07/31/2009
Several models do not report

Several models do not report fit statistics. What should be happening here? Do we need to attach them here the summary function can be improved, or should summary report the problems it found that stopped it computing the fit (couldn't be sure of df etc.) and then users can manually set these parameters in the model (df=xxx, ) to allow the existing summary to work?