Why MLEs in a CFA model are different between analyses of raw data and covariance matrix data?

4 replies [Last post]
Ami's picture
Ami
Offline
Joined: 02/22/2012

I have conducted a large scale numerical simulation study to see small
sample properties of SEM and then have the following question:
I would appreciate it if you could teach me why MLEs in a CFA model are
different between analyses of raw data and covariance matrix data in
OpenMx package of R. Are the optimization methods employed different?
The maximum difference I encountered is 0.084267.

When I compare the results, I adjusted their scales, i.e., multiplied by
sqrt(N/N-1) for factor loadings and by N/N-1 for error variances.
The adjustment can approach the two estimates to each other closely.

The code and sample covariance matrix are in the attachment.
I would be glad, if someone helped me.

AttachmentSize
source.R2.23 KB
sample_covariance_matrix.txt4.43 KB
mhunter's picture
Offline
Joined: 07/31/2009
It seems that you found an

It seems that you found an interesting problem and gone so far as to solve it. Congratulations! Here is some information about why this problem exists and why the solution you found works.

ML is not exactly equal to FIML. The divergence generally only appears in small samples because there is a switch in the ML equation where it should depend on N, but to make calculations easier they use N-1.

If you derive ML estimation of SEMs from the multivariate normal probability density function, you get

-2LL = N * (ln(det(C)) + Trace(P*C^(-1)))

where -2LL is the minus two log likelihood, N is the sample size, ln() is the natural logarithm, det() is the determinant, C is the model implied covariance matrix, Trace() is the sum of the diagonal elements of a matrix, P is the observed covariance matrix calculated using N instead of N-1 in the denominator, and ^(-1) is matrix inversion.

The above equation depends on N because the likelihood is the probability of row 1 given the model multiplied by the probability of row 2 given the model and so on up to row N. Hence it is the product of N terms: r1*r2*...*rN. Note: it is the product of the probabilities of the rows because the rows are assumed independent. Taking the log of this product turns it into the sum of N terms: log(r1) + log(r2) + ... + log(rN). Through some linear algebra, we are able to turn this into the sum of one term N times: N*(log(R)). And this is the form that the above equation takes.

Now, because no one uses N when calculating their covariance matrices and because SEM is traditionally a large sample technique where N/(N-1) is close enough to 1, everyone who writes SEM programs instead uses the standard ML estimation

-2LL = (N-1)*(ln(det(C)) + Trace(S*C^(-1)))

where the term has been multiplied by N-1 instead of N, and observed covariance matrix S has been calculated with N-1. This switch from N to N-1 is long-standing. I first found it clearly documented in Bollen's (1989) SEM book, but I'm sure it goes back to Joreskog and others circa 1970.

FIML does not use the approximation that ML does because FIML does not calculate an observed covariance matrix. This explains why FIML is slightly different from ML even when there is no missing data and why multiplying by N/(N-1) solves the issue.

Hope this helps!

Mike Hunter

Ryne's picture
Offline
Joined: 07/31/2009
This is an important issue,

This is an important issue, and I don't know the answer. I've/we've got some reading and some simulations to do to figure this out, but I wanted to let you know that we take this seriously and will give you a response when we can.

I have a few ideas that I'll need to pursue as I deal with the raw vs. cov issue:
-First, I'm the one that wrote the test scripts that compared these two data formats. I'll have to double-check them all, but we generally used larger sample sizes (>100) that meant that whatever differences might exist were well below the epsilons used in our "checkEqual" functions.
-I'm unaware of comparisons of covariance and FIML versions of ML in very small sample sizes. If this is a known SEM issue rather than an OpenMx issue, I'll let you know.
-One other avenue I'll explore is convergence criterion. The covariance data version of the code you gave converged in many fewer iterations (33 vs 44) and a similar decrease in total function evaluations (1539 vs 2091), so the actual path of the optimizer is clearly different. When there's very little data, it's possible that the criterion for convergence become very "wide". That is, if we have a fixed criteria for convergence/optimality along the lines of "norm of the gradients less than x", and the small amount of data affects the units of the likelihood function, then we could have a wide absolute range of parameter values that all are within "x" of the solution. This is testable via simulation, and I'll see if I can figure out how reasonable this answer is soon.

Thank you so much for pointing this out. You're helping make OpenMx and advanced SEM better!

Ami's picture
Ami
Offline
Joined: 02/22/2012
I sincerely apologize for the delay of my reply.

Thank you very much for taking your time for a quick response and I sincerely apologize for the delay of my reply.

>> If this is a known SEM issue rather than an OpenMx issue, I'll let you know.

This is not a SEM issue. Well, as you know, there are not so many SEM programs which can estimate without computing a sample covariance matrix.
The two estimates should be the same if ignoring the difference made by the coefficient N or (N-1).

>> The covariance data version of the code you gave converged in many fewer iterations (33 vs 44) and a similar decrease in total function evaluations (1539 vs 2091), so the actual path of the optimizer is clearly different.

Your result and mine seem different from each other. Could you tell me the version of R and OpenMx you are using? I want to confirm whether my result corresponds with yours.

I also doubt the convergence criterion. In fact, I tried to change optimality tolerance with the following codes, and then I received the error message below.
---------------------------------------------------------------------------------
> factorModel1 <- mxOption(factorModel1,key="Optimality tolerance",value=1e-20)
Error in optionsNames[[match]] : attempt to select more than one element
---------------------------------------------------------------------------------

Therefore, I checked "getOption". Then

---------------------------------------------------------------------------------
> getOption('mxOptions')
$Nolist
[1] ""


$`Optimality tolerance`
[1] "6.3e-12"

$`Infinite bound size`
[1] "1.0e+15"


$`Optimality tolerance`
[1] "6.30957344480192e-12"

$`Major iterations`
[1] "1000"


$`RAM Max Depth`
[1] NA
---------------------------------------------------------------------------------

I found twice `Optimality tolerance`.
Then, I tried to "fix" as follows.

---------------------------------------------------------------------------------
> fix(mxOption)
>function (model, key, value, reset = FALSE)
{
if (length(model) == 0 && is.null(model)) {
return(processDefaultOptionList(key, value, reset))
}
if (length(model) > 1 || !is(model, "MxModel")) {
stop("argument 'model' must be an MxModel object")
}
if (length(reset) != 1 || !is.logical(reset)) {
stop("argument 'reset' must be TRUE or FALSE")
}
if (reset) {
model@options <- list()
return(model)
}
if (length(key) != 1 || !is.character(key)) {
stop("argument 'key' must be a character string")
}
if (length(value) > 1) {
stop("argument 'value' must be either NULL or of length 1")
}
optionsNames <- names(getOption("mxOptions"))
match <- grep(paste("^", key, "$", sep = ""), optionsNames,
ignore.case = TRUE)
if (length(match) == 0) {
stop(paste("argument 'key' has a value", omxQuotes(key),
"that cannot be found in", "getOption('mxOptions')"))
}
# if (!identical(optionsNames[[match]], key)) {
# stop(paste("argument 'key' has a value", omxQuotes(key),
# "but the option is named", omxQuotes(optionsNames[[match]]),
# ": please correct", "the capitalization and re-run mxOption()."))
# }
model@options[[key]] <- value
return(model)
}
---------------------------------------------------------------------------------

Because of this, I succeeded "mxOption(factorModel1,key="Optimality tolerance",value=1e-20)" as follows.

---------------------------------------------------------------------------------
> factorModel1 <- mxOption(factorModel1,key="Optimality tolerance",value=1e-20)
> factorModel1
MxModel 'Three Factor'
type : default
@matrices : 'expMean', 'A', 'L', and 'U'

@options : 'Major iterations' = '5000', 'Optimality tolerance' = '1e-20'
@output : FALSE
>
---------------------------------------------------------------------------------
However, I did not think "Optimality tolerance" was changed because the results of simulation did not change at all.
I suspect this change was not reflected to "Optimality tolerance" which are really used.

Is this a bug? Could you please tell me what happens if you know?

Ami

rabil's picture
Offline
Joined: 01/14/2010
Raw vs Cov

I didn't realize this until now that I was getting different ML estimates depending on whether I used raw or cov in mxData. I always check the fit of the model (typically calibration models) and have not noticed any obvious problems when I used FIML for datasets with missing data. Most datasets I work with have missing values so I tend to use FIML. I am currently using FIML (raw) for a small dataset fitting a simple common factor CFA with just one factor. I scaled the data to means of 0 and sds of 1 (and fixed the variance of the latent factor to one) and noticed that the estimated factor loadings squared plus the manifest variable variances were not adding to 1. When I switched to using cov (there are no missing data), then these sums equal 1. I thought I read that these sums should equal 1 unless there were certain constraints on the estimates. The FIML estimates are similar to the estimates when using cov, but different enough so that the sums were equalling 0.9 instead of 1.

Last week I did a two-factor CFA with a few missing values (266 observations) using FIML and the factor loadings squared + variances equaled 1.00 or so.

I just checked, and using the N/(N-1) factor fixes the problem with the sums not equaling one.