Wed, 04/13/2011 - 11:37

Hi, I've been playing a bit around with SEM models and open-mx and I am having some trouble with derivation of the 'saturated -2 log likelihood' output provided by the summary screen.

Shouldn't the Saturated model be zero always except for some machine error? Instead it tends to be fairly close to a model with 1 degree of freedom.

I might be misunderstanding something but for the saturated model where the Implied model under this condition is equal to the Sample covariance matrix which should yield a discrepancy function of zero.

So, in slightly more explicit terms is this output actually related to the derivation of the -2 * log(Likelihood Ratio) for H_0: Sigma is equal to the Saturated model, where Sigma is the covariance of a multi-variate normal distribution, or is it something else?

Thanks,

Jonathan

Your conceptual understanding is sound, as there are ways to describe a likelihood function as based on the simple discrepency between observed and expected. However, more general solutions use the dot product of the observed covariance and the inverse expected and also include some 'likelihood of the data' aspect as well. The likelihood function for an ML model without means is:

(n-1) * (log(det(expected)) + observed * expected^1)

Where 'expected' is the expected covariance matrix and 'observed' is the observed covariance matrix. If you want means, they go in at the end and are multiplied by n rather than n-1. I used the 'OneFactorMatrixDemo' demo as an example, which has 5 variables, 500 rows and a saturated -2 log likelihood of -3655.665. As the observed and expected are equal (within machine error) for the saturated model, the algebra "observed * expected^1" should be the number of variables, which is 5 in this case. I have the log of the determinant of the expected covariance at -12.32598, yielding an expected saturated likelihood of:

(500-1) * (-12.32598 + 5) = 499 * -7.32598 = -3655.664

which differs only by the poor precision of the determinant when you do this by hand. As always, someone is welcome to check me, as my old notes on this look to be incorrect and I built this out of reading Tim's C code for the mxMLObjective.

Thanks,

Following the 'OneFactorMatrixDemo'

> output_factorModel <- mxRun(factorModel)

> x <- output_factorModel$A@values[1:5,6]

> y <- output_factorModel$S@values[1:5,1:5]

>

> Sigma_0 <- x %*% t(x) + y

> Sigma <- cov(demoOneFactor)

>

> L0 <- 499 * (log(det(Sigma)) + sum(diag( Sigma %*% solve(Sigma))))

> L1 <- 499 * (log(det(Sigma_0)) + sum(diag( Sigma %*% solve(Sigma_0))))

> L0

[1] -3655.665

> L1

[1] -3648.281

Which is proportional to the log of the likelihood of a multivariate normal which makes great sense. I was a bit confused since the book I was reading and the documentation for proc calis freely use likelihood for the statistic from an LRT and I incorrectly assumed this followed for other software packages.

Jonathan

Glad I could help.

Some programs report the difference in -2 log likelihoods between fitted and saturated models as either the model -2 log likelihood or the chi-square. It's also important to remember that likelihoods can't be zero, else any likelihood ratio test involving them would be undefined, as the statistic is formally defined as -2 * log (model1likelihood/model2likelihood), and dividing by zero is frowned upon in some circles.