Cfa with more than two factors

5 replies [Last post]
jb's picture
jb
Offline
Joined: 07/27/2010

Hi all,
First, thanks to do this job for R, it's really needed!
I am trying to do a CFA with 6 factors (psychological self report questionnaire with 6 scales with 14 items each).
I tried to adapt the two factors models. Seeing this discussion, I changed
the starting values and got no warning message. I have a few issues though:
1. For the latent covariance and variances of the 2 factors you give the
example of values=c(1, .5,.5, 1) But how can I organize it with 6 factors? Why do we need
to specify starting values, in sem package none is needed. Is it a specificity of the program ?
2.I fitted the same model with the sem package. I have virtually the same RMSEA
(with 0.001 difference) but the estimates are very different. In the SEM package
I got estimates between latent variables (factor 1 with 2 for instance) quite close
to the actual correlation table between the 6 scales (sum of the 14 items of each scale) with + or - 0.1. With OpenMx it was very far from that and
did not seem the same at all. Is it an error in my model, or are they really different and,
if so, what does it mean?
3. Did not succeed to fit the model with raw data, not finished in 15 hours. Is it normal ?

Some things that would be very useful in psychology and psychiatry, not only for
CFA but for every SEM:
1. More fit indices (at least RMSEA, CFI and SRMR. They are in the package sem
so may be easier to implement ? We are asked for them in reviews.
2. Some easier functions for the labels. When they are many it's difficult to name them all.
Maybe for the labels of Means for instance, a simple function which would had a "M" at the beginning
of the name of the latent variable.
2. Binary and Count variable. They are very important in psychiatry and psycho. And will be necessary
to compete with other programs. For instance, I can't use OpenMx now cause I need counts. And maybe more
examples using these kind of observed variables as outcomes.

Many thanks!!!

Here is the code for the model.
DFan is the data.frame with all the 84 items.

library(OpenMx)
##preparation
manifests <- names(DFan)
latents = c("PL","SE","CA","FE","AN","SA")
freemeans=c(rep(TRUE,84),rep(FALSE,6))
meansvalues=mean(DFan)

ANPSModel<-mxModel("ANcfa",
type="RAM",
mxData(
observed=cov(DFan),
type="cov",
numObs=830,
means=meansvalues),
manifestVars=manifests,
latentVars=latents,
# residual variances
mxPath(
from=manifests,
arrows=2,
free=TRUE,
values=0.8,
),
# latent variances and covariance
mxPath(
from=latents,
arrows=2,
all=TRUE,
free=TRUE,
values=0.8
),
mxPath(
from="PLAY",
to=c("an05", "an13", "an21", "an29", "an37",
"an45bis", "an53bis", "an61bis", "an69bis",
"an77bis","an85bis","an93bis","an101bis","an109bis"),
arrows=1,
free=c(FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE),
values=c(1,rep(0.2,13)),
),

mxPath(
from="SEEK",
to=c("an01bis","an09bis","an17bis","an25bis","an33bis",
"an41bis","an49bis","an57bis","an65bis",
"an73bis","an81bis","an89bis","an97bis","an105bis"),
arrows=1,
free=c(FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE),
values=c(1,rep(0.2,13)),
),

mxPath(
from="CARE",
to=c("an03bis","an11bis","an19bis","an27bis","an35bis",
"an43bis","an51bis","an59bis","an67bis",
"an75bis","an83bis","an91bis","an99bis","an107bis"),
arrows=1,
free=c(FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE),
values=c(1,rep(0.2,13)),
),

mxPath(
from="FEAR",
to=c("an02bis","an10bis","an18bis","an26bis","an34bis",
"an42bis","an50bis","an58bis","an66bis",
"an74bis","an82bis","an90bis","an98bis","an106bis"),
arrows=1,
free=c(FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE),
values=c(1,rep(0.2,13)),
),

mxPath(
from="ANGER",
to=c("an04bis","an12bis","an20bis","an28bis","an36bis",
"an44bis","an52bis","an60bis","an68bis",
"an76bis","an84bis","an92bis","an100bis","an108bis"),
arrows=1,
free=c(FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE),
values=c(1,rep(0.2,13)),
),

mxPath(
from="SAD",
to=c("an06bis","an14bis","an22bis","an30bis","an38bis",
"an46bis","an54bis","an62bis","an70bis",
"an78bis","an86bis","an94bis","an102bis","an110bis"),
arrows=1,
free=c(FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE),
values=c(1,rep(0.2,13)),
),

#means
mxPath(
from="one",
to=c(manifests,latents),
arrows=1,
free=freemeans,
values=c(rep(1,84),rep(0,6)),
)
)

Steve's picture
Offline
Joined: 07/30/2009
Hi, JB. I have interspersed

Hi, JB. I have interspersed answers between your questions.

1. For the latent covariance and variances of the 2 factors you give the example of values=c(1, .5,.5, 1) But how can I organize it with 6 factors? Why do we need to specify starting values, in sem package none is needed. Is it a specificity of the program ?

The OpenMx philosopyhy is WYSIWID, that is to say, "what you say is what it does". With that in mind, we don't choose starting values for you. sem() chooses starting values for you. OpenMx requires more of you as a user, but it also gives you more control when the models become complicated or computationally difficult to estimate.

2.I fitted the same model with the sem package. I have virtually the same RMSEA
(with 0.001 difference) but the estimates are very different. In the SEM package
I got estimates between latent variables (factor 1 with 2 for instance) quite close
to the actual correlation table between the 6 scales (sum of the 14 items of each scale) with + or - 0.1. With OpenMx it was very far from that and
did not seem the same at all. Is it an error in my model, or are they really different and,
if so, what does it mean?

There must be a difference between the way you specified the model in OpenMx and in sem. The two programs should obtain the same answers when they can both be used. (some models that OpenMx can run cannot be run with sem)

3. Did not succeed to fit the model with raw data, not finished in 15 hours. Is it normal ?

How many rows in your dataset? The raw data method in the 0.4.1 binary is still fairly inefficient. The version that is currently in testing is showing speedups of up to 20 times faster.

Note that most software that does FIML actually does not calculate the likelihood of each row. The current version of OpenMx does. Thus, OpenMx is slower for these problems, but is making the full calculation. The optimized version that is coming soon will be a hybrid of the row-wise FIML, operating much faster when it can, but making a row-wise calculation when necessary.

With that said, 15 hours sounds like a very long time for this simple CFA model, unless you have many thousands of rows in your dataset.

Some things that would be very useful in psychology and psychiatry, not only for
CFA but for every SEM:
1. More fit indices (at least RMSEA, CFI and SRMR. They are in the package sem
so may be easier to implement ? We are asked for them in reviews.

RMSEA is reported for covariance models, but not for FIML models. We expect to improve the summary() function as time goes on. In the mean time, since you are working in R, you could write a little function for yourself that calculated any fit function you like.

2. Some easier functions for the labels. When they are many it's difficult to name them all.
Maybe for the labels of Means for instance, a simple function which would had a "M" at the beginning
of the name of the latent variable.

myMeanLabels <- paste("M", 1:84, sep="")

2. Binary and Count variable. They are very important in psychiatry and psycho. And will be necessary
to compete with other programs. For instance, I can't use OpenMx now cause I need counts. And maybe more
examples using these kind of observed variables as outcomes.

OpenMx does work with ordinal outcomes. We are currently working on a section of the manual to explain how to do this. In the mean time, a quick search of the OpenMx website for "ordinal" give lots of hits. You might, for instance, want to look at http://openmx.psyc.virginia.edu/thread/614

Good luck and thanks for posting!

jb's picture
jb
Offline
Joined: 07/27/2010
Thanks for the answers! I

Thanks for the answers!
I agree with the philosophy, i think the insight it gives you are important and I appreciate R for that.
For a more than two factors cfa i still don't know the order of specification for the latent covariance and variances if you want to start with 0.5 for covariances and 1 for variances. For two factors the example given in the docs was =c(1, .5,.5, 1), for three is it c(1, .5,.5, 1, .5,.5, 1,.5,.5) it seems weird. Shud it be entered as a matrix ? Maybe if someone implemented one Cfa with more than two factors, it would be good as a OpenMx doc.
Estimation time: I had 830 raws but 6 factors and 14 items each. I shud have specified a maximum iteration number and a different convergence tolerance. Happy to see it will get faster.
I only can write very simple functions so I think I'll wait new indices to be implemented :) I just wanted to signal that it would be a useful development.
Count: yes, i was thinking of poisson distribution. Again, I'm not skilled enough to do that. But this would increase a lot the versatility of the package that would have continuous, binary, ordinal and count (the last three being constantly used in psy disciplines), which, i think, only Mplus have for the moment.
Thanks again for the work, it's a huge work that you are doing, good luck, and I will look at the new versions when they appear.

Ryne's picture
Offline
Joined: 07/31/2009
You can input the factor

You can input the factor covariance matrix as either a matrix or a vector (that will be used to populate a matrix). Regardless of the number of variables, you'll list the variance-covariance matrix one column at a time. The two factor model had a 2 x 2 latent covariance matrix, so the vector c(1, 0.5, 0.5, 1) gave the first column (1, 0.5) then the second column (0.5, 1). A 3 x 3 latent covariance matrix would have three columns: (1, 0.5, 0.5) for the first, (0.5, 1, 0.5) for the second, and (0.5, 0.5, 1) for the third, thus c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1). If you use 1 for the variance starting values and 0.5 for the covariances for a k x k covariance matrix, then you'll alternate 1 a single time with 0.5 k times until you have k 1s or k-1 sets of 0.5. I wrote you a little function to generate those for you, where k is the number of latent variables, var and cov are the starting values for the variances and covariances in the matrix, and vector states whether a vector (TRUE) or a matrix (FALSE) is returned. Change "jb" to something useful; all I could come up with was factorCovStartValue, which is absurdly long.

jb <- function(k, var=1.0, cov=0.5, vector=TRUE){
  a <- matrix(cov, k, k)
  diag(a) <- var
  if(vector==TRUE)(a <- as.vector(a))
  return(a)
}

jb's picture
jb
Offline
Joined: 07/27/2010
This was great, thanks!

This was great, thanks!

neale's picture
Offline
Joined: 07/31/2009
There is a difference between

There is a difference between Ordinal and Count variables, although often counts are treated as ordinal. Ordinal variables are catered for in OpenMx, as Steve says, but their analysis is not well documented at this point. For count variables one might want to consider likelihood modeling based on the Poisson distribution. Since OpenMx has an mxRowObjective() function (also undocumented but you can find some posts about it, e.g. http://openmx.psyc.virginia.edu/thread/300) it would be possible to maximize the likelihood of models based on that distribution, or any other distribution that can be computed with mxAlgebra() function calls. At this point, doing so may require more effort than was initially desired to expend, but at least it's possible.

Thanks for posting - feedback of this sort is very helpful.