Sun, 04/25/2010 - 16:46

April 25, 2010

Hello to all,

I am very much new to OpenMx and still treading water over here. Specifically, I am having trouble fitting a multiple group CFA model to test various measurement invariance hypotheses. My main difficulty is that I cannot figure out how to specify my model so that the objective functions are combined across groups. I would be very grateful if someone could provide a very short script to fit a 2-group CFA model in RAM notation. Say, a 1-factor, three indicator model. No need to set equality constraints across groups -- I can handle that part. I just cannot figure out how to combine the objective functions with Ram specification. Any and all help very much appreciated. I am trying to move all of my class materials over the OpenMx.

Best, Niels

Niels Waller, Professor

Director of Quantitative/Psychometric Methods

Department of Psychology

N218 Elliott Hall

University of Minnesota

75 East River Road

Minneapolis Minnesota 55455

Phone: (612) 626-8729

Email: nwaller@umn.edu

Homepage: http://www.psych.umn.edu/faculty/waller/

Hi,

I used the script below to run a multi-group CFA. The model runs fine and OpenMx gives me the parameter estimates in the two groups. But it doesn't give me a chi-square or a RMSEA. Why?

# Multigroup CFA with Grnt_fem and Grnt_mal

# read the data into an R dataframe and create covariance matrix - females

data_intell_f <- read.spss("Grnt_fem.sav", to.data.frame = T)

manifests_f <- c("visperc", "cubes", "lozenges", "paragrap", "sentence", "wordmean")

data_subset_f<-data_intell_f[manifests_f]

covmatrix_f<-cov(data_subset_f)

covmatrix_f

# read the data into an R dataframe and create covariance matrix - males

data_intell_m <- read.spss("Grnt_mal.sav", to.data.frame = T)

manifests_m <- c ("visperc", "cubes", "lozenges", "paragrap", "sentence", "wordmean")

data_subset_m<-data_intell_m[manifests_m]

covmatrix_m<-cov(data_subset_m)

covmatrix_m

# Heterogeneity model - different estimates in different groups

fem_model <-mxModel("females",

type="RAM",

manifestVars = manifests_f,

latentVars = c("F1_f","F2_f"),

mxPath(from="F1_f", to="visperc", values=1, free=FALSE, label="a"),

mxPath(from="F1_f", to="cubes", values=.2, label="b"),

mxPath(from="F1_f", to="lozenges", values=.2, label="c"),

mxPath(from="F2_f", to="paragrap", values=1, free=FALSE, label="d"),

mxPath(from="F2_f", to="sentence", values=.2, label="e"),

mxPath(from="F2_f", to="wordmean", values=.2, label="f"),

mxPath(from="F1_f", to="F2_f", arrows=2, values=.6, label="g"),

mxPath(from=manifests_f, arrows=2, free=TRUE, values=10, labels=c("evisperc_f", "ecubes_f", "elozenges_f", "eparagrap_f", "esentence_f", "ewordmean_f")),

mxPath(from="F1_f", arrows=2, free=TRUE, values=10, labels="vF1_f"),

mxPath(from="F2_f", arrows=2, free=TRUE, values=10, labels="vF2_f"),

mxCI(c("b", "c", "e", "f", "g", "evisperc_f", "ecubes_f", "elozenges_f", "eparagrap_f", "esentence_f", "ewordmean_f")),

mxData(covmatrix_f, type="cov", numObs = 73)

)

mal_model <- mxModel("males",

type="RAM",

manifestVars = manifests_m,

latentVars = c("F1_m","F2_m"),

mxPath(from="F1_m", to="visperc", values=1, free=FALSE, label="h"),

mxPath(from="F1_m", to="cubes", values=.2, label="i"),

mxPath(from="F1_m", to="lozenges", values=.2, label="j"),

mxPath(from="F2_m", to="paragrap", values=1, free=FALSE, label="k"),

mxPath(from="F2_m", to="sentence", values=.2, label="l"),

mxPath(from="F2_m", to="wordmean", values=.2, label="m"),

mxPath(from="F1_m", to="F2_m", arrows=2, values=.6, label="n"),

mxPath(from=manifests_m, arrows=2, free=TRUE, values=10, labels=c("evisperc_m", "ecubes_m", "elozenges_m", "eparagrap_m", "esentence_m", "ewordmean_m")),

mxPath(from="F1_m", arrows=2, free=TRUE, values=10, labels="vF1_m"),

mxPath(from="F2_m", arrows=2, free=TRUE, values=10, labels="vF2_m"),

mxCI(c("i", "j", "l", "m", "n", "evisperc_m", "ecubes_m", "elozenges_m", "eparagrap_m", "esentence_m", "ewordmean_m")),

mxData(covmatrix_m, type="cov", numObs = 72)

)

hetmodel <- mxModel ("Heterogeneity Model", fem_model, mal_model,

mxAlgebra(expression=females.objective + males.objective, name="h12"),

mxAlgebraObjective("h12")

)

# run the model and get the output

modelfit <- mxRun(hetmodel, intervals=TRUE)

summary(modelfit)

At first I thought this would be a problem of raw data, fitting saturated model is expensive, so we don't do it explicitly. However, you don't seem to have any missing data and are using covariance matrices.

You would get a chi squared if you had a single group analysis, and the objective was a single type=RAM deal. With Algebra objectives (necessary for all multi group analyses) there's a problem that OpenMx cannot be sure that it has something that could be considered as a chi-squared. Classic Mx had an option ISCHI to tell the software that the user says it jolly well is a chi-squared so report it as one. In my opinion, OpenMx ought to be a bit smarter in these instances and figure out that the algebra objective is a simple sum of chi-squared objectives and treat the whole thing as chi-squared. This isn't very simple and may not be foolproof (there are lots of formulae equivalent to summing - multiplying by a unit vector, etc), so perhaps we just need an option to summary or something... summary(model, Is_Too_A_Freaking_ChiSquared=T) or some such argument. As a workaround you could get the chi squared via a function call using some of the summary(model) output, but only if you were to obtain the likelihood of the saturated model (all manifest variables have free parameters for variances and covariances (and means if being modeled)).

What is lacking from OpenMx seems to be a slot in a RAM group for the saturated model within each group (easily calculated by fixing expected statistics to the observed ones). Then a helper function could grab all the necessary bits and just add them up. But we can't do it unless we can see the saturated model (-2 log-)likelihood within each group. Unless I'm mistaken...

Thank you for these explanations. Am I right in assuming that you are suggesting the following steps: (1) I run a multigroup analysis in which I estimate as many parameters as I have observations in each group. This analysis gives me the -2 log likelihood for the fully saturated (multigroup) model with 0 df. I store this value as "LLSat". (2) I run the heterogeneity model described in my previous post. This analysis gives me the -2 log likelihood for the over-identified (multigroup) model that allows me to test for configural invariance. I store this value as "LLHet". (3) I then use the following script to get the chi-square for the heterogeneity model:

Chi1 <- LLHet-LLSat

LRT1 <- rbind(LLSat,LLHet,Chi1)

LRT1

Correct?

Btw, what is the easiest way to write all possible variances and covariances between the manifest variables (saturated model)? I remember reading about a "to=all" option, but I don't remember if this option only existed in a previous version of OpenMx. Would it be

mxPath(from=manifests, to=all, arrows=2, values=.5),

mxPath(from=manifests, arrows=2, values=1),

?

I think if you post a reply to the top-level post, then the comment will not have a tiny width. I agree that this is Drupal being stupid.

Although you sent me an email, I am going to reply as a post to the forum. Others may have the same problem. I'm happy to reply, but it's much better if I do so on the open forum because then everyone can learn.

Here's the critical piece:

When the first argument to mxModel is an R object that is an mxModel, then it copies that R object in and modifies that mxModel and saves it as a new R object. This is what you were doing in the definition of TwoGroups.

This copied Group1 into a new model, added Group2 as a submodel, renamed "Blacks" to be "Two Groups" and saved it to an R object named TwoGroups. Then when you tried to force "Blacks", a previously defined RAM model, to have an mxAlgebraObjective, the parser didn't know what to do.

The right thing to do is this: When the first argument to mxModel is a string, then it creates a new mxModel with that name and all subsequent arguments are added to that new mxModel.

Also note that the name of the two mxModels within the mxModel namespace are "Whites" and "Blacks", but outside of the mxModel namespace they are Group1 and Group2. I generally use the same name in both the R namespace and Mx namespace in order to avoid confusion, but you used Group1 and Group2 as the R object names and "Whites" and "Blacks" as the mxModel namespace names. You correctly used Group1 and Group2 to add these R objects into the mxModel, but once they are inside you need to use Blacks.objective and Whites.objective to refer to their component pieces, since those were the mxModel namespace names you used when you created those submodels.

What you meant to write was this:

Now you are creating a new model named "Two Groups", adding the R objects Group1 and Group2, defining an mxAlgebraObjective function and storing it all to an R object named TwoGroups. This runs and gets the answer you expected.

Speaking of multiple-group CFA, consider the attached R script. I have a two-factor model, where the loadings associated with one factor are allowed to differ across two groups (all other parameters restricted to be equal across groups). The parameter estimates differ depending on whether I use raw data or covariance matrix + means. The -2LL values also differ greatly, but probably because a different likelihood is used in the two cases.

Finally, there is a negative variance estimate in the raw data case that does not replicate in other software. FWIW, neither Amos nor Mplus return a negative variance estimate, and their estimates agree (and they are different from all estimates supplied by OpenMx).

In case it matters, I am on R 2.10.1 for OS X with the latest version (as of Sept 2) of OpenMx. I've replicated the issue on R 2.11.1 on Ubuntu Linux 10.04.1 LTS (Lucid). On OS X, I get a negative variance estimate regardless of the type of data I use. On Ubuntu, I only get the negative variance for the raw data model.

Hi,

Model design is not really my strong area, but the behavior of the script might be corrected by adding a lower bound of 0 to the free parameters that you would like to be positive. I've been told that other SEM packages automatically assume a 0 lower bound for some of their free parameters. OpenMx tries to make as few assumptions as possible, and follows a WYSIWID philosophy (what you see is what it does). Use the 'lbound' argument to the mxPath() function to assign lower bounds.

Thanks for the quick response.

I will check out the lbound argument as soon as I get a chance. While that might solve the negative variance problem, I don't think it will solve the issue of different parameter estimates for raw data vs means/covariance matrix. I assumed that these would result in the same parameter estimates because the specified models are exactly the same (and I believe the means are modeled in both cases). If this assumption is incorrect, I wonder whether we are to prefer the estimates from the mean/covariance matrix model over those from the raw data model.

Sorry for all the questions. I'm doing a simulation involving nested multiple-group models, and I'm trying to assess the stability of the OpenMx estimates. OpenMx seems like a great platform for simulations, as you get R's power without the need for other programs to perform model estimation.

Ed Merkle

I took a look at this problem, and considered whether the raw data were running into a potential issue that arose with FIML in recent releases, but the solution is the same with an earlier version of OpenMx.

The algebra involved in moving from FIML with raw data to ML with covariances and means is relatively straightforward, and I would expect the two solutions to match better than they do. There appear to be no missing data at all in your sample (which is unusual) so issues of missingness mechanisms (MAR), which might cause divergence of the results, seem notwithstanding.

I'll keep looking into this problem, sorry for the delay in your work.

I played around with the script. The changes I made were fairly minor: I condensed the model specification for the model using the observed covariance matrix, to avoid any copy/paste error when modifying the models. I also added 'lbounds' arguments in places where the free parameters estimates were negative. I don't have the intuition to know which free parameters should be positive, so I just assumed any negative number was undesirable. The result of these changes is that the free parameter estimates are now close to one another, but they're not identical. I'll keep poking at it.

Thanks for looking at this. My data are artificial, which would explain the lack of missingness. Also, the true parameter values are all positive, so it is fine to add lbounds arguments for all the negative estimates (it would be fine to have lbounds arguments for all parameters).

The two sets of estimates are looking better, though I agree that they should not differ by this much. Comparing to Mplus and Amos (I know these are not gold standards, but they agree with one another in this situation), I notice a couple things.

(1) Mplus/Amos parameter estimates are similar to OpenMx estimates (do not agree to any decimal places, but whole numbers are mostly the same).

(2) The intercept estimates are nearly the same across the three programs, but it seems that OpenMx may have switched around some unique variance estimates. Rounded OpenMx variance estimates (named e1 to e6) are 46, 13, 24, 3, 9, 23. The corresponding rounded estimates in Mplus and Amos are 23, 13, 46, 2, 9, 24. So it seems as though OpenMx somehow switched e1 and e3.

One possibility is that I did something strange with my data as I moved it from R to Mplus and Amos. I don't think this is the case, though, because the estimated intercepts match closely across all three programs. It is only the estimated variances that seem to switch. If I am missing something obvious, I apologize.

Ed

Hi Ed,

I think there are two problems here.

First, your model has a rotational indeterminacy problem. Note that you are constraining the variances of your latent variables to be 1 in order to provide a scale for the latent variables. However, this does not make for a completely determined solution to your model.

To prove this to yourself, try starting the factor covariance at -.5 instead of .5 for your cov.res model. You will find that you obtain strikingly different estimates (changes in sign!) with nearly the same likelihood. This is a good indicator of rotational indeterminacy.

I suggest you try scaling each factor by constraining a loading to 1.0 but freeing the variances. This provides sufficient constraints so that you have a determined solution. It looks to me as if you were doing this earlier in your exploration and commented out those lines.

Second, when you compare FIML and covariance model estimates, you must remember that FIML is equivalent to population covariance and not sample covariance. R calculates the sample-corrected covariance by dividing by (N-1) whereas the FIML estimate is dividing by N.

You have small sample sizes here (50 in each group) and so this causes some divergence between the two methods when you use R's covariance function.

I attach a modified script that corrects these two problems and now the estimates agree between the two methods to 4 or 5 significant digits. Of course, the likelihood is different given the differences in FIML versus ML.

Hope that helps!

Thanks, Steve, that helped greatly. Everything looks good on my end now.

Regarding the "fix a loading at 1" vs "fix the latent variance at 1": I always thought of these as equivalent, possibly because other programs handle the indeterminacy issue behind the scenes (i.e., by restricting associated loadings to be positive). I like fixing the latent variance at 1 because I am less likely to care about estimating the latent variance, vs estimating a factor loading. For my model, at least, this can be accomplished in OpenMx by restricting the factor loadings to be positive via the lbound argument (as Michael suggested earlier in the thread). I get the same results back even when the starting value for the factor covariance is -.5 (though all loadings must have a lower bound of zero).

Finally, I don't know that this multiple-groups model contributes anything new, but feel free to accept it as a "donated model".

Thanks again,

Ed

Thanks for the donated script! We'll add it into our test suite.

While your script is more-or-less determined given your data, I have a similar example script that I wrote for my Intro to SEM class that is wonderfully indeterminate. I ask students to compare answers in class. Different CPUs (and even different versions of operating systems on the same computer) give different rotational values until you either fix loadings rather than fix variances or you fix something about the target rotation. It is great to see the students arguing about who has the "right" answer until they figure out the point about rotational indeterminacy.

Thanks Steve,

Your explanation was helpful and I now have a better grasp on the importance of the namespace. However, I must be missing an important piece as my code still says that I do not have a defined objective function. I've attached my code in case anyone can enlighten me.

Many thanks,

Niels

Ah, yes you've found an unfortunate interaction of components of the OpenMx interface. In the statement:

`TwoGroups <- mxModel(name="TwoGroups", model1, model2, ...)`

By passing the first argument as a named argument, the first positional argument becomes 'model1'. This informs the mxModel() call to modify 'model1' instead of build a new model. What you want to do is change the function call to the following:

`TwoGroups <- mxModel("TwoGroups", model1, model2, ...)`

When the first argument to mxModel() is a string, a new model is created with the string as the model name.

Thanks!

That did it.

Niels

Thanks to everyone's kind help, my models now run and replicate the results from MPLUS (yeah OpenMx team). However I noticed that some important output was not reported in my 2-group CFA. Do I need to specify additional commands, or are these function still being coded? Thanks, Niels

# OpenMx fails to report the following info

# observed statistics: 40

# estimated parameters: 18

# degrees of freedom: 22

# -2 log likelihood: 4264.046

# saturated -2 log likelihood: NA

# number of observations: 2252

# chi-square: NA

# p: NA

# AIC (Mx): NA

# BIC (Mx): NA

# adjusted BIC:

# RMSEA: NA

Dear all, I am having the same problem. Fitting a multiple group latent difference score model I get similar output (i.e. missing fit indexes). The same is true if I run the example code provided at http://openmx.psyc.virginia.edu/docs/OpenMx/latest/MultipleGroups_Path.html after changing the data format to "cov". I must admit that I didn't quite get the final point of neale's summary of this discussion and so I am left with the question whether or not there is any way to get the chi-square and BICs and AICs and RMSEAs or not?

I am looking forward to reading your responses.

Hannes

Ah. The problem here is the use of the mxAlgebra objective/multiple groups. While the ML objective returns the necessary information for fit index calculation, the algebra objective does not. This is because the algebra objective can be used for a great many things, many of which do not use maximum likelihood, so the assumptions underlying the various fit indices can't be assumed to hold when anything other than ML, RAM or FIML are used.

It is true that this is the source of the problem, but it remains inconvenient to the user. In Classic Mx there was an Option ISCHI, which explicitly told the software to treat the resulting fit function value as a chi-squared and to compute fit statistics accordingly. While I believe the mxAlgebra objective behavior is the correct default, it could be overridden with an option. For example,

multipleGroupModel <- mxOption(multipleGroupModel, "Unweighted sum of -2lnLs", "Yes")

might enable forcing of multiple group fit statistics to be computed for mxAlgebra objectives.

Two issues.

First, what is the appropriate saturated model for a multiple group application? Is it one where a single covariance matrix is fit to all data (i.e., constrained to equality across groups), or one with separate free means/covariances for each group? You must repeat this question for independence models as well. The values for all fit indices depend on these questions, and I don't have a good answer. Saturated likelihoods will be lower with constraints across groups, as this solution basically assumes a single covariance matrix is the correct saturated model for comparison. This solution will also require model fitting in the same way a FIML model would, which we avoid for auto-calculation. Alternatively, separate saturated models for each group may be closer to theory, but will add lots of df and make model comparison across varying numbers of groups difficult. However, this is the one we could do automatically, as it's just the sum of the saturated likelihoods assuming independent groups.

Second, where would these go if we could calculate them? This would be much easier if non-independent submodels contained their own unique output statements, but we give a global one. I'm sure it would look like 'modelA.saturatedLikelihood', but we'd have to get a lot more sophisticated with regards to saturated/independent df than we are now. At the moment, these df are calculated in the summary function because they're easy to figure out with a single group. If we support more complex comparison models, we need to move these to the backend.

Appendix:

All the more reason to build a helper function for saturated model generation. Throw in some intuitive options for labels and you could make either of the above saturated models relatively easily.

The appropriate saturated model is one with a collection of group-specific saturated models, i.e., "one with separate free means/covariances for each group". Independent=TRUE could be used, particularly if we are in FIML. Of course, fitting the saturated model takes time - possibly a lot of it if we are in FIML. Accordingly, Classic Mx allowed the user to specify

Option Sat=F,df

and something similar in OpenMx would definitely be handy.

I am unsure about the frontend/backend piece. Obviously it is a job for the backend to estimate the saturated model.

I like your thinking on the Appendix!

I like the idea of building some helper functions to make it easy for the user to specify the saturated model. That leaves the power in the hands of the user, rather than having OpenMx make guesses about what the user thinks is the appropriate saturated model.

Whether they use a helper function or build their saturated model manually, they can use

`summary(myModel, SaturatedLikelihood=mySaturatedModel)`

. That will provide the summary details and fill in the appropriate fit statistics. We'll need to update this to handle the independence model, as well, for people who want the extra fit stats.Oh, yeah. In the example above,

`mySaturatedModel`

should be the fitted saturated model. Or just the -2 log likelihood for the saturated model instead.When I updated the summary function for the CFI-like fit stats, I added the support for independence models that you described. I'll add a note to the summary function help file.

Also, I can't find a thing in this thread. Drupal seems to get unusable when threads get more than about 3 for four comments down the tree. I'm starting a new thread to solicit feature requests for a saturated model helper.

Using group-specific models creates an incongruency between group-specific models and definition-variable based approaches. Take a simple binary grouping/definition variable case as an example. If we treat everything as two groups, we have a very heavily parameterized saturated model. If we specify the same model as a single-group model with some parameters moderated by a binary definition variable, then we get a much more restricted saturated model. The definition variable version will likely appear to fit much better, provided the df differences don't overwhelm the differences in fit.

In the binary moderator case, where it essentially is a group indicator, the equivalent saturated model is one where ALL the parameters (every mean/threshold and covariance) are moderated by the group indicator. This should give the same -2lnL as the two-group saturated model. Interestingly, independent=T would help only the multiple model specification.

Hi Niels

With raw data (FIML) see this thread http://openmx.psyc.virginia.edu/thread/135 for rationale as to why OpenMx does not by default fit the saturated model, which is necessary for several of the fit statistics you desire (such as chi-square). If you fit to covariance matrices and means (i.e. you are in the unlikely situation of having no missing data) then this problem should not be encountered.

Admittedly, OpenMx could report AIC, BIC and aBIC for raw data. Similarly RMSEA. However, given that the number of variables isn't necessarily the same across groups (though in your case it is) the best flavor of multiple group RMSEA has yet to be figured out - by me at least - but perhaps others have good suggestions here.

Hi Mike,

I did fit the model to the previously computed cov and mean matrices. When you say "this problem should not be encountered" do you man that Openmx should have reported these stats? I have attached my script.

Thanks,

Niels

Yes, I do mean that. It would appear to be a problem specific to multiple group analyses, since the summary() function generates relatively complete output for the example on the home page:

observed statistics: 15

estimated parameters: 10

degrees of freedom: 5

-2 log likelihood: -3648.281

saturated -2 log likelihood: -3655.665

number of observations: 500

chi-square: 7.384002

p: 0.1936117

AIC (Mx): -2.615998

BIC (Mx): -11.84452

adjusted BIC:

RMSEA: 0.03088043

timestamp: 2010-04-26 17:49:39

frontend time: 0.1904531 secs

backend time: 0.01826882 secs

independent submodels time: 6.008148e-05 secs

wall clock time: 0.2087820 secs

cpu time: 0.2087820 secs

openmx version number: 0.3.0-1217

Should I report this as a bug? What is the next step to make sure the appropriate folks see this?

It was not very helpful of me to suggest just directly asking Michael about this problem, because the benefit of the ensuing discussion was limited to the three of us. There is an unresolved issue about saturated models that involve definition variables which is worth airing.

Therefore, I'm posting our thread:

neale: Yes, we could report it as a bug, or perhaps just nudge Michael like this...

spiegel: Err, do I just substitute the objective function values generated by the FIML saturated -2 log likelihood into the algebra objective calculation?

neale: Yes, that would do it.

However, with definition variables the saturated model may be undefined. Somewhat vaguely possibly, if one were to let the definition variables each have an effect on every mean and every variance & covariance, then this might be considered saturated with respect to the definition variables. Yet this is still inadequate, because it is very easy to specify a model which features an arbitrarily complex function of one or more of the definition variables. So the space encompassed by models with definition variables is rather large. The 'vaguely possibly' saturated model conjectured above might be considered as a *linear* saturated model with definition variables. If the specified "unsaturated" model was also only linear in its use of definition variables (say effects on means or residual variances) then it would be of the same class as the linear saturated model, against which comparisons might be meaningful. Or not. This has the scent of a fairly intensive simulation project.

Hi Niels,

A good example of this is in the documentation here:

http://openmx.psyc.virginia.edu/docs/OpenMx/latest/GeneticEpi_Path.html#...

Note that there are two groups specified as models with the RAM-style path notation and named mzModel and dzModel. Then these two models are combined in a "parent model" and their objective functions summed with the following statements:

Now you can run twinACEModel and both sub models will be run and their objective functions summed. Everything is minimized together.