Thu, 06/30/2011 - 15:30

Hi :-)

First I'm so happy that this group has written SEM software for R - Mplus is just way out of my budget!! Second I'm sort of new to this stuff, so I have a few beginner sorts of questions and I hope you'll bear with me - sorry.

I've never really understood modeling means in SEM. I know how it's done in OpenMx after reading through Dr. Boker's course notes, but I just don't understand exactly how the means are being estimated and why one would want to estimate them. I can't find any reference to this in any texts, so I would love some help or a point in the right direction.

Thanks so much!!

P.S. I can't seem to upload my picture to my account - I think there's a problem - something about it not being able to write the file to the correct folder? Thanks!!

Hi Sara

Suppose we have a simple one-factor model like that on the front page of the OpenMx website. Suppose however that we have two groups. Further suppose that we would like to know whether mean differences in the observed variables arise because of a) some specific effects on one or more of the observed variables, or b) a mean difference at the factor level. A mean difference at the factor level would imply a pattern of mean differences at the observed variable level which is entirely predicted by the size of the factor loadings. Those variables with bigger factor loadings would have bigger mean differences between groups.

The above is just one example where modeling the means can be informative. Growth curve and dynamical systems models provide other good examples.

However, I wonder if your question stems from reasoning along the lines of "actually I don't care about the means and don't have different groups etc" but OpenMx seems to be forcing me to supply a models for the means. OpenMx does this when the fit function is FIML, because the raw data likelihood function requires a estimate of the population mean in order to figure out how likely (or unlikely) it is. This is easy to see in the univariate case - the likelihood is the height of the normal distribution curve, but the height at a point x, say x=2, will be different if the mean of the distribution is say zero or if it is some other value. So raw data methods (which are a great way of handing cases with some missing data) require a model for the means. Just allowing each variable to have its own free parameter for its mean is a useful approach when one doesn't really have a model in mind for the means.