Fri, 06/03/2011 - 13:36

Hello,

I have an SEM model I am happy with and I am examining the standardized path coefficients. Then when I split the original data set into almost equal sized, mutually exclusive two smaller sized datasets based on Gender and apply my original model to get the standardized path coefficients for Males vs. Females I am expecting to see one group to have perhaps slightly smaler and another group to have slightly higher standardized path coefficient than the original total sample model, so that the average of the path coefficients for Males and females would be close to the original larger sample's. However this is not happening. I am getting for both males and females smaller standardized path coefficients than the total original sample.

I would like to understand why this is happening and if this is normal, since for all of the other filters I am applying I am getting path coefficients that average (close enough) to the total sample.

Please advice.

Thank you.

Leyla

One likely culprit is mean induced covariation, which is best explained by example. Let's say that you have two variables, x and y, that you measure in an equal number of males and females. If you look at just the males, both x and y have means of zero, variances of 1 and zero correlation. If you look at just the females, both x and y have means of 5, variances of 1 and zero correlation. If you ignore grouping, x and y will have means of 2.5, variances around 7 and a correlation around 0.9. The correlation found in the full data is caused by the mean differences in the two groups that underlie it.

There are other potential issues that come up when you compare standardized coefficients across groups with different variances. In doing so, you're assuming not that the raw values of a parameter are equal across groups, but rather if the ratio of the parameter to one or more standard deviations is equal across groups. It's not necessarily bad, but it is very easy to specify one theory and test another.

To find out if these issues affect your problem, plot and summarize your data.

> x <- c(rnorm(100, 0, 1), rnorm(100, 5, 1))

> y <- c(rnorm(100, 0, 1), rnorm(100, 5, 1))

> cor(x[1:100], y[1:100])

[1] 0.007312482

> cor(x[101:200], y[101:200])

[1] -0.007244413

> cor(x,y)

[1] 0.8679693