Fri, 08/01/2014 - 17:06

Hello,

Does the scale of a latent variable matter in interpreting results? For instance, I am trying to make a latent variable for cost, but none of my observed variables are in dollars (or any monetary value). If the first indicator loading is fixed to 1, but that is measuring # of cars, how would that affect the other indicators, which are measurements of completely different things (but things that would affect this idea of cost)?

Thanks for any help

Great question!

Latent variables don't have an inherent scale or any natural units. As latent variables can be identified in many different ways, they can have many different units. Interpreting them requires researchers to understand how they are identified.

Your latent variable reflects what your indicators have in common. If you identify your cost factor by fixing a factor loading to 1, then you're stating that every unit change in your factor corresponds to a unit change in that indicator. If you pick different identification criteria (e.g., a different item used as the reference indicator, or fixing the factor mean and variance), you'll get different parameter estimates, but your model will fit the same. Changing the identification criteria don't change the model as much as they linearly transform the scale of the latent variable, and all linear regression methods are insensitive to linear transformation.

To sum up, model fit is insensitive to the scale of the latent variable, but you have to be aware of your units when interpreting specific parameters. A regression effect of 3.5 on the latent variable has a very different interpretation if the units of the latent variable is tied to the cost of the car or the cost of a meal.

Let me know if we can answer more questions.

Ryne,

Thanks for the thorough response! I didn't know that changing the identification criteria wouldn't change the model fit.

I'm still having difficulty understanding how to interpret the effect of the different indicators on the latent variable, however. In my model for transportation cost, I have # of vehicles (should increase cost as it increases), commute distance (increase cost) and access to retail/food (decrease cost). and a few others. I don't have any specific idea of how much, in a dollar amount, changes in any of these variables affect transportation cost, just the general sense that they do. Is that a problem with identifying and interpreting the model? I get the expected coefficients on the indicators when I fix the loading to # vehicles, but I don't get how to interpret them in terms of "cost" or interpret the effect of the LV in my structural model beyond a general positive or negative correlation.

Thanks for any and all help.

A few thoughts:

-indicators don't have effects on the latent variable; the latent variable has effects on the indicators. The latent variable predicts the indicators, and in doing so explains why they covary. This is a common confusion among new factor analysts, and also gets into formative vs. reflective factors, which is a topic for fun reading later.

-The interpretation of factor loadings (the coefficients by which indicators are regressed on the latent variable) reflect the scaling of the factor and the units of the indicator. Let's stick with your original identification criteria, and I'll make up numbers to carry out the example. You identify your latent variable by fixing its mean at zero and the factor loading for the "number of cars" variable to 1.0. You estimate a factor variance, all item intercepts and residual variances, and loadings for all of the other items. Let's say that the loadings for the "commute distance" and "access to retail/food" are 10 and -5. This means that for every additional unit of the latent variable (in this case, for every change in the "cost" latent variable that yield an expected change of one car owned), we'd expect commute distance to go up by 10 and access to go down by 5. If we used "commute distance" as the reference indicator, we'd get loadings of .1 on cars owned and -.5 on access. If we identified the latent variable as having a mean of zero and variance of 1, we'd get the same ratio of factor loadings (1/10/-5), but the actual values would depend on the variance of the items in your sample.

Simpler version: factor loadings are regressions, but the units of your factor are whatever you fix them to be. You can rescale dollars into thousands of dollars or commute miles or whatever and not change your model fit just like you can in regression, but interpretation is based on your understanding of the units in your model.

Does that help?

That was extremely helpful, thank you! You just answered what a lot of hours of research didn't. I did suspect that about the latent variables affecting the indicators, because of path diagrams, but it was never spelled out in anything I came across (nor was anything about latent variables as specifically as you have put it).

I think it is coming together slowly for me and that I had the idea of a latent variable backwards. Let me know if I'm on the right track here: what I am really doing using my model as I have is used an unobservable construct "transportation cost" to account for changes in car ownership, commute distance, and access to food. Is this a sensible use, then? Because I originally thought, well, the more you have to travel to get food the greater your transportation cost is (makes sense), but now I am thinking that using a latent variable implies that "transportation cost" is a given and affects the access to food, which doesn't make sense or mean anything when trying to measure the actual transportation cost of a household.

Sorry to be dumping the whole of my modeling problems on you. Maybe if you have a simpler example of latent variables and their effect on indicators you could just use that.

And thanks again for all the help you've been! I'm really happy I came on here.

Glad to help. Your misunderstanding of latent variables is common. Here's a brief overview of reflective vs formative models with some references to get you going: http://www.rasch.org/rmt/rmt221d.htm. This is also a popular discussion in SEMNET at the moment: http://www2.gsu.edu/~mkteer/semnet.html.

You're correct in what you're defining: "transportation cost" is an unobserved variable that causes car ownership, commute distance and access to food, and in doing so explains why those variables are correlated. Once you account for this latent variable, the residual or unique factors that remain in these variables are uncorrelated. These assumptions make a lot of sense in some contexts: symptom items on a depression scale, test items on a cognitive battery, etc. They make less sense in your example: socio-economic status (SES) is the classic example of a reflective construct, where the individual economic variables combine to cause SES, not the other way around.

You could reconceptualize your use of the factor model. For instance, the latent factor that relates your observed variables appears to be connectedness in some fashion: car ownership, long commutes and large distances to food would indicate not being connected to a local community, where getting by without a car and walking to your job and food sources indicate high connectedness to a very very local community (interactions between income and urban/rural-ness aside). You'd have to rethink how you formulate hypotheses under this variable. Alternatively, discriminant function analysis and canonical correlations are older but still useful techniques for combining multivariate predictors of outcomes. Ultimately, you have to decide whether or not you're defining a theoretical latent variable or just trying to do some data reduction.

Wow thank you, this formative vs. reflective issue totally evaded me in my initial research. You'd think it might have been mentioned in something because there seems to be a lot written about it if you search for it, but it didn't come up in any of the things I read when trying to understand latent variables.

From what I have read so far, it appears to me that a formative model may be appropriate for what I am trying to do, but I am not getting intuitive results when I employ it in practice (the coefficients I was getting with a reflective construct were more what I expected than what I seem to be getting with a formative construct). I will keep researching this and thanks again for all the help.