To my knowledge, OpenMx does not calculate factor scores. One possible reason for this is the multiplicity of different ways of calculating them and the non-convergence of the different methods. Unfortunately, factor scores are not uniquely determined by a model. Would you like maximum likelihood based factor scores? These require iterative estimation. Would you like Bartlett, Thomson, or regression based factor scores? Or perhaps empirical Bayes estimates of factor scores via an EM algorithm? These methods all give factor scores that could be useful, but each with their own subtleties.

What method of factor score estimation would most interest you, and would you like some helper functions to estimate them?

A helper would be fantastic: I have never had to do this myself, and don't know how.

From reading around, regression-based methods are the most common, and seems to be the same as described by Thomson (1951), with Bartlett (1938) being a linear transform of Thomson: So a regression based helper would be good. That's also what M+ returns.

Sounds like maximum likelihood would be more robust: My present need is for factor scores for an ordinal model, and it seems that Baysian posterior methods are typically required for this, but if ML is robust across data types, estimating those would be my preference.

best, tim

Lawley and Maxwell, 1962. Factor Analysis as a Statistical Method. Journal of the Royal Statistical Society. Series D (The Statistician), Vol. 12, No. 3, Factor Analysis (1962), pp. 209-229

PS: The system of previewing, then submitting, then being challenged with a captcha and THEN having to push submit again is irritating… I thought I had submitted this earlier today, but I had only submitted it, responded to the captcha, but not submitted it again :-). Somehow, despite this most of my spam is from people signing up to do just this here ... Oh well :-)

where Lam is the factor loading matrix, Phi is the factor intercorrelation matrix, Psi is the manifest residual matrix, and Dat is the data matrix. This would require the user to give the function the correct matrices. I've attached a file that puts a light wrapper around the above function so you can give it a model and possibly some data and it will calculate factor scores. This function would be quite easy to misuse by giving it any model other than a factor model, so I'm not sure if it should really be added to OpenMx any time soon.

ML-based factor scores and Bayesian estimation techniques would involve more effort than I'm currently interested in putting forth. But I agree that eventually, it would be nice to have them be part of OpenMx! Any takers?

thanks for this lovely function. It has proven very useful, already. I suggest a little modification that makes this work on models with a single latent factor, too. The modification is only in the call of fscorereg; when filtering the latents and manifests from A and S, we should have a drop=F to make it work. See below:

fscoreregm <-function(model, data=NA){if(any(is.na(data))& model@data@type!="raw"){cat("Giiiiiiiiiiiiirl, no you di'n't!
Your model should have raw data or you should give
this function a separate raw data matrix with no
missing values\n")stop()}
A <- mxEval(A, model)
S <- mxEval(S, model)F<- mxEval(F, model)
lv <- model@latentVars
mv <- model@manifestVars
if(any(is.na(data))){D<-t(model@data@observed)}else{D<-t(data)}
f <- fscorereg(A[mv, lv,drop=FALSE], S[lv,lv,drop=FALSE], S[mv,mv,drop=FALSE], D)return(t(f))}

cheers,
Andreas

Note: This does not work, if you use drop=F, which I had tried first, since F is in fact a matrix. Uh oh.

This is a nice start, Mike, and will work well with the LISREL objective.

From a recent simulation, I have a few functions for Bartlett and iterative ML factor scores lying around that will one day end up on the project. They work for single factor models of a specific form, and I'll have to clean them up and document them before sharing. There are a (large) number of issues with factor scoring that will have to be dealt with to make a usable function. Here goes.

General conceptual problems:
-Factor scores aren't part of the factor model. This is not specific to any of the comments above, but it should be said that factor scores come from a second estimation, and are not part of nor necessary for the fitting of a factor model. EFA defines the axes that govern the common factors, while CFA uses latent variables to parsimoniously define expected covariance. Factor scores are a second and separate step. /soapbox
-Factor score estimators have various good and bad properties, which is why we have a bunch of methods. You generally find shrinkage with most estimators, and can find biased factor means and covariances. There's a Psychometrika paper by Tucker ('71 I think) that compares Bartlett, Thompson, Anderson & Rubin and others.

Questions and Issues with a helper function:
-First, how customizable should it be? Doing a helper for LISREL models is fairly easy. RAM is almost as easy, and supporting arbitrary models (i.e., mxFIMLObjective) is much harder. I only ask because there is a trade-off between flexibility and usability/parsimony.
-One of the difficulties with a RAM version is that not every circle on the path diagram should yield a factor score. Latent residual variance terms are one example: including latent residuals means that there are more factor scores than observed variables.
-Ordinal data is a complex case. You're restricted to either expected posterior or iterative ML methods. Data sorting is important, so that we don't waste time re-estimating scores for the same data pattern over and over.

Mike, I have some code for the ML version that I was planning on cleaning up with the afore mentioned simulation trickles through the review cycle. I can take the lead on that.

Thanks guys.
I think that being able to output factor scores is a very valuable core functionality which most users would expect in an SEM package, so this will be very helpful for the project.

I think that getting something going for RAM would be great, as that is how the majority of factor models will be implemented (and currently the only path interface).

When LISREL syntax comes along, then that would be nice to support.
mxFIMLObjective might be left as the diagramming has been left for matrix models, i.e., out :-)

RAM version is that not every circle on the path diagram should yield a factor score. Latent residual variance terms are one example: including latent residuals means that there are more factor scores than observed variables.

I think supporting Ordinal and mixed data types would be a big plus.

If you've made progress on an iterative ML function, then for my money, getting a version of that going or even debugging in out here in the wild would be most valuable, as it solves the real needs of most users.

Data sorting for efficiency, skipping through already estimating data patterns, sounds like it would yield good speed ups, but I guess this has also been solved in the model estimation code: can be borrowed from there?

ML estimation of factor scores with ordinal data is usually pretty rapid, because the only parameters to estimate are the factor scores of the individuals, and the individuals in a sample are independent of each other. Thus with a sample size of 1000 and a 3 factor model we'd fit 1000 3-parameter models, each of which would have only 1 data vector. While speed-ups would occur if the data vectors are identical, this may be rare if continuous definition variables have been used to deal with measurement invariance.

Ryne & I have been working on factor scores for a while. Note that with missing data it gets quite a bit more complicated. Estimation is ok, but interpretation is a bit odd, with different shrinkage depending on how many & which items are missing. We have a paper about this under construction.

Agreed, a Bayesian MCMC approach is in principle better here. A possible equivalent in ML territory would be to bootstrap estimate model parameters, and then estimate factor scores for each bootstrap solution. Then the distribution of the factor score estimates should match that of the MCMC series. I've not tried this but it seems like the right way to go.

A likelihood-based CI of a single factor score estimated from the un-bootstrapped parameter estimates would underestimate the true CI, unless the sample size was so huge as to make the error on the parameter estimates negligible.

I've been computing latent factor scores manually. It would be nice if they were automatically computed.

To my knowledge, OpenMx does not calculate factor scores. One possible reason for this is the multiplicity of different ways of calculating them and the non-convergence of the different methods. Unfortunately, factor scores are not uniquely determined by a model. Would you like maximum likelihood based factor scores? These require iterative estimation. Would you like Bartlett, Thomson, or regression based factor scores? Or perhaps empirical Bayes estimates of factor scores via an EM algorithm? These methods all give factor scores that could be useful, but each with their own subtleties.

What method of factor score estimation would most interest you, and would you like some helper functions to estimate them?

Hope this helps!

Hi Michael,

A helper would be fantastic: I have never had to do this myself, and don't know how.

From reading around, regression-based methods are the most common, and seems to be the same as described by Thomson (1951), with Bartlett (1938) being a linear transform of Thomson: So a regression based helper would be good. That's also what M+ returns.

Sounds like maximum likelihood would be more robust: My present need is for factor scores for an ordinal model, and it seems that Baysian posterior methods are typically required for this, but if ML is robust across data types, estimating those would be my preference.

best, tim

Lawley and Maxwell, 1962. Factor Analysis as a Statistical Method. Journal of the Royal Statistical Society. Series D (The Statistician), Vol. 12, No. 3, Factor Analysis (1962), pp. 209-229

PS: The system of previewing, then submitting, then being challenged with a captcha and THEN having to push submit again is irritating… I thought I had submitted this earlier today, but I had only submitted it, responded to the captcha, but not submitted it again :-). Somehow, despite this most of my spam is from people signing up to do just this here ... Oh well :-)

Hi Tim,

A very basic function to calculate regression-based factor scores would be

#----------------------------------------------

# Define Regression Factor Score Calculator

fscorereg <- function(Lam, Phi, Psi, Dat){

Sig <- Lam %*% Phi %*% t(Lam) + Psi

return(Phi %*% t(Lam) %*% solve(Sig) %*% Dat)

}

#----------------------------------------------

where Lam is the factor loading matrix, Phi is the factor intercorrelation matrix, Psi is the manifest residual matrix, and Dat is the data matrix. This would require the user to give the function the correct matrices. I've attached a file that puts a light wrapper around the above function so you can give it a model and possibly some data and it will calculate factor scores. This function would be quite easy to misuse by giving it any model other than a factor model, so I'm not sure if it should really be added to OpenMx any time soon.

ML-based factor scores and Bayesian estimation techniques would involve more effort than I'm currently interested in putting forth. But I agree that eventually, it would be nice to have them be part of OpenMx! Any takers?

Cheers,

Mike Hunter

Hi Mike,

thanks for this lovely function. It has proven very useful, already. I suggest a little modification that makes this work on models with a single latent factor, too. The modification is only in the call of fscorereg; when filtering the latents and manifests from A and S, we should have a

`drop=F`

to make it work. See below:cheers,

Andreas

Note: This does not work, if you use drop=F, which I had tried first, since F is in fact a matrix. Uh oh.

This is a nice start, Mike, and will work well with the LISREL objective.

From a recent simulation, I have a few functions for Bartlett and iterative ML factor scores lying around that will one day end up on the project. They work for single factor models of a specific form, and I'll have to clean them up and document them before sharing. There are a (large) number of issues with factor scoring that will have to be dealt with to make a usable function. Here goes.

General conceptual problems:

-Factor scores aren't part of the factor model. This is not specific to any of the comments above, but it should be said that factor scores come from a second estimation, and are not part of nor necessary for the fitting of a factor model. EFA defines the axes that govern the common factors, while CFA uses latent variables to parsimoniously define expected covariance. Factor scores are a second and separate step. /soapbox

-Factor score estimators have various good and bad properties, which is why we have a bunch of methods. You generally find shrinkage with most estimators, and can find biased factor means and covariances. There's a Psychometrika paper by Tucker ('71 I think) that compares Bartlett, Thompson, Anderson & Rubin and others.

Questions and Issues with a helper function:

-First, how customizable should it be? Doing a helper for LISREL models is fairly easy. RAM is almost as easy, and supporting arbitrary models (i.e., mxFIMLObjective) is much harder. I only ask because there is a trade-off between flexibility and usability/parsimony.

-One of the difficulties with a RAM version is that not every circle on the path diagram should yield a factor score. Latent residual variance terms are one example: including latent residuals means that there are more factor scores than observed variables.

-Ordinal data is a complex case. You're restricted to either expected posterior or iterative ML methods. Data sorting is important, so that we don't waste time re-estimating scores for the same data pattern over and over.

Mike, I have some code for the ML version that I was planning on cleaning up with the afore mentioned simulation trickles through the review cycle. I can take the lead on that.

Thanks guys.

I think that being able to output factor scores is a very valuable core functionality which most users would expect in an SEM package, so this will be very helpful for the project.

I think that getting something going for RAM would be great, as that is how the majority of factor models will be implemented (and currently the only path interface).

When LISREL syntax comes along, then that would be nice to support.

mxFIMLObjective might be left as the diagramming has been left for matrix models, i.e., out :-)

RAM version is that not every circle on the path diagram should yield a factor score. Latent residual variance terms are one example: including latent residuals means that there are more factor scores than observed variables.

I think supporting Ordinal and mixed data types would be a big plus.

If you've made progress on an iterative ML function, then for my money, getting a version of that going or even debugging in out here in the wild would be most valuable, as it solves the real needs of most users.

Data sorting for efficiency, skipping through already estimating data patterns, sounds like it would yield good speed ups, but I guess this has also been solved in the model estimation code: can be borrowed from there?

ML estimation of factor scores with ordinal data is usually pretty rapid, because the only parameters to estimate are the factor scores of the individuals, and the individuals in a sample are independent of each other. Thus with a sample size of 1000 and a 3 factor model we'd fit 1000 3-parameter models, each of which would have only 1 data vector. While speed-ups would occur if the data vectors are identical, this may be rare if continuous definition variables have been used to deal with measurement invariance.

Ryne & I have been working on factor scores for a while. Note that with missing data it gets quite a bit more complicated. Estimation is ok, but interpretation is a bit odd, with different shrinkage depending on how many & which items are missing. We have a paper about this under construction.

Just had to get this done using another program…

They resample the distribution of factor scores for each subject and return the posterior mode.

Agreed, a Bayesian MCMC approach is in principle better here. A possible equivalent in ML territory would be to bootstrap estimate model parameters, and then estimate factor scores for each bootstrap solution. Then the distribution of the factor score estimates should match that of the MCMC series. I've not tried this but it seems like the right way to go.

A likelihood-based CI of a single factor score estimated from the un-bootstrapped parameter estimates would underestimate the true CI, unless the sample size was so huge as to make the error on the parameter estimates negligible.

Just FYI Ryne, the wrapper that I wrote takes a RAM specified model, assumes it is a factor model and calculates Bartlett factor scores accordingly.

Cheers,

Mike