I wanted to check with the community if anyone has experience in using SEM for predictive modeling. If yes, how does one approach it. Thanks in advance for your help!
We use STATA gsem for prediction.
In R package lavaan, there is also a prediction function. But it is not well documented. There are two types of prediction mode, 'lv' and 'ov' for latent variables and observed variables.
I did not find any resources about doing predictive modeling in OpenMx.
The prediction function lavPredict in lavaan does not work correctly for observed variables. For example, if you implemented a simple regression models and tried to predict the response, it would give you the observed response instead of the predicted response. This is noted in the help information (but is easy to miss).
Does anyone have information on how to get OpenMx to produce predictions for endogenous variables? It would be very useful.
This isn't an implemented feature, but to the extent your data are complete, you can do this with mxEval, definition variables, or repeated model evaluations like the Estabrook & Neale factor scoring method. The predicted value of y should be the mean of y for any given row of data. If you have a vector/matrix of regression weights, predicted value of y should just be X %*% B. For more complex models, you'd have to cycle through the data a row at the time, replacing x with the vector of responses for a given individual.
There was a recent discussion among the developers about implementing this feature (more specifically, about getting residuals), but we either stopped or agreed to disagree regarding how OpenMx would decide what variables are exogenous and which are endogenous. For that general question and your specific one, can you tell us more about what you're looking to get out?
To my knowledge OpenMx does not have anything like this yet. Because it's come up on the forums a couple times now, I've been thinking about it. I'm not sure how latent variable predictions would differ from factor scores. These should be possible through RAM models with the mxFactorScores() function.
Similarly, there should be a way to obtain observed variable predictions as conditional means estimates depending on the latent variable predictions.
The thing I'm struggling with is just what are these predictions? Does anyone have a reference?
Why would predictions be distinct from the factor scores...?