Categorical and Continuous Data

3 replies [Last post]
spanosal's picture
Joined: 06/02/2010

I have a basic question about running models in OpenMx. I am interested in running a trivariate cholesky using two categorical variables (i.e., the presence or absence of a diagnosis) and one continuous variable (personality). Is this possible? If so, do i need to change my basic trivariate script (which I have tested using all continuous variables and it runs) in some way?

Thank you!

neale's picture
Joined: 07/31/2009
It is possible, but not

It is possible, but not convenient with the current version. The joint analysis of ordinal and continuous variables is very high - perhaps highest - on the wish list of features for future releases of OpenMx. Once that is implemented it will be relatively straightforward.

So how is it possible in the absence of an explicit feature? I have run such analyses with classic Mx, although only in the bivariate case with one binary and one continuous measure. It is limited in that complete data are required for the continuous measure. The trick is to make use of Bayes' Theorem. We want the joint likelihood of the continuous and ordinal measures, and the likelihood is essentially a probability conditional on the parameters of the model. Bayes' Theorem states:

p(A & B) = p(A|B) p(B).

Let p(A) be the likelihood of the binary variables and p(B) be the likelihood of the continuous measures. We can write the joint likelihood as the likelihood of the ordinal variables conditional on the values of the continuous measures, p(A|B) and multiply this by the likelihood of the continuous variables (p(B)). If we read in the continuous variables as definition variables, then on an individual likelihood basis we can compute the conditional distribution of the ordinal variables, which in turn can be obtained from a simplified application of the Pearson-Aitken selection formula. This is not really any more complicated than a simple regression model. The conditional distribution changes in its means (and hence thresholds), and in its covariance (these get reduced according to the size of the covariance between the continuous and binary variables). The guts of the classic Mx script for this purpose looks like this:

Begin Matrices;
M Full 1 1 Free ! Means of continuous variables
W Full 1 2 ! Continous variables, as definition variables
K Unit maxcat 1 ! To expand means to equal threshold formula (maxcat rows)
End Matrices;

Begin Algebra;
P = F&S; ! Cov of continuous vars
Q = F*S*J'; ! Cov between cont & ords
R = J&S; ! Cov of ordinal vars
End Algebra;
Thresholds L*(T|T) - K@ (Q*P~*(W'-(M_M)))'; ! model for thresholds conditional on continuous variables
Covariances R - Q&(P~) ; ! model for MZ variance/covariances of ordinal variable conditional on continuous
! = Cyy - [Cyx (Vx−1 − Vx−1 newVx Vx-1) Cxy] but newVx is null (zeroes); -1's denote inverses
Weight \pdfnor(W_M|M_P) ; ! model for MZ's continuous variables, must be complete data

Possibly, this enough for someone - perhaps you - to hack together an OpenMx script for this purpose. I attach the full classic Mx script.

AttachmentSize 4.22 KB
spanosal's picture
Joined: 06/02/2010
Thank you so much for your

Thank you so much for your help Dr. Neale. I wonder if you have a guess as to when this type of model may be available in OpenMx, as I have some time before I need to run them?

Thanks again,

neale's picture
Joined: 07/31/2009
This mostly depends on how

This mostly depends on how busy Tim Brick is over the next few months. Prior to the release of 1.0 (Aug 1) most effort will be on performance and robustness of the existing feature set. After that, it's a pretty high priority item, especially judging from the poll on the OpenMx homepage. But, it isn't easy - lots of housekeeping to get everything in the right place at the right time to get the conditional distribution sorted out. Hopefully this fall will see it in beta.