Fri, 02/08/2013 - 10:03

Hi,

I am currently working on a saturated model with 6 groups (mzm, dzm, mzf, dzf, dosfm, dosmf) and a dichotomous outcome variable (with one threshold). See part of the saturated model below.

I have fitted 1 dichotomous definition variable in this model. At the first try it didn't work and after reading one of the previous threads I figured out that this is because the definition variable cannot have missings in OpenMx.

In my case, is there any way I can work around this without deleting all subjects who have missing values for the definition variable? I only need to take into account the one definition variable.

Thank you in advance!

Jorien

multiTwinSatModel <- mxModel("multiTwinSat",

mxModel("MZM",

mxMatrix(type="Stand", nrow=nvar, ncol=nvar, free=TRUE, values=0.02830624, name="expCovMZM" ),

mxMatrix(type="Zero", nrow=1, ncol=nvar, name="expMeanMZM" ),

mxMatrix(type="Full", nrow=nthresh, ncol=nvar, free=TRUE, labels=c("thrMZMt1", "thrMZMt2") ,values=thValues, lbound=thLBound, name="ThreMZM" ),

# Matrix with beta effect of definition variables on threshold

mxMatrix(type="Full", nrow=1, ncol=ndef, free=TRUE, values=-0.01, label="b1", name="b" ),

# Matrices for definition variables

mxMatrix(type="Full", nrow=ndef, ncol=nvar, free=FALSE, labels=c("data.definition_var_1", "data.definition_var_2"), name="Def"),

# Matrices to calculate effect on threshold (beta * definition variable)

mxAlgebra(expression= b %*% Def, name="DefR"),

# Matrix algebra to specify expected thresholds model

mxAlgebra(expression= ThreMZM + DefR, name="expThreMZM"),

mxMatrix(type="Iden", nrow=nvar, ncol=nvar, name="I"),

mxAlgebra(solve(sqrt(I*expCovMZM)), name="iSDmzm"),

mxAlgebra(iSDmzm%*%expCovMZM%*%iSDmzm, name="expCorMZM"),

mxData(observed=mzmData, type="raw" ),

mxFIMLObjective(covariance="expCovMZM", means="expMeanMZM", dimnames=selVars, thresholds="expThreMZM")

),

OpenMx has no way of handling missing values in definition variables. The problem is that there are just too many possible ways to use a definition variable for us to try to guess how you would want a missing value handled. With that being said, you still have several good options.

First, you could try to restructure your model without definition variables. Some uses of definition variables can be reduced to multigroup models with a small number of groups: one corresponding to each value of the definition variable.

Second (and reading between the lines of your code maybe a better option), you could fill in the missing values of the definition variable with a value that handles the missingness how you want. Candidates for the fill-in values are probably 0 and 1. This really depends on your model. What does it mean when your definition variable is missing? How do you want the model to reflect the lack of this information? Again, intuiting from the code you provide, it looks like there is supposed to be an additive effect on ThreMZM by DefR. That is, you create the expected threshold for the MZM by adding ThreMZM and DefR. Replacing missing values of your definition variable with zeros would make the expected threshold unchanged for rows with missing definition variables.

Third, you could use listwise deletion to drop all rows of data with a missing definition variable.

Do any of these options sounds like the solution you're looking for?

Thank you for the quick response,

I will definitely try out your suggestions.

Best,

Jorien

Sometimes the definition variable is only missing when the data it affects is also missing. Say for example we have the age at measurement and a variable of interest (Y) for twin 1, but Y is missing for twin 2, as is their age at measurement. The model simply says that Y should be regressed on the age at measurement. In that case, twin 2's age at measurement is irrelevant though we may want to keep twin 1 in the sample. Here it would make sense to make the definition variable some ridiculously large number like 999999. The good thing about this approach is that if by mistake you actually have data for twin 2 but mistakenly marked their age as 999999 the model would likely yield silly results or not run at all.

Finally, there are other things one could do with a missing definition variable, such as integrate over the distribution in the data, or their conditional distribution given other variables measured on the same data vector. Such options are really beyond the scope of a forum post, but worthy of investigation if that is your goal.

Actually, that is exactly the case for my data. The definition variable is only missing when the outcome variable for that twin is also missing. So I think I will try this approach as well.

Thanks!