Mon, 04/02/2012 - 09:27

Suppose I have data on n subjects with at most 3 time points. The first subject has x1, x2, and x3 responses at times 0, 1, and 2. The second has time 0, and 3, and the third only at time 0 and so forth. Would it make sense to create 3 definition variables, say d1, d2, and d3:

d1 d2 d3

0 1 2

0 3 NA

0 NA NA

and use data.d1, data.d2, and data.3 for the slope path labels in a growth model that has a latent intercept and a latent slope?

I created my own data (x1, x2, x3, x4, x5, t1, t2, t3, t4, t5) for measurements and time points and modified the growth curve model example. I replaced the factor loadings with c("data.t1","data.t2","data.t3","data.t4","data.t5"). If I keep the time points 0, 1, 2, 3, 4 as in the original model, the modified code with definition variables yields the same results. If I replace some of the time points with NA's, I get an error message:

Running Linear Growth Curve Model Path Specification

Error: The job for model 'Linear Growth Curve Model Path Specification' exited abnormally with the error message: Expected covariance matrix is not positive-definite in data row 1 at major iteration 0 (minor iteration 1).

In addition: Warning message:

In runHelper(model, frontendStart, intervals, silent, suppressWarnings, :

Not calculating confidence intervals because of error status.

If I have no NA's in the time points but replace some of the x's with NA's, I don't receive this error and the results look OK.

I've attached the modified code.

This is a great problem to be talking about. First, you are absolutely correct that using definition variables to denote individually varying times of observation is both the correct and powerful way of attacking this problem. Other software packages use very similar methods, specifically Mplus' TSCORES option, though definition variables are more flexible.

As you've figured out for yourself, NA values for definition variables lead directly to errors. I was under the impression that we were catching this more directly and throwing an error specific to NA definition variables, but I'll look into it. Missing values for the observed data and for definition variables mean very different things. Missing data just means that we have to censor the expected covariance matrix to match the data, so if you're expected covariance matrix is for five timepoints and an individual is missing on the last two, we generate the likelihood of that data pattern for only the data they actually have.

Definition variables are quite different, as missing definition variables make it impossible to generate the expected covariance matrix in the first place. As OpenMx doesn't provide a "link" between definition and observed variables, we can't censor in the same way as with observed data.

If you have a missing definition variable that represents an observation time, you're essentially saying "I observed this person, but I don't know when." As such, you can't include that observation in a growth curve without making an assumption about when it occurred. If you're comfortable with that assumption, then make it directly and put in the observation time. If not, then remove the data at those observations and set the definition variables to a constant value like so:

Ryne wrote:

An error is thrown when NA values for definition variables are encountered. The test script did not have NA values in its definition variables.

`any(is.na(growthCurveModel$data@observed[,c('t1','t2','t3','t4','t5')])) #returns FALSE`

The NA definition variables are added in lines 25 and 26, and are commented out in rabil's script. Uncomment them and the error pops up.

Whoops. You are correct, sir. There was a bug in the handling of NA definition variables. I'm committing a patch.

This should surely not be the cause of the problem, but you probably don't need the five latent variables t1 to t5 that you create in your model.

If I run this script with

, I get this on "1.2.2-1986":

"The job for model 'Linear Growth Curve Model Path Specification' exited abnormally with the error message: Objective function returned a value of NaN."

Could you explain a bit more? How can I represent the sets of times for each subject and at the same time have one row represent each subject? Maybe the regression on time could be represented in a sub-model, but I'm still not clear on how the dataset should be constructed.

Thanks for sharing a batter suggestion. In vegetation pathology we are often interested in revising disease advancement over time, where time is modeled as a continuous variable rather than as a discrete variable. numerous distinct population development models have been used for modeling infection advancement curves.