Wed, 06/06/2012 - 13:00

I've written a simple latent class model (simplified the OpenMx example code for the growth mixture model). I've run it with 1,000 random starts using the approach shown in the docs and I get the maximum likelihood estimates. When I use the same starting values that were used for the smallest -2LL (which has a status code of 0), I get the same estimates but when I request confidence interals, the upper and lower bounds equal the estimates. Usually, when I run various OpenMx models, I have few problems estimating the CI's. I have had this problem (lower bound = upper bound = estimate) before, but I have no idea why this is happening. Here are the first two lines of the CI's:

lbound estimate ubound

mean_tbut_1 15.0001004 15.000000 15.0001004

mean_sch_1 17.5494846 17.549385 17.5494847

You can see that the upper and lower bounds are about equal and differ only slightly from the estimate. The estimates appear very reasonable given the data. Does any one have an idea of why this is happening?

Here are the corresponding estimates and standard errors:

8 mean_tbut_1 Class1.M 1 m_tbut_1 14.9999999 0.004127186

9 mean_sch_1 Class1.M 1 m_sch_1 17.5493846 1.290374369

and the same for the other class:

17 mean_tbut_2 Class2.M 1 m_tbut_2 7.1495123 0.298924928

18 mean_sch_2 Class2.M 1 m_sch_2 11.6191305 0.534036366

The SE is small for mean_tbut_1 but this is the exception. Some of the variance and covariance parameters have NaN.

OK, a big problem seems to have been including covariances between the variables in each cluster. Once I fixed them to zero the model converges easily to the ML estimates (I used 100 sets of random starts).

However, I am still having the same problem with the confidence intervals. I've run many (non-mixture) models and have obtained confidence intervals. I've run the growth mixture model (modified to handle irregularly spaced time points for each subject) and computed confidence intervals. Not sure why I can't get the for this LCA.

It's interesting that the lbound and ubound are equal, both of which are higher than the estimate. Do any warnings or error messages come back on the code with the CIs? Can you share code/data that causes this problem?

When you say that some of the variance and covariance parameters are NaN, do you mean that the standard errors/Hessian contains NaNs, or that you have free parameters in your model that are NaN?

I can share the code but I cannot include the data set. The standard errors listed for some model parameters (that represent variances and covariances among the observed variable cluster errors - that is, the variability of the cluster after accounting for the latent means) show as NaN.

I modified the code from the growth mixture model example. I have 3 latent variables that represent the means for 3 observed vaiables in a cluster. The variances of these latent variables are fixed to zero (and hence there are no covariances among the latent variables).

I should also note that I ran similar code on a well-behaved simulated data set and it zeroed in on the cluster means.

Here is the code for class1:

class1 <- mxModel("Class1",

type="RAM",

manifestVars=names(df),

latentVars=c("m_tbut_1","m_sch_1","m_tosm_1"),

# residual variances and covariances

mxPath(

from=names(df),

arrows=2,

free=TRUE,

values = 10,

lbound=0.001,

labels=c("res_tbut_1","res_sch_1","res_tosm_1")

),

mxPath(

from="tbut",

to="schirmer",

arrows=2,

free=TRUE,

values=0,

labels="c_tbut_sch_1"

),

mxPath(

from="tbut",

to="tosm",

arrows=2,

free=TRUE,

values=0,

labels="c_tbut_tosm_1"

),

mxPath(

from="schirmer",

to="tosm",

arrows=2,

free=TRUE,

values=0,

labels="c_sch_tosm_1"

),

# latent variances and covariance

mxPath(

from=c("m_tbut_1"),

arrows=2,

free=FALSE,

values=c(0),

labels=c("var_tbut_1")

),

mxPath(

from=c("m_sch_1"),

arrows=2,

free=FALSE,

values=c(0),

labels=c("var_sch_1")

),

mxPath(

from=c("m_tosm_1"),

arrows=2,

free=FALSE,

values=c(0),

labels=c("var_tosm_1")

),

# intercept loadings

mxPath(

from="m_tbut_1",

to=c("tbut"),

arrows=1,

free=FALSE,

values=c(1)

),

mxPath(

from="m_sch_1",

to=c("schirmer"),

arrows=1,

free=FALSE,

values=c(1)

),

mxPath(

from="m_tosm_1",

to=c("tosm"),

arrows=1,

free=FALSE,

values=c(1)

),

# manifest means

mxPath(from="one",

to=names(df),

arrows=1,

free=FALSE,

values=0

),

# latent means

mxPath(from="one",

to=c("m_tbut_1", "m_sch_1","m_tosm_1"),

arrows=1,

free=TRUE,

values=colMeans(df,na.rm=TRUE),

labels=c("mean_tbut_1", "mean_sch_1", "mean_tosm_1")

),

# enable the likelihood vector

mxRAMObjective(A = "A",

S = "S",

F = "F",

M = "M",

vector = TRUE)

) # close model

I don't see any problems in the code you've provided. Without some type of data that replicates the error, I/we don't know where to start looking for a fix. Can you throw your data through fakeData (http://openmx.psyc.virginia.edu/wiki/generating-simulated-data)? It'll take your private data and generate multivariate normal data with the same variable names, missingness patterns and a covariance matrix that's pretty close to yours. If you can do that or make up other data that also causes the error in some portion of code, we'll have a starting point to diagnosing this problem.

I'm going to guess that the NaN standard errors are a clue. Those indicate that the Hessian for your model isn't positive definite; that is, at its "final" iteration, the asymptotic variance of a parameter was negative. OpenMx, NPSOL and other quasi-Newtownian methods find new values for future iterations by defining a "step size" that depends on the model's gradient and Hessian (first and second derivatives of the likelihood function). It's possible, though not likely, that the Hessian is of such a shape that it always tries to point positive for some of your parameters. Until I/we have a model to test it with, that's just a guess, though.

As an aside, please submit future code examples in an attached file. When users submit code inline in forum posts, we have to strip out all of the added formatting prior to helping you.

I think that the lower CI being slightly above the initial estimate is simply a numerical precision issue. Optimization failed because the gradient was too flat for it to get started, and it returned its best estimate which was basically equal (within 3-4 decimal places) to the starting value. Agreed it looks weird, and I think OpenMx should flag apparent failures of optimization when trying to find CI's.

I suspect that increasing step size or decreasing numerical precision would improve the chances that optimization would get going to find the CI's. You said latent class analysis so I am thinking that your variables are binary or ordinal (the usual term for continuous variables would be latent profile analysis). If the variables are ordinal I'd try making function precision around 1.E-8 or 1.E-9. If continuous then I'd go with say 1.E-14 or 1.E-15 and see if it helps.