Thu, 08/23/2012 - 20:03

Hi guys, I wrote up a moderation model script, looking a several scripts from the Boulder workshops as reference. When running it, I'm getting large swings in -2LL and parameter estimates with start value and bound changes, so there has be an identification problem in my model specification, but after staring at it for a hour - I still can't see where.

Could someone glance over the code and let me know if they see any glaring error on my part? I also coded a version without specifying the A1, ..., C12 portions of the covariance matrix separately (instead specifying the horrible mess of a covariance matrix in terms of the moderators and ACE-level parameters) just to confirm that that wasn't causing the problem.

univModMeansACEModel <- mxModel("univModMeansACE",

mxModel("ACE",

# Matrices a, c, and e to store a, c, and e path coefficients

mxMatrix( type="Full", nrow=nv, ncol=nv, free=TRUE, values=.6, name="a" ),

mxMatrix( type="Full", nrow=nv, ncol=nv, free=TRUE, values=.6, name="c" ),

mxMatrix( type="Full", nrow=nv, ncol=nv, free=TRUE, values=.6, name="e" ),

# Matrices a, c, and e to store moderated a, c, and e path coefficients

mxMatrix( type="Full", nrow=nv, ncol=nv, free=TRUE, values=.1, name="aI" ),

mxMatrix( type="Full", nrow=nv, ncol=nv, free=TRUE, values=.1, name="cI" ),

mxMatrix( type="Full", nrow=nv, ncol=nv, free=TRUE, values=.1, name="eI" ),

# Matrix & Algebra for expected means vector

mxMatrix( type="Full", nrow=1, ncol=nv, free=TRUE, values= 0, label="mean", name="mu" ),

mxMatrix( type="Full", nrow=1, ncol=1, free=TRUE, values=.01, name="b" ),

mxCI(c("b","aI","cI","eI"))),

mxModel("MZ",

# Matrix for Moderating Variable

mxMatrix( type="Full", nrow=1, ncol=1, free=FALSE, labels=paste("data.",modVars[1],sep=""), name="modA"),

mxMatrix( type="Full", nrow=1, ncol=1, free=FALSE, labels=paste("data.",modVars[2],sep=""), name="modB"),

# Matrices A, C, and E compute Moderated variance components

mxAlgebra((ACE.a + modA %*% ACE.aI) %*% t(ACE.a + modA %*% ACE.aI), name="A1" ),

mxAlgebra((ACE.c + modA %*% ACE.cI) %*% t(ACE.c + modA %*% ACE.cI), name="C1" ),

mxAlgebra((ACE.e + modA %*% ACE.eI) %*% t(ACE.e + modA %*% ACE.eI), name="E1" ),

mxAlgebra((ACE.a + modB %*% ACE.aI) %*% t(ACE.a + modB %*% ACE.aI), name="A2" ),

mxAlgebra((ACE.c + modB %*% ACE.cI) %*% t(ACE.c + modB %*% ACE.cI), name="C2" ),

mxAlgebra((ACE.e + modB %*% ACE.eI) %*% t(ACE.e + modB %*% ACE.eI), name="E2" ),

mxAlgebra((ACE.a + modA %*% ACE.aI) %*% t(ACE.a + modB %*% ACE.aI), name="A12" ),

mxAlgebra((ACE.c + modA %*% ACE.cI) %*% t(ACE.c + modB %*% ACE.cI), name="C12" ),

# Algebra for expected variance/covariance matrix and expected mean vector in MZ

mxAlgebra(rbind ( cbind(A1+C1+E1 , A12+C12),

cbind(A12+C12 , A2+C2+E2)), name="expCovMZ" ),

mxAlgebra(ACE.mu + ACE.b %*% modA, name="meanA"),

mxAlgebra(ACE.mu + ACE.b %*% modB, name="meanB"),

mxAlgebra(cbind(meanA,meanB), name="expMean"),

# Data & Objective

mxData(observed=MZdata, type="raw"),

mxFIMLObjective( covariance="expCovMZ", means="expMean", dimnames=selVars)),

mxModel("DZ",

mxMatrix( type="Full", nrow=1, ncol=1, free=FALSE, labels=paste("data.",modVars[1],sep=""), name="modA"),

mxMatrix( type="Full", nrow=1, ncol=1, free=FALSE, labels=paste("data.",modVars[2],sep=""), name="modB"),

# Matrices A, C, and E compute variance components

mxAlgebra((ACE.a + modA %*% ACE.aI) %*% t(ACE.a + modA %*% ACE.aI), name="A1" ),

mxAlgebra((ACE.c + modA %*% ACE.cI) %*% t(ACE.c + modA %*% ACE.cI), name="C1" ),

mxAlgebra((ACE.e + modA %*% ACE.eI) %*% t(ACE.e + modA %*% ACE.eI), name="E1" ),

mxAlgebra((ACE.a + modB %*% ACE.aI) %*% t(ACE.a + modB %*% ACE.aI), name="A2" ),

mxAlgebra((ACE.c + modB %*% ACE.cI) %*% t(ACE.c + modB %*% ACE.cI), name="C2" ),

mxAlgebra((ACE.e + modB %*% ACE.eI) %*% t(ACE.e + modB %*% ACE.eI), name="E2" ),

mxAlgebra((ACE.a + modA %*% ACE.aI) %*% t(ACE.a + modB %*% ACE.aI), name="A12" ),

mxAlgebra((ACE.c + modA %*% ACE.cI) %*% t(ACE.c + modB %*% ACE.cI), name="C12" ),

# Algebra for expected variance/covariance matrix in DZ

mxAlgebra(rbind ( cbind(A1+C1+E1 , 0.5%x%A12+C12),

cbind(0.5%x%A12+C12 , A2+C2+E2)), name="expCovDZ" ),

mxAlgebra(ACE.mu + ACE.b %*% modA, name="meanA"),

mxAlgebra(ACE.mu + ACE.b %*% modB, name="meanB"),

mxAlgebra(cbind(meanA,meanB), name="expMean"),

# Data & Objective

mxData( observed=DZdata, type="raw"),

mxFIMLObjective( covariance="expCovDZ", means="expMean", dimnames=selVars )),

mxAlgebra( expression=MZ.objective + DZ.objective, name="neg2sumll" ),

mxAlgebraObjective("neg2sumll")

)

I can dummy up some variables if anyone needs that to troubleshoot. I've tried on multiple data sets and compared to Classic MX output, so my model is definitely the problem.

As you wish. :) I'd not concerned at this moment about the submodels, just the overall moderation model on means and covariances - the means section is fine, it's the variance components that are all over the place.

Thanks!

Ok, so I ran the univModMeansACEModel model 50 times with random uniform starting values between 0 and 1. I hit the minimum in 34 of those draws (some start values were absolute crap), and got these results every time:

summary(best)

data:

[deleted]

free parameters:

name matrix row col Estimate Std.Error lbound ubound

1 ACE.a[1,1] ACE.a 1 1 0.778963031 0.06471610

2 ACE.c[1,1] ACE.c 1 1 -0.004968164 0.38187248

3 ACE.e[1,1] ACE.e 1 1 0.607697030 0.04745381

4 ACE.aI[1,1] ACE.aI 1 1 0.128122707 0.09096693

5 ACE.cI[1,1] ACE.cI 1 1 -0.133247072 0.17128589

6 ACE.eI[1,1] ACE.eI 1 1 -0.030170703 0.06290640

7 mean ACE.mu 1 1 -0.049559136 0.06205693

8 ACE.b[1,1] ACE.b 1 1 -0.007064058 0.05814635

observed statistics: 364

estimated parameters: 8

degrees of freedom: 356

[deleted]

Information Criteria:

df Penalty Parameters Penalty Sample-Size Adjusted

AIC: 268.2285 996.2285 NA

BIC: -1107.2242 1027.1375 1001.758

[deleted]

cpu time: 1.830985 secs

openmx version number: 999.0.0-2111

Does that look right?

This response was so incredibly helpful. I chose two sets of start values (both seemingly reasonable) and got different values and immediately assumed my coding was to blame. And it makes me feel better to know that I couldn't find my mistake because there wasn't one. It is always frightening how much start values can impact the final results.

To further abuse your OpenMX knowledge, how do you generally deal with choosing start values so that you're sure you reach the true minimum? Running a similar bit of code to the one you attached (Thank you, it looks like gibberish to me, but I'll have lots of fun figuring it out)? Is there any command to do something similar in the MxRun statement? The postdoc with the unfortunate role of trying to keep me on task mentioned something called "jiggle" in Classic MX... Or is there any validity to building the model more sequentially and using start values ascertained from simpler (less likely to have issues) models - in this case, starting with a univariate ACE to get a, c and e components, then added means moderation, etc..?

Thank you again!

Ok, lots of questions all mixed together, so here are a bunch of answers:

(a) start values usually don't matter as much as they did for your model. The simpler the model, the more plentiful and well-behaved the data, the more robust and reliable the optimizer will be.

(b) writing some script to generate multiple start values can make you more sure that you've hit the global minimum.

(c) There's not much of a "jiggle" command, but there is one situation where OpenMx will move your starting values. If your model hits a problem at the very first iteration, OpenMx will try moving all of your 0 starting values to 0.1. This is designed for the case where parameters are of indeterminate sign (i.e., 1 and -1 fit equally well), which can result in an estimate of that parameter's gradient as zero when it should move in either direction.

(d) building models sequentially is a great way to build starting values.

Generally, I try to pick reasonable starting values based on the summary stats of the data. If it's a simple model and I get reasonable parameter estimates and a status 0 back, then I probably stop there. Status 1 or 6, I look at the gradient/hessian to see what problems exist. From there, I'll decide whether to just tinker with starting values or run a script like the one above, depending on how the results look and how long each model takes.

Nothing jumps out at me. Mind throwing some data up along with the above code in a .R file?

Wild swings in -2LL don't necessarily connote identification issues, but do indicate convergence issues. What statuses are you getting when you reach these varying -2LLs and parameter estimates?