Sun, 09/27/2009 - 09:24

Hi,

Getting my head around how to add a covariate to a script.

In mx1.x, there were two 'data' statements: "SElect" and "DEfinition_variables".

**Select** is equivalent to **mxData**() and selects the variables from which expected covariances (and means) are computed.

**Definition** creates a list of covariates used to populate matrices with individual-level data, such as sex and age. (in old mx this was done using negative indices, but in OpenMx we could use the names() of the definition columns as labels signifying a matrix is to be created for every subject using definition-level data).

I can't see an OpenMx equivalent of "Definition". Hopefully this is so powerfully simple now that I am over-looking an obvious usage in OpenMx but can't see it...

Just adding covariates onto the end of the mxData() statement seems wrong when we don't want to model the covariance of these covariates with the variables in which we are interested, just model them out of the means of our mxData() columns?

Is an mxCovariates(data) statement needed?

So if you had a covariates dataframe

head(covs) age1 age2 1 ... ....

You could then say:

mxModel( ... mxCovariates(covs) # setting a value to a name brings in correct row of data from covariates mxMatrix(nrow=1, ncol=nCov,free=TRUE, values=c('age1'), name="T1covs") mxMatrix(nrow=1, ncol=nCov,free=TRUE, values=c('age2'), name="T2covs") # Algebra evaluated across all subjects individually mxAlgebra(Means J+ T1covs*beta | K+ T2covs*betas, name="expMeans") )

Thanks, that's helpful. My point is just that when specifying definition variables it is restrictive to have to do it column by column.

How else would you do it? Given that a matrix is a series of columns, there is no work to be done on the data. if you think that

a = b1x + b2y + b3z

Then how would you write this apart from something like:

mxPath(from = mycovNames, to = myDV)

Interested to see your suggested syntax

The examples of definition variables that I have been able to find on this site all refer to single columns of data, but when conducting a multiple regression it is common to have a N x K matrix of covariates. I guess the way you do that you supply individual columns of data cov1, ..., covK in the mxData object and then define an mxMatrix of covariates with labels = paste( "data.cov", 1:K, sep = "" ). This is not very elegant though because it requires splitting the covariate matrix into columns. Is there a better way?

Actually, it's easier than that.

First you need an R dataframe or matrix. Each column in the dataframe or matrix must be named. Your mxData statement points to that dataframe or matrrix.

Then you just use those variables directly. If they are covariates in the multiple regression sense, declare them as manifest variables.

If you wish to constrain a path to be equal to the value in one of the data columns you set a label as "data.x" where "x" is the name of the column in your original dataframe or matrix. This is not going to result in a covariate as in a multiple regression. For that, you need to use the data column as a manifest variable, not a constraint.

Here's a quick multiple regression. Suppose I have a csv data file with columns named "x", "y", "z", and "w". As is traditional, I'll have "y" be the outcome variable. Here is a multiple regression with x, z, and w predicting y.

(Posted something but removed it)

Basically, we expect that the cbind is done prior to creating the mxModel. That is to say, the covariates and other data should be in the same dataframe or data matrix. The dimnames of this dataframe (or matrix) are how you identify what columns of data you want to use where.

Basically, that's all there is to it. If you use a data column name as an element in a matrix, it is used as a definition variable. If you use a data column name as a regular variable, it is used in that way. Note that there is no longer a proscription against using a data column as both a definition variable and a regular variable in the same model.

In this way, we have made the old Mx functions of SElect and DEfinition obsolete. Defining the dimnames of your model matrices or using a label for an element of a data matrix that is of the form "data.foo" are all you need.

For instance,

Creates a symmetric mxMatrix named "bar" that could be used to estimate the variances of x and y while holding their covariances constant to the value of foo at each row.

There are demos that show this for either matrix or path-style input:

demo(DefinitionMeans_MatrixRaw, echo=TRUE)

demo(DefinitionMeans_PathRaw, echo=TRUE)

Clever. I had not realized that. An improvement on Mx1 indeed.

Hi Tim

Take a look at the scripts models/passing/DefinitionMeans.R and models/passing/DefinitionMeansPath.R. The key is to name paths or matrix elements as, e.g., data.age where age is the variable in the dataset that you want to be a definition variable. The data. tag alerts OpenMx that this variable should be taken out of the set being analyzed, and reserved as a definition variable.

Can a definition variable appear without data in the same model?

It's very nice to be able to pull the algebras for multiple group twin designs into a supermodel, and just keep the objective and data in the groups... When using definition variables, I have run into a problem in implementing this pattern, which seems to require anything referring to a definition variable to be in the model that contains the data referred too.

Here's a short 1-group example based on modifying DefinitionMeans.R

set.seed(200)

n = 500

Sigma <- matrix(c(1,.5,.5,1),2,2)

group1<-mvrnorm(n=n, c(1,2), Sigma)

group2<-mvrnorm(n=n, c(0,0), Sigma)

y<-rbind(group1,group2)

dimnames(y)[2]<-list(c("x","y"))

def<-rep(c(1,0),each=n)

selvars<-c("x","y")

model<-mxModel("top",

mxMatrix("Symm", nrow=2, ncol=2, free=TRUE, values=c(1, 0, 1) , name="Sigma"),

mxMatrix("Full", nrow=1, ncol=2, free=TRUE, values=c(0, 0) , dimnames=list(NULL, selvars), name="beta"),

mxMatrix("Full", nrow=1, ncol=2, free=FALSE,labels=c("data.def"), dimnames=list(NULL, selvars), name="def"),

mxMatrix("Full", nrow=1, ncol=2, free=TRUE , dimnames=list(NULL, selvars), name = "M"),

mxAlgebra(M+beta*def, name="Mu"),

mxAlgebra(model1.objective, name="alg"),

mxAlgebraObjective("alg"),

mxModel("model1", mxFIMLObjective("top.Sigma", "top.Mu", selvars), mxData((data.frame(y,def)), type="raw"))

)

fit<-mxRun(model)

# Running top

# Error: A definition variable has been declared in model 'top' that does not contain a data set

summary(fit)

The meaning of this pattern becomes unclear when the definition matrix is used anywhere outside of a row-by-row objective function (like FIML).

If you use a definition in an algebra in that top model, it has no meaning, since there's no "current data row" for that model.

In a different submodel with different data, its would have to take on the data row for that model, which is likely different than that of the first submodel. So what value it has at any point in the calculation is unclear.

And if you're not using it in other submodels or in the top model, what's the purpose of putting it in the top model?

What I think you're arguing for is that the definition variable and any matrices and algebras that depend on it should be treated as a duplicate matrix/algebra that's the same, but appropriate to the new submodel. I'd argue that this is an easy way to cause confusion, since the value of a matrix is different when it is used at different places.

Try this specification instead:

Aaaah Tim Brick answer is too complicated! @Tim Bates, yes you can use a definition variable in model A that uses a data set from model B. But you need to tell the definition variable which data set to use. Replace the reference to "data.def" with "model1.data.def" and the script works. I've attached a working version.

Sorry about the overly-complex response.

Should we really be allowing users to reference definition variables from outside the model with the associated data? I think we should at least recommend against it.

If it's legal, we at least have to decide what value the definition variable "model1.data.def" has when it's used outside model1.

I'm less concerned about definition variables being used outside the model with the associated data. After all, the front-end is flattening all the dependent submodes into one giant thingy, so where the user places definition variables is a matter of style not substance.

I am concerned about the following scenario, which does NOT reference definition variables outside the model with the associated data (although I could have written it so that it did).

Does 'alg' see intermediate values of submodel1.A and submodel2.A? Or does it see the values for the last row in the data frames? If it always sees the value of the last row, then we are consistent and that doesn't bother me.

'Alg' should never see intermediate values.

After the initial computation, 'A' in each submodel will be filled in with the appropriate (last row) data value. Those value will only change during the FIML calculation, and will be set to the last row at the end of the FIML Objective, which is before anything else will be calculated. In this case, they actually won't change at all, since the FIML Objective won't be calculated as part of the objective computation.

I wouldn't count on this behavior, though. As OpenMx becomes more parallel, we'll have to be very careful if we want to keep this behavior consistent.

Thanks Tim and Michael... hmmm.

Both answers have similar implications: anywhere a definition variable is needed (G*E designs etc), the algebras have to be per-group: Either written in duplicate in the supermodel with full addresses for the definition variables including their model, or cloned into submodels.

The logic makes sense: it is all per person, so someone is going to have to make a unique structure per group.

I guess I thought when solving for a parameter that has to be equal in all the groups (like a beta weight), that it would be possible to specify that abstractly at the supergroup level (just as shared non-definition values can be solved for, like a value for A that works in all groups). Then that the optimiser would bring in the right data via the group-specific mxData() statement, and solve the algebra locally for the objective function, using group specific definition data for "data.def", along-side shared definition parameter values like "top.A"

OK: forward to the drawing board :-) Thanks again!!

hi steve and mike,

Thanks for that: I had looked at the definition scripts, but didn't grok the special meaning of the "data." prefix.

That's a powerful tool. Nice!

I made a bit of a page about it, will expand as learn more.

I wonder if the summary should collect up and report vars used as individual level? would help with debugging, I think.

What I'd really like to see is the option for the back end to return the parameters and fit for each individual when we are running FIML.

So, instead of a vector of summary-level parameters coming back, we'd get an NxP matrix of parameters. Statistics on these individual-level parameters can be very useful.