# Examples, Matrix Specification¶

## Regression, Matrix Specification¶

Our next example will show how regression can be carried out from structural modeling perspective. This example is in three parts; a simple regression, a multiple regression, and multivariate regression. There are two versions of each example are available; one with raw data, and one where the data is supplied as a covariance matrix and vector of means. These examples are available in the following files:

• SimpleRegression_MatrixCov.R
• SimpleRegression_MatrixRaw.R
• MultipleRegression_MatrixCov.R
• MultipleRegression_MatrixRaw.R
• MultivariateRegression_MatrixCov.R
• MultivariateRegression_MatrixRaw.R

This example will focus on the RAM approach to building structural models. A parallel version of this example, using path-centric rather than matrix specification, is available here link.

### Simple Regression¶

We begin with a single dependent variable (y) and a single independent variable (x). The relationship between these variables takes the following form:

In this model, the mean of y is dependent on both regression coefficients (and by extension, the mean of x). The variance of y depends on both the residual variance and the product of the regression slope and the variance of x. This model contains five parameters from a structural modeling perspective , , , and the mean and variance of x). We are modeling a covariance matrix with three degrees of freedom (two variances and one variance) and a means vector with two degrees of freedom (two means). Because the model has as many parameters (5) as the data have degrees of freedom, this model is fully saturated.

#### Data¶

Our first step to running this model is to put include the data to be analyzed. The data must first be placed in a variable or object. For raw data, this can be done with the read.table function. The data provided has a header row, indicating the names of the variables.

myRegDataRaw <- read.table("myRegData.txt",header=TRUE)


The names fo the variables provided by the header row can be displayed with the names() function.

> names(myRegDataRaw)
[1] "w" "x" "y" "z"


As you can see, our data has four variables in it. However, our model only contains two variables, x and y. To use only them, we’ll select only the variables we want and place them back into our data object. That can be done with the R code below.

SimpleDataRaw <- myRegDataRaw[,c("x","y")]


For covariance data, we do something very similar. We create an object to house our data. Instead of reading in raw data from an external file, we can also include a covariance matrix. This requires the matrix() function, which needs to know what values are in the covariance matrix, how big it is, and what the row and column names are. As our model also references means, we’ll include a vector of means in a separate object. Data is selected in the same way as before.

myRegDataCov <- matrix(
c(0.808,-0.110, 0.089, 0.361,
-0.110, 1.116, 0.539, 0.289,
0.089, 0.539, 0.933, 0.312,
0.361, 0.289, 0.312, 0.836),
nrow=4,
dimnames=list(
c("w","x","y","z"),
c("w","x","y","z"))
)

SimpleDataCov <- myRegDataCov[c("x","y"),c("x","y")]

myRegDataMeans <- c(2.582, 0.054, 2.574, 4.061)

SimpleDataMeans <- myRegDataMeans[c(2,3)]


#### Model Specification¶

The following code contains all of the components of our model. Before running a model, the OpenMx library must be loaded into R using either the require() or library() function. All objects required for estimation (data, matrices, and an objective function) are included in their functions. This code uses the mxModel function to create an MxModel object, which we’ll then run.

uniRegModel <- mxModel("Simple Regression - Matrix Specification",
mxData(
observed=SimpleRegRaw,
type="raw"
),
mxMatrix(
type="Full",
nrow=2,
ncol=2,
free=c(F, F,
F, F),
values=c(0, 0,
1, 0),
labels=c(NA,     NA,
"beta1", NA),
byrow=TRUE,
name="A"
),
mxMatrix(
type="Symm",
nrow=2,
ncol=2,
values=c(1, 0,
0, 1),
free=c(T, F,
F, T),
labels=c("varx", NA,
NA,    "residual"),
byrow=TRUE,
name="S"
),
mxMatrix(
type="Iden",
nrow=2,
ncol=2,
name="F"
),
mxMatrix(
type="Full",
nrow=1,
ncol=2,
free=c(T, T),
values=c(0, 0),
labels=c("meanx", "beta0"),
name="M"),
mxRAMObjective("A", "S", "F", "M")
)


This mxModel function can be split into several parts. First, we give the model a name. The first argument in an mxModel function has a special function. If an object or variable containing an MxModel object is placed here, then mxModel adds to or removes pieces from that model. If a character string (as indicated by double quotes) is placed first, then that becomes the name of the model. Models may also be named by including a name argument. This model is named Simple Regression -- Matrix Specification.

The second component of our code creates an MxData object. The example above, reproduced here, first references the object where our data is, then uses the type argument to specify that this is raw data.

mxData(
observed=SimpleDataRaw,
type="raw"
)


If we were to use a covariance matrix and vector of means as data, we would replace the existing mxData function with this one:

mxData(
observed=SimpleDataCov,
type="cov",
numObs=100,
means=SimpleRegMeans
)


The next four functions specify the four matricies that make up the RAM specified model. Each of these matrices defines part of the relationship between the observed variables. These matrices are then combined by the objective function, which follows the four mxMatrix functions, to define the expected covariances and means for the supplied data. In all of the included matrices, the order of variables matches those in the data. Therefore, the first row and column of all matrices corresponds to the x variable, while the second row and column of all matrices corresponds to the y variable.

The A matrix is created first. This matrix specifies all of the assymetric paths or regressions among the variables. A free parameter in the A matrix defines a regression of the variable represented by that row on the variable represented by that column. For clarity, all matrices are specified with the byrow argument set to TRUE, which allows better correspondence between the matrices as displayed below and their position in mxMatrix objects. In the section of code below, a free parameter is specified as the regression of y on x, with a starting value of 1, and a label of "beta1". This matrix is named "A".

mxMatrix(
type="Full",
nrow=2,
ncol=2,
free=c(F, F,
F, F),
values=c(0, 0,
1, 0),
labels=c(NA,     NA,
"beta1", NA),
byrow=TRUE,
name="A"
)


The second mxMatrix function specifies the S matrix. This matrix specifies all of the symmetric paths or covariances among the variables. By definition, this matrix is symmetric. A free parameter in the S matrix represents a variance or covariance between the variables represented by the row and column that parameter is in. In the code below, two free parameters are specified. The free parameter in the first row and column of the S matrix is the variance of x (labeled "varx"), while the free parameter in the second row and column is the residual variance of y (labeled "residual"). This matrix is named "S".

mxMatrix(
type="Symm",
nrow=2,
ncol=2,
values=c(1, 0,
0, 1),
free=c(T, F,
F, T),
labels=c("varx", NA,
NA,    "residual"),
byrow=TRUE,
name="S"
)


The third mxMatrix function specifies the F matrix. This matrix is used to filter latent variables out of the expected covariance of the manifest variables, or to reorder the manifest variables. When there are no latent variables in a model and the order of manifest variables is the same in the model as in the data, then this filter matrix is simply an identity matrix. The dimnames provided at this matrix should match the names of the data, either the column names for raw data or the dimnames of covariance data. There are no free parameters in any F matrix.

mxMatrix(
type="Iden",
nrow=2,
ncol=2,
dimnames=list(c("x","y"),c("x","y")),
name="F"
)


The fourth and final mxMatrix function specifies the M matrix. This matrix is used to specify the means and intercepts of our model. Exogenous or independent variables receive means, while endogenous or dependent variables get intercepts, or means conditional on regression on other variables. This matrix contains only one row. This matrix consists of two free parameters; the mean of x (labeled "meanx") and the intercept of y (labeled "beta0"). This matrix gives starting values of 1 for both parameters, and is named "M".

mxMatrix(
type="Full",
nrow=1,
ncol=2,
free=c(T, T),
values=c(0, 0),
labels=c("meanx", "beta0"),
dimnames=list(NULL, c("x","y")),
name="M"
)


The final part of this model is the objective function. This defines both how the specified matrices combine to create the expected covariance matrix of the data, as well as the fit function to be minimized. In a RAM specified model, the expected covariance matrix is defined as:

The expected means are defined as:

The free parameters in the model can then be estimated using maximum likelihood for covariance and means data, and full information maximum likelihood for raw data. While users may define their own expected covariance matrices using other objective functions in OpenMx, the mxRAMObjective function yields maximum likelihood estimates of structural equation models when the A, S, F and M matrices are specified. The M matrix is required both for raw data and for covariance or correlation data that includes a means vector. The mxRAMObjective function takes four arguments, which are the names of the A, S, F and M matrices in your model.

The model now includes an observed covariance matrix (i.e., data) and the matrices and objective function required to define the expected covariance matrix and estimate parameters.

#### Model Fitting¶

We’ve created an MxModel object, and placed it into an object or variable named uniRegModel. We can run this model by using the mxRun function, which is placed in the object uniRegFit in the code below. We then view the output by referencing the output slot, as shown here.

uniRegFit <- mxRun(uniRegModel)

uniRegFit@output


The output slot contains a great deal of information, including parameter estimates and information about the matrix operations underlying our model. A more parsimonious report on the results of our model can be viewed using the summary function, as shown here.

summary(uniRegFit)


### Multiple Regression¶

In the next part of this demonstration, we move to multiple regression. The regression equation for our model looks like this:

Our dependent variable y is now predicted from two independent variables, x and z. Our model includes 3 regression parameters (, , ), a residual variance () and the observed means, variances and covariance of x and z, for a total of 9 parameters. Just as with our simple regression, this model is fully saturated.

We prepare our data the same way as before, selecting three variables instead of two.

MultipleDataRaw <- myRegDataRaw[,c("x","y","z")]

MultipleDataCov <- myRegDataCov[c("x","y","z"),c("x","y","z")]

MultipleDataMeans <- myRegDataMeans[c(2,3,4)]


Now, we can move on to our code. It is identical in structure to our simple regression code, containing the same A, S, F and M matrices. With the addition of a third variables, the A, S and F matrices become 3x3, while the M matrix becomes a 1x3 matrix.

multiRegModel<-mxModel("Multiple Regression - Matrix Specification",
mxData(MultipleDataRaw,type="raw"),
mxMatrix("Full",
nrow=3,
ncol=3,
values=c(0,0,0,
1,0,1,
0,0,0),
free=c(F, F, F,
T, F, T,
F, F, F),
labels=c(NA,     NA, NA,
"betax", NA,"betaz",
NA,     NA, NA),
byrow=TRUE,
name = "A"),
mxMatrix("Symm",
nrow=3,
ncol=3,
values=c(1, 0, .5,
0, 1, 0,
.5, 0, 1),
free=c(T, F, T,
F, T, F,
T, F, T),
labels=c("varx",  NA,         "covxz",
NA,    "residual",   NA,
"covxz", NA,         "varz"),
byrow=TRUE,
name="S"),
mxMatrix("Iden",
nrow=3,
ncol=3,
name="F",
dimnames = list(c("x","y","z"), c("x","y","z"))),
mxMatrix("Full",
nrow=1,
ncol=3,
values=c(0,0,0),
free=c(T,T,T),
labels=c("meanx","beta0","meanz"),
dimnames = list(NULL, c("x","y","z")),
name="M"),
mxRAMObjective("A","S","F","M")
)


The mxData function now takes a different data object (MultipleDataRaw replaces SingleDataRaw, adding an additional variable), but is otherwise unchanged. The mxRAMObjective does not change. The only differences between this model and the simple regression script can be found in the A, S, F and M matrices, which have expanded to accomodate a second independent variable.

The A matrix now contains two free parameters, representing the regressions of the dependent variable y on both x and z. As regressions appear on the row of the dependent variable and the column of the independent variable, these two parameters are both on the second (y) row of the A matrix.

mxMatrix("Full",
nrow=3,
ncol=3,
values=c(0,0,0,
1,0,1,
0,0,0),
free=c(F, F, F,
T, F, T,
F, F, F),
labels=c(NA,     NA, NA,
"betax", NA,"betaz",
NA,     NA, NA),
byrow=TRUE,
name = "A")


We’ve made a similar changes in the other matrices. The S matrix includes not only a variance term for the z variable, but also a covariance between the two independent variables. The F matrix still does not contain free parameters, but has expanded in size and made parallel changes in the dimnames arguments. The M matrix includes an additional free parameter for the mean of z.

The model is run and output is viewed just as before, using the mxRun function, @output and the summary function to run, view and summarize the completed model.

### Multivariate Regression¶

The structural modeling approach allows for the inclusion of not only multiple independent variables (i.e., multiple regression), but multiple dependent variables as well (i.e., multivariate regression). Versions of multivariate regression are sometimes fit under the heading of path analysis. This model will extend the simple and multiple regression frameworks we’ve discussed above, adding a second dependent variable “w”.

We now have twice as many regression parameters, a second residual variance, and the same means, variances and covariances of our independent variables. As with all of our other examples, this is a fully saturated model.

Data import for this analysis will actually be slightly simpler than before. The data we imported for the previous examples contains only the four variables we need for this model. We can use myRegDataRaw, myRegDataCov, andmyRegDataMeans in our models.

myRegDataRaw<-read.table("myRegData.txt",header=TRUE)

myRegDataCov <- matrix(
c(0.808,-0.110, 0.089, 0.361,
-0.110, 1.116, 0.539, 0.289,
0.089, 0.539, 0.933, 0.312,
0.361, 0.289, 0.312, 0.836),
nrow=4,
dimnames=list(
c("w","x","y","z"),
c("w","x","y","z"))
)

myRegDataMeans <- c(2.582, 0.054, 2.574, 4.061)


Our code should look very similar to our previous two models. The mxData function will reference the data referenced above, while the mxRAMObjective again refers to the A, S, F and M matrices. Just as with the multiple regression example, the A, S and F expand to order 4x4, and the M matrix now contains one row and four columns.

multivariateRegModel<-mxModel("Multiple Regression - Matrix Specification",
mxData(myRegDataRaw,type="raw"),
mxMatrix("Full", nrow=4, ncol=4,
values=c(0,1,0,1,
0,0,0,0,
0,1,0,1,
0,0,0,0),
free=c(F, T, F, T,
F, F, F, F,
F, T, F, T,
F, F, F, F),
labels=c(NA, "betawx", NA, "betawz",
NA,  NA,     NA,  NA,
NA, "betayx", NA, "betayz",
NA,  NA,     NA,  NA),
byrow=TRUE,
name="A"),
mxMatrix("Symm", nrow=4, ncol=4,
values=c(1,  0, 0,  0,
0,  1, 0, .5,
0,  0, 1,  0,
0, .5, 0,  1),
free=c(T, F, F, F,
F, T, F, T,
F, T, F, T),
labels=c("residualw",  NA,     NA,         NA,
NA,         "varx",  NA,        "covxz",
NA,          NA,    "residualy", NA,
NA,         "covxz", NA,        "varz"),
byrow=TRUE,
name="S"),
mxMatrix("Iden",  nrow=4, ncol=4,
dimnames=list(
c("w","x","y","z"),
c("w","x","y","z")),
name="F"),
mxMatrix("Full", nrow=1, ncol=4,
values=c(0,0,0,0),
free=c(T,T,T,T),
labels=c("betaw","meanx","betay","meanz"),
dimnames=list(
NULL,c("w","x","y","z")),
name="M"),
mxRAMObjective("A","S","F","M")
)


The only additional components to our mxMatrix functions are the inclusion of the “w” variable, which becomes the first row and column of all matrices. The model is run and output is viewed just as before, using the mxRun function, @output and the summary function to run, view and summarize the completed model.

These models may also be specified using paths instead of matrices. See link for matrix specification of these models.

## Factor Analysis, Matrix Specification¶

This example will demonstrate latent variable modeling via the common factor model using RAM matrices for model specification. We’ll walk through two applications of this approach: one with a single latent variable, and one with two latent variables. As with previous examples, these two applications are split into four files, with each application represented separately with raw and covariance data. These examples can be found in the following files:

• OneFactorModel_MatrixCov.R
• OneFactorModel_MatrixRaw.R
• TwoFactorModel_MatrixCov.R
• TwoFactorModel_MatrixRaw.R

A parallel version of this example, using path-centric specification of models rather than matrices, can be found here link.

### Common Factor Model¶

The common factor model is a method for modeling the relationships between observed variables believed to measure or indicate the same latent variable. While there are a number of exploratory approaches to extracting latent factor(s), this example uses structural modeling to fit confirmatory factor models. The model for any person and path diagram of the common factor model for a set of variables - are given below.

While 19 parameters are displayed in the equation and path diagram above (6 manifest variances, six manifest means, six factor loadings and one factor variance), we must constrain either the factor variance or one factor loading to a constant to identify the model and scale the latent variable. As such, this model contains 18 parameters. Unlike the manifest variable examples we’ve run up until now, this model is not fully saturated. The means and covariance matrix for six observed variables contain 27 degrees of freedom, and thus our model contains 9 degrees of freedom.

#### Data¶

Our first step to running this model is to put include the data to be analyzed. The data for this example contain nine variables. We’ll select the six we want for this model using the selection operators used in previous examples. Both raw and covariance data are included below, but only one is required for any model.

myFADataRaw <- read.table("myFAData.txt",header=TRUE)

[1] "x1" "x2" "x3" "x4" "x5" "x6" "y1" "y2" "y3"

oneFactorRaw <- myFADataRaw[,c("x1", "x2", "x3", "x4", "x5", "x6")]

c(0.997, 0.642, 0.611, 0.672, 0.637, 0.677, 0.342, 0.299, 0.337,
0.642, 1.025, 0.608, 0.668, 0.643, 0.676, 0.273, 0.282, 0.287,
0.611, 0.608, 0.984, 0.633, 0.657, 0.626, 0.286, 0.287, 0.264,
0.672, 0.668, 0.633, 1.003, 0.676, 0.665, 0.330, 0.290, 0.274,
0.637, 0.643, 0.657, 0.676, 1.028, 0.654, 0.328, 0.317, 0.331,
0.677, 0.676, 0.626, 0.665, 0.654, 1.020, 0.323, 0.341, 0.349,
0.342, 0.273, 0.286, 0.330, 0.328, 0.323, 0.993, 0.472, 0.467,
0.299, 0.282, 0.287, 0.290, 0.317, 0.341, 0.472, 0.978, 0.507,
0.337, 0.287, 0.264, 0.274, 0.331, 0.349, 0.467, 0.507, 1.059),
nrow=9,
dimnames=list(
c("x1", "x2", "x3", "x4", "x5", "x6", "y1", "y2", "y3"),
c("x1", "x2", "x3", "x4", "x5", "x6", "y1", "y2", "y3")),
)

oneFactorCov <- myFADataCov[c("x1", "x2", "x3", "x4", "x5", "x6"),c("x1", "x2", "x3", "x4", "x5", "x6")]

myFADataMeans <- c(2.988, 3.011, 2.986, 3.053, 3.016, 3.010, 2.955, 2.956, 2.967)



#### Model Specification¶

The following code contains all of the components of our model. Before running a model, the OpenMx library must be loaded into R using either the require() or library() function. All objects required for estimation (data, matrices, and an objective function) are included in their functions. This code uses the mxModel function to create an MxModel object, which we’ll then run.

oneFactorModel<-mxModel("Common Factor Model - Matrix Specification",
mxMatrix(
type="Full",
nrow=7,
ncol=7,
values=c(0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,0),
free=c(F, F, F, F, F, F, F,
F, F, F, F, F, F, T,
F, F, F, F, F, F, T,
F, F, F, F, F, F, T,
F, F, F, F, F, F, T,
F, F, F, F, F, F, T,
F, F, F, F, F, F, F),
labels=c(NA,NA,NA,NA,NA,NA,"l1",
NA,NA,NA,NA,NA,NA,"l2",
NA,NA,NA,NA,NA,NA,"l3",
NA,NA,NA,NA,NA,NA,"l4",
NA,NA,NA,NA,NA,NA,"l5",
NA,NA,NA,NA,NA,NA,"l6",
NA,NA,NA,NA,NA,NA,NA),
byrow=TRUE,
name="A"),
mxMatrix(
type="Symm",
nrow=7,
ncol=7,
values=c(1,0,0,0,0,0,0,
0,1,0,0,0,0,0,
0,0,1,0,0,0,0,
0,0,0,1,0,0,0,
0,0,0,0,1,0,0,
0,0,0,0,0,1,0,
0,0,0,0,0,0,1),
free=c(T, F, F, F, F, F, F,
F, T, F, F, F, F, F,
F, F, T, F, F, F, F,
F, F, F, T, F, F, F,
F, F, F, F, T, F, F,
F, F, F, F, F, T, F,
F, F, F, F, F, F, T),
labels=c("e1", NA,   NA,   NA,   NA,   NA,   NA,
NA, "e2",   NA,   NA,   NA,   NA,   NA,
NA,   NA, "e3",   NA,   NA,   NA,   NA,
NA,   NA,   NA, "e4",   NA,   NA,   NA,
NA,   NA,   NA,   NA, "e5",   NA,   NA,
NA,   NA,   NA,   NA,   NA, "e6",   NA,
NA,   NA,   NA,   NA,   NA,   NA, "varF1"),
byrow=TRUE,
name="S"),
mxMatrix(
type="Full",
nrow=6,
ncol=7,
free=FALSE,
values=c(1,0,0,0,0,0,0,
0,1,0,0,0,0,0,
0,0,1,0,0,0,0,
0,0,0,1,0,0,0,
0,0,0,0,1,0,0,
0,0,0,0,0,1,0),
byrow=TRUE,
dimnames=list(
c("x1","x2","x3","x4","x5","x6"),
c("x1","x2","x3","x4","x5","x6","F1")),
name="F"),
mxMatrix(
type="Full",
nrow=1,
ncol=7,
values=c(1,1,1,1,1,1,0),
free=c(T,T,T,T,T,T,F),
labels=c("meanx1","meanx2","meanx3",
"meanx4","meanx5","meanx6",
NA),
dimnames=list(
NULL,
c("x1","x2","x3","x4","x5","x6","F1")),
name="M"),
mxRAMObjective("A","S","F","M")
)


This mxModel function can be split into several parts. First, we give the model a name. The first argument in an mxModel function has a special function. If an object or variable containing an MxModel object is placed here, then mxModel adds to or removes pieces from that model. If a character string (as indicated by double quotes) is placed first, then that becomes the name of the model. Models may also be named by including a name argument. This model is named "Common Factor Model - Matrix Specification".

The second component of our code creates an MxData object. The example above, reproduced here, first references the object where our data is, then uses the type argument to specify that this is raw data.

mxData(
observed=oneFactorRaw,
type="raw"
)


If we were to use a covariance matrix and vector of means as data, we would replace the existing mxData function with this one:

mxData(
observed=oneFactorCov,
type="cov",
numObs=500,
means=oneFactorMeans
)


Model specification is carried out using mxMatrix functions to create matrices for a RAM specified model. The A matrix specifies all of the assymetric paths or regressions in our model. In the common factor model, these parameters are the factor loadings. This matrix is square, and contains as many rows and columns as variables in the model (manifest and latent, typically in that order). Regressions are specified in the A matrix by placing a free parameter in the row of the dependent variable and the column of independent variable.

The common factor model requires that one parameter (typically either a factor loading or factor variance) be constrained to a constant value. In our model, we’ll constrain the first factor loading to a value of 1, and let all other loadings be freely estimated. All factor loadings have a starting value of one and labels of "l1" - "l6".

mxMatrix(
type="Full",
nrow=7,
ncol=7,
values=c(0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,1,
0,0,0,0,0,0,0),
free=c(F, F, F, F, F, F, F,
F, F, F, F, F, F, T,
F, F, F, F, F, F, T,
F, F, F, F, F, F, T,
F, F, F, F, F, F, T,
F, F, F, F, F, F, T,
F, F, F, F, F, F, F),
labels=c(NA,NA,NA,NA,NA,NA,"l1",
NA,NA,NA,NA,NA,NA,"l2",
NA,NA,NA,NA,NA,NA,"l3",
NA,NA,NA,NA,NA,NA,"l4",
NA,NA,NA,NA,NA,NA,"l5",
NA,NA,NA,NA,NA,NA,"l6",
NA,NA,NA,NA,NA,NA,NA),
byrow=TRUE,
name="A")


The second matrix in a RAM model is the S matrix, which specifies the symmetric or covariance paths in our model. This matrix is symmetric and square, and contains as many rows and columns as variables in the model (manifest and latent, typically in that order). The symmetric paths in our model consist of six residual variances and one factor variance. All of these variances are given starting values of one and labels "e1" - "e6" and "varF1".

mxMatrix(
type="Symm",
nrow=7,
ncol=7,
values=c(1,0,0,0,0,0,0,
0,1,0,0,0,0,0,
0,0,1,0,0,0,0,
0,0,0,1,0,0,0,
0,0,0,0,1,0,0,
0,0,0,0,0,1,0,
0,0,0,0,0,0,1),
free=c(T, F, F, F, F, F, F,
F, T, F, F, F, F, F,
F, F, T, F, F, F, F,
F, F, F, T, F, F, F,
F, F, F, F, T, F, F,
F, F, F, F, F, T, F,
F, F, F, F, F, F, T),
labels=c("e1", NA,   NA,   NA,   NA,   NA,   NA,
NA, "e2",   NA,   NA,   NA,   NA,   NA,
NA,   NA, "e3",   NA,   NA,   NA,   NA,
NA,   NA,   NA, "e4",   NA,   NA,   NA,
NA,   NA,   NA,   NA, "e5",   NA,   NA,
NA,   NA,   NA,   NA,   NA, "e6",   NA,
NA,   NA,   NA,   NA,   NA,   NA, "varF1"),
byrow=TRUE,
name="S")


The third matrix in our RAM model is the F or filter matrix. Our data contains six observed variables, but the A and S matrices contain seven rows and columns. For our model to define the covariances present in our data, we must have some way of projecting the relationships defined in the A and S matrices onto our data. The F matrix filters the latent variables out of the expected covariance matrix, and can also be used to reorder variables.

The F matrix will always contain the same number of rows as manifest variables and columns as total (manifest and latent) variables. If the manifest variables in the A and S matrices precede the latent variables are in the same order as the data, then the F matrix will be the horizontal adhesion of an identity matrix and a zero matrix. This matrix contains no free parameters, and is made with the mxMatrix function below.

mxMatrix(
type="Full",
nrow=6,
ncol=7,
free=FALSE,
values=c(1,0,0,0,0,0,0,
0,1,0,0,0,0,0,
0,0,1,0,0,0,0,
0,0,0,1,0,0,0,
0,0,0,0,1,0,0,
0,0,0,0,0,1,0),
byrow=TRUE,
dimnames=list(
c("x1","x2","x3","x4","x5","x6"),
c("x1","x2","x3","x4","x5","x6","F1")),
name="F"
)


The last matrix of our model is the M matrix, which defines the means and intercepts for our model. This matrix describes all of the regressions on the constant in a path model, or the means conditional on the means of exogenous variables. This matrix contains a single row, and one column for every manifest and latent variable in the model. In our model, the latent variable has a constrained mean of zero, while the manifest variables have freely estimated means, labeled "meanx1"through "meanx6".

mxMatrix(
type="Full",
nrow=1,
ncol=7,
values=c(1,1,1,1,1,1,0),
free=c(T,T,T,T,T,T,F),
labels=c("meanx1","meanx2","meanx3",
"meanx4","meanx5","meanx6",
NA),
dimnames=list(
NULL,
c("x1","x2","x3","x4","x5","x6","F1")),
name="M"
)


The final part of this model is the objective function. This defines both how the specified matrices combine to create the expected covariance matrix of the data, as well as the fit function to be minimized. In a RAM specified model, the expected covariance matrix is defined as:

The expected means are defined as:

The free parameters in the model can then be estimated using maximum likelihood for covariance and means data, and full information maximum likelihood for raw data. While users may define their own expected covariance matrices using other objective functions in OpenMx, the mxRAMObjective function yields maximum likelihood estimates of structural equation models when the A, S, F and M matrices are specified. The M matrix is required both for raw data and for covariance or correlation data that includes a means vector. The mxRAMObjective function takes four arguments, which are the names of the A, S, F and M matrices in your model.

mxRAMObjective("A", "S", "F", "M")


The model now includes an observed covariance matrix (i.e., data) and the matrices and objective function required to define the expected covariance matrix and estimate parameters.

The model can now be run using the mxRun function, and the output of the model can be accessed from the output slot of the resulting model. A summary of the output can be reached using summary().

oneFactorFit <- mxRun(oneFactorModel)

oneFactorFit@output

summary(oneFactorFit)


### Two Factor Model¶

The common factor model can be extended to include multiple latent variables. The model for any person and path diagram of the common factor model for a set of variables - and - are given below.

Our model contains 21 parameters (6 manifest variances, six manifest means, six factor loadings, two factor variances and one factor covariance), but each factor requires one identification constraint. Like in the common factor model above, we’ll constrain one factor loading for each factor to a value of one. As such, this model contains 19 parameters. The means and covariance matrix for six observed variables contain 27 degrees of freedom, and thus our model contains 8 degrees of freedom.

The data for the two factor model can be found in the myFAData files introduced in the common factor model. For this model, we’ll select three x variables (x1-x3) and three y variables (y1-y3).

twoFactorRaw <- myFADataRaw[,c("x1", "x2", "x3", "y1", "y2", "y3")]

twoFactorCov <- myFADataCov[c("x1", "x2", "x3", "y1", "y2", "y3"),c("x1", "x2", "x3", "y1", "y2", "y3")]



Specifying the two factor model is virtually identical to the single factor case. The mxData function has been changed to reference the appropriate data, but is identical in usage. We’ve added a second latent variable, so the A and S matrices are now of order 8x8. Similarly, the F matrix is now of order 6x8 and the M matrix of order 1x8. The mxRAMObjective has not changed. The code for our two factor model looks like this:

twoFactorModel <- mxModel("Two Factor Model - Matrix",
type="RAM",
mxData(
observed=twoFactorRaw,
type="raw",
),
mxMatrix(
type="Full",
nrow=8,
ncol=8,
values=c(0,0,0,0,0,0,1,0,
0,0,0,0,0,0,1,0,
0,0,0,0,0,0,1,0,
0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0),
free=c(F, F, F, F, F, F, F, F,
F, F, F, F, F, F, T, F,
F, F, F, F, F, F, T, F,
F, F, F, F, F, F, F, F,
F, F, F, F, F, F, F, T,
F, F, F, F, F, F, F, T,
F, F, F, F, F, F, F, F,
F, F, F, F, F, F, F, F),
labels=c(NA,NA,NA,NA,NA,NA,"l1", NA,
NA,NA,NA,NA,NA,NA,"l2", NA,
NA,NA,NA,NA,NA,NA,"l3", NA,
NA,NA,NA,NA,NA,NA, NA,"l4",
NA,NA,NA,NA,NA,NA, NA,"l5",
NA,NA,NA,NA,NA,NA, NA,"l6",
NA,NA,NA,NA,NA,NA, NA, NA,
NA,NA,NA,NA,NA,NA, NA, NA),
byrow=TRUE,
name="A"),
mxMatrix(
type="Symm",
nrow=8,
ncol=8,
values=c(1,0,0,0,0,0, 0, 0,
0,1,0,0,0,0, 0, 0,
0,0,1,0,0,0, 0, 0,
0,0,0,1,0,0, 0, 0,
0,0,0,0,1,0, 0, 0,
0,0,0,0,0,1, 0, 0,
0,0,0,0,0,0, 1,.5,
0,0,0,0,0,0,.5, 1),
free=c(T, F, F, F, F, F, F, F,
F, T, F, F, F, F, F, F,
F, F, T, F, F, F, F, F,
F, F, F, T, F, F, F, F,
F, F, F, F, T, F, F, F,
F, F, F, F, F, T, F, F,
F, F, F, F, F, F, T, T,
F, F, F, F, F, F, T, T),
labels=c("e1", NA,   NA,   NA,   NA,   NA,    NA,    NA,
NA, "e2",   NA,   NA,   NA,   NA,    NA,    NA,
NA,   NA, "e3",   NA,   NA,   NA,    NA,    NA,
NA,   NA,   NA, "e4",   NA,   NA,    NA,    NA,
NA,   NA,   NA,   NA, "e5",   NA,    NA,    NA,
NA,   NA,   NA,   NA,   NA, "e6",    NA,    NA,
NA,   NA,   NA,   NA,   NA,   NA, "varF1", "cov",
NA,   NA,   NA,   NA,   NA,   NA, "cov", "varF2"),
byrow=TRUE,
name="S"),
mxMatrix(
type="Full",
nrow=6,
ncol=8,
free=F,
values=c(1,0,0,0,0,0,0,0,
0,1,0,0,0,0,0,0,
0,0,1,0,0,0,0,0,
0,0,0,1,0,0,0,0,
0,0,0,0,1,0,0,0,
0,0,0,0,0,1,0,0),
byrow=T,
name="F"),
mxMatrix(
type="Full",
nrow=1,
ncol=8,
values=c(1,1,1,1,1,1,0,0),
free=c(T,T,T,T,T,T,F,F),
labels=c("meanx1","meanx2","meanx3",
"meanx4","meanx5","meanx6",
NA,NA),
name="M"),
mxRAMObjective("A","S","F","M")
)


The four mxMatrix functions have changed slightly to accomodate the changes in the model. The A matrix, shown below, is used to specify the regressions of the manifest variables on the factors. The first three manifest variables ("x1"-"x3") are regressed on "F1", and the second three manifest variables ("y1"-"y3") are regressed on "F2". We must again constrain the model to identify and scale the latent variables, which we do by constraining the first loading for each latent variable to a value of one.

mxMatrix(
type="Full",
nrow=8,
ncol=8,
values=c(0,0,0,0,0,0,1,0,
0,0,0,0,0,0,1,0,
0,0,0,0,0,0,1,0,
0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0),
free=c(F, F, F, F, F, F, F, F,
F, F, F, F, F, F, T, F,
F, F, F, F, F, F, T, F,
F, F, F, F, F, F, F, F,
F, F, F, F, F, F, F, T,
F, F, F, F, F, F, F, T,
F, F, F, F, F, F, F, F,
F, F, F, F, F, F, F, F),
labels=c(NA,NA,NA,NA,NA,NA,"l1", NA,
NA,NA,NA,NA,NA,NA,"l2", NA,
NA,NA,NA,NA,NA,NA,"l3", NA,
NA,NA,NA,NA,NA,NA, NA,"l4",
NA,NA,NA,NA,NA,NA, NA,"l5",
NA,NA,NA,NA,NA,NA, NA,"l6",
NA,NA,NA,NA,NA,NA, NA, NA,
NA,NA,NA,NA,NA,NA, NA, NA),
byrow=TRUE,
name="A")


The S matrix has an additional row and column, and two additional parameters. For the two factor model, we must add a variance term for the second latent variable and a covariance between the two latent variables.

mxMatrix(
type="Symm",
nrow=8,
ncol=8,
values=c(1,0,0,0,0,0, 0, 0,
0,1,0,0,0,0, 0, 0,
0,0,1,0,0,0, 0, 0,
0,0,0,1,0,0, 0, 0,
0,0,0,0,1,0, 0, 0,
0,0,0,0,0,1, 0, 0,
0,0,0,0,0,0, 1,.5,
0,0,0,0,0,0,.5, 1),
free=c(T, F, F, F, F, F, F, F,
F, T, F, F, F, F, F, F,
F, F, T, F, F, F, F, F,
F, F, F, T, F, F, F, F,
F, F, F, F, T, F, F, F,
F, F, F, F, F, T, F, F,
F, F, F, F, F, F, T, T,
F, F, F, F, F, F, T, T),
labels=c("e1", NA,   NA,   NA,   NA,   NA,    NA,    NA,
NA, "e2",   NA,   NA,   NA,   NA,    NA,    NA,
NA,   NA, "e3",   NA,   NA,   NA,    NA,    NA,
NA,   NA,   NA, "e4",   NA,   NA,    NA,    NA,
NA,   NA,   NA,   NA, "e5",   NA,    NA,    NA,
NA,   NA,   NA,   NA,   NA, "e6",    NA,    NA,
NA,   NA,   NA,   NA,   NA,   NA, "varF1", "cov",
NA,   NA,   NA,   NA,   NA,   NA, "cov", "varF2"),
byrow=TRUE,
name="S")


The F and M matrices contain only minor changes. The F matrix is now of order 6x8, but the additional column is simply a column of zeros. The M matrix contains an additional column (with only a single row), which contains the mean of the second latent variable. As this model does not contain a parameter for that latent variable, this mean is constrained to zero.

The model is now ready to run using the mxRun function, and the output of the model can be accessed from the output slot of the resulting model. A summary of the output can be reached using summary().

These models may also be specified using paths instead of matrices. See link for path specification of these models.

## Time Series, Matrix Specification¶

This example will demonstrate a growth curve model using RAM specified matrices. As with previous examples, this application is split into two files, one each raw and covariance data. These examples can be found in the following files:

• LGC_MatrixCov.R
• LGC_MatrixRaw.R

A parallel version of this example, using path-centric specification of models rather than matrices, can be found here link.

### Latent Growth Curve Model¶

The latent growth curve model is a variation of the factor model for repeated measurements. For a set of manifest variables - measured at five discrete times for people indexed by the letter i, the growth curve model can be expressed both algebraically and via a path diagram as shown here:

. math::

begin{eqnarray*} x_{ij} = Intercept_{i} + lambda_{j} * Slope_{i} + epsilon_{i} end{eqnarray*}

The values and specification of the parameters allow for alterations to the growth curve model. This example will utilize a linear growth curve model, so we will specify to increase linearly with time. If the observations occur at regular intervals in time, then can be specified with any values increasing at a constant rate. For this example, we’ll use [0, 1, 2, 3, 4] so that the intercept represents scores at the first measurement occasion, and the slope represents the rate of change per measurement occasion. Any linear transformation of these values can be used for linear growth curve models.

Our model for any number of variables contains 6 free parameters; two factor means, two factor variances, a factor covariance and a (constant) residual variance for the manifest variables. Our data contains five manifest variables, and so the covariance matrix and means vector contain 20 degrees of freedom. Thus, the linear growth curve model fit to these data has 14 degrees of freedom.

#### Data¶

The first step to running our model is to import data. The code below is used to import both raw data and a covariance matrix and means vector, either of which can be used for our growth curve model. This data contains five variables, which are repeated measurements of the same variable. As growth curve models make specific hypotheses about the variances of the manifest variables, correlation matrices generally aren’t used as data for this model.

myLongitudinalData <- read.table("myLongitudinalData.txt",header=T)

myLongitudinalDataCov<-matrix(
c(6.362, 4.344, 4.915,  5.045,  5.966,
4.344, 7.241, 5.825,  6.181,  7.252,
4.915, 5.825, 9.348,  7.727,  8.968,
5.045, 6.181, 7.727, 10.821, 10.135,
5.966, 7.252, 8.968, 10.135, 14.220),
nrow=5,
dimnames=list(
c("x1","x2","x3","x4","x5"),
c("x1","x2","x3","x4","x5"))
)


myLongitudinalDataMean <- c(9.864, 11.812, 13.612, 15.317, 17.178)

#### Model Specification¶

The following code contains all of the components of our model. Before running a model, the OpenMx library must be loaded into R using either the require() or library() function. All objects required for estimation (data, matrices, and an objective function) are included in their functions. This code uses the mxModel function to create an MxModel object, which we’ll then run.

require(OpenMx)

growthCurveModel <- mxModel("Linear Growth Curve Model, Matrix Specification",
mxData(myLongitudinalDataRaw,
type="raw"),
mxMatrix(
type="Full",
nrow=7,
ncol=7,
free=F,
values=c(0,0,0,0,0,1,0,
0,0,0,0,0,1,1,
0,0,0,0,0,1,2,
0,0,0,0,0,1,3,
0,0,0,0,0,1,4,
0,0,0,0,0,0,0,
0,0,0,0,0,0,0),
byrow=TRUE,
name="A"),
mxMatrix(
type="Symm",
nrow=7,
ncol=7,
free=c(T, F, F, F, F, F, F,
F, T, F, F, F, F, F,
F, F, T, F, F, F, F,
F, F, F, T, F, F, F,
F, F, F, F, T, F, F,
F, F, F, F, F, T, T,
F, F, F, F, F, T, T),
values=c(0,0,0,0,0,  0,  0,
0,0,0,0,0,  0,  0,
0,0,0,0,0,  0,  0,
0,0,0,0,0,  0,  0,
0,0,0,0,0,  0,  0,
0,0,0,0,0,  1,0.5,
0,0,0,0,0,0.5,  1),
labels=c("residual", NA, NA, NA, NA, NA, NA,
NA, "residual", NA, NA, NA, NA, NA,
NA, NA, "residual", NA, NA, NA, NA,
NA, NA, NA, "residual", NA, NA, NA,
NA, NA, NA, NA, "residual", NA, NA,
NA, NA, NA, NA, NA, "vari", "cov",
NA, NA, NA, NA, NA, "cov", "vars"),
byrow= TRUE,
name="S"),
mxMatrix(
type="Full",
nrow=5,
ncol=7,
free=F,
values=c(1,0,0,0,0,0,0,
0,1,0,0,0,0,0,
0,0,1,0,0,0,0,
0,0,0,1,0,0,0,
0,0,0,0,1,0,0),
byrow=T,
name="F"),
mxMatrix(
type="Full",
nrow=1,
ncol=7,
values=c(0,0,0,0,0,1,1),
free=c(F,F,F,F,F,T,T),
labels=c(NA,NA,NA,NA,NA,"meani","means"),
name="M"),
mxRAMObjective("A","S","F","M")
)


The model begins with a name, in this case “Linear Growth Curve Model, Path Specification”. If the first argument is an object containing an MxModel object, then the model created by the mxModel function will contain all of the named entites in the referenced model object.

Data is supplied with the mxData function. This example uses raw data, but the mxData function in the code above could be replaced with the function below to include covariance data.

mxData(myLongitudinalDataCov,
type="cov",
numObs=500,
means=myLongitudinalDataMeans)


The four mxMatrix functions define the A, S, F and M matrices used in RAM specification of models. In all four matrices, the first five rows or columns of any matrix represent the five manifest variables, the sixth the latent intercept variable, and the seventh the slope. The A and S matrices are of order 7x7, the F matrix of order 5x7, and the M matrix 1x7.

The A matrix specifies all of the assymetric paths or regressions among variables. The only assymmetric paths in our model regress the manifest variables on the latent intercept and slope with fixed values. The regressions of the manifest variables on the intercept are in the first five rows and sixth column of the A matrix, all of which have a fixed value of one. The regressions of the manifest variables on the slope are in the first five rows and sixth column of the A matrix with fixed values in this series: [0, 1, 2, 3, 4].

mxMatrix(
type="Full",
nrow=7,
ncol=7,
free=F,
values=c(0,0,0,0,0,1,0,
0,0,0,0,0,1,1,
0,0,0,0,0,1,2,
0,0,0,0,0,1,3,
0,0,0,0,0,1,4,
0,0,0,0,0,0,0,
0,0,0,0,0,0,0),
byrow=TRUE,
name="A")


The S matrix specifies all of the symmetric paths among our variables, representing the variances and covariances in our model. The five manifest variables do not have any covariance parameters with any other variables, and all are restricted to have the same residual variance. This variance term is constrained to equality by specifying five free parameters and giving all five parameters the same label. The variances and covariance of the latent variables are included as free parameters in the sixth and sevenths rows and columns of this matrix as well.

mxMatrix(
type="Symm",
nrow=7,
ncol=7,
free=c(T, F, F, F, F, F, F,
F, T, F, F, F, F, F,
F, F, T, F, F, F, F,
F, F, F, T, F, F, F,
F, F, F, F, T, F, F,
F, F, F, F, F, T, T,
F, F, F, F, F, T, T),
values=c(0,0,0,0,0,  0,  0,
0,0,0,0,0,  0,  0,
0,0,0,0,0,  0,  0,
0,0,0,0,0,  0,  0,
0,0,0,0,0,  0,  0,
0,0,0,0,0,  1,0.5,
0,0,0,0,0,0.5,  1),
labels=c("residual", NA, NA, NA, NA, NA, NA,
NA, "residual", NA, NA, NA, NA, NA,
NA, NA, "residual", NA, NA, NA, NA,
NA, NA, NA, "residual", NA, NA, NA,
NA, NA, NA, NA, "residual", NA, NA,
NA, NA, NA, NA, NA, "vari", "cov",
NA, NA, NA, NA, NA, "cov", "vars"),
byrow= TRUE,
name="S")


The third matrix in our RAM model is the F or filter matrix. This is used to “filter” the latent variables from the expected covariance of the observed data. The F matrix will always contain the same number of rows as manifest variables and columns as total (manifest and latent) variables. If the manifest variables in the A and S matrices precede the latent variables are in the same order as the data, then the F matrix will be the horizontal adhesion of an identity matrix and a zero matrix. This matrix contains no free parameters, and is made with the mxMatrix function below.

mxMatrix(
type="Full",
nrow=5,
ncol=7,
free=F,
values=c(1,0,0,0,0,0,0,
0,1,0,0,0,0,0,
0,0,1,0,0,0,0,
0,0,0,1,0,0,0,
0,0,0,0,1,0,0),
byrow=T,
name="F")


The final matrix in our RAM model is the M or means matrix, which specifies the means and intercepts of the variables in the model. While the manifest variables have expected means in our model, these expected means are entirely dependent on the means of the intercept and slope factors. In the M matrix below, the manifest variables are given fixed intercepts of zero while the latent variables are each given freely estimated means with starting values of 1 and labels of "meani" and "means"

mxMatrix(
type=”Full”, nrow=1, ncol=7, values=c(0,0,0,0,0,1,1), free=c(F,F,F,F,F,T,T), labels=c(NA,NA,NA,NA,NA,”meani”,”means”), name=”M”)

The last piece of our model is the mxRAMObjective function, which defines both how the specified matrices combine to create the expected covariance matrix of the data, as well as the fit function to be minimized. As covered in earlier examples, the expected covariance matrix for a RAM model is defined as:

The expected means are defined as:

The free parameters in the model can then be estimated using maximum likelihood for covariance and means data, and full information maximum likelihood for raw data. The M matrix is required both for raw data and for covariance or correlation data that includes a means vector. The mxRAMObjective function takes four arguments, which are the names of the A, S, F and M matrices in your model.

The model is now ready to run using the mxRun function, and the output of the model can be accessed from the output slot of the resulting model. A summary of the output can be reached using summary().

growthCurveFit <- mxRun(growthCurveModel)

growthCurveFit@output

summary(growthCurveFit)

These models may also be specified using paths instead of matrices. See link for path specification of these models.

## Multiple Groups, Matrix Specification¶

An important aspect of structural equation modeling is the use of multiple groups to compare means and covariances structures between any two (or more) data groups, for example males and females, different ethnic groups, ages etc. Other examples include groups which have different expected covariances matrices as a function of parameters in the model, and need to be evaluated together to estimated together for the parameters to be identified.

The example includes the heterogeneity model as well as its submodel, the homogeneity model and is available in the following file:

• BivariateHeterogeneity_MatrixRaw.R

A parallel version of this example, using paths specification of models rather than matrices, can be found here link.

### Heterogeneity Model¶

We will start with a basic example here, building on modeling means and variances in a saturated model. Assume we have two groups and we want to test whether they have the same mean and covariance structure.

#### Data¶

For this example we simulated two datasets (‘xy1’ and ‘xy2’) each with zero means and unit variances, one with a correlation of .5, and the other with a correlation of .4 with 1000 subjects each. See attached R code for simulation and data summary.

#Simulate Data
require(MASS)
#group 1
set.seed(200)
rs=.5
xy1 <- mvrnorm (1000, c(0,0), matrix(c(1,rs,rs,1),2,2))
set.seed(200)
#group 2
rs=.4
xy2 <- mvrnorm (1000, c(0,0), matrix(c(1,rs,rs,1),2,2))

#Print Descriptive Statistics
selVars <- c('X','Y')
summary(xy1)
cov(xy1)
dimnames(xy1) <- list(NULL, selVars)
summary(xy2)
cov(xy2)
dimnames(xy2) <- list(NULL, selVars)


#### Model Specification¶

We first fit a heterogeneity model, allowing differences in both the mean and covariance structure of the two groups. As we are interested whether the two structures can be equated, we have to specify the models for the two groups, named ‘group1’ and ‘group2’ within another model, named ‘bivHet’. The structure of the job thus look as follows, with two mxModel commands as arguments of another mxModel command. mxModel commands are unlimited in the number of arguments.

bivHetModel <- mxModel("bivHet",
mxModel("group1", ....
mxModel("group2", ....
mxAlgebra(group1.objective + group2.objective, name="h12"),
mxAlgebraObjective("h12")
)


For each of the groups, we fit a saturated model, using a Cholesky decomposition to generate the expected covariance matrix and a row vector for the expected means. Note that we have specified different labels for all the free elements, in the two mxModel statements. For more details, see example 1.

#Fit Heterogeneity Model
bivHetModel <- mxModel("bivHet",
mxModel("group1",
mxMatrix(
type="Full",
nrow=2,
ncol=2,
free=c(T,T,F,T),
values=c(1,.2,0,1),
labels=c("vX1", "cXY1", "zero", "vY1"),
name="Chol1"
),
mxAlgebra(
Chol1 %*% t(Chol1),
name="EC1"
),
mxMatrix(
type="Full",
nrow=1,
ncol=2,
free=T,
values=c(0,0),
labels=c("mX1", "mY1"),
name="EM1"
),
mxData(
xy1,
type="raw"
),
mxFIMLObjective(
"EC1",
"EM1"
selVars)
),
mxModel("group2",
mxMatrix(
type="Full",
nrow=2,
ncol=2,
free=c(T,T,F,T),
values=c(1,.2,0,1),
labels=c("vX2", "cXY2", "zero", "vY2"),
name="Chol2"
),
mxAlgebra(
Chol2 %*% t(Chol2),
name="EC2",
),
mxMatrix(
type="Full",
nrow=1,
ncol=2,
free=T,
values=c(0,0),
labels=c("mX2", "mY2"),
name="EM2"
),
mxData(
xy2,
type="raw"
),
mxFIMLObjective(
"EC2",
"EM2",
selVars)
), ....


As a result, we estimate five parameters (two means, two variances, one covariance) per group for a total of 10 free parameters. We cut the ‘Labels matrix:’ parts from the output generated with bivHetModel$group1@matrices and bivHetModel$group2@matrices

$Chol1 X Y X "vX1" "zero" Y "cXY1" "vY1"$EM1
X     Y
[1,] "mX1" "mY1"

$Chol2 X Y X "vX2" "zero" Y "cXY2" "vY2"$EM2
X     Y
[1,] "mX2" "mY2"


#### Model Fitting¶

To evaluate both models together, we use an mxAlgebra command that adds up the values of the objective functions of the two groups. The objective function to be used here is the mxAlgebraObjective which uses as its argument the sum of the function values of the two groups.

mxAlgebra(
group1.objective + group2.objective,
name="h12"
),
mxAlgebraObjective("h12")
)


The mxRun command is required to actually evaluate the model. Note that we have adopted the following notation of the objects. The result of the mxModel command ends in ‘Model’; the result of the mxRun command ends in ‘Fit’. Of course, these are just suggested naming conventions.

bivHetFit <- mxRun(bivHetModel)


A variety of output can be printed. We chose here to print the expected means and covariance matrices for the two groups and the likelihood of data given the model. The mxEval command takes any R expression, followed by the fitted model name. Given that the model ‘bivHetFit’ included two models (group1 and group2), we need to use the two level names, i.e. ‘group1.EM1’ to refer to the objects in the correct model.

EM1Het <- mxEval(group1.EM1, bivHetFit)
EM2Het <- mxEval(group2.EM2, bivHetFit)
EC1Het <- mxEval(group1.EC1, bivHetFit)
EC2Het <- mxEval(group2.EC2, bivHetFit)
LLHet <- mxEval(objective, bivHetFit)


### Homogeneity Model: a Submodel¶

Next, we fit a model in which the mean and covariance structure of the two groups are equated to one another, to test whether there are significant differences between the groups. Rather than having to specify the entire model again, we copy the previous model ‘bivHetModel’ into a new model ‘bivHomModel’ to represent homogeneous structures.

#Fit Homnogeneity Model
bivHomModel <- bivHetModel


As elements in matrices can be equated by assigning the same label, we now have to equate the labels of the free parameters in group1 to the labels of the corresponding elements in group2. This can be done by referring to the relevant matrices using the ModelName[['MatrixName']] syntax, followed by @labels. Note that in the same way, one can refer to other arguments of the objects in the model. Here we assign the labels from group1 to the labels of group2, separately for the Cholesky matrices used for the expected covariance matrices and for the expected means vectors.

bivHomModel[['group2.Chol2']]@labels <- bivHomModel[['group1.Chol1']]@labels
bivHomModel[['group2.EM2']]@labels <- bivHomModel[['group1.EM1']]@labels


The specification for the submodel is reflected in the names of the labels which are now equal for the corresponding elements of the mean and covariance matrices, as below.

$Chol1 X Y X "vX1" "zero" Y "cXY1" "vY1"$EM1
X     Y
[1,] "mX1" "mY1"

$Chol2 X Y X "vX1" "zero" Y "cXY1" "vY1"$EM2
X     Y
[1,] "mX1" "mY1"


We can produce similar output for the submodel, i.e. expected means and covariances and likelihood, the only difference in the code being the model name. Note that as a result of equating the labels, the expected means and covariances of the two groups should be the same.

bivHomFit <- mxRun(bivHomModel)
EM1Hom <- mxEval(group1.EM1, bivHomFit)
EM2Hom <- mxEval(group2.EM2, bivHomFit)
EC1Hom <- mxEval(group1.EC1, bivHomFit)
EC2Hom <- mxEval(group2.EC2, bivHomFit)
LLHom <- mxEval(objective, bivHomFit)


Finally, to evaluate which model fits the data best, we generate a likelihood ratio test as the difference between -2 times the log-likelihood of the homogeneity model and -2 times the log-likelihood of the heterogeneity model. This statistic is asymptotically distributed as a Chi-square, which can be interpreted with the difference in degrees of freedom of the two models.

Chi= LLHom-LLHet
LRT= rbind(LLHet,LLHom,Chi)
LRT


## Genetic Epidemiology, Matrix Specification¶

Mx is probably most popular in the behavior genetics field, as it was conceived with genetic models in mind, which rely heavily on multiple groups. We introduce here an OpenMx script for the basic genetic model in genetic epidemiologic research, the ACE model. This model assumes that the variability in a phenotype, or observed variable, of interest can be explained by differences in genetic and environmental factors, with A representing additive genetic factors, C shared/common environmental factors and E unique/specific environmental factors (see Neale & Cardon 1992, for a detailed treatment). To estimate these three sources of variance, data have to be collected on relatives with different levels of genetic and environmental similarity to provide sufficient information to identify the parameters. One such design is the classical twin study, which compares the similarity of identical (monozygotic, MZ) and fraternal (dizygotic, DZ) twins to infer the role of A, C and E.

The example starts with the ACE model and includes one submodel, the AE model. It is available in the following file:

• UnivariateTwinAnalysis_MatrixRaw.R

A parallel version of this example, using path specification of models rather than matrices, can be found here link.

### ACE Model: a Twin Analysis¶

#### Data¶

Let us assume you have collected data on a large sample of twin pairs for your phenotype of interest. For illustration purposes, we use Australian data on body mass index (BMI) which are saved in a text file ‘myTwinData.txt’. We use R to read the data into a data.frame and to create two subsets of the data for MZ females (mzfData) and DZ females (dzfData) respectively with the code below.

require(OpenMx)

#Prepare Data
twinVars <- c('fam','age','zyg','part','wt1','wt2','ht1','ht2','htwt1','htwt2','bmi1','bmi2')
summary(twinData)
selVars <- c('bmi1','bmi2')
mzfData <- as.matrix(subset(twinData, zyg==1, c(bmi1,bmi2)))
dzfData <- as.matrix(subset(twinData, zyg==3, c(bmi1,bmi2)))


#### Model Specification¶

There are a variety of ways to set up the ACE model. The most commonly used approach in Mx is to specify three matrices for each of the three sources of variance. The matrix X represents the additive genetic path ‘a’, the Y matrix is used for the shared environmental path ‘c’ and the matrix Z for the unique environmental path ‘e’. The expected variances and covariances between member of twin pairs are typically expressed in variance components (or the square of the path coefficients, i.e. ‘a^2’, ‘c^2’ and ‘e^2’). These quantities can be calculated using matrix algebra, by multiplying the X matrix by its transpose t(A). Note that the transpose is not strictly needed in the univariate case, but will allow easier transition to the multivariate case. We then use matrix algebra again to add the relevant matrices corresponding to the expectations for each of the statistics of the observed covariance matrix. The R functions ‘cbind’ and ‘rbind’ are used to concatenate the resulting matrices in the appropriate way. Let’s go through each of the matrices step by step. They will all form arguments of the mxModel, specified as follows. Note that we left the comma’s at the end of the lines which are necessary when all the arguments are combined prior to running the model. Each line can be pasted into R, and then evaluated together once the whole model is specified.

#Fit ACE Model with RawData and Matrix-style Input
twinACEModel <- mxModel("twinACE",


As the focus is on individual differences, the model for the means is typically simple. We can estimate each of the means, in each of the two groups (MZ & DZ) as free parameters. Alternatively, we can establish whether the means can be equated across order and zygosity by fitting submodels to the saturated model. In this case, we opted to use one ‘grand’ mean, obtained by assigning the same label to the two elements of the matrix ‘expMeanMZ’ and the two elements of the matrix ‘expMeanDZ’, each of which are ‘Full’ 1x2 matrices with free parameters and start values of 20. Note again that dimnames are required for matrices or algebras that generate the expected mean vectors and expected covariance matrices.

mxMatrix(
type="Full",
nrow=1,
ncol=2,
free=T,
values=c(20,20),
labels= c("mean","mean"),
name="expMeanMZ"
),
mxMatrix(
type="Full",
nrow=1,
ncol=2,
free=T,
values=c(20,20),
labels= c("mean","mean"),
name="expMeanDZ"
),


Given the current example is univariate (in the sense that we analyze one variable, even though we have measured it in two members of twin pairs), the matrices for the paths ‘a’, ‘c’ and ‘e’, respectively, X, Y and Z are all ‘Full’ 1x1 matrices assigned the ‘free’ status and given a .6 starting value. We also specify the matrix h to have a fixed value of 0.5, necessary for the expectation of DZ twins.

mxMatrix(
type="Full",
nrow=1,
ncol=1,
free=TRUE,
values=.6,
label="a",
name="X"
),
mxMatrix(
type="Full",
nrow=1,
ncol=1,
free=TRUE,
values=.6,
label="c",
name="Y"
),
mxMatrix(
type="Full",
nrow=1,
ncol=1,
free=TRUE,
values=.6,
label="e",
name="Z"
),
mxMatrix(
type="Full",
nrow=1,
ncol=1,
free=FALSE,
values=.5,
name="h"
),


While the labels in these matrices are given lower case names, similar to the convention that paths have lower case names, the names for the variance component matrices, obtained from multiplying matrices with their transpose have upper case letters ‘A’, ‘C’ and ‘E’ which are distinct (as R is case-sensitive).

mxAlgebra(
expression=X * t(X),
name="A"
),
mxAlgebra(
expression=Y * t(Y),
name="C"
),
mxAlgebra(
expression=Z * t(Z),
name="E"
),


Previous Mx users will likely be familiar with the look of the expected covariance matrices for MZ and DZ twin pairs. These 2x2 matrices are built by horizontal and vertical concatenation of the appropriate matrix expressions for the variance, the MZ and the DZ covariance. In R, concatenation of matrices is accomplished with the ‘rbind’ and ‘cbind’ functions. Thus to represent the matrices in expression ? in R, we use the following code.

mxAlgebra(
expression=rbind (cbind(A + C + E, A + C),
cbind(A + C    , A + C + E)),
name="expCovMZ"
),
mxAlgebra(
expression=rbind (cbind(A + C + E  , h %x% A + C),
cbind(h %x% A + C, A + C + E)),
name="expCovDZ"
),


As the expected covariance matrices are different for the two groups of twins, we specify two mxModel commands within the ‘twinACE’ mxModel command. They are given a name, and arguments for the data and the objective function to be used to optimize the model. We have set the model up for raw data, and thus will use the mxFIMLObjective function to evaluate it. For each model, the mxData command calls up the appropriate data, and provides a type, here ‘raw’, and the mxFIMLObjective command is given the names corresponding to the respective expected covariance matrices and mean vectors, specified above.

mxModel("MZ",
mxData(
observed=mzfData,
type="raw"
),
mxFIMLObjective(
covariances="twinACE.expCovMZ",
means="twinACE.expMeanMZ",
dimnames=selVars
)
),
mxModel("DZ",
mxData(
observed=dzfData,
type="raw"
),
mxFIMLObjective(
covariances="twinACE.expCovDZ",
means="twinACE.expMeanDZ",
dimnames=selVars
)
),


Finally, both models need to be evaluated simultaneously. We first generate the sum of the objective functions for the two groups, using mxAlgebra, and then use that as argument of the mxAlgebraObjective command.

mxAlgebra(
expression=MZ.objective + DZ.objective,
name="twin"
),
mxAlgebraObjective("twin")
)


#### Model Fitting¶

We need to invoke the mxRun command to start the model evaluation and optimization. Detailed output will be available in the resulting object, which can be obtained by a print() statement.

#Run ACE model
twinACEFit <- mxRun(twinACEModel)


Often, however, one is interested in specific parts of the output. In the case of twin modeling, we typically will inspect the expected covariance matrices and mean vectors, the parameter estimates, and possibly some derived quantities, such as the standardized variance components, obtained by dividing each of the components by the total variance. Note in the code below that the mxEval command allows easy extraction of the values in the various matrices/algebras which form the first argument, with the model name as second argument. Once these values have been put in new objects, we can use and regular R expression to derive further quantities or organize them in a convenient format for including in tables. Note that helper functions could (and will likely) easily be written for standard models to produce ‘standard’ output.

MZc <- mxEval(expCovMZ, twinACEFit)
DZc <- mxEval(expCovDZ, twinACEFit)
M <- mxEval(expMeanMZ, twinACEFit)
A <- mxEval(A, twinACEFit)
C <- mxEval(C, twinACEFit)
E <- mxEval(E, twinACEFit)
V <- (A+C+E)
a2 <- A/V
c2 <- C/V
e2 <- E/V
ACEest <- rbind(cbind(A,C,E),cbind(a2,c2,e2))
LL_ACE <- mxEval(objective, twinACEFit)


### Alternative Models: an AE Model¶

To evaluate the significance of each of the model parameters, nested submodels are fit in which these parameters are fixed to zero. If the likelihood ratio test between the two models is significant, the parameter that is dropped from the model significantly contributes to the phenotype in question. Here we show how we can fit the AE model as a submodel with a change in one mxmMatrix command. First, we call up the previous ‘full’ model and save it as a new model ‘twinAEModel’. Next we re-specify the matrix Y to be fixed to zero. We can run this model in the same way as before and generate similar summaries of the results.

#Run AE model
twinAEModel <- mxModel(twinACEModel,
mxMatrix(
type="Full",
nrow=1,
ncol=1,
free=F,
values=0,
label="c",
name="Y"
)
)
twinAEFit <- mxRun(twinAEModel)

MZc <- mxEval(expCovMZ, twinAEFit)
DZc <- mxEval(expCovDZ, twinAEFit)
A <- mxEval(A, twinAEFit)
C <- mxEval(C, twinAEFit)
E <- mxEval(E, twinAEFit)
V <- (A+C+E)
a2 <- A/V
c2 <- C/V
e2 <- E/V
AEest <- rbind(cbind(A,C,E),cbind(a2,c2,e2))
LL_AE <- mxEval(objective, twinAEFit)


We use a likelihood ratio test (or take the difference between -2 times the log-likelihoods of the two models) to determine the best fitting model, and print relevant output.

LRT_ACE_AE <- LL_AE-LL_ACE

#Print relevant output
ACEest
AEest
LRT_ACE_AE


## Definition Variables, Matrix Specification¶

This example will demonstrate the use of OpenMx definition variables with the implementation of a simple two group dataset. What are definition variables? Essentially, definition variables can be thought of as observed variables which are used to change the statistical model on an individual case basis. In essence, it is as though one or more variables in the raw data vectors are used to specify the statistical model for that individual. Many different types of statistical model can be specified in this fashion; some are readily specified in standard fashion, and some that cannot. To illustrate, we implement a two-group model. The groups differ in their means but not in their variances and covariances. This situation could easily be modeled in a regular multiple group fashion - it is only implemented using definition variables to illustrate their use. The results are verified using summary statistics and an Mx 1.0 script for comparison is also available.

### Mean Differences¶

The scripts are presented here

• DefinitionMeans_MatrixRaw.R
• DefinitionMeans_MatrixRaw.mx

#### Statistical Model¶

Algebraically, we are going to fit the following model to the observed x and y variables:

where the residual sources of variance, and covary to the extent . So, the task is to estimate: the two means and ; the deviations from these means due to belonging to the group identified by having def set to 1 (as opposed to zero), and ; and the parameters of the variance covariance matrix: cov() which we will call or simply “S” in the R script.

#### Data Simulation¶

Our first step to running this model is to simulate the data to be analyzed. Each individual is measured on two observed variables, x and y, and a third variable “def” which denotes their group membership with a 1 or a 0. These values for group membership are not accidental, and must be adhered to in order to obtain readily interpretable results. Other values such as 1 and 2 would yield the same model fit, but would make the interpretation more difficult.

library(MASS)

set.seed(200)  # to make the simulation repeatable
n = 500        # sample size, per group

Sigma <- matrix(c(1,.5,.5,1),2,2)
group1<-mvrnorm(n=n, c(1,2), Sigma)
group2<-mvrnorm(n=n, c(0,0), Sigma)


We make use of the superb R function mvrnorm in order to simulate n=500 records of data for each group. These observations correlate .5 and have a variance of 1, per the matrix Sigma. The means of x and y in group 1 are 1.0 and 2.0, respectively; those in group 2 are both zero. The output of the mvrnorm function calls are matrices with 500 rows and 3 columns, which are stored in group 1 and group 2. Now we create the definition variable

# Put the two groups together, create a definition variable,
# and make a list of which variables are to be analyzed (selvars)
y<-rbind(group1,group2)
dimnames(y)[2]<-list(c("x","y"))
def<-rep(c(1,0),each=n)
selvars<-c("x","y")


The objects y and def might be combined in a data frame. However, in this case we won’t bother to do it externally, and simply paste them together in the mxData function call.

#### Model Specification¶

The following code contains all of the components of our model. Before running a model, the OpenMx library must be loaded into R using either the require() or library() function. This code uses the mxModel function to create an mxModel object, which we’ll then run. Note that all the objects required for estimation (data, matrices, and an objective function) are declared within the mxModel function. This type of code structure is recommended for OpenMx scripts generally.

defMeansModel <- mxModel("DefinitionMeans",
mxFIMLObjective(
covariance="Sigma",
means="Mu",
dimnames=selvars
),


The first argument in an mxModel function has a special function. If an object or variable containing an MxModel object is placed here, then mxModel adds to or removes pieces from that model. If a character string (as indicated by double quotes) is placed first, then that becomes the name of the model. Models may also be named by including a name argument. This model is named "DefinitionMeans".

The second argument in this mxModel call is itself a function. It declares that the objective function to be optimized is full information maximum likelihood (FIML) under normal theory, which is tagged as mxFIMLObjective. There are in turn two arguments to this function: the covariance matrix Sigma and the mean vector Mu. These matrices will be defined later in the mxModel function call.

Next, we declare where the data are, and their type, by creating an MxData object with the mxData function. This piece of code creates an MxData object. It first references the object where our data are, then uses the type argument to specify that this is raw data. Analyses using definition variables have to use raw data, so that the model can be specified on an individual data vector level.

mxData((
observed=data.frame(y,def)),
type="raw"
),


Model specification is carried out using mxMatrix functions to create matrices for the model. In the present case, we need four matrices. First is the predicted covariance matrix, Sigma. Next, we use three matrices to specify the model for the means. First is M which corresponds to estimates of the means for individuals with definition variables with values of zero. Individuals with definition variable values of 1 will have the value in M along with the value in the matrix beta. So both matrices are of size 1x2 and both contain two free parameters. There is a separate deviation for each of the variables, which will be estimated in the elements 1,1 and 1,2 of the beta matrix. Last, but by no means least, is the matrix def which contains the definition variable. The variable def in mxData data frame is referred to as data.def. In the present case, the definition variable contains a 1 for group 1, and a zero otherwise.

mxMatrix(
type="Symm",
nrow=2,
ncol=2,
free=TRUE,
values=c(1, 0, 1),
name="Sigma"
),
mxMatrix(
type="Full",
nrow = 1,
ncol = 2,
free=TRUE,
name = "M"
),
mxMatrix(
type="Full",
nrow=1,
ncol=2,
free=TRUE,
values=c(0, 0),
name="beta"
),
mxMatrix(
type="Full",
nrow=1,
ncol=2,
free=FALSE,
labels=c("data.def"),
name="def"
),


The trick - commonly used in regression models - is to multiply the beta matrix by the def matrix. This multiplication is effected using an mxAlgebra function call:

    mxAlgebra(
expression=M+beta*def,
name="Mu"
)
)


The result of this algebra is named Mu, and this handle is referred to in the mxFIMLObjective function call. We can then run the model and examine the output with a few simple commands.

#### Model Fitting¶

# Run the model
defMeansFit <- mxRun(defMeansModel)
defMeansFit@matrices
defMeansFit@algebras


It is possible to compare the estimates from this model to some summary statistics computed from the data:

# Compare OpenMx estimates to summary statistics computed from raw data.
# Note that to calculate the common variance,
# group 1 has the 1 and 2 subtracted from every Xi and Yi in the sample
# data, so as to estimate variance of combined sample without the mean correction.

# First we compute some summary statistics from the data
ObsCovs<-cov(rbind(group1-rep(c(1,2),each=n),group2))
ObsMeansGroup1<-c(mean(group1[,1],mean(group1[,2]))
ObsMeansGroup2<-c(mean(group2[,1],mean(group2[,2]))

# Second we extract the parameter estimates and matrix algebra results from the model
Sigma<-run@matrices$Sigma@values Mu<-run@algebras$Mu@result
M<-run@matrices$M@values beta<-run@matrices$beta@values

# Third, we check to see if things are more or less equal
omxCheckCloseEnough(ObsCovs,Sigma,.01)
omxCheckCloseEnough(ObsMeansGroup1,as.vector(M+beta),.001)
omxCheckCloseEnough(ObsMeansGroup2,as.vector(Mu),.001)
`