mxGREMLDataHandler {OpenMx}R Documentation

Helper Function for Structuring GREML Data


This function takes a dataframe or matrix and uses it to setup the 'y' and 'X' matrices for a GREML analysis; this includes trimming out NAs from 'X' and 'y.' The result is a matrix the first column of which is the 'y' vector, and the remaining columns of which constitute 'X.'


mxGREMLDataHandler(data, yvars=character(0), Xvars=list(), addOnes=TRUE, 
                  blockByPheno=TRUE, staggerZeroes=TRUE)



Either a dataframe or matrix, with column names, containing the variables to be used as phenotypes and covariates in 'y' and 'X,' respectively.


Character vector. Each string names a column of the raw dataset, to be used as a phenotype.


A list of data column names, specifying the covariates to be used with each phenotype. The list should have the same length as argument yvars.


Logical; should lead columns of ones (for the regression intercepts) be adhered to the covariates when assembling the 'X' matrix? Defaults to TRUE.


Logical; relevant to polyphenotype analyses. If TRUE (default), then the resulting 'y' will contain phenotype #1 for individuals 1 thru n, phenotype #2 for individuals 1 thru n, ... If FALSE, then observations are "blocked by individual", and the resulting 'y' will contain individual #1's scores on phenotypes 1 thru p, individual #2's scores on phenotypes 1 thru p, ... Note that in either case, 'X' will be structured appropriately for 'y.'


Logical; relevant to polyphenotype analyses. If TRUE (default), then each phenotype's covariates in 'X' are "staggered," and 'X' is padded out with zeroes. If FALSE, then 'X' is formed simply by stacking the phenotypes' covariates; this requires each phenotype to have the same number of covariates (i.e., each character vector in Xvars must be of the same length). The default (TRUE) is intended for instances where the multiple phenotypes truly are different variables, whereas staggerZeroes=FALSE is intended for instances where the multiple "phenotypes" actually represent multiple observations on the same variable. One example of the latter case is longitudinal data where the multiple "phenotypes" are repeated measures on a single phenotype.


For a monophenotype analysis (only), argument Xdata can be a character vector. In a polyphenotype analysis, if the same covariates are to be used with all phenotypes, then Xdata can be a list of length 1.

Note the synergy between the output of mxGREMLDataHandler() and arguments and casesToDropFromV to mxExpectationGREML().

If the dataframe or matrix supplied for argument data has n rows, and argument yvars is of length p, then the resulting 'y' and 'X' matrices will have np rows. Then, if either matrix contains any NA's, the rows containing the NA's are trimmed from both 'X' and 'y' before being returned in the output (in which case they will obviously have fewer than np rows). Function mxGREMLDataHandler() reports which rows of the full-size 'X' and 'y' were trimmed out due to missing observations. These row indices can be provided as argument casesToDropFromV to mxExpectationGREML().


A list with these two components:


Numeric matrix. The first column is the phenotype vector, 'y,' while the remaining columns constitutethe 'X' matrix of covariates. If this matrix is used as the raw dataset for a model, then the model's GREML expectation can be constructed with in mxExpectationGREML().


Numeric vector. Contains the indices of the rows of the 'y' and 'X' that were dropped due to containing NA's. Can be provided as as argument casesToDropFromV to mxExpectationGREML().


The OpenMx User's guide can be found at

See Also

For more information generally concerning GREML analyses, including a complete example, see mxExpectationGREML(). More information about the OpenMx package may be found here.


dat <- cbind(rnorm(100),rep(1,100))
colnames(dat) <- c("y","x")
dat[42,1] <- NA
dat[57,2] <- NA
dat2 <- mxGREMLDataHandler(data=dat, yvars="y", Xvars=list("x"),
  addOnes = FALSE)

[Package OpenMx version 2.7.9 Index]