This document was authored by Michael D. Hunter, M.A.
This document is a draft. Please excuse its disorderly state as I continue writing. -MDH
This example will show how full information maximum likelihood can be implemented using a row-by-row evaluation of a likelihood function. This example is in two parts. The first part is a discussion of full information maximum likelihood. The second part is an example implementation of full information maximum likelihood in a row-wise objective function that estimates the saturated model in two variables. The second part refers to the following file:
There is a parallel version of this example that uses the standard full information maximum likelihood implementation here:
The goal of the current document is twofold: to increase users’ understanding of full information maximum likelihood, and to assist users in implementing their own row objective functions.
Full information maximum likelihood is almost universally abbreviated FIML, and it is often pronounced like “fimmle” if “fimmle” was an English word. FIML is often the ideal tool to use when your data contains missing values because FIML uses the raw data as input and hence can use all the available information in the data. This is opposed to other methods which use the observed covariance matrix which necessarily contains less information than the raw data. An observed covariance matrix contains less information than the raw data because one data set will always produce the same observed covariance matrix, but one covariance matrix could be generated by many different raw data sets. Mathematically, the mapping from a data set to a covariance matrix is not one-to-one (i.e. the function is non-injective), but rather many-to-one.
Although there is a loss of information between a raw data set and an observed covariance matrix, in structural equation modeling we are often only modeling the observed covariance matrix and the observed means. We want to adjust the model parameters to make the observed covariance and means matrices as close as possible to the model-implied covariance and means matrices. Therefore, we are usually not concerned with the loss of information from raw data to observed covariance matrix. However, when some raw data is missing, the standard maximum likelihood method for determining how close the observed covariance and means matrices are to the model-expected covariance and means matrices fails to use all of the information available in the raw data. This failure of maximum likelihood (ML) estimation, as opposed to FIML, is due to ML exploiting for the sake of computational efficiency some mathematical properties of matrices that do not hold true in the presence of missing data. The ML estimates are not wrong per se and will converge to the FIML estimates, rather the ML estimates do not use all the information available in the raw data to fit the model.
The intelligent handling of missing data is a primary reason to use FIML over other estimation techniques. The method by which FIML handles missing data involves filtering out missing values when they are present, and using only the data that are not missing in a given row.
If X is the entire data set then the minus two log likelihood of row i of the data is
Equation (1) is identically equal to times the logarithm of the probability density function of the multivariate normal distribution. The minus two log likelihood is quite literally minus two times the log of the probability of the data given the model.
There are several important things to note about Equation (1).
First, the model-implied means vector and the model-implied covariance matrix are for the the manifest variables only. Although your structural equation model may involve both latent and manifest variables, the latent variables are only present to explain the means and covariances of the observed variables in a meaningful and parsimonious way. The free parameters of your model are adjusted to make the model-implied means vector and the model-implied covariance matrix as close as possible to the observed means vector and the observed covariance matrix.
Second, there are several references to filtered vectors and matrices. Filtering is how FIML handles missing data. All filtering is performed based on the pattern of missingness observed in a given row of data. If the observed values of a given row, , of data are where denotes a missing value, then the filtered data row is . The filtered data row is merely the original data row with the missing entries removed. For this row, entries 3, 5, and 6 are removed. Alternatively, entries 1, 2, and 4 are kept.
The filtered, model-implied means row vector is similar. If the original model-implied means row vector is , then the filtered model-implied means row vector for row is , keeping only entries 1, 2, and 4.
The filtered, model-implied covariance matrix is marginally more complicated. It must be selected on both rows and columns. If is the model-implied covariance matrix and is given by
then the filtered covariance matrix selects rows 1, 2, and 4
and columns 1, 2, and 4.
The selection on both rows and columns yields the following filtered expected covariance matrix.
In practical implementations of FIML, the data are first sorted based on their pattern of missingness, so that all the rows missing on variables 3, 5, and 6 are computed together followed by all the rows with a different missingness pattern. This sorting allows fewer filterings to be performed and often accelerates the likelihood computation. In the row objective implementation shown below there is no data sorting because it is for demonstration purposes only. The implementation of FIML in the backend of OpenMx uses this data sorting and other techniques to provide speed ups. The details are in the source code at http://openmx.psyc.virginia.edu/svn/trunk/R/MxFIMLObjective.R and http://openmx.psyc.virginia.edu/svn/trunk/src/omxFIMLObjective.c .
There is one final note to discuss about Equation (1). A very important component to Equation (1) is . It is a quadratic form. Any expression of the form where is a row-vector and is a matrix is called a quadratic form. Equivalently, a quadratic form can be stated as where is a column-vector. In mathematical circles it is typical to express quadratic forms in terms of columns vectors as , whereas in statistical circles it is common to express quadratic forms in terms of row vectors as . The difference is completely arbitrary and due to tradition and convenience. Quadratic forms arise in many disciplines: in engineering as linear quadratic regulators, in physics as potential and kinetic energy, and in economics as utility functions. Quadratic form appear in many optimization problems, so it is no surprise that they appear in the FIML equation.
A quadratic form can also be thought of as a quadratic product of and , so that .
The particular quadratic form in Equation (1) has a special meaning and interpretation. It is the squared Mahalanobis distance from the data row to the mean vector in the multivariate space defined by the covariance matrix . Intuitively, the Mahalanobis distance from the mean vector tells you how far an observation is from the center of the distribution, taking into account the spread of the distribution in all directions.
For well-behaved covariance matrices, the value of the quadratic form in equation (1) (i.e. the squared Mahalanobis distance) is always greater than or equal to zero, and equal to zero only when the observation row vector is exactly equal to the mean vector. The likelihood functions in maximum likelihood (ML) and in FIML are not defined when this is not the case. In general for any row-vector and square, symmetric matrix , if for any , then the quadratic form and the matrix are called positive definite.
Because ML and FIML are not defined when the model-implied covariance matrix is not positive definite, frequent and often cryptic error message that users of any structural equation modeling program receive is something like ERROR: EXPECTED COVARIANCE MATRIX IS NOT POSITIVE DEFINITE. A number of different problems could induce this error. The model may be unidentified; a variable may have zero variance, i.e. be a constant; one variable might be a linear combination of another variable or equal to another variable; the starting values might imply an impossible covariance matrix; a variable may have zero or negative error (i.e. residual) variance. In any case, it is a good idea to check your model specification for theoretical and typographical errors, and if you are expecting a parameter like an error variance to be greater than zero then set zero as that parameter’s lower bound.
Now that the FIML equation for a single row of data has been discussed, it is relevant to see how the full information maximum likelihood of the entire data set is computed.
The minus two log likelihood of the entire data set is the sum of the minus two log likelihoods of the rows.
where there are rows in the data.
We will now implement FIML using a row-wise objective function. The mxRowObjective() function evaluates an mxAlgebra for each row of a data set. It then stores the results of this row-wise evaluation in an mxAlgebra which is by default called “rowResults”. Finally, the row results must be collapsed into a single number. Another mxAlgebra called the “reduceAlgebra” takes the row results and reduces them to a single number which is then minimized.
For this example we will simulate our own data. We will use the mvrnorm() function which lives is the MASS package. The mvrnorm() function generates a multivariate random normal sample with a given vector of means and a given covariance matrix. The following code generates the data.
require(MASS) set.seed(200) rs <- .5 xy <- mvrnorm (1000, c(0,0), matrix(c(1, rs, rs, 1), nrow=2, ncol=2))
The data have 2 variables with 1000 rows. The true means are 0. Each variable has a true variance of 1.0, and a covariance of 0.5.
Some further data processing will prove helpful. First, we recast the generated data as a data.frame object in R. Second, we tell R that what we want the variables names to be. Finally, we look at a summary of the data set and the observed covariance matrix which differs slightly from the covariance matrix used to generate the data.
testData <- as.data.frame(xy) testVars <- c('X','Y') names(testData) <- testVars summary(testData) cov(testData)
Now the data has been generated and we can specify the saturated model.
We generate an mxModel, give it data, and two mxMatrix objects. The first mxMatrix is a row-vector or completely free parameters and is the model-implied means vector. Because we are specifying the saturated model, the means are freely estimated. The second mxMatrix gives the model-implied covariance matrix. Because we are specifying the saturated model, the covariance matrix is freely estimated, however it is still constrained to by symmetric and the starting values are picked so that the variances on the diagonal are in general larger than the covariances.
bivCorModelSpec <- mxModel( name="FIML Saturated Bivariate", mxData( observed=testData, type="raw", ), mxMatrix( type="Full", nrow=1, ncol=2, free=TRUE, values=c(0,0), name="expMean" ), mxMatrix( type="Symm", nrow=2, ncol=2, values=c(.21, .2, .2, .21), free=TRUE, name='expCov' ) )
We create a new mxModel that has everything from the previous model. We then create mxAlgebra objects that filter the expected means vector and the expected covariance matrix. We also create an mxAlgebra that keeps track of the number of variables that are not missing in a given row.
bivCorFiltering <- mxModel( model=bivCorModelSpec, mxAlgebra( expression=omxSelectRowsAndCols(expCov, existenceVector), name="filteredExpCov", ), mxAlgebra( expression=omxSelectCols(expMean, existenceVector), name="filteredExpMean", ), mxAlgebra( expression=sum(existenceVector), name="numVar_i") )
We create a new mxModel that has everything from the previous models.
bivCorCalc <- mxModel( model=bivCorFiltering, mxAlgebra( expression = log(2*pi), name = "log2pi" ), mxAlgebra( expression=log2pi %*% numVar_i + log(det(filteredExpCov)), name ="firstHalfCalc", ), mxAlgebra( expression=(filteredDataRow - filteredExpMean) %&% solve(filteredExpCov), name = "secondHalfCalc", ) )
We create a new mxModel that has everything from the previous models.
bivCorRowObj <- mxModel( model=bivCorCalc, mxAlgebra( expression=(firstHalfCalc + secondHalfCalc), name="rowAlgebra", ), mxAlgebra( expression=sum(rowResults), name = "reduceAlgebra", ), mxRowObjective( rowAlgebra='rowAlgebra', reduceAlgebra='reduceAlgebra', dimnames=c('X','Y'), ) ) bivCorTotal <- bivCorRowObj
bivCorFit <- mxRun(bivCorTotal)