Fri, 12/17/2010 - 19:35

We are implementing the joint continuous and ordinal piece for the FIML objective function. There was a long discussion at this week's developers meeting on how to match expected means/covariances and thresholds to the observed dataset. First, here is some background.

Currently, FIML applied to continuous data requires row and column names for the expected covariance matrix, and row names for the expected means vector. The row names of the covariance matrix, the column names of the covariance matrix, and the row names of the expected means vector must all be identical. These names must be a subset of the column names in the raw dataset. The order of the expected names does not need to match the order of the column names in the dataset. The 'dimnames' argument to the mxFIMLObjective() function is a shortcut for specifying the three vectors.

Currently, FIML applied to ordinal data has the same requirements as the continuous case. In addition, the thresholds matrix must have column names defined. The column names of the thresholds matrix are not assigned by the 'dimnames' argument of the mxFIMLObjective() function. The column names of the thresholds matrix do not need to match the 'dimnames' argument of the mxFIMLObjective() function.

There were four proposal discussed at the meeting today:

1) Keep the current system. The column names of the thresholds matrix must be specified explicitly.

2) Allow the 'dimnames' argument to apply to the expected covariance, expected means, and thresholds matrices.

2a) require the thresholds matrix to contain dummy columns for the continuous data

2b) use a filtering procedure on the 'dimnames' argument to select the ordinal data when applying the names on the columns of the thresholds matrix

3) Change the signature of the FIML objective function to `mxFIMLObjective(covariance, means, dimnames = NA, thresholds = NA, vector = FALSE, threshnames = dimnames)`

has anything been decided on this issue?

This is the only objection that has been raised again proposal (3):

If (a) you have an existing script that uses ordinal data with FIML, and (b) you have declared dimnames for the thresholds MxMatrix or MxAlgebra object, and (c) the order of your rownames in your thresholds object is different from the 'dimnames' argument to the mxFIMLObjective() function, then you'll need to change your call to mxFIMLObjective by adding the argument 'threshnames = NA'. Alternatively, you could change 'threshnames' to match the row names of the thresholds matrix. Otherwise you'll get an error about the 'threshnames' argument to the mxFIMLObjective() function being inconsistent with the rownames in your thresholds matrix. This is because the default value for 'threshnames' will be the value of 'dimnames' in mxFIMLObjective().

I will grant that (a) ∧ (b) ∧ ¬ (c) are common. We have not seen a single instance of (a) ∧ (b) ∧ (c) in the wild. In all other cases, existing scripts will continue to work. That includes continuous data, ordinal data, and joint ordinal and continuous data. I don't see another proposal that continues to work in 99% of existing scripts and performs the consistency check that Ryne explained in his post.

2b Having a filter extract all the ordinal columns from amongst the names specified in dimnames sounds easiest for the user (although often the row/col names are set prior to relying on FIMLobjective anyway).

Option 3: adding threshnames = dimnames to the FIML objective function has the virtue of making it explicit what the user is doing. But good error messages in 2b would compensate.

2a. Having dummy columns in the thresholds matrix sounds awkward.

I'll relay one issue with 2b that came up repeatedly; by having the dimnames for the thresholds matrix be a filtered version of the cov/means matrix names, the dimnames of the thresholds matrix now depend on the data. We've previously required seperate specification of model-implied variable names (dimnames) and data variable/column names, which provided error checking and an independence between model and data.

Proposal 2b creates dependence between data and model, such that one could change the data (say, by changing x from factor to continuous and y from continuous to factor in a dataset, or just swap out data sets with similar structures) in such a way that the model itself changes. This also requires that users understand data flow in model trees, especially in cases with multiple datasets or dynamic data. In the basic case, we would go from a system that checks whether the dimnames of each matrix match the data to one where we check whether both model and data have the same number of ordinal variables.

It's column names of the means matrix (1 x k matrix representing the means vector).

I'll also add the TL;DR version: we are close to analyzing both continuous and ordinal data simultaneously under FIML. Please review the 3-4 proposals above and let us know how you would like to specify threshold matrices and their dimnames for joint continuous-ordinal models.