Sat, 08/15/2009 - 16:59

In the interest of moving forward, I have put together a proposal for representing ordinal data in OpenMx. The scope of this proposal shall be limited to ordinal data; further discussion on how to represent other data types in OpenMx shall be released in another forthcoming proposal. Corrections or comments to this proposal are encouraged. Alternative proposals are also encouraged, and should start a new forum topic [Ordinal Data Proposal (#n)].

1. Use R's existing data structures for ordered and unordered factors. Unordered factors are created using factor(). Ordered factors are created using ordered() or factor(ordered = TRUE). The various R functions that read data into a data.frame have configurable behavior on how to create factors.

2. Add an optional argument "threshold" to the mxData function. "threshold" will accept a list. Each list element must be a vector. The name of each list element must match the name of one of the columns of the "observed" argument to mxData, creating a correspondence between the "observed" argument and the "threshold" argument. Each list element of "threshold" must correspond to either a factor or an ordered factor. The length of each list element must be equal to one minus the length of the corresponding factor or ordered factor.

Here are my thoughts on it. Take 'em as you will.

On point 1: As I understand it, thresholds will require ordinal data (that is, an ordered factor), not just a factor. We'll need to decide whether to assume order or kick back an error in the case of an unordered factor. Maybe we assume it with a warning?

On point 2: What goes in the vector?

I like the prospect of a list of elements, one for each appropriate data column. But what value goes into that list?

I'm guessing that people will want to be able to either free or fix parameters for the thresholds, and probably constrain them to be equal to other free parameters (for example, across groups). So we'll need starting values, free/fixed, and labels, probably, as well as output.

This sounds a whole lot like an MxMatrix object, so we might just want to make each one an MxMatrix instead. This requires a lot of typing--one MxMatrix() statement per ordinal data column, but it makes sense in the framework we already have.

Alternately, the "OldMx" way to do it would be to have the thresholds element be a single large MxMatrix with appropriate dimnames. This has the benefit of (for example) letting people free all the thresholds at once, but seems conceptually less clean. (Syntax options for setting threshold 3 of data column scale2 to .1:

`thresholds$values['scale2', 3] <- .1`

versus`thresholds$scale2$values[3] <- .1`

)In a some cases, people will want the variance of the underlying distribution to be estimated as well. The model isn't identified if you free all the thresholds and free the variance, but people will want to fix the values of some thresholds and free the variance of the distribution instead. I don't know if we'll need this for the first version of ordinal, but it's something to keep in mind.

There's also the question of whether this should go in the mxData object or in the model. Theoretically, I'd say the model is a better place, since thresholds make model assumptions about the data. The catch is that we then need a separate mechanism to have them pass down to submodels alongside the data.

Case 1: I am not sure if the intent pf proposal #1 is to provide a means of supplying user computed sample thresholds (along with estimated polychoric correlations), to make it analogous to sample cov/means??

Case 2: On the other hand, if it was intended as a way of definining threshold structure, then as per tbrick's comments the specification should be part of the model and thresholds should be an MxMmatrix.

I would think in most instances, the user will want openMx to compute thresholds from raw-data and further estimate a model (case 2). However, I do see (case 1) as being potentially useful under some circumstances. So, allowing the option would certainly be useful.

Also for case 2, we need an option to supply asymptotic covariance matrix as well- something we do not have for the continuous case. This case, may be occasionally useful.

Proposal 1: Estimation related issues

The ordinal threshold proposal also involves estimation method.

Original Mx only implemented FIML for ordinal outcomes (far ahead of everybody else, I should add!).

CASE I: (User supplies thresholds and polychoric correlations) would lead to limited information estimation - some variant of weighted least squares. This may actually be a very useful option to full-information estimation.

CASE II (raw-data): The user may request either full/limited information estimation. My understanding is that the immediate task is to replicate Mx I functionality. In which case, only CASE II is relevant from an estimation perspective.

However, CASE I is still useful to (a) ensure correspondance between continuous and discrete case specification, and (b) to keep the option of implementing LIML estimators open.