Ordinal Data Proposal (#2)

2 replies [Last post]
tbrick's picture
Offline
Joined: 07/31/2009

I've been talking to Ryne about this, and I think we've put together a fairly reasonable proposal for the user experience for ordinal data.

We welcome comments and suggestions.

A) From the pathy perspective:
Models of type=RAM have an additional function mxThreshold(), similar to mxPath(). mxThreshold takes a list of data column names, and then lists of vectors of starting values, labels, and freeness/fixedness specifiers. Each of these lists has the same length as the list of data columns, and consists of vectors of numerics, strings, and logicals, respectively. These should probably repeat if you don't specify enough of them, so that an IRT model with 20 variables rated 1-6 can be entered as:

 irtModel <- mxModel(irtModel, mxThresholds(names(data), value=c(-2,-1,0,1,2), labels=NA, free=T))

to have free independent thresholds for each of the variables in the data set. Names will have to line up with the names of data columns, or an error will be thrown, either now or at mxRun(). Any data elements that are not already R ordered factors will be coerced with ordered(). After all coercion happens, sanity checks will make sure the number of levels is consistent with the number of thresholds, etc.

B) From the matrix perspective:
Any objective function capable of dealing with objective data (right now, RAM and FIML) includes a slot "thresholds". This slot is filled with an mxMatrix in the same way that mxRAMObjective's A slot or mxFIMLObjective's cov slot is. (In implementation, I expect this will be similar to those implementations--the model will contain the mxMatrix object, and the objective function's slot will just hold the name of that matrix.)

The slot "thresholds" contains a rectangular matrix with n rows and m columns, where m is the number of ordinal data columns and n is the maximum number of thresholds needed for any of those columns. The elements of the threshold matrix are the values of the thresholds, specified like any other matrix. Elements where no threshold exists have their values set to NA [for example, if data column 1 needs only two thresholds and data column 2 requires 3, the values matrix might be matrix(cbind(-1, 1, NA), cbind(-1, 0, 1))]. The dimnames() of this matrix must correspond to columns in the data.

Any data element whose name matches one of the dimnames of the threshold matrix is considered to be an ordinal variable. If it is not already an R ordered factor, it is coerced into one using ordered(). After all coercion happens, sanity checks will make sure the number of levels is consistent with the number of thresholds, etc.

C) Overall Notes:
1. This way of doing things allows anyone in a "standard case", where all levels of their factors are used, to not have to muck around with R's factors, but allows them to do so if they want to add levels that aren't used or do something else out of the ordinary. Users are protected from some accidents of automatic conversion because their number of thresholds is checked against the number of factor levels, and an error thrown if nothing is specified. We'll need to make note of the defaults of ordered(), so that we can tell our users how to specify their data columns, but it seems to work quite reasonably if given integers or numerics.

2. It also keeps the thresholds with the model, but in a way that they can be easily transported from one model to another, or simply referenced from a sub- or supermodel.

Steve's picture
Offline
Joined: 07/30/2009
Comments on A) "Names will

Comments on A)

"Names will have to line up with the names of data columns, or an error will be thrown, either now or at mxRun()." When you say names will have to line up, do you mean that names in the vector given to mxThreshold() will need to exist in the variable names in mxData()? Or do you mean that all names in mxData() must be appear and be in the same order in mxThreshold()? I would vote for "names exist in mxData()".

Otherwise, A) seems quite reasonable to me. We might want to add one or more arguments (poly=, link=) later to allow for LIML. I guess the pathic to mathic machinery would end up placing these in a modified RAM objective function.

Comments on B)

If we coerce to ordered(), we should throw a warning.

C) General comments.

In the pathic case, are ordered factors coerced in the same was as in the mathic case?

What happens at the back end if I just use an ordered factor as a predictor?

Paras's picture
Offline
Joined: 07/31/2009
This sounds good. As Mike

This sounds good.
As Mike Neale pointed out earlier, in the multiple groups case there will need to be proper error messages. Problems can occur if group 1 has data in categories {1 & 2} and group 2 has data in categories {2 & 3). In this case, either the user will need to use explicit threshold-specification in each group, or, we could have default alignment of thresholds across groups and set appropriate thresholds as missing in each group.