## Ordinal variables with many categories

5 replies [Last post]
Offline
Joined: 01/14/2010

If an ordinal variable has 20 categories, is it practical to try to handle this with thresholds? Or would it be better to treat as continuous?

Offline
Joined: 07/31/2009
If you have enough data

It is practical, but the different between threshold analysis and continuous would be slight if the density distribution of the categories is approximately normal. A highly skewed 20 category variable may still be better analyzed as ordinal - depending on how it got that way. Of course, it's a lot more parameters and computationally slower to analyze ordinal data with lots of categories.

Online
Joined: 07/31/2009
lots of levels, but values below a cutoff set to 0

What if the data have lots of levels, but are the right hand side of a normal distribution, with all values below ~ z==0 set to 0. e.g.: clinical symptom data with continuous severity above the threshold for having any symptoms?

> table(mxFactor(ocd\$OCI_OBSESSION, levels = c(0:20)))
0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20
2813  746  462  295  197  107   84   49   48   37   21   13   14    4    9    7    7    1    2    0    2

Then one needs to stay ordinal to avoid the model treating everyone with score "0" as having risk ~ scores of 1.

Given that analysis is prohibitive (can be hundreds of hours even running multi-core) for a large multivariate model where variables have this many levels (and some empty categories), is there a logical basis for cutting the data into fewer quantiles? How would one justify a given choice (say 5, vs, 7, vs 10 or 12 levels?) Use a criterion like no group smaller than 50 or so?

table(cut2(ocd\$OCI_OBSESSION, m=75))
0       1       2       3       4       5       6 [ 7, 9) [ 9,11) [11,29]
2813     746     462     295     197     107      84      97      58      69

Offline
Joined: 07/31/2009
Maybe not so large

Smaller than 50 can work too, but it depends on what the joint distribution with other variables being analyzed looks like. If these other variables are binary or ordinal, and don't have appreciable cell sizes, the polychoric correlation can be indeterminate due to zero cells. So it's not easy to have a rule of thumb to decide minimum cell frequency. If the categories are pretty arbitrary, then you might think about approximate deciles as a compromise between continuous information but ordinal data. If the first cell occupies several deciles, well that it will have to do.

Offline
Joined: 05/03/2013
I am quite new for OpenMX. I

I am quite new for OpenMX. I have not figure out what the difference between binary data(with one threshold) with more than 2 category (say a variable with N category, so we have N-1 numbers of threshold).How does the script like? could you please give me some basic script as explaination? Thank you so much.

Offline
Joined: 07/31/2009