choice of using FIML for binary data

4 replies [Last post]
MC's picture
MC
Offline
Joined: 11/09/2012

Hello,
my question is quite a general newby one on whether to use FIML, and therefore OpenMX. I'd appreciate any comments on which direction to take.

I'm considering using it (and therefore using OpenMX) because I understood it is very well adapted to use with binary data, compared to conventional techniques (using tetrachoric correlation is often problematic, I believe, and WLS requires large numbers). However, the OpenMx Manual cites handling of missing data (which didn't worry me that much, as I was expecting to use listwise deletion):

'The intelligent handling of missing data is a primary reason to use FIML over other estimation techniques' (4.3.1)

In fact, in the data sets I will be analysing, I have between 3,000 and 30,000 cases (test candidates) and around 30 variables (test items), so my worry was mainly tetrachoric correlation, rather than WLS. My aim is to determine and verify dimensional models of two language tests and then compare them.

many thanks for any enlightenment,

Michael Corrigan

neale's picture
Offline
Joined: 07/31/2009
FIML ok if number of variables not large

Hi Michael

FIML becomes prohibitively computationally expensive with a large number of variables. Suppose it takes 10 points to integrate each dimension. Then for m variable analysis we need 10^m poiints. So it's about an order of magnitude slower with each additional variable. It's practical for perhaps a dozen or even with patience up to 20 variables, but beyond this it's really not feasible. However, there are some alternatives. One is to use weighted least squares and OpenMx can compute the tetrachorics and a weight matrix for this purpose. Easiest is diagonally weighted least squares, in which the reciprocal of the standard error of the correlation is used to form the weight. This is much better than 'pretending' that the correlation was estimated from continuous data, because tetrachorics' precision depends on where the threshold is, as well as how large the correlation is.

I'm sure what it is that worries you about the tetrachoric?

MC's picture
MC
Offline
Joined: 11/09/2012
tetrachorics issues

Hello Neale,
thanks very much for your help.

On the topic of tetrachoric correlation, I think that it is quite common to end up with a non-positive definite matrix and, therefore, something unresolvable. FIML seemed a way to avoid this kind of difficulty.

thanks

neale's picture
Offline
Joined: 07/31/2009
Non-pd

For sure FIML is ideal, but only if you have a manageable number of items (say <20 and preferable <12 or so depending on your hardware).

There are two possibilities for non-positive definiteness. One is, as you say, with the observed matrix of tetrachoric correlations when this is compiled through pairwise analyses to obtain the tetrachorics. If one is using diagonally weighted (or unweighted) least squares this isn't a complete non-starter because you can still fit a model to the non-pd correlation matrix. Another possibility is to use a ridge function to bring the matrix into positive definite land (taking the eigenvalues V and eigenvectors W, increasing those eigenvalues that are zero or negative so that they are small and positive, to form V2 then computing W %*% diag(V2) %*% t(W). This approach can yield a correlation matrix which is really quite close (say the same to within a couple of decimal places) to the original, but I must say it feels a bit dishonest - how much should the smallest eigenvalues be increased?

Another possibility is that the weight matrix of the polychorics (whether these weights are estimated by one big analysis of all the items together, or say by trios of variables) may be non-pd. This would be a non-starter for asymptotic weighted least squares because the weight matrix cannot be inverted. In principle this matrix too could be ridged. Perhaps there are some simulation studies out there that examine how well these approaches work in practice.

mdewey's picture
Offline
Joined: 01/21/2011
Methods in other R packages

There are implementations in several R packages. Look at cor.smooth in package psych for some links and references.