Tue, 12/22/2009 - 13:11

Hi, I'm trying to figure out how to specify models with categorical outcomes. Can you get me started with a modification of your simple example below -- how would you change it to work for categorical outcomes?

require(OpenMx)

data(demoOneFactor)

manifests <- names(demoOneFactor)

latents <- c("G")

factorModel <- mxModel("One Factor",

type="RAM",

manifestVars=manifests,

latentVars=latents,

mxPath(from=latents, to=manifests),

mxPath(from=manifests, arrows=2),

mxPath(from=latents, arrows=2, free=F, values=1.0),

mxData(cov(demoOneFactor), type="cov", numObs=500))

summary(mxRun(factorModel))

I am trying to do the same, extending a basic (working) 1 factor example with thresholds. In the example I have three 5-point Likert items as indicators, so I specify a 3x4 threshold matrix with all cells freely estimated.

Unfortunately, running the model results in the error

`Error: The job for model '1factor_model_RAM_thresh' exited abnormally with the error message: Objective function returned an infinite value.`

I get the same error if I start from the mxFIMLObjective specification in the example 'OrdinalTest.R' (without the path-specification part), so I guessing I do not really understand how to specify thresholds...

Hi

You were very close to having it run;

values=cbind( (-2:1),

(-2:1),

(-2:1)),

would have "worked" in that the model would be fitted (see ###1 below), although the model is not identified:

The model does not satisfy the first-order optimality conditions to the required accuracy, and no improved point for the merit function could be found during the final linesearch (Mx status RED)

free parameters: thresh 1 belang -0.68785859 NaN thresh 2 belang -0.13691998 1.4253940 thresh 3 belang 0.89381646 1.8988307 thresh 4 belang 1.61439240 0.4731209 thresh 1 moskee 0.31658257 NaN thresh 2 moskee 0.50944070 NaN thresh 3 moskee 0.98157705 NaN thresh 4 moskee 1.56038768 NaN thresh 1 koran 0.35927457 NaN thresh 2 koran 0.46067897 2.7499008 thresh 3 koran 0.65612570 4.7995829 thresh 4 koran 0.90837487 6.6301938

name matrix row col Estimate Std.Error

1 l21 A 2 4 1.34006721 NaN

2 l31 A 3 4 0.64320758 NaN

3 e1 S 1 1 0.49797100 NaN

4 e2 S 2 2 0.13839618 NaN

5 e3 S 3 3 0.06882259 0.3997287

6 varReligie S 4 4 1.57808455 NaN

7 meanbelang M 1 1 1.00129453 1.8267578

8 meanmoskee M 1 2 -0.63725041 NaN

9 meankoran M 1 3 -0.22759640 NaN

10

11

12

13

14

15

16

17

18

19

20

21

observed statistics: 8370

estimated parameters: 21

degrees of freedom: 8349

-2 log likelihood: 15275.88

To identify the model, it is necessary to fix the first two thresholds of each variable, because the mean and the variance of each variable are being estimated. I prefer to fix these thresholds to zero and one. The attached runs nicely and Std Errors are reported happily:

> summary(model.thresh.run)

data:

$`1factor_model_RAM_thresh.data`

belang moskee koran

1:329 1:1958 1:2072

2:274 2: 89 2: 98

3:733 3: 219 3: 181

4:514 4: 241 4: 189

5:940 5: 283 5: 250

free parameters: thresh 3 belang 2.870869 0.1291761 thresh 4 belang 4.178778 0.2040375 thresh 3 moskee 3.448120 0.3028074 thresh 4 moskee 6.449431 0.6109066 thresh 3 koran 2.927398 0.2374341 thresh 4 koran 5.414895 0.4842495

name matrix row col Estimate Std.Error

1 l21 A 2 4 3.828229 0.4445358

2 l31 A 3 4 3.494584 0.3962465

3 e1 S 1 1 1.640563 0.2130230

4 e2 S 2 2 3.721492 1.1765354

5 e3 S 3 3 6.692506 1.4842776

6 varReligie S 4 4 5.199057 0.6049877

7 meanbelang M 1 1 3.065956 0.1483182

8 meanmoskee M 1 2 -4.945852 0.5853761

9 meankoran M 1 3 -5.787413 0.6382787

10

11

12

13

14

15

observed statistics: 8370

estimated parameters: 15

degrees of freedom: 8355

-2 log likelihood: 15275.88

Note that the -2 log likelihoods are the same with the underidentified and the identified versions; when fitting the underidentified version, the optimizer found one of the infinite number of solutions that fit equally well. Standard errors were a mess, though, as would be expected.

Paras Mehta describes the 0:1 specification of thresholds for ordinal models in this way in this article:

Mehta PD, Neale M.C., Flay BR: Squeezing interval change from ordinal panel data: latent growth curves with ordinal outcomes. Psychol Methods 2004; 9:301-333 http://www.vipbg.vcu.edu/vipbg/Articles/psychmethods-squeezing-2004.pdf

One final comment: I prefer to fix the factor variance to 1.0 and estimate all the factor loadings rather than single out one factor loading to be fixed to 1.0. The solutions are equivalent, and the pros and cons of each have been discussed in the literature.

Thanks for the correction.

... the 0:1 specification of thresholds for ordinal models ...

Is it possible to use the parametrization with all thresholds free, and the mean and variance of the (latent response) variable fixed? I tried the naive approach of just applying those changes to the M, S and thresh matrices, but the results are very different from those that Mplus gives for that parametrization...

Regarding estimation, I seem to remember from class that fully weighted or robust least squares should be used when dealing ordinal indicators. Is this possible in OpenMx, or is FILM adequate?

Regarding output, for the resulting object from mxRun(), holds

`model$covariance@result`

the sample covariance matrix and`model@algebras$covariance@result`

the model implied covariance matrix?Apologies for all the basic questions, I'll promise in return to expand the "OpenMx for Mplus users" wikipage once I get the hang of this...

Hi mhermans

Is it possible to use the parametrization with all thresholds free, and the mean and variance of the (latent response) variable fixed? I tried the naive approach of just applying those changes to the M, S and thresh matrices, but the results are very different from those that Mplus gives for that parametrization...

Yes. However, you need to constrain the variances of the observed variables to equal 1.0. Your "naive" attempt did not do this; I attach a script in which these constraints were implemented. Good agreement with WLSMV estimates from Mplus. There is a problem with MxModels with non-linear constraints - the Std Errors are wrong. Therefore I made yet another version which uses mxAlgebra to calculate the residual variances (1-factorloading^2) instead. Std Errors are ok in this version (also part of attached script).

On the FIML vs. WLSMV issue there are pros and cons of each method. FIML delivers maximum likelihood parameter estimates, which have the usual advantages of i) being asymptotically unbiased, and ii) minimum variance of all estimators with property i). In addition, being a likelihood framework, model comparison via likelihood ratio test is possible. The disadvantage of FIML is that it takes a long time when there are many variables. In my experience around 15 or more gets painfully slow, due to the numerical integration over so many dimensions. It is possible to overcome this in part by use of a marginal maximum likelihood procedure in which one integrates over the factor space, but this is not very straightforward to specify. The good agreement between FIML and WLSMV in this case may or may not be general; there are likely some simulation studies out there. I note that there aren't any missing values in your dataset; I speculate that the divergence between FIML and WLSMV may increase when there missing values, depending on the mechanism for missingness.

Update:

The following code, if run after the thresh.R script, will produce standardized thresholds, A and S matrices, but from the zero/one fixed threshold approach. The real beauty of the 0/1 fixed threshold approach is that the rest of the model is entirely equivalent to the continuous case, containing variances, covariances and means. Thus, e.g., a latent growth curve model is straightforward to modify for use with ordinal data.

#

# Attempt to standardize the 0/1 fixed threshold estimates

# using a nthresholds x 1 Unit vector U to reproduce rows etc.

#

nthresh<-4

model.thresh.run.mod <- mxModel(model.thresh.run,mxMatrix(type="Unit",nrow=nthresh,ncol=1,name="U"))

standardizedThresholds <- mxEval((thresh - ((U %x% M[,1:3])) ) / (U %x% t(sqrt(diag2vec(F%&%(solve(I-A)%&%S))))) ,model.thresh.run.mod)

# Flushed with success at that, we now go after Standardized A and S matrices (loadings & initial covariances)

D <- mxEval(vec2diag(sqrt(diag2vec((solve(I-A)%&%S)))) ,model.thresh.run.mod)

standardizedLoadings <- mxEval(solve(D) %*% A %*% D,model.thresh.run.mod)

Ssd <- mxEval(solve(vec2diag(sqrt(diag2vec(S)))),model.thresh.run.mod)

standardizedSmatrix <- mxEval(Ssd %*% S %*% Ssd,model.thresh.run.mod)

`model$covariance`

and`model@algebras$covariance`

reference the same entity. Try to avoid using the '@' operator, as slot names may change as OpenMx develops over time. The style guide has some tips on using the '$' operator. WRT your question, the entity model$covariance stores the expected covariance matrix. Its final value should be accessed with`mxEval(covariance, model)`

, this avoids using the '@' operator on the 'result' slot.Attached please find a sample file and an associated data file. We are still documenting the ordinal thresholds examples, but this should at least get you going.

Thanks Steve!

But what is the goal for this code? The estimates seem to be correlations among manifest variables (polychorics?) and thresholds. I thought we might be fitting a one-factor model but there are no factor loadings (that I have found). Am I supposed to add code for the factor model part?

Yes, but specifying a factor model should be easy given the factor model specification (by either paths or matrices) on the homepage http://openmx.psyc.virginia.edu/