Degrees of freedom with raw data

12 replies [Last post]
StavrosUt's picture
Offline
Joined: 04/22/2010

Hello everybody,
I am an amateur in OpenMx. I am trying, for a start, to fit a one-factor model with OpenMx. I am willing to use the path version.
I use type=RAM, and mxData(...type="raw")

I have 1064 observations and 8 items.

I compare the summary() with that of factanal(). I get the same loadings and uniquenesses estimations with one weird problem: When I use "raw" and not "cov", I have a total number of observed statistics = 8512 (=1064*8) instead of the correct = 44. Several statistics are NA (chi-square, RMSA) but the estimates of the loadings and uniquenesses are the same (and means) !!!

I use the code found in the documentation as an example for: Factor Analysis, Path Specification.

Please help!!

Stavros

Ryne's picture
Offline
Joined: 07/31/2009
This is not an error, but a

This is not an error, but a difference between OpenMx and other programs that goes back to Mx 1.0. Degrees of freedom are the units by which we say how much information is in some data or matrix. There's a lot more information in raw data than is captured in a covariance matrix.

Most SEM software treats raw data like it is a covariance matrix. They take the raw data (which has n*p pieces of information or df, where p is the number of variables) and project it down into a covariance matrix (p(p+1)/2 df) and a vector of means (p df). This represents information loss, but the covariance matrix retains the linear relations in the data. In SEM, this is likely all you care about most of the time.

OpenMx doesn't do that pre-processing step, instead fitting your model to the raw directly. This reflects OpenMx' status as a general-purpose matrix optimizer rather than as software exclusively for SEM. As such, the data still have n*p df, which is a multivariate regression definition of df rather than an SEM definition. The fit stats you're referencing are undefined because the null model isn't estimated. Whereas the SEM definition of df leads to an obvious null model, the fully-saturated model with n*p df would be very large.

If you think of your data as a covariance matrix, then analyze it as a covariance matrix, and you'll have the df and fit statistics you're looking for.

Good luck!

StavrosUt's picture
Offline
Joined: 04/22/2010
Thanks once again for the

Thanks once again for the reply. My real problem is that I am trying to fit a threshold model, and I assume this can only be done with raw data.
I found in a thread that this can be done with path specification and mxThreshold command, but it does not work. (cant find function mxThreshold).
With matrix specification, I keep getting the error that the number of thresholds is not one less than the levels of my model. All variables in the model have a range from 1-7 (Totally disagree.... fully agree). I attach the code I use in case somebody could help.
Thanks for your time anyway

AttachmentSize
threshlds.R 804 bytes
mspiegel's picture
Offline
Joined: 07/31/2009
Several comments on the

Several comments on the current state of ordinal data analysis in OpenMx (4/25/2010). First, make sure you are using the latest binary release. The ordinal interface has changed to make it more similar to classic Mx. Second, you can use mxFactor() on the ordinal columns of your dataset to specify levels. See this example or this example. Third, the performance of ordinal data analysis is currently poor. This will be improved by the OpenMx 1.0 release.

StavrosUt's picture
Offline
Joined: 04/22/2010
Thank you. I downloaded the

Thank you. I downloaded the latest version, and used mxFactor. Now, instead of the previous error, I get :

Error: In model 'THR One Factor' column 'SS227' is not an ordered factor. Use mxFactor() on this column.
(I get this after using mxFactor, SS227 is just my first variable)

If I use as.ordered, I get that my matrix is not of type 'double'.

mspiegel's picture
Offline
Joined: 07/31/2009
It's difficult to debug

It's difficult to debug without seeing how you used the mxFactor() function. Take a look a the sample problems that were linked in the previous comment. mxFactor() is applied to each column of the data that is ordinal data.

StavrosUt's picture
Offline
Joined: 04/22/2010
I think I do it exactly as

I think I do it exactly as seen on the example you indicated. I also attach the script.

Neg.Self91 is my data sets consisted of 8 items with levels 1 - 7, originally being numeric values, with no NA.
Thanks

AttachmentSize
threshlds.R 824 bytes
mspiegel's picture
Offline
Joined: 07/31/2009
Ah. I think it's not working

Ah. I think it's not working because Neg.Self91 is a matrix, and should be a data.frame. I'll add some checking to mxFactor() that will issue an error on this condition. Nevermind, the right-hand side of the second expression is an ordered factor, but the left-hand side is not. Instead I'll throw an error if thresholds are specified and the input data is not a data.frame object.

foo <- matrix(1:9, 3, 3)
foo[,1] <- mxFactor(foo[,1], levels = c(1:3))

StavrosUt's picture
Offline
Joined: 04/22/2010
Thank you so much! Turning my

Thank you so much! Turning my dataset into a dataframe made it run. However, it took R 1.5 hour to tell me that it is not responding after all... Is this about the poor performance you mentioned (8 vars, 7 categories, 1000 obs, thresholds) or it should sometime run anyway?

Steve's picture
Offline
Joined: 07/30/2009
There are 48 free parameters

There are 48 free parameters in the 8x6 matrix of thresholds. Plus whatever free parameters are in your model. With 1000 obs, that will take some time to converge. Several hours is not out of the range of possible time for a problem that big. Again, we expect to be able to radically improve our computation time in the near future, but for now, a couple of hours is a reasonable estimate of convergence time for a problem of that size.

I just finished a problem with an 8x4 matrix of thresholds and 200 obs which took 15 minutes on my Macbook Pro.

neale's picture
Online
Joined: 07/31/2009
Yes, many are awaiting

Yes, many are awaiting performance improvements with baited breath!

One thought I had for a temporary solution was to use frequency weights in a definition variable. Suppose that one or two patterns of data were relatively common (all zeroes for example). In this case, using a dataset with a data pattern and then a frequency to indicate the number of observations with this pattern might yield a big performance improvement. It would be necessary to use the vector=TRUE argument to mxFIMLObjective(), as used in this script: http://openmx.psyc.virginia.edu/repoview/1/trunk/models/passing/Acemix2.R
in which probabilities of zygosity are used. In our case there would only be the one "model", but the frequency would be applied *after* the logarithm part, so in place of
mxAlgebra(-2*sum(log(pMZ * MZlike.objective + (unit-pMZ) * DZlike.objective)), name="twin"),
we'd use something like
mxAlgebra(-2*sum(frequency * log(MZlike.objective)), name="whatever"),

where frequency is the MxMatrix with the definition variable frequency as its sole element. This approach would avoid calculating the same integral multiple times for each set of parameter estimates.

Dorothy Bishop's picture
Offline
Joined: 02/04/2010
You might find it helpful to

You might find it helpful to download my "simplified manual for beginners" which you can access from the Wiki tab above. It explains about degrees of freedom for raw vs cov models.
It was written for people who haven't got much background in R, Mx, or SEM and I am interested in feedback in how far it is useful.

The short answer is that when fitting a model to raw data, the degrees of freedom will reflect the sample size, being equivalent to the total number of observations, minus the estimated parameters.

StavrosUt's picture
Offline
Joined: 04/22/2010
Thank you very much for your

Thank you very much for your quick answer! The manual seems indeed helpfull, although not in my case.

It still does not make sense, the statistics used in any case are the covariances and the means , why does it calculate the total observations? And of course this results in no calculation of Chi-square and what that follows (p-value, RMSEA etc).
I am trying to fit a model with thresholds and I came to a deadend.
mxThresholds does not work for path specification and for the matrix one, I keep getting the message that my thresholds ar not one less than my levels and they really are!

I am looking forward to any kind of response.