Some suggestions to consider

5 replies [Last post]
kkelley's picture
Offline
Joined: 08/04/2009

Hi OpenMx Team - thanks for a great R package!

After using OpenMx for some time now, I have a few suggestions that I'll put forth as things to think about. There may be reasons not to implement some or all of them, but in the spirit of trying to help I thought I would pass them along. I've included numbered suggestions below, but the numbers are arbitrary.

1.) I think fixed parameters should be included as part of the output. If someone has only the output, they will not necessarily know what paths were fixed and what those paths were fixed to. It seems it would
be easy to handle this be including as part of the output a row that has the name (where "labels" was used) in the mxPath call and the estimate would be the fixed value and the "Std.Error" as NA or something else to denote that it is not computed (because it is fixed).

2.) Confidence intervals for the RMSEA should be included. This can be done via the MBESS package as:
require(MBESS)
ci.rmsea(rmsea=.055, df=35, N=500, conf.level = 0.95)
But, the functions ci.rmsea uses can be pulled out and included in OpenMx so as not to rely on another package.

3.) I don't see much of a benefit of including the "elapsed time" and all of the calculation timers in the output. Perhaps if someone wants to request it, but it seems unnecessary in general.

4.) Having easy to obtain bootstrap confidence interval integration via the boot package would be great. For example, options such as ciBoot=TRUE (and then compute both BCa and percentile) and B=1000 (for the number of bootstrap replications) could be included and the rest of the process done behind the scenes.

5.) For the summary of the variables that is part of the output, I think the variance or standard deviation should also be included. I think OpenMx calls the R function summary(), which, for whatever reason does not have the variance or standard deviation included in its output (I've always wondered why it didn't include the variance or the standard deviation).

6.) Building some intelligence into the labeling of estimates so that so many labels do not have to be inserted manually would be great. Default values could be the variable name(s) with things like "coef" or "var" or "cor" in front to identify what they are, but the users could specify their own labels.

7.) Consider including the confidence intervals whenever intervals=TRUE. Having to specify a new mxCI object separately is awkward (especially since an option seems to exists via intervals=TRUE, even though that must be specified when mxCI is used). I know there was some discussion about how to implement confidence intervals on the Wiki among the team members, but it seems like there is an extra step one has to do here and thus simplification would be great.

8.) Including z-values as well as the corresponding p-values for the test of the null hypothesis (that the parameter equals zero) in the output for parameter estimates would be great (and is standard practice in other programs). Or, if not including them by default, a simple option to do this: pvalues=TRUE.

9.) I know some folks don't like fit indices, but others do. It would be nice if there was a way to easily obtain several of the popular fit indices. RMSEA is produced by default, but perhaps some helper function(s) could be used to easily obtain other fit indices.

10.) Perhaps this exists but if so I don't know about it and couldn't find anything via a search: consider including an easy way to obtain a standardized solution. For example, for a model fitted to raw data or a covariance matrix, having an option such as standarizedSolution=TRUE would help folks interpret their output. This is different than using a correlation matrix from the outset, as the distribution theory differers for raw vs. standardized data (due to the fact that the process of standardization is based on random variables, namely the variance of the variables). Ideally the appropriate standard errors could be given (RAMONA did this in Systat) for the standardized solution. But, even if only the point estimates were given, that would be helpful.

Thanks so much for your work on OpenMx - the community really appreciates it!
Ken

Ryne's picture
Offline
Joined: 07/31/2009
Thanks for the suggestions,

Thanks for the suggestions, Ken.

1.) This is a good idea, though we'd need a good rule for implementation. While path models provide some information about which parameters are user-specified and which are assumed zero, matrix models provide no such information. A better rule would be to print fixed parameters that are nonzero, though this would get hairy when lots of non-zero paths are included (a dynamical systems or other structured longitudinal model could include dozens or hundreds of unit paths and few free parameters). Definition variables are a separate issue.

The summary function provides a summary of output, not of the model, which is inherently a summary of the optimization and free parameters. We'd have to find a good way of passing in these fixed values, especially when unlabeled.

2.) That should be a relatively straightforward patch to summary. I'll put it in my cue.

3.) Maybe we can print the time information as a table like the IC table so that the spread is horizontal rather than vertical; we should be able to cram everything down to two lines that way.

4.) Sounds interesting. Someone with more bootstrap experience should look into this. Maybe a helper function?

5.) I never really cared for the data summary information, but we should include the variance/sd if we print this stuff.

6.) Default labeling could get very unweildly, even if the labels are intuitive. Consider a multiple group model: you make one model for each group in separate steps. If the autolabeler just makes labels as "betaV1V2", then all parameters will be constrained to equality across groups because they have the same variable name. While non-labeled free parameters are a pain to change, at least there are no accidental equality constraints. Maybe a helper function that keeps a running list of the number of parameters? myLabel(3) could print out "label1", "label2", "label3" and update a system variable that says "start the next myLabel call at 4".

7.) As confidence intervals can be so time consuming, it seems smart to avoid accidentally turning them on at any point. Mike's suggestion about running CI whenever they're included in a model makes sense; this could be achieved by setting the CI flag in mxRun to a default of TRUE.

8.) I may be in the minority here, but I think significance tests for individual parameters in a structural model are inherently biased, as they ignore the covariances between parameters. I want to support the user base by giving them the tools they want, but people will use whatever we put in.

9.) CFI and TLI are in the 1.1 release. Any other requests? We have another forum post about a function to return them. Anything based on comparisons to saturated and independence likelihoods is fairly easy to do at this point; fit indices that compare to an observed covariance matrix will break down under FIML.

10.) I have a function on the boards to standardize RAM models (http://openmx.psyc.virginia.edu/thread/718). It gives parameters and standard errors, and outputs either as a parameter list (a la summary), model or the RAM matrices. Outside of RAM (or a future LISREL type), we'd have to know what the latent variables are to standardize, which isn't available in ML or FIML.

Thanks again for the suggestions!
Ryne

neale's picture
Offline
Joined: 07/31/2009
Some thoughts on comments

1) I don't, for several reasons. First, however, a qualifier: is it proposed that only paths that the user has declared would be reported? There are implicit zero paths all over the place so one would have to assume that the path has been declared in an mxPath() statement. So if we ignore all the implicit ones, there remains a bit of an issue with respect to the Hessian - does this also get padded with zero columns and rows wherever there's a fixed path? How would evaluation of this matrix work - say looking at its eigenvalues? Or do we have a different length parameter estimate vector and derivative/Hessian dimensions? This gets a bit messy.

One advantage I would see is that if one were filling out a table with different estimates in which some of the parameters were fixed in certain models, then the table would be easy to generate. Therefore, some helper function which takes one "Super" model in which all parameters are free, and inserts the values of the "missing" matrix elements would seem viable without incurring complexities described above.

2) Ok but I have not found these CI's to be very scintillating, being simply a function of N and df. Also, FIML of raw data with missing values may not have a consistent "N".

3) I like elapsed time a lot - it is a huge deal for users trying to optimize scripts for, e.g., application to many loci

4) Good suggestion - I have used boot around OpenMx but it isn't as convenient as what you propose, nor as convenient as classic Mx's.

5) Agreed - should include: Min, Max, Mean, Median, Var and SD & perhaps skew/kurtosis measures

6) NA labels are a useful way to ensure that parameters are distinct from each other, and I would not like to lose this functionality

7) The reason that not all parameters - and all computed matrix elements in the event that mxAlgebra() statements generate functions of parameters - is that the CI's are expensive to compute. I would counter-propose that the default for any mxModel() recursively containing an mxCI() would obviate the need for the intervals=TRUE part of the mxRun() function.

8) True, although there are some technical issues with this type of test (see http://www.vipbg.vcu.edu/vipbg/Articles/behavior-fitting-1989.pdf) even if it is "what has been done before".

9) I think we have helper functions on the way for other indices. There remain issues with indices for FIML

10) Good idea - but only really viable for models specified via mxPath() function calls. mxAlgebra() models might be figured out for some simple cases.

tbates's picture
Offline
Joined: 07/31/2009
wiki page for collecting thoughts on these ideas and others

As repeated replies get hard to follow, and to encourage consensual conclusions, I made a web page with these suggestions organized with a section for pro/con thoughts beneath.

http://openmx.psyc.virginia.edu/wiki/suggestions

kkelley's picture
Offline
Joined: 08/04/2009
Just to clarify my point 1, I

Just to clarify my point 1, I did mean to report in the output only the paths explicitly fixed by the user, not all possible implicitly zero paths. I find it really helpful to make sure what I think I've fixed to a particular value was in fact what the program fixed. If not, I can make adjustments. But, without a knowing if what I've tried to feed the program is in fact that the program interpreted, it leaves a bit of a question mark as to if the model is specified as one intended.

On your point 2, the confidence intervals are also a function of the RMSEA itself. The point estimate will differ from the population value which is what is really of interest. So, having interval valued bounds around the population parameter is often helpful. Simulation research exists that shows the confidence intervals tend to work well in many realistic situations, but certainly not all.

On your point 8, I would say that the majority of applications of SEM report the statistical significance of various paths in a model. Correspondingly, it seems natural to me for these to be included.

Thanks for having a look at these.
Ken

neale's picture
Offline
Joined: 07/31/2009
Actually, thank you for

Actually, thank you for making these comments!

I am still not keen on the idea of reporting anything other than those parameters that are free in the summary of "estimated parameters". It could be rather difficult to get rid of a fixed parameter from such a table. Possibly, only parameters which are both explicitly labeled and are free=F might be included. Then assigning label=NA to such a parameter would delete it from the table. Technically, it could be difficult to include it in the list of estimated parameters, given that its SE's have to be derived from a reduced covariance matrix of parameter estimates, but not impossible. Still, questions remain about the gradients and hessian matrix of this augmented and interspersed with fixed parameters 'list'. Second guessing such things is a bit against OpenMx philosophy, so I still favor a helper function to take multiple lists of estimates and populate an output table with the estimated value of elements that are fixed in a particular run.

There are some issues with RMSEA in multiple groups, and with missing data, which may have been addressed since I last looked at the literature on this issue. CI's of RMSEA is something that "Classic" Mx used to compute, but I didn't see them being reported very often. Perhaps this was because of the rather BG oriented user-group.

It does seem fair to include them, I was merely ranting about their use despite statistical problems inherent in them. My comment about "it is what we published before" really needed a link: http://www.youtube.com/watch?v=PbODigCZqL8. The general idea being that just because it is what people have done before doesn't make it right, and perhaps software developers should try to alert people to such problems and make it less easy to tread the same path. Only perhaps, though.