Tue, 05/27/2014 - 16:35

Hi All,

I am using OpenMx in conjunction with SEM Trees. As a result of this, I have to include the whole dataset (30,000 x 1200). Right now, it takes about 30 seconds to run the current code:

hpc20 <- mxModel("1 Variable",type="RAM",

mxData(observed=pt_run1,type="raw"),

manifestVars="A_AbsRea",

mxPath(

from= "A_AbsRea" ,

arrows=2,free=T,values=9,labels= "e1"),

mxPath(

from="one",

to="A_AbsRea",

arrows=1,free= T,values= 8,labels = "mean1"))

system.time(hpc20Fit <- mxRun(hpc20,suppressWarnings=T,silent=T))

user system elapsed

30.723 2.092 32.806

I am trying to figure out a way to speed this up. Because the computation takes so long in using SEM Trees with such a large dataset, we are starting with the simplest possible model. I suspect the model takes so long to run because of the computation of the descriptives for all of the other variables. When the dataset is changed to just include the one variable, it takes a split second.

I have tried to change this option thus far:

hpc20 <- mxOption(hpc20,'No Sort Data',"A_AbsRea")

although it doesn't off change the computation time. Is there a different way to have OpenMx only calculate values for the 1 variable of interest? Sorry if I am not clear or am missing something quite obvious.

Thanks

Ross

Hi,

Given that you only refer to one variable in the model, why include the others? i.e.

Hi Ross,

Two suggestions. First, I don't think you have the signature for mxOption correct. You shouldn't be listing the names of variables you don't want sorted, but the names of models. It should read:

hpc20 <- mxOption(hpc20,'No Sort Data',"1 Variable")

If that doesn't work, then you've already hit the correct solution: don't include unnecessary variables in your dataset. OpenMx should be calculating any descriptives on run; that only happens with the summary command, and happens because R base's summary function is applied to the data.

Let us know if the change to mxOption helps you.

Thanks a lot for your replies. Changing the mxOption worked and significantly cut down the computation time.

I included the whole dataset in the model because I was under the assumption that this was necessary for SEM Trees to take the excluded variables and use them as covariates. Please correct me if I am wrong in this and there is a better way to program the code.

Thanks again.

Ross