Many many submodels (how S4 builds objects and a request to run lists of models)

4 replies [Last post]
Ryne's picture
Offline
Joined: 07/31/2009

So I'm following up on some of my factor score work and happened across and old question I had regarding holder or parent models for embarassingly many submodels.

I'm trying to run the same model on 500 (or some other large n) datasets with the independence flag set to TRUE. I'm doing this by creating one model per dataset, building a list of mxModels, then putting this list into a single mxModel for optimization. However, I'm spending most of my time the fourth line of this code:

singScore <- transformFactorScores(spRes, 1, "mu", "sigma", "epsilon")
singScore@independent <- TRUE
singSM <- replicate(un, singScore)
singParScore <- mxModel("Singletons", singSM)
singRes <- mxRun(singParScore)

Lines 1-3 combine for less than .40 seconds, and the optimization of all 500 models (line 5) takes 40 or 50 seconds. However, the fourth line takes 3 minutes. However, if I split the fourth and fifth lines into 5 separate models of 100 submodels each, total time drops to 70 seconds for everything, and 60ish seconds for 10 models with 50 submodels each.

I believe this has to do with how S4 builds objects and interacts with apply statements. When I tell mxModel to add these 500 submodels, my understanding of S4 is that they are added one at a time, essentially creating holder with 1 submodel, then holder with 2, etc.

My questions are:
-is there a better way to benefit from parallelism for this approach?
-is there a smart way to determine exactly how to balance this S4 slowdown with paralellism?
-here's the feature request: is it a worthwide endeavor to allow mxRun to optimize and return a list of models? is an mxList worth the function crawl?

Ryne's picture
Offline
Joined: 07/31/2009
To clarify and improve the

To clarify and improve the last question, how could I lapply mxRun and still benefit from parallelism?

tbrick's picture
Offline
Joined: 07/31/2009
omxLapply

If you don't need to have them all in one master model, why not use omxLapply to apply mxRun to the list of submodels?

So:

singScore <- transformFactorScores(spRes, 1, "mu", "sigma", "epsilon")
singSM <- replicate(un, singScore)
singRes <- omxLapply(singSM, mxRun)

That way you can avoid building and flattening the large container model.
omxLapply will take advantage of snowfall if you have it installed on your machine so you can benefit from multiprocess parallelism. If snowfall is not installed, it will run sequentially. And, of course, you can still use thread-level parallelism if you're running a FIML model.

Ryne's picture
Offline
Joined: 07/31/2009
I completely forgot about

I completely forgot about omxLapply. Thanks!

neale's picture
Offline
Joined: 07/31/2009
Try doParallel package?

I've used doParallel with some success in the past.