A design for parallelizing dependent sub-models

13 replies [Last post]
tbates's picture
Offline
Joined: 07/31/2009

Hi,
Just a thought on how to get parallizability going between submodels.

This would be a major saving in time for many users of multi-group models, where the submodels have independent data and could be run as stand-alone independent models if they did not also share some parameters (typically located in a top model).

It's straightforward to compute which models a model is dependent on (just list up algebras etc. where the model name is not mine).

A model can't currently tell if anyone depends on it, but these "master" models need to be able to synchronize the copies of themselves they hadn out to models running on other cores/threads.

To get around this (and rather than ask users to manually maintain dependencies), at mxRun time, dependent models could register their requirements with the model upon which they depend.

To parallelize the supermodel, let's take a common example where a, c, e are in "top" and are used in submodels "mz" and "dz"

On each iteration of mz and dz, they check out a copy of the parameters in top that they need (or, for simplicity, check out a copy of the whole "top" model)

"top" sets a flag saying that iteration has been checked out.

When mz finishes and wants the new copy of parameters to estimate, it simply requests this.

top waits until all depends have requested new copies, and then does its own part of the optimization - jiggling new parameter values based on its fit algebra. It then hands out the new (synchronized) copies of its state to the submodels requesting them.

If something like this would work in the backend, most twin models would speed up by perhaps 50%?

iloo's picture
Offline
Joined: 05/26/2010
Parallellism for dependent submodels again.

This thread has been quiet for some time, and I would like to revive it.
Is there any development ongoing in solving the issue of executing dependent submodels using multiple cores? I'm in dire need of exactly that feature, having ~48 dependent submodels in extended family analyses (>300000 rows of data), and also access to a server cluster with at least as many available cores (not 300000, but 48 that is). Current computations take forever on one core.

A first step would perhaps be to follow tbricks list, and let the user define how to split the data as Mike mentioned. I'm an imbecill when it comes to the technical parts, but it sounds quite good to me.

neale's picture
Offline
Joined: 07/31/2009
There is some

1. We are working on a new C optimizer which would give us more control over farming out sets of parameter estimates.

2. I hope that you know that multicore works very well when you have raw ordinal or raw continuous data to analyze. Given a suitable mxOption for Number of Threads, the data are split into this many chunks. These chunks then have their likelihoods evaluated separately. I'm not exactly sure what happens in the multiple group (mxModel) case (and I think Tim Brick who programmed it isn't either) but it does speed things up then. This may be greater than 2x speedup for MZ and DZ groups.

mxOption(NULL, "Number of Threads", omxDetectCores() - 1)

This would set the number of cores to one less than whatever the system thinks is available, and it would do so for all models.

iloo's picture
Offline
Joined: 05/26/2010
Didn't know

Hmm, I didn't know that it worked like that. Suppose I have to read the forum more closely. But, as I understand it, multicore execution doesn't work in windows with the "chopping-of-raw-data"-approach? Because I've been working on a windows machine and it doesn't seem to work. On the unix system it works perfectly fine, no problems with the dependent subgroups either.
Thanks for the help!

neale's picture
Offline
Joined: 07/31/2009
Windows version?

You do have to have the multicore version of OpenMx, and I think there are issues with some of the Windows operating system versions such that OpenMx won't compile because the compiler is broken.

Glad you can get things working ok under Linux!

neale's picture
Offline
Joined: 07/31/2009
Maybe spoke too soon

Although there are speed-ups to be had with both ordinal and continuous FIML estimation in a single mxModel context, there is no improvement in a two mxModel and container mxModel example, as attached.

This is a serious limitation which we developers need to address.

AttachmentSize
testParallel.R 5.36 KB
neale's picture
Offline
Joined: 07/31/2009
Nice idea

I like the idea here, although the implementation is a bit more tricky. The optimizer software is "deciding" which sets of trial values of the parameters to use, and it does this sequentially, depending on results obtained previously. I don't think this obviates the possibility that multiple groups implemented in submodels could be evaluated independently, and I do see that multiple group models would speed up if this approach could be implemented. It would be especially handy in extended twin family study designs, in which there can be many data groups (well, at least five).

tbrick's picture
Offline
Joined: 07/31/2009
I think this is definitely a

I think this is definitely a possibility, but, like Mike said, it's trickier than it sounds.

When we start moving to parallelizing independent models, one way to start would be to find the best "natural" split points in the data and share those chunks of data out to different places. The most straightforward natural split point is where the models are completely different between almost identically-sized data sets. The next would be if, for example, there was a single definition variable and it was 1 for one half of the data and 0 for the other half. These kinds of splits mean that each processor that's working will have a minimal number of computations to make, since it'll only have to calculate model-implied covariances for one model or for one set of definition variables.

The tricky part is working out when it's faster to share the data out and deal with the associated network delays at every step of optimization, and when it's faster to just do it on a single processor. There's also a lot of error/delay handling messiness to deal with if we're really going to network. And some work that will need to be done on the back-end to make it friendly to multithreading or multicomputer processing. So this may be more an OpenMx 2.0 issue than a revision 1.2 issue.

If there are constraints between the models, they have to be optimized simultaneously, with the results of the models combined for each step of the optimizer. So the workflow would have to run like this:
1) Split up data
2) Share data to nodes; everybody listens for values
3) Send starting values
4) All nodes compute and return likelihoods
5) Main node picks next values
6) Send next values to nodes
7) Repeat 4-6 until converged

If there aren't constraints between them (that is, if they share no parameters), you can just mark them independent, and OpenMx can already snowfall them out to run them independently.

tbates's picture
Offline
Joined: 07/31/2009
followup and multicore

hi txb (we need middle initials - I'm tcb :-) )

Yes, this was just about speeding models with shared constraints by adding some syntactic sugar that would allow the user to inform OpenMx about how to parallelize.

The data are already kindly split up in most multi-group models (though more splitting could happen as you say: but we currently don't use even different datasets as a split, so that would be a start).

The loop through items 4 thru 7 of the sequence you outlined is exactly what I wondered if we could implement - not building up and knocking down a model for every iteration, but letting the OS spool them out on virtual cores, with a light-weight locking mechanism to ensure they run values the main node picks.

Another helpful start would be if we used multicore to automatically run CI jobs locally in parallel on the cores on a machine (much easier in general than going out to a server farm, and often quick enough)

Simon's multicore library seems very nice as a basis for getting this going.

I'm writing a helper to enable using multicore to spool parallel CI jobs: Will post here when done.

mspiegel's picture
Offline
Joined: 07/31/2009
Umm, you can run CI jobs in

Umm, you can run CI jobs in parallel. It's called omxParallelCI().

tbates's picture
Offline
Joined: 07/31/2009
omxParallelCI...

Hi Mike,
I had not used this, mostly because I could not follow the various snippets on getting snowfall working, esp on big server farm we have here...

I ran it now locally, and it ran sequentially (I think it should post a warning if it is called and there is no parallel support), and I can't find the documentation on what parallel libraries it supports/needs called first.
Also, it came back with this error

> omxParallelCI(model)
Running ACE_container 
Running interval1 
Running interval2 
Error in mxRename(submodel, modelname) : 
  The name 'ACE' is already used in the model

I think if omxParallelCI() is to remain omx, then it needs to be rolled into mxRun as being omx precludes documenting it. The documentation for mxRun then needs to include info on the supported parallel libraries and how to set them up: I'm happy to help improve this.

I see snowfall is suggested, I've now installed it...
Tried this:

> sfInit(parallel=T, cpus=2)
R Version:  R version 2.13.0 (2011-04-13) 
 
snowfall 1.84 initialized (using snow 0.3-5): parallel execution on 2 CPUs.

cool (assuming cores = CPUs)

but then:

> omxParallelCI(model)->whatever
Running ACE_container 
Error in checkForRemoteErrors(val) : 
  2 nodes produced errors; first error: no slot of name "metadata" for this object of class "MxFIMLObjective"

Any clues?

Question box: Do/will/can we support multicore? it seems very nice (albeit unix only). Also it is supported under foreach, which is a very nice way to ask for things to be done in parallel

listOfFits <- foreach(i=CIList) %dopar% genEpi_GetCIs(fit1a,i)

mspiegel's picture
Offline
Joined: 07/31/2009
Are you using load() to load

Are you using load() to load a model from disk? Or is this a model that is in your workspace saved in between updates to OpenMx? There is no longer a metadata slot in MxFIMLObjective functions. In general, load() will not work when a change is made to the internal representation of OpenMx objects. Save a copy of the script that generated the model, and then rerun the script to recreate the model.

We can't include omxParallelCI() into mxRun() until more people have tested omxParallelCI(). As you can see, it doesn't always work. And we can't get more people to test omxParallelCI() until they adopt either snowfall, or SwiftR, or any other parallel library we will support. It's fairly easy to add support for a new library, it just needs a few lines added to the implementations of "omxLapply", "omxApply", "omxSapply".

tbates's picture
Offline
Joined: 07/31/2009
not loaded.. new error

> Are you using load() to load a model from disk?
> Or is this a model that is in your workspace saved in between updates to OpenMx?
Hi Mike: not load(ed) but might have been in workspace...

rebuilt a model from scratch just now which churned for a minute before giving the first "running x" message, then gave this error:

> fit1a=mxRun(fit1a)
Running ACE
> omxParallelCI(fit1a)->whatever
Running ACE_container
Error in checkForRemoteErrors(val) :
108 nodes produced errors; first error: no slot of name "metadata" for this object of class "MxFIMLObjective"

> It's fairly easy to add support for a new library, it just needs a few
> lines added to the implementations of "omxLapply", "omxApply", "omxSapply".

OK: would be nice to support multicore then I think for *nix users inc. OS X. I think it will take off with Simon Urbanek's backing.
Best, t

"R version 2.13.0 (2011-04-13)"
> mxVersion()
[1] "999.0.0-1661"

neale's picture
Offline
Joined: 07/31/2009
Exactly. There is a rather

Exactly. There is a rather vital step 4b) in which the main node accumulates the log-likelihoods and returns the overall likelihood to the optimizer which then executes step 5.

I agree that this sounds like 2.0. In addition, it may be very efficient on a multiprocessor/multicore system (like the laptop on which I am lucky enough to be writing this). Also, I note that there is a degree of user control over parallelization. A large raw dataset could be split across multiple mxModels - the model is identical but the data are not - to increase performance.