trying to speed up a gnarly script with lots of non-linear constraints - any ideas?

9 replies [Last post]
mkeller's picture
Offline
Joined: 08/04/2009

Hi all,

This probably won't be easy to answer given how vague the question is, but...

We're trying to run a multivariate 'nuclear twin family' model with lots of parameters to be estimated (~30 free parameters in the simplified version we're trying to run now, where lots of parameters we otherwise would estimate are not free) and lots of non-linear constraints (~10 in the current model). The problem is that the script has been running for 1.5 hours so far, and still going. I will post the script as an attachment in case anyone wants to glance it over for anything obvious, but barring that, here's my question:

Anyone have any ideas on where we can start in trying to make this run faster? What types of issues might make it run this slowly? I've thought of entering in covariance matrices rather than raw data. And I've also thought of changing it so it just runs a univariate model to see if it can run that. Also, thought of trying to use some R parallelization packages that optimize low-level functions, such as "pnmath0". I don't think that the package "snow" will speed anything up here given that it's a single model we're trying to run. Might any of these ideas affect the speed? What other tricks and/or coding alterations (see attached) might people suggest?

Thanks,

Matt

AttachmentSize
Multi.IQ_.Ht_.NTF_.Model-mck4.R17.16 KB
Steve's picture
Offline
Joined: 07/30/2009
Hi Matt, In general, FIML is

Hi Matt,

In general, FIML is extremely slow right now. This is about to change. But for the next month it is going to be slow. We have been focussing on making sure that all the 1.0 features are complete and tested. We are about to shift focus to code optimization. For FIML with no missing data, we are expecting an order of magnitude speed up.

The use of definition variables slows the code down. Again, we will work to speed things up in this case as well, especially when there are only a few integer values for a definition variable. But a continuous valued definition variable is sort of a worst case scenario for the optimizer.

Any time you can run a covariance version of your code, the improved optimization time will be proportional to 1/N*t where N is the number of lines in your data set and t is the time for the FIML optimization. That's a BIG win.

If you have missing data, run a covariance model with pairwise calculation of the covariances in order to find starting values for FIML. That way your FIML calculation needs fewer iterations to converge. With lots of free parameters this can be a big win too.

mkeller's picture
Offline
Joined: 08/04/2009
Great, Steve, thanks a lot.

Great, Steve, thanks a lot. That is very helpful information. We don't have any definition variables. We also have missing data (lots of it), but we can probably first estimate parameters using just the cov matrix, then use the estimates from that model as the start values in the FIML model, as you suggested. I guess those first estimates might also suggest parameters that can be dropped/fixed (I doubt a parameter estimated ~0 in the model run off of the cov matrices will substantially change in the FIML model). Thanks again. You guys have done very impressive work over the last couple of years! I look forward to the optimized FIML code...

mspiegel's picture
Offline
Joined: 07/31/2009
Hi Matt. We took a look at

Hi Matt. We took a look at the script during today's developers meeting. It appears that many of the equality constraints are being used to enforce symmetry on some of the algebras. We came up with two techniques that accomplish the same task, but don't use free parameters to enforce symmetry constraints.

The first technique is, whenever possible, to rewrite the algebras so that they produced symmetric matrices. We don't think you'll be able to apply this technique to all of your instances, but it might work in some of the instances.

The second technique is to use equality constraints to force the symmetry of an algebra without the free parameters. For example, instead of using

mxConstraint(deltam1 == deltam2, name='deltamCon'),

use

mxConstraint(deltam2 == t(deltam2),name='deltamCon').

(In the new syntax for constraints it is possible to use algebras on the left-hand and right-hand sides.)

If you need to use a specific cell with the deltam2 matrix, instead of using a free parameter, instead use the matrix subscript operator deltam2[i,j] to access the cell.

mkeller's picture
Offline
Joined: 08/04/2009
Hi Michael, Thank you for the

Hi Michael,

Thank you for the help. The script as is is taking over 4 hours, so any speed-ups are hugely helpful. I'm not clear on your advice:
1) the first technique says to make algebras produce symmetric matrices, but what do you do with them after that?

2) The second technique seems to only work for symmetric matrices (deltam1 == t(deltam1) is true only when deltam1 is symmetric). So I'm not sure if technique 2 relies on technique 1??

3) You say we can use algebras in the constraints. Do you mean we can do this:
mxConstraint(am %*% q1 + t(wm1) + bm %*% t(r1) == deltam1,name="deltamCon")
?

If so, then that latter syntax is much easier! Best,
Matt

tbrick's picture
Offline
Joined: 07/31/2009
Hi, Matt. I think there may

Hi, Matt.

I think there may be some confusion about what your constraints in this model are doing in the first place.

It looks like you're constraining the results of algebras to be equal to symmetric matrices of free parameters. These constraints seem to do two things:
A) effectively fix the free parameters to be equivalent to the results of the algebra
B) constrain the result of the algebra to be symmetric

Is that right?

If it is, then you should be able to reduce the number of free parameters in the model by:

A) using the results of the algebra directly and eliminating the constraint. For example, referencing the algebra cell deltam2[1,1] instead of the free parameter deltam11. This lets you remove those free symmetric matrices from the model completely.

B) using one of mspiegel's suggestions to constrain the results of the algebra to be symmetric.

What will actually wind up helping speed here is eliminating the symmetric free matrices you set up for those constraints. Removing excess free parameters will often speed things up considerably.

But again, I think I may be confused on what those constraints are there to do. Is there something else that they're enforcing?

mkeller's picture
Offline
Joined: 08/04/2009
Hi Tim, When we tried to just

Hi Tim,

When we tried to just make all these algebra statements, we ran into the problem I suspected:

Running multiHetNTF
Error: A cycle has been detected in model 'multiHetNTF' involving the following elements: 'NTF.deltam1' and 'NTF.q1'

The issue is that deltam1 called up q1 before it was defined. This was the reason for all the constraints & defining matrices beforehand.

mspiegel's picture
Offline
Joined: 07/31/2009
OK, let's begin by replacing

OK, let's begin by replacing all references to 'deltam1' with references to 'deltam2'. Now you can eliminate the declaration for 'deltam1' and add a mxConstraint() that enforces symmetry on 'deltam2'. See if that works and then we can continue from there.

On a related topic, it is possible to reference algebras or matrices before they are declared. OpenMx performs demand-driven evaluation, so it is not necessary for an entity to be declared before it is used. However, the user is prohibited from declaring a cycle in a series of algebra expressions. That is the error that you are seeing.

mkeller's picture
Offline
Joined: 08/04/2009
Hi Tim, The purpose isn't

Hi Tim,

The purpose isn't really to ensure that deltam1 is symmetric; that's true in this case but won't always be true in the future.

What we need to do is use these matrices (e.g., deltam1) in algebra statements, given that these matrices (e.g., deltam1) themselves are algebraic expressions of other matrices. Thus, deltam1 is a function of wm, but wm is a function of deltam1 (and so forth for all of these matrices). In other words, these matrices are a set of nonlinear constraints.

#Are you saying that we can do the following (e.g.)?:
mxAlgebra(am %*% q1 + t(wm1) + bm %*% t(r1),name="deltam1"),

mxAlgebra(.5 %x% t(deltam1) %*% t(k) + .5 %x% t(deltaf1) %*% t(o) + .5 %x% t(deltam1) %*% mu %*% Vf1 %*% t(o) + .5 %x% t(deltaf1) %*% t(mu) %*% Vm1 %*% t(k),name="wm1"),

#and then begin using deltam1 and wm1 in future algebras, e.g.:
mxAlgebra(Vm1 %*% t(k) + theta1 %*% t(o) + .5 %x% deltam1 %*% t(am) + .5 %x% Vm1 %*% mu %*% deltaf1 %*% t(am)+ .5 %x% pim1 %*% t(bm) + .5 %x% Vm1 %*% mu %*% pif1 %*% t(bm),name="CVfason"),

Will that work? It seems somewhat surprising because when openmx gets to the mxAlgebra statement for deltam1, it sees 'wm1', which hasn't been defined yet.

Best,

matt

tbrick's picture
Offline
Joined: 07/31/2009
Hi, Matt. Aha. I see now

Hi, Matt.

Aha. I see now what I was missing. You're using the nonlinear constraints to enforce constraints where the algebras would be circular.

You should still be able to reduce the number of free parameters and constraints in this model, I think. You'll want to reduce this so that you have a single constraint for each algebra that depends on its own output. So r1 and q1, for example, are constrained to be the results of algebras performed on themselves. Unless you can rewrite these in different terms, you're stuck with constraints on this one.

For some of the others, you might be able to reduce the number of free parameter/constrain pairs you're using. For example, if you remove deltam1, and just replace it with deltam2 in all the algebras where it appears, you should still have a working script.

You can do the same for deltaf1. I expect you can probably do the same for pim1 and pif1 as well, and so on.

Similar to what mspiegel suggested below, I'd try removing these one at a time to see if you can get this to reduce to a smaller solution.

The fewer free parameters and the fewer constraints that you include, the faster your job will run.