Wed, 02/17/2010 - 10:30

When seeking to optimize mxAlgebra, would there be a speed benefit (or cost?) to saying instead of, for example:

mxAlgebra(rbind (cbind(A+C+E, .5%x%A+C),

cbind(.5%x%A+C , A+C+E) ), name="DZCov")

instead saying:

mxAlgebra( A+C+E , name="ACE")

mxAlgebra( .5%x%A+C+E, name="halfACE")

mxAlgebra(rbind (cbind(ACE, hACE),

cbind(hACE, ACE) ), name="DZCov")

Especially when the computed result ACE etc might be used not just in a couple of places in an algebra, but in 6 or more algebras as well?

The answer is: it depends.

It looks like you're using common subexpression elimination, which is a reasonable optimization trick. In order to know the cost of the algebra operations, you would need to profile an execution and see how much time is spent evaluating algebras: http://cran.r-project.org/doc/manuals/R-exts.html#Profiling-compiled-code. If we are only spending 5% of our time evaluating algebras, there won't be much of a speedup.

OTOH the algebras have been implemented for functionality, not speed. Matrix addition does not take row or column major order into consideration, and is possibly thrashing across cache boundaries. Correctness first, optimization second.

An algebra is computed only once per minor iteration of the optimizer. I believe that is true, unless this was turned off because it was buggy. So it shouldn't matter how many times the computed result ACE is used.

That's correct--each back-end omxAlgebra object is computed once per optimizer iteration if it's required for the objective computation, regardless of how many times the result is used in the objective computation. The exception is with Full Information methods, which potentially calculate each algebra once per data row per optimizer iteration.

The question here is about duplicating the same calculation in several algebras, though. Without searching for and removing duplicates, there is potential for slowdown. Each operator in a long mxAlgebra specification becomes an omxAlgebra in the back-end. Consider the code

and the code

These are functionally equivalent, but the former will generate 4 omxAlgebras in the back end (three '+' algebras and one 'cbind'), where the latter will generate two. Each of those algebras will need to be evaluated at each optimizer step, which will mean the former will run slower than the latter.

Searching for this sort of duplication is one of several optimizations we have not yet made in OpenMx. As mspiegel said, we've implemented for functionality first, and will be adding optimizations for speed as time goes on.

That said, you can give potential speedups a try and see how well they work--just run both versions and look at the differences in output timing reported in the @output slot of the results. Keep in mind that minor differences in things like starting values and data can result in significant changes in speed, so for these comparisons you might want to run several tests and check the distributions of results.

Thanks guys - I went ahead and did some comparisons and the creation of algebras saves an appreciable (5%-10%) of elapsed time even when it is as simple as addition.

It also makes the more complex algebras much more readable and maintainable!

This also validates the inclusion of clock in the summary: Very handy!