Sun, 02/03/2013 - 15:08

I've been trying to determine if OpenMx can be used to help with my optimization problem.

I'm very impressed with what I've read and am excited about the prospect of using it for my project. However, I'm not yet convinced it will help me do what I need and whether it is the right package for this work.

I'm working on confirming a physiological model for heart rate kinetics during cycling. The model takes heart rate, cadence and power as input data. The model calculates a derived power using heart rate, cadence and a number of physiological values that may be measured (those in pink boxes) and those that are adjusted (in yellow boxes) so as to minimize standard deviation of residuals between derived power and measured power. Here is a link to a screenshot of my spreadsheet:

The derivation uses things like slope calculations in the input data, various time constants and there is also recursion. Roughly speaking, the derivation moves from left to right to eventually derive a power value and the residuals are calculated with the measured power value, using a moving average. By making successive parameter changes from a set of starting values, the model can be converged to local minimums. By changing starting points, I can then look for the global minimum. I never know whether I'm at the global minimum however often the correlation is sufficiently high that I can conclude that the parameters are *very near* their optimal values - hence a close representation of the actual physiological values. (Interestingly, once there has been sufficient data collection to have tested the model, i.e. compare the derived physiological parameters against real parameters from a broader variety of individuals and situations, the model itself can then be evaluated and potentially adjusted if needed. I can see this following a more traditional statistical approach.)

So my first and most important question is whether OpenMx can be used to solve for the optimal parameter values. I'm not sure if algebras can be setup to obtain data longitudinally for calculations (like slopes) for example and whether objectives can be setup to perform global minimum searches. If so, then my question would be whether it is well-suited for this (hoping that it is but suspect that it isn't). Finally, if it is well-suited, if there are any posts and/or sources of information that I can look at that will help me.

Btw, if anyone is interested in understanding more and would like to discuss further, feel free to email me off list.

Yes, it can!

Your two options are the algebra objective and R objective, with the algebra objective being a far better option. Algebra objectives just minimize the value of some user-supplied algebra. Here's the simplest possible version of such a model:

test <- mxModel("algDemo",

mxMatrix("Full", 1, 1, TRUE, 10, "a", name="param"),

mxAlgebra(a^2, name="criterion"),

mxAlgebraObjective("criterion")

)

We made a free parameter ("a", in matrix "param") and picked a value of a where a^2 was at its minimum (obviously zero). Check ?mxAlgebra to see if the current supported algebra functions can generate your model. mxRowObjective is the row-wise extension to mxAlgebraObjective: you specify an algebra for each data row, then specify a second algebra to reduce the vector of row objectives to a single model objective for minimization.

Alternatively, you could use mxRObjective to minimize a function you write in R. However, this is much much much slower than the algebra objective, because OpenMx has to make repeated calls back and forth between R and C, and this takes time.

Let us know if either of these options are feasible for your work.

ryne

Thought I'd post my script and results. Doesn't look like anything out there. With a little trial-and-error, managed to get it to work. Here is the script:

I'm only using an hours worth of data. I would like to be able to analyze an entire file with 2 or more hours worth, but the frontend time increases substantially. As well, this is less than half of the analysis so once I start piling on the remaining algebras, the time to compute might make this approach unfeasible.

Are there any other approaches I could use that would provide better analysis time? Is my script efficient designed?

Thanks for your feedback.

So at this stage, I think I'm at the end of the rope.

In order to complete the optimization process, I'm going to need to calculate a running total or accumulation value down several matrices. There is no way to do that with mxAlgebra as far as I can tell. Functions like "cumsum" are not available. (I guess I could pull out some of my old coding skills and update the source but that'd be pushing it for me.)

The software is very easy to use I was able to get part way through the optimization without much fuss. But I don't think it is meant to address the type of analysis I'm looking to do.

If I wanted to sum elements in a column of a matrix, I would premultiply it by a row vector of 1's. Or if I wanted just some of the elements summed, put a zero in elements I didn't want. I've not read your needs carefully so I don't know if this helps, but hope it does.

Ryne,

As someone new to OpenMx, R, and this type of analysis, I very much appreciate you providing the guidance. Now it's just a matter of working my way up the (steep) learning curve to understand how best to set up the script to work most effectively and efficiently.

I guess what I'll need to work through is to find a way define a series of mxAlgebras to funnel into mxAlgebraObjectives that essentially minimize a st. dev. calculation or if mxRowObjective can be used, it will minimize each individual row residual from a set of algebras and parameters. (Perhaps best to use nested models?) Alternatively, I've been working through how a single mxAlgebra could be used. This creates a lot of complexity in definition but might be the only way.

One challenge in the optimization that I think will be difficult is to work through is the phase offset and slope calculation. That is, the input data vectors are not in phase (mostly from poor synchronization of the measurement devices) and the number of rows to use for the slope depends on the accuracy of the measurement device as well as a calculation from the previous row. Three parameters are used to determine the optimal offset and the number of rows to use in the slope calculation.

Other challenges might also be algebras that calculate running totals which are used in other algebras. These too are parameterized and will also need to be optimized.

Again, thanks and any pointers to material I can read through or samples along these lines would be very much appreciated.