Sorry if this is a question that is answered elsewhere, but does anyone know how the time to compute a full SEM in OpenMx scales with the number of subjects that are used?

Time should scale linearly with the number of data rows in a full information computation in OpenMx. So twice as many rows = twice as much time.

Now, some caveats:

There's some variance in how long a row can take, so using twice as many rows may take a little more or less than twice as much time. I'd actually expect that more often it will take less than twice as long, but it's tough to say for certain.

I'm assuming here that you're not adding any free parameters and not changing the likelihood space too much by adding these new rows. I would imagine that adding more rows from the same population will not significantly alter the likelihood space, but I don't know that for sure.

I'm also assuming you're not reaching the limits of your machine's main memory or anything--for very very large data sets you can run into other bottlenecks. R might have some limits as well. But that shouldn't kick in until the data get very, very large.

And, of course, if you're using a covariance method, the compute time's the same regardless of the number of participants.

tim,
question: does OpenMx have (or is going to have) a mechanism for sorting individual vectors according to pattern and then solving using the summary statistics for a pattern?

e.g., say there are 1000 nuclear families, but they are of two types: (1) mother, father, one offspring, and (2) mother, father, two offspring. the most efficient way to fit models is to compute t(X) %*% X for each type and then fit a model to the uncorrected SSCP matrix for the two groups. this is easily done with OpenMx.
in real life, those 1000 pedigrees are likely to have several dozen types along with a number of unique families. here, it can be very tedious and error prone to get the correct means and covariance matrices for each type. having OpenMx sort the pedigrees and compute summary statistics up front would be a great benefit to the user.
best,
greg

Time should scale linearly with the number of data rows in a full information computation in OpenMx. So twice as many rows = twice as much time.

Now, some caveats:

There's some variance in how long a row can take, so using twice as many rows may take a little more or less than twice as much time. I'd actually expect that more often it will take less than twice as long, but it's tough to say for certain.

I'm assuming here that you're not adding any free parameters and not changing the likelihood space too much by adding these new rows. I would imagine that adding more rows from the same population will not significantly alter the likelihood space, but I don't know that for sure.

I'm also assuming you're not reaching the limits of your machine's main memory or anything--for very very large data sets you can run into other bottlenecks. R might have some limits as well. But that shouldn't kick in until the data get very, very large.

And, of course, if you're using a covariance method, the compute time's the same regardless of the number of participants.

tim,

question: does OpenMx have (or is going to have) a mechanism for sorting individual vectors according to pattern and then solving using the summary statistics for a pattern?

e.g., say there are 1000 nuclear families, but they are of two types: (1) mother, father, one offspring, and (2) mother, father, two offspring. the most efficient way to fit models is to compute t(X) %*% X for each type and then fit a model to the uncorrected SSCP matrix for the two groups. this is easily done with OpenMx.

in real life, those 1000 pedigrees are likely to have several dozen types along with a number of unique families. here, it can be very tedious and error prone to get the correct means and covariance matrices for each type. having OpenMx sort the pedigrees and compute summary statistics up front would be a great benefit to the user.

best,

greg