Tue, 05/25/2010 - 09:19

I wonder if others would like the confidence intervals report to include the initial value in the middle as a convenience

i.e.,

confidence intervals:

lbound estimate ubound top.a[1,1] 0.6275213 0.713 0.7931507 rather than lbound ubound top.a[1,1] 0.6275213 0.7931507

I ran the "One Factor" demo code and then modified it to obtain confidence intervals for the A and S matrices. But what do you do if you want confidence intervals, say, on the square root of the diagonal of S? I tried:

factorModel <- mxModel("One Factor", type="RAM",

manifestVars = manifests,

latentVars = latents,

mxPath(from=latents, to=manifests),

mxPath(from=manifests, arrows=2),

mxPath(from=latents, arrows=2,

free=F, values=1.0),

mxData(cov(demoOneFactor), type="cov",

numObs=500),

mxAlgebra(sqrt(diag2vec(S)),name="P"),

mxCI(c('A','S','P'))

)

results <- mxRun(factorModel,intervals=T)

summary(results)

But this is what I obtain for the confidence intervals:

confidence intervals:

lbound ubound

One Factor.A[1,6] 0.36793418 0.42905221

One Factor.A[2,6] 0.46950344 0.54116463

One Factor.A[3,6] 0.53896150 0.61933438

One Factor.A[4,6] 0.65788322 0.75226558

One Factor.A[5,6] 0.74642245 0.85125822

One Factor.S[1,1] 0.03572462 0.04681232

One Factor.S[2,2] 0.03293715 0.04399791

One Factor.S[3,3] 0.03510875 0.04753553

One Factor.S[4,4] 0.03318743 0.04662735

One Factor.S[5,5] 0.02957416 0.04407938

One Factor.P[1,1] 0.00000000 0.00000000

One Factor.P[2,1] 0.00000000 0.00000000

One Factor.P[3,1] 0.00000000 0.00000000

One Factor.P[4,1] 0.00000000 0.00000000

One Factor.P[5,1] 0.00000000 0.00000000

One Factor.P[6,1] 1.00000000 1.00000000

In Mx, if you defined P to be the square root of the diagonal of the S matrix and asked for intervals, they were computed. What am I missing?

This looks like it's specified correctly, and should work

I think you've found a bug--I'll take a look and post something here once we have a fix.

OpenMx binary release 0.4.1 should now be correctly calculating confidence intervals.

Thanks for reporting this issue!

Agreed, CI's look very nice now. I wonder whether we should report Std Error as whenever there are non-linear constraints? Or possibly try an alternative algorithm to obtain the Std Errors? They're quite obviously incorrect in the attached relatively simple example. Output snippet:

Running onelocus P 1 1 0.2944972 2.304829e-314 Q 1 1 0.1540031 2.127637e-314 R 1 1 0.5514997 2.308156e-314

free parameters:

name matrix row col Estimate Std.Error

1

2

3

confidence intervals:

lbound estimate ubound

onelocus.P[1,1] 0.2520800 0.2944972 0.3395616

onelocus.Q[1,1] 0.1229535 0.1540031 0.1889258

onelocus.R[1,1] 0.5028019 0.5514997 0.5991765

observed statistics: 1

Constraint 'equalityConstraint' contributes 1 observed statistic.

estimated parameters: 3

degrees of freedom: -2

-2 log likelihood: 627.1042

saturated -2 log likelihood: NA

The CI's look fine but in fact they are wrong too because the objective function was -lnL instead of -2lnL. This was not possible for the software to "spot" but it is a buyer-beware issue in that one should avoid using negative log-likelihoods and use twice negative log-likelihoods wherever possible.

Note also that the df are incorrect (this is an mxAlgebraObjective). I don't recall if there's a way to manually change the number of observed statistics in a script - does anyone know? Ok enough stuff for one post... to reiterate:

1. Suggest for std errors for models containing non-linear constraints, makes sense?

2. Is there a way to change the number of statistics manually?

`mxAlgebraObjective()`

has the optional arguments numObs = NA, numStats = NA. See`?mxAlgebraObjective`

for more details on these arguments.Ah thanks - a very polite RTFM response :). An unfortunate side-effect of not programing it myself is that I don't remember everything it can do.

The remaining issue concerns the Std Error column in the event that there are mxConstraints in the model. Either we fix it (difficult at best) or we report throughout, methinks. And seems much better until we fix it.

There's a candidate fix for this bug in the current repository version of OpenMx. We're still testing it, but it should be available in the next binary release.

If you're interested, you can download the latest version from the subversion repository and give it a try.

Thanks. Yes, I can see the results for the P vector. Sorry, I'm still learning how to use OpenMX.

> results$P

mxAlgebra 'P'

@formula: sqrt(diag2vec(abs(S)))

@result:

[,1]

[1,] 0.2020252

[2,] 0.1949872

[3,] 0.2020574

[4,] 0.1984617

[5,] 0.1904917

[6,] 1.0000000

dimnames: NULL

Most of the SEMs I use are basically just measurement error models for ophthalmic data from multiple laser devices or for data air pollution monitoring devices (typically involving data for both eyes in my ophthalmic work). Regardless, the devices have different scales (unit sizes) so it's typically necessary to adjust the standard deviations for the random error by the scale parameters before computing ratios comparing devices. So in Mx I'm always computing intervals for various functions of the model parameters. I'm looking forward to doing the same in OpenMx. Thanks again.

I would say that you are not missing anything, but that the mxCI software appears to be missing a vital feature (or has a bug).

Can we clarify the original intent of the script? Is it to (1) compute the lower and upper confidence intervals for the free parameters in the S matrix and then evaluate the P matrix based on those values, or (2) compute the lower and upper confidence intervals of the P matrix? Are these two options the same behavior? I believe they are not the same behavior, but this is outside my area of training. As it is written, the script currently yields the result of option (2).

If the two behaviors are different and if the desired behavior is option (1), then I think we need to add an argument to the mxEval() function such that either the lbound or ubound values are inserted into the model. And when I think upon that suggestion, I realize it would be non-trivial when the confidence intervals are calculated for algebras.

~~Because what we need are the free parameter values for each lbound or ubound calculation, which are currently thrown away~~. Err, that doesn't quite work. To eval the P matrix with the lower bounds, the free parameters of the lower bound of S[1,1] are different from the free parameters of the lower bound of S[2,2], etc.I want to do the same thing I've been doing in Mx. In Mx, I do some algebra to compute the P vector as the square root of the diagonal of the S matrix. When I ask for confidence intervals on the free elements of the P vector, Mx outputs them. In general, I want to be able to compute confidence intervals on any function of the model parameters. This is something I've been doing in Mx. It can be tedious to write the code for Mx to do it, but it does it.

I guess I'm at a loss as to what mxAlgebra is supposed to do. I don't see any result of using mxAlgebra for computing P. The output contains no results for P. In Mx, when you used algebra to compute a matrix, it printed the results.

The results for P should be contained in the output of the model--it just may not be included in the summary.

If you save the results of the mxRun command to a variable, you can examine the optimum state of the model using that result.

So if your run command looks like

`modelFit <- mxRun(...)`

and you look at

`modelFit$P`

, you should see the results for P at the fitted values.I would also like to see this function implemented in OpenMx soon. Computing CIs on arbitrary functions is one of the strengths in Mx. This function can be used to construct CIs on statistics like reliability estimates, R2 and indirect effects- all of them are functions of other parameter estimates. If raw data are available, one workaround solution is to construct bootstrap CIs on the computing functions.

I'm going to implement this, unless there's some terrible reason why I shouldn't.

I think that is useful, especially as a check that nothing weird is happening.

A common issue with CI's is where the optimizer "discovers" that it can use -UpperCI as the LowerCI. For example, in a factor model one gets the same fit if all the elements in a column of a factor loading matrix are multiplied by -1. This holds for the CI case as well. The work around is to bound the first free factor loading in a column to be non-negative. It is not going to be easy for the software to spot this limitation, and to distinguish it from cases where patterning of the elements in a column, or bounding of other elements in a column, make this workaround inappropriate.

However, if we do see that, within some tolerance amount eta the lower CI equals -UpperCI then we could warn the user that this condition has likely been encountered and something should be done to fix it. I recently rescued a couple of papers from going to press with this type of error (they were using classic Mx but same issue applies).

Bounding the path at zero doesn't show how far the CI would extend over zero when it is non-significant. It also seems that you can't just bound things to be non-negative - in many models you have to see which way paths "wanted" to go, and then fool around with the lower or upper as indicated?

It would be nice to be able to do this for the user. Few scripts will have boundaries too, so using CIs will mean adding several lines of boundary code to most scripts.

I wonder if an automated way is possible that will catch (most) problems... perhaps:

1. Find the bound in the direction of the fitted value

2. Check the fit with path set to zero.

a. If it is worse than the criterion, you know that zero is a valid lower bound: set the (upper or lower as appropriate) bound to zero and proceed as now.

b. If it is not worse, then either tell the user "warning: CI called on non-significant path) or, more helpfully, move the bound to 10% beyond the fitted value minus the already-found boundary (i.e., symmetrical about the fitted value, with some leeway) and check in that range?

This is where identification constraints really begin to matter. Were a factor model identified through a fixed factor loading, then this isn't an issue for the (free) loadings, as the fixed loading removes the indeterminacy of the sign of the implied manifest covariances. The trade-off is that there is no CI for the fixed loading. We seem to be talking about factor identification through variance restriction.

An assumption with building confidence intervals is that the likelihood space is monotonically decreasing (and the -2LL monotonically increasing) away from the fitted values, and that the parameters aren't on the boundary of the parameter space. The situation you detail, Mike, is one where the likelihood space is W-shaped with the middle of the W at zero, and that middle point has not reached the criterion value. Depending on model specification, you could argue that the factor loadings as a set have an inherent boundary, and thus confidence intervals and LR tests aren't appropriate as the parameter set approaches that boundary. This gets more confusing when, given the other parameters are fixed, there is no boundary condition. A solution is to constrain the sum of your factor loadings to be greater than zero, though even this has problems when a factor is indicated by a roughly equal mix of positive and negative factor loadings.

Getting back to the topic, does a lower CI of -upper CI (or visa versa) mean that we had to cross zero to look for the boundaries of the confidence interval? I suspect it does, but I could be missing something.

> Getting back to the topic, does a lower CI of -upper CI (or visa versa) mean that

> we had to cross zero to look for the boundaries of the confidence interval?

> I suspect it does, but I could be missing something.

Yes: that's the problem, the CI algorithm is not smart enough to handle the complex problem of knowing that if it crossed zero and had bad fit somewhere on the way to the mirror-image value, then it needs to report the first failure to be above 95% fit, not the second.

I'd like to make sure that I understand how mxCI works. From the thread in the wishlist, it sounds like mxCI (1) starts at the optimized value, and (2) continues in either direction until critical values are found. I don't understand how mxCI could "skip" over a critical value en route to a second, unless the iteration steps (or better term) are just too big. Regarding my earlier comments, a lowerCI of -upperCI is not necessarily an indication of either a poorly constructed confidence interval or a valid CI; rather, it could indicate that the parameter is of indeterminate sign, and thus zero represents a boundary condition.

Mike, your comments are part of what made me doubt my understanding of mxCI. The mxNormalInverval proposal thread makes it sound like confidence intervals are built while holding all other parameters constant. Factor loadings are interesting parameters; as a set, they are of indeterminate sign, but holding all others constant, sign is very important.

Tim, a simpler version of your proposal is to report a fit statistic of some kind with the parameter constrained at zero, allowing the user to interpret their confidence interval as they wish. Knowing the first and second derivatives of the likelihood space at zero would probably help as well, but that requires a larger computational demand than a single likelihood calculation with specified values.

A report on significant parameters would be quite handy, and much quicker than computing the CIs...

This could be implemented by adding a parameter to mxRun() to compute only path significance (rather than the expensive CI...)

mxRun(model, CI=T, intervals=T, sigOnly=T)

sigOnly=T would just whiz through the list of matrices to be examined fixing free parameters at zero and reporting p value of the fit change

That would give users a quick list of paths that may (at least singly) be dropped.

The fit significance when Fixed@0 could supplement the CI table helping the user interpret the CIs.

I think there's some misunderstanding here about how mxCI works.

The CI optimization is not univariate. That is, all the other free parameters in the model are left free during CI calculation.

The likelihood-based confidence interval is calculated by simultaneously minimizing the difference between the objective function value and the target value (for 95%, the target is at 3.84 + the -2LL at optimum) and either minimizing or maximizing the value whose interval is to be calculated. All the model parameters are left free to vary during this computation so that we can catch situations where the slowest ascent of likelihood space is not directly along a parameter. We start from the optimum and walk outward, at present, but we're likely going to be making intelligent guesses as starting points in the near future.

I don't know that simply fixing one of the parameters to zero and calculating the likelihood there will get you a good test of significance. Think about it this way: if you were going to do this by model comparison, you'd need to fix it to zero and optimize the remaining parameters to see if the model could fit just as well without your extra parameter. The same would apply here.

We could walk along the parameter set and fix each one at zero and do a new optimization, but that's likely to take just as long as the CI calculation without actually giving you a confidence interval.

Here's an alternative approach that might work--if so, it'd make a good helper function:

At a glance, at least, this should eliminate the sign-inverted likelihood well from the CI calculations, and should appropriately restrict the CIs to be around one specific fit point, and not around the other. You could think of this as 'see[ing] which way paths "wanted" to go' and then constraining them to that way.

This algorithm would have to be modified if you had more than one factor being defined, since you'd need to fix one sign per factor. But it should still be possible, provided at least one path per factor is significant. (Note: I haven't tried, tested, or simulated this procedure, so I don't know if there are other cases where it won't work.)

Would that address your concern?

I may be stating the obvious here: any constraints and lower/upper free parameter bounds that are placed on the original model are also placed on the CI calculations.