## trying to get correct standard errors for standardized regression coefficients

7 replies [Last post]
Offline
Joined: 11/05/2010

Hello all,

I am trying to get the correct standard errors for standardized regression coefficients. According to Bentler and Lee (1983) this can be done using nonlinear constraints. I have tried two different model specifications with nonlinear constraints, both of which yield the correct parameter estimates, but neither give correct standard errors (they are returning zeros). Any help would be appreciated. My R code and output are below.

Thanks,

Jeff

Bentler, P. M. & Lee, S. (1983). Covariance structures under polynomial constraints: Applications to correlation and alpha-type structural models. Journal of Educational Statistics, 8, 207-222.

#################
#################

R <- matrix(c( 1,.3,.4,
.3, 1,.5,
.4,.5, 1),3)

manifests <- c("y","x1","x2")
colnames(R) <- rownames(R) <- manifests

###########
# Model 1 #
###########

m1 <- mxModel("Regression of y on x1 and x2 (standardized)",
type="RAM",
manifestVars=manifests,
mxPath(from=c("x1","x2"), to="y",
arrows=1,
free=TRUE, values=.3, labels=c("b1", "b2")),

mxPath(from=c("x1","x2"),
arrows=2,
free=TRUE, values=1,
labels=c("VarX1", "VarX2")),

mxPath(from="y",
arrows=2,
free=TRUE,values=.8,labels="VarE"),

mxPath(from="x1", to="x2",
arrows=2,
free=TRUE, values=.3,
labels=c("CovX1X2")),

# nonlinear constraint: Y to unit variance

mxAlgebra(expression=b1^2 + b2^2 + 2*b1*b2*CovX1X2+VarE, name="VarY"),
mxMatrix( type="Unit", nrow=1, ncol=1, name="I"),
mxConstraint(I == VarY, name="VarYEqualOne"),

mxData(observed=R, type="cov", numObs=100)
)

m1 <- mxOption(model=m1,key="Standard Errors",value="Yes")

m1.run <- mxRun(m1)
Running Regression of y on x1 and x2 (standardized)
summary(m1.run)
data:
\$`Regression of y on x1 and x2 (standardized).data`
\$`Regression of y on x1 and x2 (standardized).data`\$cov
y x1 x2
y 1.0 0.3 0.4
x1 0.3 1.0 0.5
x2 0.4 0.5 1.0

free parameters:
name matrix row col Estimate Std.Error
1 b1 A y x1 0.1333333 2.128634e-314
2 b2 A y x2 0.3333333 2.128634e-314
3 VarE S y y 0.8266667 2.128634e-314
4 VarX1 S x1 x1 0.9999993 2.128634e-314
5 CovX1X2 S x1 x2 0.4999993 2.128634e-314
6 VarX2 S x2 x2 0.9999993 2.130209e-314

observed statistics: 7
Constraint 'VarYEqualOne' contributes 1 observed statistic.
estimated parameters: 6
degrees of freedom: 1
-2 log likelihood: 249.6745
saturated -2 log likelihood: 249.6745
number of observations: 100
chi-square: 4.672529e-11
p: 0.9999945
AIC (Mx): -2
BIC (Mx): -2.302585
RMSEA: NA
timestamp: 2010-11-05 11:22:50
frontend time: 0.1519690 secs
backend time: 0.001686096 secs
independent submodels time: 7.70092e-05 secs
wall clock time: 0.1537321 secs
cpu time: 0.1537321 secs
openmx version number: 1.0.1-1464

###########
# Model 2 #
###########

# covariance matrix that matches the correlation matrix
# in model 1

COV <- matrix(c(100,60 ,60,
60,400,150,
60,150,225),3)

colnames(COV) <- rownames(COV) <- manifests

D <- diag(sqrt(diag(COV)[2:3]))
sdy <- sqrt(COV[1,1])

m2 <- mxModel("Regression of y on x1 and x2 (standardized)",
type="RAM",
manifestVars=manifests,

mxPath(from=c("x1","x2"), to="y",
arrows=1,
free=TRUE, values=.2, labels=c("b1", "b2")),

mxPath(from=c("x1","x2"),
arrows=2,
free=TRUE, values=300,
labels=c("VarX1", "VarX2")),

mxPath(from="y",
arrows=2,
free=TRUE,values=.8,labels="VarE"),

mxPath(from="x1", to="x2",
arrows=2,
free=TRUE, values=100,
labels=c("CovX1X2")),

# nonlinear constraints

mxAlgebra(exp=sdy*(t(rbind(b1,b2))%*%R[2:3,2:3]%*%rbind(b1,b2)+VarE)*sdy,name="c1"),
mxAlgebra(exp=sdy*(b1+R[2,3]*b2)*D[1,1],name="c2"),
mxAlgebra(exp=sdy*(b2+R[2,3]*b1)*D[2,2],name="c3"),

mxConstraint(c1 == COV[1,1],name="nonlinear1"),
mxConstraint(c2 == COV[1,2],name="nonlinear2"),
mxConstraint(c3 == COV[1,3],name="nonlinear3"),

mxData(observed=COV, type="cov", numObs=100)
)

m2 <- mxOption(m2,"Standard Errors","Yes")
m2 <- mxOption(m2,"Major iterations","10000")

m2 <- mxRun(m2)
Running Regression of y on x1 and x2 (standardized)
summary(m2)
data:
\$`Regression of y on x1 and x2 (standardized).data`
\$`Regression of y on x1 and x2 (standardized).data`\$cov
y x1 x2
y 100 60 60
x1 60 400 150
x2 60 150 225

free parameters:
name matrix row col Estimate Std.Error
1 b1 A y x1 0.1333333 2.121996e-314
2 b2 A y x2 0.3333333 2.121996e-314
3 VarE S y y 0.8266667 0.000000e+00
4 VarX1 S x1 x1 398.2965743 2.121996e-314
5 CovX1X2 S x1 x2 149.1088443 2.121996e-314
6 VarX2 S x2 x2 224.3940013 2.322230e-314

observed statistics: 9
Constraint 'nonlinear1' contributes 1 observed statistic.
Constraint 'nonlinear2' contributes 1 observed statistic.
Constraint 'nonlinear3' contributes 1 observed statistic.
estimated parameters: 6
degrees of freedom: 3
-2 log likelihood: 11991.72
saturated -2 log likelihood: 1834.935
number of observations: 100
chi-square: 10156.78
p: 0
AIC (Mx): 10150.78
BIC (Mx): 5071.484
RMSEA: 5.817727
timestamp: 2010-11-05 11:22:51
frontend time: 0.3465281 secs
backend time: 1.01469 secs
independent submodels time: 6.29425e-05 secs
wall clock time: 1.361281 secs
cpu time: 1.361281 secs
openmx version number: 1.0.1-1464

Offline
Joined: 07/30/2009
The reason that standard

The reason that standard errors come out as NAs when you include constraints is that the inverse Hessian does not, in general, provide correct standard errors in the presence of arbitrary constraints. Thus we do not report them.

When you included:

m1 <- mxOption(model=m1,key="Standard Errors",value="Yes")

you overwrote the default, and sure enough, the standard errors aren't correct.

But there is a way to do this.

1. Remove the mxOption() statement.

2. Add the following to your model to calculate confidence intervals for "b1" and "b2". Of course, you could calculate confidence intervals for all your estimates, if you like.

mxCI(c("b1", "b2")),

3. Then turn on confidence interval calculation in the mxRun statement.

m1.run <- mxRun(m1, intervals=TRUE)

That's all there is to it.

From the mxCI version:

confidence intervals:
lbound  estimate    ubound
b1 -0.07510258 0.1333333 0.3279935
b2  0.12635554 0.3333333 0.5064689

From inverse Hessian model of the correlation matrix without constraints:

# b1
> 0.1333332 - (1.96 * 0.1055161)
[1] -0.07347836
> 0.1333332 + (1.96 * 0.1055161)
[1] 0.3401448

# b2
> 0.3333333 - (1.96 * 0.1055160)
[1] 0.1265219
> 0.3333333 + (1.96 * 0.1055160)
[1] 0.5401447

In this case, the mxCI estimates are a little tighter than the inverse Hessian estimates, but that is not guaranteed to be the case. In general I would be more likely to trust the likelihood based (mxCI) estimates over inverse Hessian estimates of the parameter variability.

Hope this helps!

Offline
Joined: 11/05/2010
Thanks for the help! I have

Thanks for the help!

I have one question. Do you happen to have a reference discussing why the inverse Hessian under arbitrary constraints does not yield correct standard errors?

Thanks again,

Jeff

Offline
Joined: 07/31/2009
This issue is more technical

This issue is more technical than scholarly. There are two 'flavors' of Hessian that OpenMx generates. One is estimated during optimization - it is used by the optimizer to find the minimum. The second is calculated after optimization by directly adjusting the parameter values and recording how much the objective changes. The reason for the second direct step (even though it can be costly in terms of CPU) is that the first method is sometimes inaccurate, whether there are non-linear constraints (specified with mxConstraint()) or not. Unfortunately, the second method does not take mxConstraint()s into consideration, so it too yields an inaccurate Hessian when there are constraints in the model. Hence no SE's for models with this type of constraint. This limitation may be lifted in a future version of OpenMx.

Offline
Joined: 07/31/2009
Surely the Hessian is a

Surely the Hessian is a headless rider in Sleepy Hollow? :-)

Offline
Joined: 10/08/2009
I sometimes also need to have

I sometimes also need to have standard errors rather than confidence intervals. It is because standard errors of the summary statistics can be used in further analyses, e.g., meta-analysis. One possibility that I am thinking is parametric bootstrap. The following code estimates a bootstrapped SE for Jeff's example. It is not optimized for performance. For example, it is better to generate covariance samples directly instead of to generate raw data.

library(MASS)
set.seed(10001)
paraboot <- function(R, n, mx.fit) {
mx.fit@data@observed <- cov(mvrnorm(n=n, mu=c(0,0,0), Sigma=R))
mx.refit <- mxRun(mx.fit, silent=TRUE, suppressWarnings=TRUE)
return(mxEval(c(b1,b2), mx.refit))
}

bootCov <- cov(t(replicate(200, paraboot(R=R, n=100, mx.fit=m1.run))))
dimnames(bootCov) <- list(c("b1","b2"),c("b1","b2"))
bootCov
sqrt(diag(bootCov))

bootCov:
b1 b2
b1 0.010148215 -0.006035671
b2 -0.006035671 0.011095607

Standard errors:
b1 b2
0.1007383 0.1053357

Offline
Joined: 07/30/2009
Nice!

Nice!

Offline
Joined: 07/31/2009
>The reason that standard

>The reason that standard errors come out as NAs when you include constraints is that the inverse
> Hessian does not, in general, provide correct standard errors in the presence of arbitrary
> constraints. Thus we do not report them.

Might be helpful to add a note in the summary when SE <- NA

Something like "Constraints preclude automatic calculation of Standard Errors. To determine CIs, rewrite to avoid constraints (perhaps using standardized data) or, more simply, add a request to mxCI() into your model. e.g.:

mxCI(c("b1", "b2"))