Definition variables for categorical and continuous variables, something's wrong

7 replies [Last post]
iloo's picture
Offline
Joined: 05/26/2010

I've noticed this a while ago but forgot to post it to you developers.

Something goes wrong when you include definition variables in a model where both continuous and categorical variables are included, and use the definition variables for regression on the mean. For example running this code yields on my machine:

library(OpenMx)
N <- 2000
u <- rbinom(N,1,.5)
x <- .5*u+rnorm(N)
y <- mxFactor( rbinom(N,1,pnorm(-2+u)) , levels=c(0,1) )
 
model <- mxModel( 'BinCont',      
  mxMatrix('Full',nrow=1,ncol=2,free=c(T,T),name='Betas'),
  mxMatrix('Full',nrow=1,ncol=1,free=F,label='data.u',name='U'),
  mxMatrix('Full',nrow=1,ncol=2,free=c(T,F),name='Means'),
  mxAlgebra( Means + Betas%x%U , name='eMean'),
  mxMatrix('Full',nrow=1,ncol=1,free=T,name='Thresh'),
  mxMatrix('Symm',nrow=2,ncol=2,free=c(T,T,F),values=c(1,0,1),name='Cov'),
  mxData( data.frame(x,y,u) , type='raw'),
  mxFIMLObjective( means='eMean', covariance='Cov',thresholds='Thresh',threshnames='y',dimnames=c('x','y') )
)                                        
mxRun( model )@output$estimate[1:2]
> mxRun( model )@output$estimate[1:2]
Running BinCont 
BinCont.Betas[1,1] BinCont.Betas[1,2] 
               0.1                0.1 

So nothing has ahppened with the regression coefficients. This is true on my Windows machine, as well as on my Linux server. To make things even more complicated, this happen when changing the number of thresds used for fitting model:
> mxOption( NULL , 'Number of Threads' , 2)
 
> mxRun( model )@output$estimate[1:2]
Running BinCont 
BinCont.Betas[1,1] BinCont.Betas[1,2] 
         0.5015814          0.9228213 
 
> mxOption( NULL , 'Number of Threads' , 3)
 
> mxRun( model )@output$estimate[1:2]
Running BinCont 
BinCont.Betas[1,1] BinCont.Betas[1,2] 
         1.1945377          0.5031962 
 
> mxOption( NULL , 'Number of Threads' , 4)
 
> mxRun( model )@output$estimate[1:2]
Running BinCont 
BinCont.Betas[1,1] BinCont.Betas[1,2] 
         0.5015814          0.9228213 

I.e. the result seems to be correct when stating the use of an even number of threads, but not when using an odd number of threads! To make things even more confusing, this is true on the Linux server (where the number of threads actually is changed) as well as on Windows (where only one thread is used).

Am I doing something wrong? Is the use of FIMLObjective causing something of this?

Ryne's picture
Offline
Joined: 07/31/2009
I ran the same code on a Mac

I ran the same code on a Mac (10.8.4, quad core i7, R 2.15.3, OpenMx 999.0.0-2883) and got the correct answer with 1, 2 and 4 threads, but unrealistic betas for 3 threads. I have no idea what's causing this.

This model is also reporting indepndence and saturated M2LLs of -1.9xxx and -2, when the model M2LL is around 6700. The "bad" solution with 3 threads has a lower M2LL, which is weird. All models status 0: 1, 2 & 4 have 13 npsol iterations and 145 evaluations, while 3 has 11 and 138. If I rerun the model with model 3's start values (model1a <- res3), then 2 and 4 return to their previous solutions, where 1 thread gets stuck at a local solution with a M2Ll of 6900.

I compared the raw likelihoods of model 2 and model 3, and did nothing productive beyond make a cool graph.

Note on models below. res1 has the run model with 1 thread, res2 has 2 threads, etc. As 3 threads had the problem, I reran each model with res3's final values, and called everything res1a through res4a.

> res1@output$estimate
 BinCont.Betas[1,1]  BinCont.Betas[1,2]  BinCont.Means[1,1] BinCont.Thresh[1,1]    BinCont.Cov[1,1]    BinCont.Cov[1,2] 
         0.48753503          0.99342898         -0.01476468          1.97586024          0.96021506         -0.02468846 
> res2@output$estimate
 BinCont.Betas[1,1]  BinCont.Betas[1,2]  BinCont.Means[1,1] BinCont.Thresh[1,1]    BinCont.Cov[1,1]    BinCont.Cov[1,2] 
         0.48753503          0.99342898         -0.01476468          1.97586024          0.96021506         -0.02468846 
> res3@output$estimate
 BinCont.Betas[1,1]  BinCont.Betas[1,2]  BinCont.Means[1,1] BinCont.Thresh[1,1]    BinCont.Cov[1,1]    BinCont.Cov[1,2] 
          1.1840026           0.5934588          -0.1664744           1.5706397           0.7077827          -0.1077638 
> res4@output$estimate
 BinCont.Betas[1,1]  BinCont.Betas[1,2]  BinCont.Means[1,1] BinCont.Thresh[1,1]    BinCont.Cov[1,1]    BinCont.Cov[1,2] 
         0.48753503          0.99342898         -0.01476468          1.97586024          0.96021506         -0.02468846 
> 
> res1@output$Minus2LogLikelihood
[1] 6709.981
> res2@output$Minus2LogLikelihood
[1] 6709.981
> res3@output$Minus2LogLikelihood
[1] 6163.627
> res4@output$Minus2LogLikelihood
[1] 6709.981
 
> res1a@output$Minus2LogLikelihood
[1] 6953.424
> res2a@output$Minus2LogLikelihood
[1] 6709.981
> res3a@output$Minus2LogLikelihood
[1] 6163.627
> res4a@output$Minus2LogLikelihood
[1] 6709.981
 
library(OpenMx)
N <- 2000
u <- rbinom(N,1,.5)
x <- .5*u+rnorm(N)
y <- mxFactor( rbinom(N,1,pnorm(-2+u)) , levels=c(0,1) )
 
model <- mxModel( 'BinCont',      
  mxMatrix('Full',nrow=1,ncol=2,free=c(T,T),name='Betas'),
  mxMatrix('Full',nrow=1,ncol=1,free=F,label='data.u',name='U'),
  mxMatrix('Full',nrow=1,ncol=2,free=c(T,F),name='Means'),
  mxAlgebra( Means + Betas%x%U , name='eMean'),
  mxMatrix('Full',nrow=1,ncol=1,free=T,name='Thresh'),
  mxMatrix('Symm',nrow=2,ncol=2,free=c(T,T,F),values=c(1,0,1),name='Cov'),
  mxData( data.frame(x,y,u) , type='raw'),
  mxFIMLObjective( means='eMean', covariance='Cov',thresholds='Thresh',threshnames='y',dimnames=c('x','y') )
)                                        
res1 <- mxRun( model )
 
mxOption( NULL , 'Number of Threads' , 2)                 
res2 <- mxRun( model )
mxOption( NULL , 'Number of Threads' , 3)                 
res3 <- mxRun( model )
mxOption( NULL , 'Number of Threads' , 4)                 
res4 <- mxRun( model )
 
res1@output$estimate
res2@output$estimate
res3@output$estimate
res4@output$estimate
 
res1@output$Minus2LogLikelihood
res2@output$Minus2LogLikelihood
res3@output$Minus2LogLikelihood
res4@output$Minus2LogLikelihood
 
mxOption( NULL , 'Number of Threads' , 1)   
model1a <- res3
res1a <- mxRun(model1a)
mxOption( NULL , 'Number of Threads' , 2)   
model2a <- res3
res2a <- mxRun(model2a)
mxOption( NULL , 'Number of Threads' , 3)   
model3a <- res3
res3a <- mxRun(model3a)
mxOption( NULL , 'Number of Threads' , 4)   
model4a <- res3
res4a <- mxRun(model4a)
 
res1a@output$estimate
res2a@output$estimate
res3a@output$estimate
res4a@output$estimate
 
res1a@output$Minus2LogLikelihood
res2a@output$Minus2LogLikelihood
res3a@output$Minus2LogLikelihood
res4a@output$Minus2LogLikelihood
 
pdf("rawlikelihoods.pdf")
plot(l2, l3)
abline(0,1)
dev.off()
 
pdf("loglikelihoods.pdf")
plot(log(l2), log(l3))
abline(0,1)
dev.off()

AttachmentSize
loglikelihoods.pdf 17.63 KB
rawlikelihoods.pdf 18.95 KB
neale's picture
Offline
Joined: 07/31/2009
Got wrong answer with 1 thread, correct with 4, on a Mac

Confirm this as a bug. With repository version updated as of today, 999.0.0-2914 I get betas stuck at starting values with 1 thread, and inconsistent results with 2-6 threads

@values
          [,1]      [,2]
[1,] 0.5973846 0.9912441
 
         [,1]      [,2]
[1,] 1.131837 0.6336549
 
         [,1]      [,2]
[1,] 1.415889 0.4357451
 
          [,1]      [,2]
[1,] 0.9577509 0.6713813
 
         [,1]      [,2]
[1,] 1.169111 0.5255151

neale's picture
Offline
Joined: 07/31/2009
Working on it

Here's code to work on

library(OpenMx)
N <- 2000
set.seed(1234)
u <- rbinom(N,1,.5)
x <- .5*u+rnorm(N)
y <- mxFactor( rbinom(N,1,pnorm(-2+u)) , levels=c(0,1) )
 
model <- mxModel( 'BinCont',      
  mxMatrix('Full',nrow=1,ncol=2,free=c(T,T),name='Betas'),
  mxMatrix('Full',nrow=1,ncol=1,free=F,label='data.u',name='U'),
  mxMatrix('Full',nrow=1,ncol=2,free=c(T,F),name='Means'),
  mxAlgebra( Means + Betas%x%U , name='eMean'),
  mxMatrix('Full',nrow=1,ncol=1,free=T,name='Thresh'),
  mxMatrix('Symm',nrow=2,ncol=2,free=c(T,T,F),values=c(1,0,1),name='Cov'),
  mxData( data.frame(x,y,u) , type='raw'),
  mxFIMLObjective( means='eMean', covariance='Cov',thresholds='Thresh',threshnames='y',dimnames=c('x','y') )
)                                        
model <- mxOption(model,'Number of Threads',1)
mxRun( model )@output$estimate[1:2]
model <- mxOption(model,'Number of Threads',2)
mxRun( model )@output$estimate[1:2]
model <- mxOption(model,'Number of Threads',3)
mxRun( model )@output$estimate[1:2]
model <- mxOption(model,'Number of Threads',4)
mxRun( model )@output$estimate[1:2]
model <- mxOption(model,'Number of Threads',5)
mxRun( model )@output$estimate[1:2]

jpritikin's picture
Offline
Joined: 05/23/2012
fixed

Please try to reproduce with SVN 3717.

neale's picture
Offline
Joined: 07/31/2009
Works for me

I now get consistent results with R 3.1.1 and rev3171 under OSX Mavericks. Nice going Joshua! Do you think this change will affect other scripts with ordinal data?

jpritikin's picture
Offline
Joined: 05/23/2012
impact

Joint continuous-ordinal was completely broken with respect to definition variables. Pure ordinal or pure continuous models are not affected.

neale's picture
Offline
Joined: 07/31/2009
Fixed another problem script

Major congratulations on finding and fixing this bug, Josh! A joint ordinal continuous script that had been giving weird SE's and which had parameter estimates of regressions-on-definition-variables stuck at the starting values is now giving very sensible results. Great job!