sample weights

15 replies [Last post]
manu's picture
Offline
Joined: 10/08/2009

Dear all,
I am working with data from a study with a complex sample design. This requires the use of sample weights (which are known). Is this possible in OpenMx?
In a post by neale from 09/24/10 it has been suggested to extend the mxFIMLObjective() function so that sample weights, which are part of the dataframe, are available as an argument whithin the function. I think that is exactly what I am looking for, but I could not find any further information. Can you point me to any?
Thank you!

tbates's picture
Online
Joined: 07/31/2009
unconformable arrays

what you say makes sense mike, but

mxAlgebra(-2 * sum(MZw.data.weight * log(MZ.objective) ), name = "mzWeigh")

generates the unconformable array error. Unfortunately the error doesn't tell one what the noncorforming array's sizes are...

figuring this might just be a column of weights from the data entering a row of likelihoods, i tried

mxAlgebra(-2 * sum(t(MZw.data.weight) * log(MZ.objective) ), name = "mzWeigh")

Also fail...
It shouldn't be necessary to place the weights into a matrix, should it?

OK.. tried making a matrix and placing the weight data in there. A benefit is that the errors are more informative. But same error. Would be great if the back end kicked back something like I was doing "[300,1] %*% [300,1] when…"

Will look more in the morning.
PS: I assume the row objective is padded with NAs for all(is.na(rows)?

tbates's picture
Online
Joined: 07/31/2009
trying a simple example: vector alters estimates in model?

So, building a simple play model. After encountering some no-context errors :-(
This is just estimating the covariance of X and Y, which is around .5

require(OpenMx)
require(MASS)
#Simulate Data
set.seed(200)
rs = .5
nSubs = 1000
selVars <- c('X','Y')
nVar = length(selVars)
xy <- mvrnorm (nSubs, c(0,0), matrix(c(1,rs,rs,1),2,2))
testData <- data.frame(xy) 
names(testData) <- selVars
cov(testData)
 
m1 <- mxModel("vector_is_FALSE", 
	mxMatrix(name = "expCov", type = "Symm", nrow = nVar, ncol = nVar, free = T, values = var(testData)),
	mxMatrix(name = "expMean", type = "Full", nrow = 1, ncol = nVar, free = T, values = 0),
	mxExpectationNormal(covariance = "expCov", means = "expMean", dimnames = selVars),
	mxFitFunctionML(vector = F),
	mxData(observed = testData, type = "raw")
)
m1 <- mxRun(m1)
# summary shows all parameters recovered fine.
 
# Now: Switch to vector
m1 <- mxModel(m1, mxFitFunctionML(vector = T), name = "vector")
m1 <- mxRun(m1)
umxSummary(m1, showEstimates = "std" )
# what we get
#                  name  Estimate Std.Error
# 1  vector.expCov[1,1]    0.4705   9.0e+07
# 2  vector.expCov[1,2]    0.6148   1.4e+08
# 3  vector.expCov[2,2]    1.0314   2.1e+08
# what it should be...
cov(testData)
# XX 0.9945328
# XY 0.4818317 
# YY 1.0102951

So when vector is on, something needs to change in the model to keep it driving toward good estimates?

mhunter's picture
Offline
Joined: 07/31/2009
Concerning ... but a solution

This simple example reliably causes my R to completely crash. Rolling back a couple of revisions does not fix the problem for me. Going back to r2655 still has the same crashing problem. For r2583, I no longer crash, but I replicate your finding of rather bad estimates only when vector=TRUE. I should also note I haven't observed this problem with other models running vector=TRUE. However, those other models do not optimize the fit function with vector=TRUE; they have that fit function as a component of another fit function. This got me thinking.

I think in this case OpenMx is trying to optimize the likelihood (not even the log likelihood) of the last row of data. Essentially, for the model you specified the row likelihoods do not collapse to a single number for optimization. The developers should discuss if we should catch this, if it should case an error, a warning, or what.

The following code works like a charm.

require(OpenMx)
require(MASS)
#Simulate Data
set.seed(200)
rs = .5
nSubs = 1000
selVars <- c('X','Y')
nVar = length(selVars)
xy <- mvrnorm (nSubs, c(0,0), matrix(c(1,rs,rs,1),2,2))
testData <- data.frame(xy) 
names(testData) <- selVars
cov(testData)
 
m1 <- mxModel("vector_is_FALSE", 
	mxMatrix(name = "expCov", type = "Symm", nrow = nVar, ncol = nVar, free = T, values = var(testData)),
	mxMatrix(name = "expMean", type = "Full", nrow = 1, ncol = nVar, free = T, values = 0, ubound=1),
	mxExpectationNormal(covariance = "expCov", means = "expMean", dimnames = selVars),
	mxFitFunctionML(vector = FALSE),
	mxData(observed = testData, type = "raw")
)
m1 <- mxRun(m1)
# summary shows all parameters recovered fine.
omxGetParameters(m1)
 
# Now: Switch to vector
m1 <- mxModel(m1, mxFitFunctionML(vector = T), name = "vector")
mc <- mxModel("vector_is_TRUE", m1,
	mxAlgebra(name="thefit", -2*sum(log(vector.fitfunction))),
	mxFitFunctionAlgebra("thefit"))
mc <- mxRun(mc)
omxGetParameters(mc)

tbates's picture
Online
Joined: 07/31/2009
to fit or not to fit, that is the parameter

Very helpful mike H !

If a side effect of vector = TRUE is that the likelihoods are not optimised on, then perhaps mxFitFunctionML(vector = T) should have a fit = TRUE (default) parameter, which causes the ML values to be fit, rather than just optimising one value in the vector?

ie.

mxFitFunctionML(vector = TRUE, optimiseMinus2log_all = TRUE) # note, i will find ML values based on -2 * sum(log(vector))

or perhaps a new function, followed by a fit algebra. Perhaps not... the complexity of this is already in the top 1 % of IQ :-)
mxLikelihoodsML(vector = T, name = "mylike") # note, you need to fit these...
mxFitFunctionAlgebra("mylike") 

tbates's picture
Online
Joined: 07/31/2009
weighted data tutorial

So, I made up a tutorial-type resumé of what I think I gathered here...

https://github.com/tbates/umx/blob/master/developer/examples/weighting/w...

Does that seem correct?

Also, I noticed that these two give the same result: so the

mc <- mxModel("vector_is_TRUE", 
	mxModel("vector",
	mxMatrix(name = "expCov", type = "Symm", nrow = nVar, ncol = nVar, free = T, labels = c("XX","XY","YY"), values = var(testData)),
	mxMatrix(name = "expMean", type = "Full", nrow = 1, ncol = nVar, free = T, labels = c("meanX","meanY"), values = 0, ubound=1),
	mxExpectationNormal(covariance = "expCov", means = "expMean", dimnames = selVars),
	mxFitFunctionML(vector = TRUE),
	mxData(observed = testData, type = "raw")
	),
	mxAlgebra(name = "minus2LL", -2 * sum(log(vector.fitfunction)) ),
	mxFitFunctionAlgebra("minus2LL")
)

and this version with only one model containing two fit functions.
mxModel("vector",
	mxMatrix(name = "expCov", type = "Symm", nrow = nVar, ncol = nVar, free = T, labels = c("XX","XY","YY"), values = var(testData)),
	mxMatrix(name = "expMean", type = "Full", nrow = 1, ncol = nVar, free = T, labels = c("meanX","meanY"), values = 0, ubound=1),
	mxExpectationNormal(covariance = "expCov", means = "expMean", dimnames = selVars),
	mxAlgebra(name = "minus2LL", -2 * sum(log(vector.fitfunction)) ),
	mxFitFunctionML(vector = TRUE),
	mxFitFunctionAlgebra("minus2LL"),
	mxData(observed = testData, type = "raw")
)

Does that make any sense?

neale's picture
Offline
Joined: 07/31/2009
Cool

I like this. I'm surprised that the one with two fit functions in one model actually works, but pleased it does.

Your github link is incorrect (there's a trailing w).

It would be nice to have an example of selective sampling, and then recovery of population parameters by weighting.

neale's picture
Offline
Joined: 07/31/2009
Note example script in models/passing

Although it uses a mixture distribution, the principle for sample weights is very similar (just have a single component weighted mixture, essentially). It uses the technique of pulling variables out of the data frame to use as weights.

http://openmx.psyc.virginia.edu/svn/trunk/models/passing/Acemix2.R

I think this circumvents the issues with dimensionality, but I agree it is a bit clunky that the likelihoods are specified on an individual-wise basis (a la definition variable) but then you get the whole vector of them back. It might be better to have a weight constructor that would compute an algebra (individual-wise) and weight the likelihoods on the way back.

mhunter's picture
Offline
Joined: 07/31/2009
Three things to try

With this example there aren't a lot of arrays available to be non-conformable. It's either -2, MZw.data.weight, or MZ.objective. First, I'd make sure the objective (fit function) for the MZ model has vector=TRUE. Second, is MZw.data.weight a single column vector? The * operator is for elementwise multiplication, and it will not recycle either argument if they are different lengths like R. Use %*% for matrix multiplication. Third, you could put MZw.data.weight into its own mxAlgebra and have a look at it that way.

neale's picture
Offline
Joined: 07/31/2009
True in this case not many sources of non-conformability

But, it would be SUPER HELPFUL to print out the dimensions of the matrices found not to be conformable along with the error. This is something classic Mx used to do, and it's sorely missed in OpenMx. Pretty please?

Ryne's picture
Offline
Joined: 07/31/2009
Weighted objectives

This is possible in OpenMx; Mike was arguing for a shortcut.

To weight objective functions in OpenMx, you'll have to specify two models. The first model is your standard model as though there were no weights, with the "vector=TRUE" option added to the FIML objective. This option makes the first model return not a single -2LL value for the model, but individual likelihoods for each row of the data.

That model is placed in a second MxModel object, which will also contain the data (if you put the data here, you don't have to put it in the first model, but you still can). That second model will contain an algebra that multiplies the likelihoods from the first model by some weight. I'll assume that you want to weight the log likelihoods by a variable in your dataset. Your second model will then look something like this, where "data.weight" is the weight from your data and "firstModel" is the name you assigned to your first model.

fullModel <- mxModel("ThisIsHowYouDoWeights",
    mxModel("firstModel", 
        ....
        mxFIMLObjective("myCov", "myMeans", dims, vector=TRUE)),
    mxData(myData, "raw"),
    mxAlgebra(-2 * sum(data.weight %x% log(firstModel.objective), name="obj"),
    mxAlgebraObjective("obj")
)

Edit: corrected the multiplication to a Kronecker product, as its a 1 x 1 definition variable matrix and a 1 x n likelihood vector.

tbates's picture
Online
Joined: 07/31/2009
Does this work to implement weighted analysis?

Hi,
Following this thread and a previous one [1], I've been trying to implement differential weighting of subjects in a model by building another model along side which multiplies the vector of fits in the first by the row weight in the data and optimizing that sum.

I thought it was working, but I am getting very different results from a colleague who is using MPlus's weight feature. Just deleting the low-weighted rows gives results more similar to his than to mine...
So,

    wMZ = data.frame(weight = w[twinData$ZYG == "MZFF"]) # get weights for MZs
    wDZ = data.frame(weight = w[twinData$ZYG == "DZFF"]) # and for DZs
    ...
    # An MZ model with vector = T
    mxModel(name = "MZ", mxData(mzData, type = "raw"), mxFIMLObjective("top.mzCov", "top.expMean", vector = bVector))
 
    # model "MZw", which simply weights the MZ model
    mxModel("MZw",
    mxData(wMZ, "raw"),
    mxAlgebra(-2 * sum(data.weight %x% log(MZ.objective) ), name = "mzWeightedCov"),
    mxAlgebraObjective("mzWeightedCov")
    # ...
    # In the container model, target an algebra which sums the weighted objectives
    # of the MZw and DZw, which in turn target weighted functions of the individual row
    # likelihoods of the underlying MZ and DZ ACE models 
    mxAlgebra(MZw.objective + DZw.objective, name = "twin"), mxAlgebraObjective("twin")

What (if anything) is wrong with this approach? (or my implementation)

[1] http://openmx.psyc.virginia.edu/thread/445

neale's picture
Offline
Joined: 07/31/2009
Unsure about product

mxAlgebra(-2 * sum(data.weight %x% log(MZ.objective) ), name = "mzWeightedCov"),

Is this really doing what we want, if MZ.objective is a vector of all the likelihoods? I'd be inclined to go for:

mxAlgebra(-2 * sum(mzDataWeight * log(MZ.objective) ), name = "mzWeightedCov"),

where mzDataWeight is the name of an mxMatrix explicitly loaded with the weights.

mspiegel's picture
Offline
Joined: 07/31/2009
Umm, I'm not following the

Umm, I'm not following the previous explanation. The definition variable "data.weight" is a 1 x 1 matrix, and "firstModel.objective" is a n x 1 matrix when vector=TRUE. Another way to specify this is to put the weights in a n x 1 MxMatrix, and then multiply the weights by the likelihood vector. I think there's a right-parenthesis missing the closing of the model "firstModel".

Ryne's picture
Offline
Joined: 07/31/2009
My fault for writing code

My fault for writing code without checking it. It should be a Kronecker product in this specification. You're right that a 1 x n matrix of weights may be included in lieu of the definition variable approach. I fixed the parenthesis issue as well.

manu's picture
Offline
Joined: 10/08/2009
thank you

that was very helpful!
just a quick follow up question: I know Mplus uses a sandwich estimator to correct the standard errors in addition to using the weighted likelihood function. Is there a similar option in OpenMx? Can I trust the standard errors?