heritability estimates for unequal family sizes in R

5 replies [Last post]
Docherty's picture
Offline
Joined: 02/20/2012

I'm working with pedigrees where one family member has schizophrenia and all family members share 1/2 DNA. The program SOLAR seems to use a separate matrix for mother and father id when samples have families of varying sizes. Does anyone know of sample R/OpenMx syntax for this? I have a smaller sample and want to use all family data possible, but my first-degree family members are unequal--it would be nice to eyeball what matrices look like in this type of analysis.

neale's picture
Offline
Joined: 07/31/2009
Set up a model for the maximum family size

One approach to this problem would be to treat the smaller pedigrees as having missing data for those relatives that don't exist. Suppose the maximum family size was 6, then the data might look like this (I assume binary data here but the same applies if the data are ordinal or continuous):

Proband Rel1 Rel2 Rel3 Rel4 Rel5
1 0 1 1 1 0
1 0 1 0 0 0
1 0 0 0 0 0
1 0 0 NA NA NA
1 NA NA 1 0 0

where the NA's indicate 'missing' family members. The predicted covariance matrix would require as many rows and columns as there are in the largest family size, six in this case. Likewise the predicted mean and threshold matrices would require six columns. If we assume that all relatives are equally correlated, and have the same variance, then I would set a covariance structure using matrix algebra to get VG+VE on the diagonal and .5VG otherwise. Assuming the data are in a file called szData.txt, a script might look like this.

szData <- read.table("szData.txt", header=T)
szData <- mxFactor(szData, 0:1)
varNames <- names(szData)
maxfamsize <- 6
Identity <- mxMatrix(type="Iden", nrow=maxfamsize, ncol=maxfamsize,  name="identity")
Units <- mxMatrix(type="Unit", nrow=maxfamsize, ncol=maxfamsize,  name="unit")
VE <- mxMatrix(type="Full", nrow=1, ncol=1, free=T, name="E")
VG <- mxMatrix(type="Full", nrow=1, ncol=1, free=T, name="G")
famCov <- mxAlgebra( E %x% identity + ((.5 * G) %x% (identity + unit)), name="famcovs")
famMean <- mxMatrix(type="Full", nrow=1, ncol=maxfamsize, free=F, labels="mean", name="famMean")
famThresh <- mxMatrix(type="Full", nrow=1, ncol=maxfamsize, free=T, labels="threshold", name="famThresh")
myData <- mxData(szData, type="raw")
myObj <- mxFIMLObjective(covariance="famcovs", means="famMean", thresholds="famThresh", dimnames=varNames, threshnames=varNames)
pedModel <- mxModel("PedigreeModel", Identity, Units, VE, VG, famCov, famMean, famThresh, myData, myObj)
summary(fittedpedModel <- mxRun(pedModel))

The %x% are Kronecker products to get G+E on the diagonal, and .5G on the off-diagonal. Two free parameters is all!

There are, however, some gotcha's here. Since the data have been ascertained via an affected proband, the likelihood is not the same as if the families had been sampled at random. In essence we are missing the chance to observe those families in which none of the family members is affected. Unsurprisingly, the probability of observing such a family varies according to its size, so the correction should depend on family size. This issue could be resolved using vector=T to get individual pedigree likelihoods and weights that would depend on a definition variable for the family size, and the covariance structure of the pedigree. Overall this isn't simple, I realize. Scripting a solution would take me a bit longer...

Docherty's picture
Offline
Joined: 02/20/2012
This is very helpful, thank

This is very helpful, thank you. We have collected data on family members available to participate, but the numbers of members in each family do not necessarily correlate with the actual size of the family. Does this still present a problem? In every family there is one proband (it is not multiplex).

neale's picture
Offline
Joined: 07/31/2009
several things to consider

When you say there is one proband per family, is this because you never ascertained any families in which there were two probands - this could have happened but just didn't. I assume the former. By multiplex, one might ordinarily think that it applies regardless of proband status, that is, if the number of schizophrenics in any pedigree is greater than 1, the pedigree might be called multiplex. I assume that this is not the case - that there are pedigrees with multiple affected individuals, but only a single proband.

Next, while it would be better to have a measure of family size, we might suppose that numbers available to participate are an imperfect index of family size and could work with that.

The next direction I would go would be towards omxMnor() to compute a weight for each pedigree. We want the likelihood of finding a pedigree of size n with at least 1 affected, which is equal to the 1 - likelihood(none affected). The second is easier to compute, it will be the result of omxMnor(covariance, means, lbound, ubound) where lbound=-Inf and ubound=t (the threshold in standard normal units, say 2.326 for 1% prevalence) the covariance matrix will be the predicted covariance matrix based on the current parameter estimate of the resemblance between relatives, and the means will be zero. The dimensions of all the arguments to omxMnor would depend on the family size. To be honest, I am treading new OpenMx ground here and need to think how it might best be arranged - mxRowObjective is coming to mind but it might work in the usual ordinal FIML function with vector=TRUE and some cunning use of omxSelectRowsAndCols() and omxSelectCols(). Or (duh, should have thought of this sooner) collect pedigrees of equal size and make different models for each, but use the same ascertainment correction for all pedigrees of this type. I note that this isn't particularly (i.e. in the least bit) general and won't easily generalize to more complex pedigrees.

Docherty's picture
Offline
Joined: 02/20/2012
To the first part of your

To the first part of your comment--yes, the former. The only concern is that theoretically there could be less likelihood of collecting relatives when endophenotype levels are high within families...So in families with more severe schizotypal symptoms or traits/genetic liability, relatives could be less available.

Using participant ns as an index of family size is interesting, and there probably aren't many variations in these data (~2-6 family members in each group). Correct me if I'm wrong--Is a model for each family size what you're thinking? So the ascertainment correction would be unique for each type of family/model, based on those means? Thanks!

neale's picture
Offline
Joined: 07/31/2009
Yes

Yes, that is what I was thinking. Sorry for the delay - have been away!