heritability estimation from gene expression data

7 replies [Last post]
pinki999's picture
Offline
Joined: 01/17/2013

Hi,

I am new to the twins studies field. I am hoping for help and support from this forum.

I want to analyse the gene expression data from twins in order to estimate the heritability. I have seen many studies with other phenotypes but not much with gene expression data. Has anyone worked with expression data?... can anyone point me in a correct direction?

Thanks

pinki999's picture
Offline
Joined: 01/17/2013
Nobody knows :(

Nobody knows :(

neale's picture
Offline
Joined: 07/31/2009
Good Thing!

To be doing something new in science is definitely good!

What form do the expression data take? I imagine very large number of observations from an expression array, rather like the analysis of voxel level data from MRI scans of the brain. Most of the task is to feed the expression measures through a pipeline (an R function) and summarize the results (perhaps as a heatmap plot like individual expression arrays).

Do you have a specific question?

pinki999's picture
Offline
Joined: 01/17/2013
Hi, The main focus is to

Hi,

The main focus is to carry out the heritability analysis. In case of gene expression data, I think we will get an estimate for each gene. I am not sure how the data should look like if I want to carry out the heritability analysis. Currently my data is in a matrix format with each row corresponding to a gene and column corresponding to a sample. I also have zygosity information, age and gender information. It would be great if anyone can suggest me how to proceed. An example in this direction would be a great start for me.

Thank you

anav's picture
Offline
Joined: 05/09/2013
It has been done, but not in

It has been done, but not in OpenMX. Check Grundberg et al (2012). http://www.nature.com/ng/journal/v44/n10/abs/ng.2394.html

The implementation was done in R using lmer(). The methodology is based on:

Visscher, P.M., Benyamin, B. & White, I. The use of linear mixed models to estimate variance components from data on twin pairs by maximum likelihood. Twin Res. 7, 670–674 (2004).

Cheers,
Ana.

neale's picture
Offline
Joined: 07/31/2009
It could be done in OpenMx though

There exist many OpenMx scripts for estimating heritability and common environment effects, controlling for covariates such as age and sex. Some are to be found in the OpenMx documentation, others at the Boulder CO workshop site ">VCU VIPBG site for a recently taught class. For a large number of transcripts, it would probably be best to set up a function to which you pass the model and a along with a list of the variables you wish to analyze. So the only thing the function has to do is fit the model with the right variable.

We have discussed keeping the backend objects (a sort of pre-compiled mxModel) to skip the time spent sanity-checking the model in the front end for every analysis. This would be efficient with either MRI data, or say fitting the same model with a large number of SNPs along the genome, or your case of expression data. The advent of big data does make this more of a priority.

The localization step in Visscher et al - by using pairwise identity-by-descent (IBD) sharing estimates for particular regions of the genome - could also be done in OpenMx using a definition variable for the IBD sharing (see this script of the aforementioned course). Also possible would be the slightly better (and more powerful statistically) mixture distribution approach in which the probabilities that sib pairs are IBD 0, 1 or 2 are used as definition variables (see Eaves et al 1996. In either approach it is possible to test for joint linkage and association using Identity by State (IBS) measures as a covariate on the phenotype. This would effectively separate the association with the measured locus from the linkage with nearby loci.

The relative merits of variance components estimation via structural equation modeling or via lmer() are worth discussing, but I haven't the time to fill up "this margin" with elements of that.

anav's picture
Offline
Joined: 05/09/2013
The link to the script doesn't work.

The link to the script to use IBD sharing from the course doesn't work.

Also, maybe you would help me, as I am trying to replicate the analysis done by the Grundenberg paper in OpenMX. However, the main advantage of lmer() is that we can include random variables in the model. I have searched around, but I am not sure that is possible in OpenMX yet. Would you confirm if random variables can be added in the models?

Thank you,

neale's picture
Offline
Joined: 07/31/2009
Oops

Sorry about the link! Here's an IBD script - although it uses "pihat" instead of the more powerful mixture distribution approach. A number of loci are crudely included in the same datafile, and only one locus is tested - there are better ways to do this of course, but only so many hours in the day.

http://www.vipbg.vcu.edu/HGEN619_2013/twinAeqg.R

This is from the April 22nd session of this course: http://www.vipbg.vcu.edu/HGEN619_2013/HGEN619_OpenMx.shtml

Yes, covariates can be included. I'd say that an advantage of a regression-based method is likely to be speed, compared to numerical optimization in OpenMx. The disadvantage would seem to be that one can't specify models for linkage/association with latent variables.