Importing correlation matrices (beginner)

9 replies [Last post]
nickz's picture
Offline
Joined: 10/02/2013

Dear all,

I recently started exploring the metaSEM package and worked my way through some of the examples found online.

However, I am new to R, OpenMx, and metaSEM and have some difficulty with importing my own data.

I would like to import a number of correlation matrices, corresponding sample sizes, and clusters, but I don't know how to structure the input.

Is it possible to input the data using a csv file? If so, how should the file be structured? If not, what is the best way to input a large number of correlation matrices?

In a different post I saw a similar solution for the standardized mean difference and its sampling variance (http://openmx.psyc.virginia.edu/thread/1521), but not for correlation matrices.

I hope you can point me in the right direction.

Thanks,

Nick

nickz's picture
Offline
Joined: 10/02/2013
importing matrices and missing data

Dear all,

I came across another issue when trying to import correlation matrices. I am trying to import correlation matrices from various studies. Only a few of the correlation matrices are complete, some have missing variables and some have missing correlations. I would like to know how to handle missing correlation, rather than missing variables.

Missing variables need to be flagged on the diagonal as NA. The next example shows the second variable to be missing and flagged on the diagonal.

1 NA 0.39
NA NA NA
0.39 NA 1

However, I am not sure how to handle missing correlations. In the following example there are no missing variables, but the correlation between variable one and two are missing.

1 NA 0.31
NA 1 0.10
0.31 0.10 1

Inputting the correlation matrix above as is results in an error due to the missing correlation.

> fixed1 <- tssem1(data,n,method="FEM")
Error in eigen(x, only.values = TRUE) : infinite or missing values in 'x'

If I flag the variable as above I resolve the error, but I lose the information associated with the correlation. The correlation between the two variables is treated as a missing value.

data[[1]][1,1]<-NA

The following two matrices provide the same result.

NA NA 0.31
NA 1 0.10
0.31 0.10 1
NA NA NA
NA 1 0.10
NA 0.10 1

The problem is also mentioned here (Cheung and Chan, 2005, p.45):

"Missing correlation coefficients, rather than missing variables, are observed in MASEM sometimes. It is not easy to handle missing correlation coefficients in SEM."

Is there a way to handle missing correlations without disregarding them? Is there a suggested workaround?
Disregarding the correlations or study coefficients seems arbitrary and I rather not do that. Substituting an average value also seems abit crude.

I've attached the dataset (fulldata_example1.dat) and code file (example_1.r) to clarify the issue.

Thanks,

Nick

Reference:
Cheung, M.W.-L., & Chan, W. (2005). Meta-analytic structural equation modeling: A two-stage approach. Psychological Methods, 10, 40-64.

AttachmentSize
example1.r 323 bytes
fullmat_example1.dat 214 bytes
Mike Cheung's picture
Offline
Joined: 10/08/2009
Hi Nick, There are a couple

Hi Nick,

There are a couple of options. My preferred approach is to use a random-effects model (Cheung, 2013).
## Symmetric matrix for the random effects
random1 <- tssem1(data,n,method="REM")
summary(random1)

# OR diagonal matrix for the random effects
random2 <- tssem1(data,n,method="REM", RE.type="Diag")
summary(random2)

If you still prefer a fixed-effects model, you may fix the variance component of the random effects to zero. This is equivalent to the conventional GLS approach.
fixed4 <- tssem1(data,n,method="REM", RE.type="Zero")
summary(fixed4)

Mike

Cheung, M. W.-L. (2013, June 27). Fixed- and random-effects meta-analytic structural equation modeling: Examples and analyses in R. Behavior Research Methods. Advance online publication. doi:10.3758/s13428-013-0361-y

Nicola's picture
Offline
Joined: 11/15/2013
Error when running both random effects options.

Hi Mike,

I have the same issue as Nick expressed. However, when I ran the two random-effects options, they did not run successfully. I get the following errors. Attached is my data set. Below is my sample size matrix.

##MATRIX FOR SAMPLE SIZE OF EACH STUDY
TAMn = matrix (c(425, 73, 964, 225, 137, 243, 275, 102, 128, 362, 109), nrow=1, ncol=11, byrow = TRUE)

When I run with the first option, I get the following error.
## Symmetric matrix for the random effects
random1 <- tssem1(TAMdata,TAMn,method="REM")

Error in running mxModel:

When I run with the section option, I get the following error.
# OR diagonal matrix for the random effects
random2 <- tssem1(TAMdata,TAMn,method="REM", RE.type="Diag")
summary(random2)

Error in solve.default(t(X) %*% V_inv %*% X) :
Lapack routine dgesv: system is exactly singular: U[9,9] = 0

I have tried searching for these errors, but have not found how to resolve the issue. Thank you for your assistance.

Nicola

AttachmentSize
TAMdata.dat 1.49 KB
Mike Cheung's picture
Offline
Joined: 10/08/2009
Hi Nicola, There are totally

Hi Nicola,

There are totally 10 correlation coefficients in the model. When a diagonal matrix is imposed on the variance component, there are still 20 parameters (10 for the mean correlations and 10 for their variances). I don't think that 11 studies (with missing values) are sufficient to fit this model.

A fixed-effects model also does not work in this example. It is because there is no data for the correlation between X3 and X5.

Hope it helps.

Mike

AttachmentSize
TAMdata.txt 17.65 KB
Nicola's picture
Offline
Joined: 11/15/2013
Thank you!

Hi Mike,

Thank you for your response. I really appreciate the time you have taken to answer my questions. I'm learning a lot. Instead of pooling the 11 studies together, I will group the 11 studies into four groups that will have no missing correlations and run a fixed-effects MASEM for the four groups. I assume this is reasonable. Thanks again for you your help.

Kindest regards,
Nicola

nickz's picture
Offline
Joined: 10/02/2013
reply

Hello Mike,

Thank you for your response. I really appreciate the time you have taken to answer my questions. I'm learning a lot.

I do think that the random effects model is more appropriate and it runs fine. However, I would like to cluster to groups based on study level moderators to see if I can resolve at least part of the heterogeneity. The optional argument 'cluster' is disabled, when the "method = 'REM'" is chosen.

I suppose the solution would be to partition the sample and run the random effects analysis two times (asses the difference in Q and I2 statistics). However, if I understand correctly, you loose the ability to use fit statistics to asses homogeneity as they are not available in stage 1 analysis under the random effects model (Cheung, 2013). Why is this?

One of the advantages of a TSSEM is the ability to evaluate heterogeneity using multiple goodness-of-fit indices under the fixed effects model.

Is there a way to do this is if missing correlations are present? I would like to run the fixed effects model to evaluate whether or not the results are homogeneous (using goodness-of-fit indices) and continue (and repeat stage 1) under the random effects model if the results are not homogeneous.

Kind regards,

Nick

Cheung, M. W.-L. (2013, June 27). Fixed- and random-effects meta-analytic structural equation modeling: Examples and analyses in R. Behavior Research Methods. Advance online publication. doi:10.3758/s13428-013-0361-y

Mike Cheung's picture
Offline
Joined: 10/08/2009
Hi, Nick. Fit indices are

Hi, Nick.

Fit indices are available for the fixed-effects model because it uses a multiple-group SEM approach.

For random-effects model, the effect sizes for each study are treated as data points with known sampling covariance/variances. Multivariate meta-analysis (meta() in the metaSEM package) is used to estimate the mean and variance component of the effect sizes under a random-effects model. Suppose there are 3 effect sizes per study, there are totally 3 means and 6 variances/covariances. Thus, the model is saturated. The so-called fit indices do not work here.

If you really want to do it within the fixed-effects model, you may contact Suzanne Jak (http://www.uva.nl/over-de-uva/organisatie/medewerkers/content/j/a/s.jak/...). She has done some work on how to handle missing correlations within the TSSEM.

Mike

Jak, S., Roorda, D. L., Oort, F. J. & Koomen, H. M. Y. (2013). Meta-analytic structural equation modelling with missing correlations. Netherlands Journal of Psychology, 67, 132 - 139.

Mike Cheung's picture
Offline
Joined: 10/08/2009
Hi Nick, The best way is to

Hi Nick,

The best way is to import the correlation matrices, the sample sizes and the clusters separately. The sample sizes and the clusters are just vectors, e.g.,
n <- c(100, 200, 300, 100)
cluster <- c("A", "A", "B", "B")

Suppose there are two correlation matrices to import:
1.0 0.3 0.4
0.3 1.0 0.5
0.4 0.5 1.0

1.0 NA 0.4
NA NA NA
0.4 NA 1.0

metaSEM provides three functions to read correlation matrices.

readFullMat() reads the full matrices, e.g.,
1.0 0.3 0.4
0.3 1.0 0.5
0.4 0.5 1.0
1.0 NA 0.4
NA NA NA
0.4 NA 1.0

readLowTriMat() reads the lower triangle matrices, e.g.,
1.0
0.3 1.0
0.4 0.5 1.0
1.0
NA NA
0.4 NA 1.0

readStackVec() reads the vectors of the correlation matrices, e.g.,
1.0 0.3 0.4 1.0 0.5 1.0
1.0 NA 0.4 NA NA 1.0

You may refer to the examples in the manual by typing ?readFullMat

Mike

nickz's picture
Offline
Joined: 10/02/2013
Hi Mike, Thank you for taking

Hi Mike,

Thank you for taking the time to point me in the right direction and also for your quick response.

I can't believe I overlooked the manual (didn't know how to find it...).

I managed to import my own data and am now working on figuring out how to import a large number of (fairly large) matrices efficiently. I'm sure I'll manage with the directions you gave above.

Thanks again,

Nick