Revision of Ideas for simplified manual for beginners (SMB) from Tue, 03/02/2010 - 11:26

The revisions let you track differences between multiple versions of a post.

Some thoughts on how to structure such a manual:

1. Getting started in R - to be written
To include: Explain how to get R loaded, set up directory, brief explanation of console vs editor screens, how to escape if program in loop, what the little + signs in left hand column mean, how to comment; Most could be taken from any introduction to R; we can refer to this, but nevertheless, I think a couple of paragraphs of the essentials should be included in SMB

2. Generating simulated data in R - see below
Based on Optimization script p 11, but heavily commented.
User-defined variables to have names such as mydata, mylabel.
This would avoid difficulties for beginners in working out what bits of a script are variables and what are functions.
Also revised script shows how to compute covariances, SDs etc in R, how to plot scatterplot of data, and how to write data to a file and read it back.

3. Matrix operations in R - to be written
This could be lifted from existing sources. Should be kept pretty simple at this stage, but need to explain basic operations of matrix multiplication, transpose, inverse,etc. An appendix similar to that in Mx Manual, with exercises for doing matrix operations, would be useful.

4. Introduction to optimization and maximum likelihood - partly done - but got stuck: see below!
First encounter with OpenMx, in very simplified analysis based on script on p 12-14, but computing likelihoods for fixed values of correlation from .1 to .9, so these can be plotted.
User encouraged to repeat analysis with different sample size for simulated data, to show how this affects likelihood estimation. Although this introduces user to OpenMx, main aim here is to get them to understand likelihood estimation.

5. Introduction to SEM using RAM path method - to be done
Could be based on material on p 4, but with more explanation (based on excellent account in Mx manual), and I think better to have example where one-factor vs two-factor models are compared, and user encouraged to modify data to see how this affects likelihood estimation.

6. Introduction to SEM approach to analysis of twin data: univariate ACE model. - to be done
Should be able to use material from Mx manual here. Also worth looking at good didactic material on Shaun Purcell's website. Include path tracing rules to explain how formulae arrived at.

7. Run ACE model in OpenMx - to be done
Include how to read in data that start in SPSS or xls format. Plot scatterplots for MZ and DZ pairs.

Here is my simplified version of script re point 2. NB I am totally new to R and so would welcome suggestions for improving this.

#---------------------------------------------------------------------------------
# Modified version of generate correlated data Script for real beginners (based on script on p 11)
#---------------------------------------------------------------------------------

# Simulate data on X and Y from 50 cases, with correlation of .5 between them
# (NB using smaller N than in original example, so user can see mismatch between obtained and predicted)

require(MASS) # MASS is a R package that you need for generating multivariate normal data

set.seed(2) # A seed is a value used in creating random numbers;
# You don't need to understand how it works
# Keep the seed the same if you want the same random numbers every time you run
# Change the seed to any other number to run the script and get different random numbers

rs=.5 # Correlation between variables
mydata=mvrnorm(50, # Create a matrix of multivariate random normal deviates, called mydata, and specify number of XY pairs to generate
c(0,0), # Mean values for X and Y
# NB. In R, c denotes concatenate
matrix(c(1,rs,rs,1),2,2)) # Covariance matrix to be simulated, 2 rows, 2 columns, i.e.
# [1 .5
# .5 1]

mylabels=c('X','Y') # Put labels for the two variables in a vector
dimnames(mydata)=list(NULL,mylabels) # Allocate the labels to mydata (our created dataset)
# Just accept that NULL is needed here: too complicated to explain
summary(mydata) # Print means for mydata
print('Covariances')
cov(mydata) # Print covariance for mydata
print('Correlations')
cor(mydata) # Print correlation for mydata
print('Note that actual values may differ from specified value of .5, especially with small sample size')
print('SDs')
sd(mydata) # Print SD for mydata
print('Note that the correlation = covariance/(SD_X * SD_Y)')

plot(mydata) # Plots a scatterplot of X vs Y
write.table(mydata,"myfile") # Saves a copy of mydata in your R directory

# If you want to re-read your data another time, you can use a command such as
# newdata=read.table("myfile"); this will create a matrix called newdata containing the data

#--------------------------------------------------------------------------------------

Script for point 4: attempt to write script to show different log likelihood for simulated correlated data, relative to different levels of covariance. Stuck because can't get value of expected covariance to cycle through values as desired. Can someone advise as to what I am doing wrong?

#-----------------------------------------------------------------------------------
# Modified version of 1.2.2 Optimization Script for real beginners
#---------------------------------------------------------------------------------
# Based on OpenMxUserGuide, p 13-14.
# But with expCov values Fixed. - Idea is to step through different values, but can't work out
# how to do this.

require(OpenMx) # This step needed because we are going to use OpenMx routines

bivCorModel=mxModel("Biv corr", #Name given by user to the model
#----------------------------------------------------------------------------------
#create a matrix called expMean with values [0 0 corresponding to expected means
#----------------------------------------------------------------------------------

mxMatrix(type="Full", # for a Full matrix, all cells are filled
nrow=1,ncol=2,
free=TRUE, # if free=TRUE, then values are estimated, rather than fixed at start values
values=c(0,0), # starting values for matrix
name="expMean"), #create a matrix called expMean with values [0 0]
#corresponding to expected means
#--------------------------------------------------------------------------------------------------
#create a 2 x 2 matrix called expCov with values [1 mycor mycor 1] corresponding to expected covariances
#--------------------------------------------------------------------------------------------------

mxMatrix(type="Full",nrow=2,ncol=2,free=FALSE,values=c(1, mycor,mycor,1),name="temp"),
mxAlgebra(expression=temp, name="expCov"), # here just reassign temp to expCov
mxData(observed=mydata,
type="raw"), # use the raw data created in our simulation
mxFIMLObjective( # Function used to obtain likelihood of data
covariance="expCov", # FIML stands for full information maximum likelihood
means="expMean", # FIML needs to be told which values to use for covariance and means
dimnames=mylabels)
)
#------------------------------------------------------------------------------------------------
# End of model specification section; this just sets up the model, it does not run it
#------------------------------------------------------------------------------------------------
myLL=matrix(0,nrow=9,ncol=4) # set up a matrix with 10 rows to hold a) correlation; b) LL values; c) AIC, d)BIC
for (myx in c(1:9))
{
assign("mycor",myx/10)
myLL[myx,1]=mycor
bivCorFit=mxRun(bivCorModel) # Output of running the model is written to bivCovFit
EM = bivCorFit[['expMean']]@values
EC = bivCorFit[['expCov']]
myLL[myx,2] = mxEval(objective,bivCorFit); #Log likelihood

summary(bivCorFit) #How do I access AIC and BIC values, to add to myLL table?
}
print(myLL)