Revision of Ideas for simplified manual for beginners (SMB) from Tue, 03/02/2010 - 03:17

The revisions let you track differences between multiple versions of a post.

Some thoughts on how to structure such a manual:

1. Getting started in R - to be written
To include: Explain how to get R loaded, set up directory, brief explanation of console vs editor screens, how to escape if program in loop, what the little + signs in left hand column mean, how to comment; Most could be taken from any introduction to R; we can refer to this, but nevertheless, I think a couple of paragraphs of the essentials should be included in SMB

2. Generating simulated data in R - see below
Based on Optimization script p 11, but heavily commented.
User-defined variables to have names such as mydata, mylabel.
This would avoid difficulties for beginners in working out what bits of a script are variables and what are functions.
Also revised script shows how to compute covariances, SDs etc in R, how to plot scatterplot of data, and how to write data to a file and read it back.

3. Matrix operations in R - to be written
This could be lifted from existing sources. Should be kept pretty simple at this stage, but need to explain basic operations of matrix multiplication, transpose, inverse,etc. An appendix similar to that in Mx Manual, with exercises for doing matrix operations, would be useful.

4. Introduction to optimization and maximum likelihood - partly done
First encounter with OpenMx, in very simplified analysis based on script on p 12-14, but computing likelihoods for fixed values of correlation from .1 to .9, so these can be plotted.
User encouraged to repeat analysis with different sample size for simulated data, to show how this affects likelihood estimation. Although this introduces user to OpenMx, main aim here is to get them to understand likelihood estimation.

5. Introduction to SEM using RAM path method - to be done
Could be based on material on p 4, but with more explanation (based on excellent account in Mx manual), and I think better to have example where one-factor vs two-factor models are compared, and user encouraged to modify data to see how this affects likelihood estimation.

6. Introduction to SEM approach to analysis of twin data: univariate ACE model. - to be done
Should be able to use material from Mx manual here. Also worth looking at good didactic material on Shaun Purcell's website. Include path tracing rules to explain how formulae arrived at.

7. Run ACE model in OpenMx - to be done
Include how to read in data that start in SPSS or xls format. Plot scatterplots for MZ and DZ pairs.

Here is my simplified version of script re point 2. NB I am totally new to R and so would welcome suggestions for improving this.

#---------------------------------------------------------------------------------
# Modified version of generate correlated data Script for real beginners (based on script on p 11)
#---------------------------------------------------------------------------------

# Simulate data on X and Y from 50 cases, with correlation of .5 between them
# (NB using smaller N than in original example, so user can see mismatch between obtained and predicted)

require(MASS) # MASS is a R package that you need for generating multivariate normal data

set.seed(2) # A seed is a value used in creating random numbers;
# You don't need to understand how it works
# Keep the seed the same if you want the same random numbers every time you run
# Change the seed to any other number to run the script and get different random numbers

rs=.5 # Correlation between variables
mydata=mvrnorm(50, # Create a matrix of multivariate random normal deviates, called mydata, and specify number of XY pairs to generate
c(0,0), # Mean values for X and Y
# NB. In R, c denotes columns of a matrix
matrix(c(1,rs,rs,1),2,2)) # Covariance matrix to be simulated, 2 rows, 2 columns, i.e.
# [1 .5
# .5 1]

mylabels=c('X','Y') # Put labels for the two variables in a vector
dimnames(mydata)=list(NULL,mylabels) # Allocate the labels to mydata (our created dataset)
# Just accept that NULL is needed here: too complicated to explain
summary(mydata) # Print means for mydata
print('Covariances')
cov(mydata) # Print covariance for mydata
print('Correlations')
cor(mydata) # Print correlation for mydata
print('Note that actual values may differ from specified value of .5, especially with small sample size')
print('SDs')
sd(mydata) # Print SD for mydata
print('Note that the correlation = covariance/(SD_X * SD_Y)')

plot(mydata) # Plots a scatterplot of X vs Y
write.table(mydata,"myfile") # Saves a copy of mydata in your R directory

# If you want to re-read your data another time, you can use a command such as
# newdata=read.table("myfile"); this will create a matrix called newdata containing the data

#--------------------------------------------------------------------------------------