## select sample

3 replies [Last post]
Offline
Joined: 11/24/2013

I have a real data set consisting of 6148 students. There are six observed variables and two latent variables. I calculate the covariances matrix from data which include 6148. I want to selecet 100 student's data which has the same covariances matrix with the universe. If you help me this regard, I would be very pleasure.

Offline
Joined: 07/31/2009
Any randomly selected

Any randomly selected sub-sample will have a covariance matrix that is "pretty close" to the population covariance matrix. They will only differ due to sampling variability.

What do I mean by "pretty close"? I mean that the sampling variability of the covariance matrix is not large. A popular result from undergraduate statistics is that the sampling distribution of the mean has a mean of mu and a variance of v/N where mu is the population mean, v is the population variance, and N is the sample size. Similarly, the sampling distribution of the variance has a mean of v and a variance of 2*v*v/(N-1). If 2*v/(N-1) is less than 1.0 then the variance of the sampling distribution of the variance is smaller than the population variance. There are similar results for multiple variables.

The take-home message is that the sample covariance matrix from a random sample is almost always close enough to the population covariance matrix. You have a population covariance matrix, so take any random sample and the covariance should be sufficiently close to the population covariance.

Offline
Joined: 11/24/2013