Wed, 10/20/2010 - 14:10

Here's an example growth mixture model from the NIDA workshop this week. I've copied the two functions I wrote and placed them in the text of this post for further discussion. Everything else in the code has been added to the GMM demo and documentation pages for the next binary release.

The first function calculates posterior class probabilities for a mixture model based on the following equation:

P(class=k|i, p_k, L_k) = p_k * L_ik / sum_k=1^m (p_k * L_ik)

where classes 1 through M are indexed by k, persons are indexed by i, p_k is the model estimated probability of being in class k and L_k is a vector of person specific likelihood for class k. Posterior class probabilities are given by Bayes theorum, with the equation above coming from Ramaswamy, Desarbo, Reibstein, and Robinson, 1993. In words, the individual class probabilities are the ratio of that person's raw likelihood for any one class divided by the sum of their likelihoods across all classes, each weighted by the model's estimated class probabilities for the sample.

The function below takes two arguments: an MxModel that was used to fit a mixture model, and the name of the matrix or algebra in that model that defines the class probabilities. It is assumed that the submodels in the provided model are the models for each of the classes that use FIML or RowObjective and return a vector of likelihoods. For instance, if you fit a mixture model with two classes, your structure should be that some object (''gmm'') that contains a model ("GrowthMixtureModel") with two submodels ("class1" and "class2") representing the models for each class. There should be a matrix in the parent model ("GrowthMixtureModel") that contains class probabilities.

indClassProbs <- function(model, classProbs, round=NA){ # this function takes a mixture model in OpenMx # and returns the posterior class probabilities # using Bayes rule, individual person-class likelihoods # and the model class probability matrix, as described in # Ramaswamy, Desarbo, Reibstein, and Robinson, 1993 cp <- mxEval(classProbs, model) cp2 <- as.vector(cp) cps <- diag(length(cp2)) diag(cps) <- cp2 subs <- model@submodels if(min(dim(cp))!=1)stop("Class probabilities matrix must be a row or column vector.") if(max(dim(cp))==1)stop("Class probabilities matrix must contain two or more classes.") of <- function(num){ return(mxEval(objective, subs[[num]])) } rl <- sapply(1:length(names(subs)), of) raw <- (rl%*%cps) tot <- 1/apply(raw, 1, sum) div <- matrix(rep(tot, length(cp2)), ncol=length(cp2)) icp <- raw * div if (is.numeric(round)){icp <- round(icp, round)} return(icp) } The output of this function is a persons x classes matrix of posterior class probabilities. Next is a function for entropy (again, formula cribbed from Ramaswamy, Desarbo, Reibstein, and Robinson, 1993), which takes a persons x classes matrix of posterior class probabilities and calculates entropy. That equation is: 1-sum_i=1^n sum_k=1^M (-P_ik * ln(P_ik))/(n*ln(k)) And the code: <code> entropy <- function(classProbs){ # this function takes a matrix of class probabilities from a # mixture model with people in rows and classes in columns # and returns the entropy statistic, as given by # Ramaswamy, Desarbo, Reibstein, and Robinson 1993 n <- dim(classProbs)[1] k <- dim(classProbs)[2] e <- 1-sum(-classProbs*log(classProbs))/(n*log(k)) return(e) } Entropy is scaled between zero and one, and I don't believe there are consistent criterion for what constitutes "good" entropy, but maybe some users can chime in. I have questions to prompt discussion: -Does everything look correct? -What other mixture-specific stats do we want? -Is the assumed structure for mixture models tenable? -Is there a better way to make math in forum posts? The equations above are pretty unreadable.

Attachment | Size |
---|---|

gmm.R | 6.77 KB |

Answering the last question: Maybe your forum software supports inline LaTex? If not, we can use a LaTeX generator like http://www.codecogs.com/latex/eqneditor.php to generate formulas and attach them as a picture. For example, I can simply add the formula as linked image (I allowed to alter the formula a little bit to avoid the double usage of 'k'):

This image is generated by the following link:

http://latex.codecogs.com/gif.latex?P(class=k|i,%20p_k,%20L_k)%20=%20\frac{%20\left%20(%20p_k%20\cdot%20L_{ik}%20\right%20)}{\sum_{j=1}^m%20(p_j%20\cdot%20L_{ij})

The drawbacks with this method are, that this image is now rendered every time someone sees this post and whenever they decide to stop the service, all image links will be broken. So it might be more helpful to generate the picture, download it, and attach it to the post. Nevertheless, inline LaTeX in the forum would be awesome!

NIDA workshop to here?

http://openmx.psyc.virginia.edu/workshops

Thanks for this contribution. It works as advertised.

A small remark: variable t1 in line 140 is uninitialized.

Thanks! t1 should be icp[,1]. I'll replace the file.

Does Ryne's function 'gmm.R' need updated? It used to work with older version of OpenMx, but it doesn't anymore with v. 1.2.2-1986.

Here is the error message I now get:

> indClassProbs(gmm.fit)

Error in mxEval(classProbs, model) :

'expression' argument is mandatory in call to mxEval function

Oops. There was insufficient magic to the internal mxEval() call. Don't forget that the function indClassProbs requires two arguments, and the second argument is the name of the classProbs MxMatrix or MxAlgebra in your model. Here's an updated version of the script: