Fri, 07/26/2013 - 01:50

I am wondering if someone can answer a nagging question I have about the best way to deal with outliers in heritability modeling.

It has been suggested to me that using z scores, anyone outside the cut off of +/- 2.5 should have their score altered to be the next extreme score (eg if we have scores of 2.5, 2.9, 3.2 etc these would be 2.5, 2.6, 2.7). I am concerned that this will really alter the heritability results. Any thoughts?

Is is better to just remove any outliers from the analyses?

And, if the latter, is the standard +/- 2.5 standard deviations the best cut off to use or is there a better determination?

Thank you so much

Karen

IMO it is best to figure out why an outlier is an outlier. Can coding errors and the like be ruled out? Is it possible that the scale isn't normally distributed? And if not, why not? Is there a rationale for transformation (other than well they look more normal when I use this transformation)? In behavioral sciences measures often derive from summing up a set of items, and if there are floor or ceiling effects (but typically not both) this can affect "outlieriness".

If reasonably obvious explanations can be eliminated, I still don't favor rules of thumb for deleting or (perhaps worse) adulterating the scores. What is an outlier in z-score terms is heavily dependent on sample size. A QQ plot can help to illustrate how bad the outlier is.

I hope these musings help.