Fixelcfestats not robust to outliers?

rgrazioplene · October 29, 2018, 1:33pm

Hello MRtrix community, I am encountering an issue in which the results of fixelcfestats appear strongly driven by a single outlier (1 observation out of 159 observations). The fwe results are highly significant when that observation is included, but nonsignificant when it is removed. I have a memory of seeing a paper or conference presentation at some point in the last year that mentioned this problem, but I can’t seem to track it down. So, my question is, am I correct in remembering that this problem may have been addressed somewhere in the literature? If so, does anyone remember what it was? If not, has anyone else encountered this issue, and are there any ways of avoiding this? (Other than post-hoc checks.)

Thanks in advance,
Rachael

jdtournier · October 31, 2018, 9:01pm

I have to admit that I’m not all that familiar with the statistics involved, but I would naively have assumed that the permutation testing would have somehow dealt with it. However, a quick search does pull up a decent explanation of how an outlier could influence the FWE error rate – on Twitter, of all places… So based on that little video, if you’re doing a regression, and your outlier is at the end of the range in your outcome measure, and also an outlier in terms of its AFD, then this could lead to artificially low p-values. Have a look at the video, see if I’ve got that right…

As to whether there are ways of avoiding this: not as far as I know in the current software. Maybe others have solutions that they’re working on?

rgrazioplene · November 5, 2018, 3:51pm

Fascinating–this bivariate outlier situation is exactly what was going on in my data, which was obvious as soon as I started making scatterplots of the results. The troublesome case wasn’t an exclusion-worthy outlier in the predictor variable alone, which is why I went ahead and included it in the whole brain regressions. Frustrating to find out that some seemingly cool results weren’t so strong after all, but a good lesson re: why it’s important to visualize data post hoc using values from fwe-significant clusters.

Looks like Dr. Mumford did follow up on suggesting a fix for this issue:

The goal of this video is a simple fix that will at least help you avoid the scenario that inflates your Type I errors. It will only take a few minutes….look at the distribution of your explanatory variables! Are there outliers? If so, run your model with and without those subjects and report both results. More about why this works and why modeling strategies such as robust regression and flame 1 may miss these types of outliers in the video.

Link to the video explanation

rsmith · November 25, 2018, 5:55am

As to whether there are ways of avoiding this: not as far as I know in the current software. Maybe others have solutions that they’re working on?

The solution I have on the way is only applicable for the prior explicit labelling of outliers, which are then excluded from the model on a per-fixel basis. It’s intended for dealing with brain cropping / inconsistent fixel correspondence rather than “outliers” more generically.