DWI quality check method, handling DWI volumes with artifacts

Dear experts,

We have artifacts in some of our DWI volumes. Most of them are “spikes” causing periodical image pixel intensity modulation or even almost complete dropout of signal. Some of them are caused by rapid movement of subject, not correctable by volume-by-volume realignment.

What do you generally recommend for artifact detection and quality check of dwi data?

We currently assess the data quality by the optical inspection:
We examine data slice-by slice and when we spot a volume where some slice has artifact, we exclude completely this volume from subsequent FSL DWI preprocessing. Since the FSL and mrtrix tools process data volume-wise, it is not possible to exclude only that particular slice.
This approach is quite time consuming, subjective and maybe not sufficiently sensitive for less apparent artifacts. This also does not provide any quantitative measure of data quality.

Or, do you think that the current state-of-art tools like eddy, dwi2tensor (with iteratively reweight least squares method), dwi2response, dwi2fod are sufficiently robust for such artifacts (which would probably manifest themselves as outliers in particular model fit) so that manual exclusion of the particular volume with artifact prior to the data processing is not necessary?

I think you’ll find that the most recent version of FSL’s EDDY will do outlier rejection for you - it’ll replace your corrupted slices with their predictions. Otherwise, CSD (dwi2fod) is inherently so strongly constrained that it’s generally pretty robust to the odd outlier. This is true especially for lower SNR data, when it becomes inherently more difficult to distinguish artefact from noise.

On the other hand, I expect the tensor fit (dwi2tensor) to be strongly affected by outliers, unless expressly dealt with (the log transform will tend to exaggerate signal dropout problems).

dwi2response may also be affected, but hopefully not to any great extent: outliers will hopefully not turn a non-single-fibre voxel into a single-fibre voxel, which would lead to some impact on the response - but bear in mind that with the current recommended approach, the response is estimated from the best 300 voxels, so it should tolerate the odd outlier.

We have also been discussing recently the possibility of including modifications to CSD that I’d presented a while back as an ISMRM abstract. It would certainly be relatively simple to add the Rician bias and outlier rejection modifications, I’ll see if I can find the time to work on that over the next few weeks. Note that this would be a voxel-wise outlier rejection strategy, which won’t be as sensitive to detecting outliers as more bespoke, slice-wise outlier rejection strategies (as implemented in EDDY, for instance).

I tried most recent 5.0.9 version of EDDY with --repol option (shortly documented in https://github.com/Washington-University/Pipelines/blob/master/DiffusionPreprocessing/scripts/run_eddy.sh )

and it ends with error message:

eddy: msg=--rep_ol cannot be used for this version of eddy
terminate called after throwing an instance of 'EDDY::EddyException'
  what():  eddy: msg=--rep_ol cannot be used for this version of eddy

I tried 5.0.9 version of eddy binary from NeuroDebian and 5.0.9 versions of eddy_gpu and eddy_openmp from CentOS and it seems that this feature is now disabled in all publicly available versions (despite info in HCP pipelines that --repol works in eddy_cuda version). They probably refrain from public release until this method is validated and published. Or do you have some working version available?

This is quite surprising for me, since the iterative reweighted least squares (IRLLS) method, according to the paper

should work well in presence of outliers, much better than classical linear least squares or weighted least squares (as available i.e. in FSL’s dtifit ). According to their validation this method performs even better than iRESTORE:

Do you think that in practical reality the IRLLS method in dwi2tensor does not work sufficiently well for outliers, i.e. it is not better than traditional FSL’s dtifit with linear least squares or dtifit with weighted linear least squares?

Dear Antonin,

dwi2tensor does not implement the method by Collier et al. which is robust to outliers. This method should have been dubbed ‘robust iteratively reweighted least squares’ instead of just ‘iteratively reweighted least squares’ as was done in the paper.

dwi2tensor does implement the paper by Veraart et al. that is mentioned in the References part of the documentation and which does not address outliers. This method should have been called ‘iteratively reweighted least squares’ instead of just ‘weighted least squares’.

I understand that the naming might have caused some confusion…

We are in the process of including also the robust variant by Collier et al. into the dwi2tensor command.

For info, the current state of outlier handling in eddy 5.0.9 version is summarized in following Jesper’s post on FSL list:


Thank you, Ben, for the clarification. What method would you recommend for now to use for tensor fit robust to outliers? iRESTORE?

I would like to follow-up this thread:
My current experience is, that eddy even with outlier replacement does not handle well all artifacts in volumes. So, I think that still the inspection and exclusion of specific volumes with artifacts is needed to improve precision of the results. Of course, when the number of volumes with artifacts is too high, the exclusion of whole dataset is necessary. What is your common approach to artifacts in data? Do you check volume-by-volume and remove artifactual volumes?

My practical implementation of this is making text file with volume indices and then running custom script using awk and mrconvert -coord. Inside of that script I had to labour to get complement of volumes which should be retained.
I think that for this purpose it would be useful to have directly in mrconvert the option to remove specific volume. What do you think?



I check volume by volume and remove them if necessary. As this is very tedious, we’ve trained an image classifier to do the job for us (ISMRM abstract). For neonates with lots of motion artefacts this works really well.

For our training, we hacked mrview to write the current volume and the label to stdout.

git clone https://github.com/MRtrix3/mrtrix3.git mrtrix_annotate
cd mrtrix_annotate
git checkout 7fcffaea52f0
wget https://gist.githubusercontent.com/maxpietsch/da9733be4610da86746e30d8c1ce523a/raw/0b6e36380a4d1de3c09d9c6782cd585e3db31f59/annotate_mrview_patch.txt
git apply annotate_mrview_patch.txt
./build release/bin/mrview

Annotating: press , to start annotating then press x for reject, . for unsure, p for keep, repeat for every volume. mrview writes the current volume and the key you pressed to the terminal. I parse the output with a python script. You can use any key that is not taken by mrview and of course only label the reject volumes.

Hope that helps.


Too bad that I miss your abstract,

Why did you choose, outlier rejection (instead of trying to replace it with an interpolation of qspace neighbour)
did you compare both ?

Is there a plan to release the code for the classifier ?
I guess it will have to be retrained, to work on normal subject. any plan for that ?

Many thanks


Dear @maxpietsch,

thank you for the feedback. The mrtrix_annotate seems handy. I think that all this stuff (including script for parsing) deserves inclusion to official mrtrix code. What do you and other mrtrix experts think?

I would be also very interested in the code of classifier. I think that the automatic QC of (not only) DWI is very overlooked field. We are working with quite large datasets and at least semi-automatic methods for QC would be very helpful for us.


We did not try to replace the volumes with qspace neighbours as our acquisition is designed to cope with the removal of motion corrupted volumes (Hutter et al., DOI: 10.1002/mrm.26765).

However, I used outlier rejection for the severely corrupted volumes and used Eddy’s outlier slice replacement for the remaining volumes for this ISMRM abstract. I went for this approach as I found that Eddy’s outlier replacement otherwise introduced artefacts in some datasets. We haven’t quantitatively assessed the impact of that decision though. Note that our data can be very motion corrupted depending on the compliance of the baby. Also note that the dHCP preprocessing pipeline uses Eddy with slice to volume registration which helps recovering corrupted volumes.

We might publish the classifier in the foreseeable future but there are no plans for implementing the classifier in MRtrix as it relies heavily on other tools and is tailored to our acquisition and reconstruction. Lacking “normal” data with ground truth annotations, I haven’t tested the applicability to other data. Happy to give it a try if someone can point me to such data.

It would be great if someone wants to contribute to an annotation tool in mrview! I would probably try to attach labels to individual volumes and store them in the .mif header. That obviates the need for a script. Viewing those labels could then be done in the 4D lightbox which I think also still needs some debugging. If someone in the community wants to help out, let us know!

Just to chip in briefly, this ISMRM 2017 abstract may also be of interest in this context: http://cds.ismrm.org/protected/17MPresentations/abstracts/1786.html (by @Kerstin).

just for the discussion, I wonder if the automatic threshold is a good idea :

For instance a similar strategy is used by the - repol option of topup : they compute a threshold based on 4 standard deviations. I notice that very good looking dataset had some corrupted label detected because with a high snr the standard deviation was very small. Then 4 standard deviation correspond to a very smal deviation and false positive were detected. May be I am wrong, the corruption was there but could not be seen by the eyes. …
The opposite effect can be seen with very bad series where the standard deviation is too high and no outlier are detected.

Hmm, yes, can make sense in some scenarios indeed. Generally, it’s safer to go with a multiple of the interquartile range for the purpose of detecting outliers… going with a multiple of the standard deviation already assumes more about the distribution, and may yield such unexpected outcomes if those assumptions don’t hold.