Dwiintensitynorm on too many subjects (HCP)

DorianP · December 25, 2016, 7:48pm

Hi all and happy holidays,

I am following the steps inside dwiintensitynorm script to normalize the HCP data (~860 subjects). The script is designed mostly for small groups and does not make use of a computing cluster, therefore I am trying to parallelize the work myself. Until now I have computed FA for all subjects, and created a cleanup script to remove the typical rim with high FA outside the brain. But I am stuck at population_template. I tried this script on a subset of 10 subjects and it took a 1-2 hours. I have also read on this list cases of 3-day computations for 40 subjects. This means the script will take “forever” on 860 subjects.

Can you give me some advise how to tackle this problem? I have thought various solutions:

Find a way to parallelize the exact computations inside the population_template script in a cluster environment. But I am not sure what computations are being run exactly.
Use the 5TT map and the respective FA map of each subject to select voxels with high FA within white matter. My guess is that this should give anyway a good estimation of the signal we are trying to find out with dwiintensitynorm.
Use alternative methods for building the common template. ANTs has scripts for building population templates, although that may take a while, too. Another idea I had is to make an average FA map, register all subjects to that map using a quick linear or greedy algorithm, then repeat this procedure a couple of times to get a good map in 1-2 days.

If I had to choose, option 2 is the most straightforward, but I am not sure what are the implications of selecting slightly different voxels for each subject. The idea is to take the median value anyway…

Also, for an eventual fixel analyses the population template must come from FODs. In the documentation you advise to create it for a subsample of ~40 subjects. Does this advice still stand for our HCP dataset. Again, I wish these maps were already computed and made publicly available. It would be a great contribution to open source science.

Thank you for any help.
Dorian

DorianP · December 25, 2016, 8:54pm

Forgot another idea I had:

Use existing transformations in MNI space to bring FA maps in MNI. The downside here is that MNI registration was (probably) performed on T1. So the internal WM structures might not be perfectly matched.

ThijsDhollander · December 29, 2016, 12:30am

Happy Holidays to you too @DorianP,

Just quickly chipping in, so you’re not stuck on this: the template in that context doesn’t have to come from all subjects, but all subjects will have to eventually be registered to template space of course. In group studies, you realistically won’t need more than 40 subjects to build a very representative template. For example, a group study with 2 groups of 50 subjects (so 100 in total), should work just fine by picking a reasonably representative subset of 30 subjects from all 100 to build the template. Next, you would then register all 100 (even those 30 still!) individually to the template. The reason why those 30 should also still be registered to the template (even though you already have warps of them to the template space at the point the template was made), is that this registration may come with different regularisation options, different initialisation conditions, etc… All 100 subjects should be treated equally in this regard. It doesn’t matter that those 30 were used in the template creation though: there being 30 of them, no single one will leave a significant footprint on the template itself, it’s just an average smooth template at that stage.

In the context of dwiintensitynorm, the same essentially holds: you can replace that template creation step by one with just a subset of all subjects (so 40 would be ok), followed by 860 individual registrations of each subject to the template. The latter one should then be trivial to parallelise (while the former one with 40 subjects should be doable in a realistic computation time)!

Cheers,
Thijs

DorianP · December 29, 2016, 4:16am

Great, thank you @Thijs .
I am thinking to use buildtemplateparallel.sh from ANTs to create the template quickly in a cluster environment. Would this be ok? Is there any special thing MRtrix does besides the iteration average -> register required to build a template?

Thank you.

ThijsDhollander · January 10, 2017, 4:20am

For non-FOD images, there’s not much difference in the overall idea and strategy, as far as I’m aware. Of course, the actual parameters and registration algorithms, spaces, scales and levels, etc… differ between both. Registration is quite a flexible thing (that’s not per se a positive thing ).

For your FOD template later on in a study, you should definitely go for the MRtrix population_template script (taking into account my above comments about numbers of subjects, in order to keep that processing time realistic and reasonable): the information the FODs (i.e. their SH coefficients) provide to the template building (and registration in general) process is of important value to align those FODs to the extent needed to match fixels more accurately.

jjmcfadyen · January 31, 2017, 7:36am

I’m also trying to run dwiintensitynorm on the HCP dataset. I’ve managed to create a population template (fa_template.mif and template_wm_mask.mif) using a subsample of 60 subjects but I’m not unsure how to now individually normalise each person’s DWI.mif image to the template.

I’m guessing that I’ll need to run each subject through the last lines of dwiintensitynorm:

mrtransform template_wm_mask.mif -interp nearest -warp_full warps/999999.mif wm_mask_warped/999999.mif -from 2 -template fa/FA_999999.mif

dwinormalise Original_DWI/999999_DWI.mif wm_mask_warped/999999.mif Normalised/999999_nDWI.mif

Only how do I create the files within the “warps” directory for each subject? Is it mrregister with some specific parameters?

Dave · February 1, 2017, 1:24am

Hi Jessica,

Hidden at the bottom of the dwiintensitynorm step in the docs is a command to normalise any other subject using the exiting template and WM mask. From the docs:

Keeping the FA template image and white matter mask is also handy if additional subjects are added to the study at a later date. New subjects can be intensity normalised in a single step by piping the following commands together:

dwi2tensor <input_dwi> -mask <input_brain_mask> - | tensor2metric - -fa - | mrregister <fa_template> - -mask2 <input_brain_mask> -nl_scale 0.5,0.75,1.0 -nl_niter 5,5,15 -nl_warp - tmp.mif | mrtransform <input_template_wm_mask> -template <input_dwi> -warp - - | dwinormalise <input_dwi> - <output_normalised_dwi>; rm tmp.mif

As a side note. We are nearly ready to announce a new release for MRtrix, which includes many updates. This includes a novel method to intensity normalise and bias correct multi-tissue CSD data, which would be appropriate for the HCP data set. This novel approach is applied per individual, and does not require any group-based registration. You can check out a sneak preview of the new documentation here.

Cheers,
Dave

jjmcfadyen · February 2, 2017, 12:34am

Hi Dave,

Great, thanks for that! And very exciting about the new update - will keep an eye out.

Cheers,

Jess

DorianP · February 2, 2017, 12:45am

I ended up unpacking the dwiintensitynorm script and running it bit by bit. To contribute to open science, here is the FA template I used from 300 random subjects, created with ANTs. I am attaching also an average FA of all subjects.

https://drive.google.com/drive/folders/0BxHeqEv37qqDZ3Q2TmtkY0s3aVk?usp=sharing
Note 1: I run a cleanup script to remove high FA voxels out of the brain, so the final maps don’t have that ugly rim that may bias the registration.

Note 2: The average FA has also a subject (150* something) which I removed later because for some reason has less orientation than others and CSD cannot reach lmax=8.

DorianP · March 12, 2018, 1:17am

Hello again.

As I mentioned 13 months ago, I used the classic dwiintensitynormalize approach on the HCP dataset to extract a single tract of interest. After extracting both AFD and FA of the tract, I performed correlations with a couple of behavioral scores which I think that tract should be related to. Yet, nothing came significant (tried also segments of the tract).

Now I notice the new command mtnormalise, which uses a different normalization strategy. I was wondering: is mtnormalise more accurate than dwiintensitynorm? Have you performed comparisons of the two methods in some way (i.e., what is the correlation of AFD obtained with dwiintensitynorm and mtnormalise for the same white matter tract)?

Thanks again for this software.

P.s. A side legal question: is MRtrix free for use outside academia, i.e., for commercial/industry purposes? I know that FSL needs special licensing for the industry, and I am aware it might be a problem to use MRtrix without being able to use ‘eddy’, but that’s another story.

rsmith · April 4, 2018, 7:29am

Hi Dorian,

Both mtnormalise and dwiintensitynorm aim to perform “intensity” normalisation in order to make quantitative values more comparable across subjects. However I’m not sure if I would describe the former as more “accurate”: not only would that require an explicit demonstration of such, but the actual “normalisation” that they perform is somewhat different (both in terms of the mechanisms used, and the resulting quantitative measures). Certainly we would say that mtnormalise makes use of more information, gives the possibility of performing superior bias field correction to dwibiascorrect alone, and does not suffer from particular weaknesses that are intrinsic to the dwiintensitynorm approach; so we would generally consider it to be the “superior” option.

I’m not aware of any investigation into correlations between the two approaches. This might be an interesting experiment to pursue.

Regarding the legal question: We have required separate MRtrix3 licensing for our own collaborations with Siemens. While I’m very much not an expert on licensing, if there’s potential commercial / industry uses of MRtrix3, I think it would be best to open a private dialogue with us directly.

ThijsDhollander · April 4, 2018, 8:23am

@DorianP: I’m working on it. Note that mtnormalise requires a multi-tissue CSD output to be able to do its job. If your data allows for a decent multi-tissue CSD output, then you have my word mtnormalise is much more robust, both at correcting (spatial) intensity inhomogeneities (“bias fields”) as well as global intensity normalisation for quantitative AFD analysis, e.g. as part of FBA or other analysis strategies.

About the interpretation of the resulting quantitative metrics: I’m working on that as well; no worries, you’ll soon hear more about this. mtnormalise also changes things for the better here, eliminating certain sources of T2 shine-through that may in the dwiintensitynorm still be left in (and in case left in, cause unwanted variance or even an unwanted effect in later analysis). It depends on the scenario though, as well as (in case of a “traditional” dwiintensitynorm strategy) the region used as a reference for intensity normalisation. mtnormalise eliminates some of these worries. In most scenarios, it’ll bring you closer to the “intended” original interpretation of AFD, when observing relative differences between populations.
However, beyond all of this, there’s still other ways to normalise (some of which do change interpretation). Again, I’m working on this and parts of it will be released…

Yes. Entirely free for use.

That’s true indeed. MRtrix does not require industry users to buy a license.

Indeed, when referring to “MRtrix” here, it doesn’t include external tools that some of our scripts happen to call upon at times. Given FSL’s licensing strategy, you’re entirely right that you’ll need to speak to the FSL folks for use of FSL tools such as eddy and topup.

For MRtrix, refer to the bit of text you’ll see popup under the help of each command (or in the source code headers, etc…). At the moment, for most commands, on the master branch, this reads:

So the MPL link in there will tell you all the details; but the gist is that you can use it without worries. If you (change and) distribute (such changed) versions, you’ll have to refer to that link again. The gist there is that such distribution of (changed) versions requires a clear description (link) of where the original MPL licensed software can be found (on our GitHub page/repository). But again, definitely go and read the details if you require them!

Note furthermore also the “without warranty” bit (and beyond)… so if you kill someone with the software… we’ve got nothing to do with it!

This was for a past version of MRtrix, that was licensed via the GPL, which was much more restricted. And of course also because this includes re-distribution on one of Siemens’ platforms. But due to the GPL context of that, things were very different.

@DorianP: if you’re in doubt, definitely be in touch. But as far as simply “using” goes (not re-distributing, etc…), there isn’t much to worry about.