Intensity normalisation of DWI data from control and TBI rats


Dear all,

Currently I am analysing DWI data from control and TBI rats. I have a question about dwinormalise for the rat brain. I first tried to run dwiintensitynorm to perform global intensity normalisation across subjects (step 5) [after I already went through step 1 to 4]. Somehow this script failed somewhere in the process which has something to do with the registration (might be due to the shape of the rat brain, which is of course quite different from a human brain in size and shape). Therefore I did some steps manually: (1) I made my own study-specific template, (2) registered all FA images to the template, (3) created a WM mask by thresholding, and (4) projected the WM mask back into subject space. Now everything should be ready to apply dwinormalise, since it’s stated that “The mask is then transformed back into the space of each subject image and used in the dwinormalise command to normalise the input DW images to have the same b=0 white matter median value.Elsewhere on the forum (response June 2017) it’s also stated that inter-subject normalisation is provided by dwinormalise.

So I applied dwinormalise to every rat DWI image, using the ‘back projected subject specific WM mask’ as a mask and the default value of ‘1000’. To check the result, I compared the median b=0 white matter values of two rats, and they were slightly different, so they don’t have ‘the same b=0 white matter median value’ as is stated above.

So my question is, does the dwiintensitynorm script apply the dwinormalise step in a ‘special way’, e.g. by normalising every subject by it’s own median b=0 white matter value? In that case every median b=0 white matter value would be 1000 and indeed ‘the same’. How does the script deal with multiple b=0 images? May be I missed something, I just want to know if I’m on the right track, before I start with ‘Fixel Based Analysis’ (another 17 steps)… :wink:

And a related question, which might also be important for the inter-subject global intensity normalisation. We have a dataset of rats with and without traumatic brain injury (TBI). TBI causes wide spread white-matter damage, so should some steps (like normalisation or average FOD estimation) be performed separately for control and TBI rats? Or should there be one study-specific template etc.? Furthermore, data acquisition is longitudinal, similar questions here. Does anyone has experience with a similar set up?

Thanks in advance for your advice and suggestions!

Kind regards,


I am still waiting for a response. Thanks in advance.


But it’s only been a month… :flushed:

Sorry, that one slipped under the radar. Looking at the code for dwinormalise, it really doesn’t do anything too unexpected. It simply scales the entire input DWI series wholesale by a single scalar, and that scalar is (by default) computed as the median b=0 intensity across the specified WM mask. However, when multiple b=0 volumes are present, then each voxel’s b=0 intensity is computed as the mean over the b=0 volumes. So technically, what it’s scaling to is the median over the mask of the mean over volumes of the b=0 intensities… :crazy_face:

So the natural question is how did you assess the median value exactly? I presume using mrstats -mask? If so, can you provide some example output from that command? If you have multiple b=0 volumes, then they’ll each show up with values slightly different from the expected 1,000, but their average should still be (close to) 1,000. But if you were to average them and compute the median of that within the WM mask, you should get 1,000. This should do it:

dwiextract -bzero dwi_nornalised.mif - | mrmath - mean -axis 3 - | mrstats - -mask wm_mask.mif -output median

If that still isn’t 1,000, then I’m not sure what’s going on…

It’s important you keep everything the same to avoid introducing biases between the groups. I would recommend you treat both groups as one, ignoring the difference between them during the entire processing pipeline. You definitely need to use the same average response function for all cases. Most other steps are per-subjects anyway (apart from dwiintensitynorm since it requires a group template). I’d only consider doing something else if the patient group was so abnormal that it caused outright failures – but then, if the pathology was so severe, I’m not sure there’d be any point in performing an FBA…?

This isn’t something I’ve had much of a chance to really think about recently, unfortunately… Maybe others will be able to provide you with more principled advice. But for what it’s worth:

In terms of pre-processing, I think you should also treat them all as equivalent for the same reason: to avoid any biases.

On the stats front, it’s been a while since I looked into this, but I have a feeling our implementation of the statistics in fixelcfestats isn’t really suitable for a proper longitudinal analysis as it stands. No doubt @rsmith or @ThijsDhollander will correct me if I’m wrong…


Yes, in general, I agree strongly with this (and would like to emphasise it as much as possible). There’s no strong need for separate templates in general. Everything, in the end, is expressed relative to a template, or indirectly via different templates if you use multiple ones. All metrics depending on warps, e.g. FC, are relative to this template, but that doesn’t matter: they are relative to the template, only as a means to be relative to each other. Whatever template is chosen, doesn’t matter for the theory. The only arguments for choices on this front are pragmatic, and with respect to performance of the actual registration algorithm: even though sometimes people call a population template “unbiased” because it’s the average of all subjects or something, the only benefit is then that it’ll make the individual registrations of subjects to this template less likely to fail. But for most, if not all, populations I’ve been involved with studying, I’m pretty confident we could just as well have taken the template as coming just from healthy control subjects only, or even patients only (as long as it’s not just all patients with gross pathology, e.g. more outlier than actual data point).

More importantly to deal with the effects of TBI (or other neurodegeneration, but TBI in particular due to potentially large areas being affected), would be to use a model that deals with this. 3-tissue CSD has proven successful in this regard to us since a while now, with the CSF-like compartment dealing with free water, and the GM-like compartment dealing with a range of other effects, leaving you quite often with a cleaned up WM FOD that should hopefully be more specific to the remaining intra-axonal signal. See these two works in this context:

Also, see (for example, there’s multiple ones) this work for how 3-tissue CSD can pay off in a fixel-based analysis, due to dealing with the presence of lesion (neurodegeneration) inherently, so it’s less bothersome to steps such as template building and registration:

Again a very clear and important recommendation: I can’t emphasise enough how you need to strictly adhere to this. All your results are expressed directly in function of these response functions; if “(apparent) Fibre Density” is your metric, the accompanying response function(s) are the units used to relate this metric to the measured signals. Don’t compare things with different units; hence, use only a unique single (set of) response function(s): either a single WM one for single-tissue CSD, or a single fixed set of WM, (GM) and CSF for 2- or 3-tissue CSD. But essentially always use the same stuff for all subjects.

There’s a few simplified things possible, but truly in general: no indeed. @rsmith has been working on extending the functionality to better allow for more advanced designs, but it’s not been released yet. @rsmith : maybe time to start looking into this… if it’s close to being ready of course? This question seems to come up every now and then on the forum, I noticed.


Dear @jdtournier and @ThijsDhollander

Thank you both for your response and clear explanation!

I didn’t exactly know what to expect, but last week I saw responses to more recent posts, therefore I decided to post a ‘short reminder’ (just to be sure)… :slightly_smiling_face:

Previously, I calculated the median of the first b=0, as well as the median over all b=0 volumes. But now I applied your commandline code and calculated “the median over the mask of the mean over volumes of the b=0 intensities…” and it worked! So, now all values are 1,000.

No, the damage in our TBI model is ‘subtle’ and widespread.

Okay, that’s clear, I’ll use one template image for all cases.

Thanks for the suggestions and references! I’ll read these works…

Clear, I’ll use one average response function for all cases.

Would it be possible to ‘extract’ data to perform the analyses and statistics with different software (e.g. R)?


Dear @rsmith,

I read this recent paper which describes a longitudinal fixel based analysis. I think this would also be very helpful for me. When will this longitudinal FBA be implemented in MRtrix? Or would it be possible for me to use the adjusted scripts?

We have a pre-scan, followed by TBI induction and scans at 1h, 24h, 1 week, 1 month, 2 months and 3 months post-TBI. In addition we would like to relate potential structural changes to behavioral measures as well as other (biological) metrics.

Thanks in advance.

Kind regards,


When will this longitudinal FBA be implemented in MRtrix?

Well, it’s already implemented; I think the question is when it will be available… These are part of a large suite of changes that will be published and released as a single coherent batch of “technical improvements to FBA” that are all kind of intertwined; however I need to actually write the publication first…

We have a pre-scan, followed by TBI induction and scans at 1h, 24h, 1 week, 1 month, 2 months and 3 months post-TBI.

The capability to “perform longitudinal FBA” as in @sgenc’s latest paper is principally the ability to perform sign-flipping of model residuals within the Freedman-Lane method as described in this manuscript. However that only deals with a “one-sample t-test” as described in the FSL GLM wiki. In your example you not only have more than two data points per subject, but you additionally do not have compound symmetry. You will need to very carefully consider exactly upon which parameter(s) you wish to perform statistical inference, and whether or not the requirements / assumptions of such statistical testing are satisfied. Depending on your choices here the experiment may additionally require the definition of non-trivial variance groups within the GLM, which I have not yet had a chance to implement.