Response function for group analysis



Dear Mrtrix experts,

I would like to create an atlas from a group of N controls. All the acquisitions have the same parameters (number of directions, b-0 value, etc…).
Do I need to estimate the response function for each subject with the dwi2response command then estimate the CSD with the N different response functions? Or the best way is to have the same average response function for everyone (and does it exist a command to average the response function)?

Thanks in advance!
Best regards


taken from the docs:

…using the same response function when estimating FOD images for all subjects enables differences in the intra-axonal volume (and therefore DW signal) across subjects to be detected as differences in the FOD amplitude (the AFD). To ensure the response function is representative of your study population, a group average response function can be computed by first estimating a response function per subject, then averaging with the script:

foreach * : dwi2response tournier IN/dwi_denoised_preproc_bias_norm.mif IN/response.txt
average_response */response.txt ../group_average_response.txt



I second that. The manual isn’t very elaborate on explaining why, and “to ensure the response function is representative of your study population” is slightly vague though. I reckon you’d be also good for instance with the average response of only your controls. Of even only your patients. Or even just one subject. But the more important point to emphasise it that you want to use just 1 response (or set of tissue responses if doing multi-tissue CSD) for all subjects when doing CSD. In a way, the response function is the unit of your FOD that results from CSD: amplitudes of the FOD are expressed in a unit called something like “times your responses function”. When doing any subsequent quantitative analysis across your subjects, it’s important that their FODs are expressed using the same units. You can’t compare apples and oranges!


Dear Thijs and Max,

Thanks for this perfect explanation on why averaging the response function. My turn to second Felix (friend and French colleague) on problems that we encountered by using the same response in a group of subjects.
In our experience (mainly focused on Diffusion gradient scheme and TWI adaptation for computing cranial and peripheral nerves atlas), using an average response text file instead of those yielded from an individual dataset led to a “denoising” aspect of the tractogramm (ie. on the visual analysis of the TWI map, whatever the contrast type -FOD amplitude, length…- or use of super resolution properties).

In other words, small distal tracts or nerves could “disappear” by averaging the response function, which could be problematic at the group level, depending of the disease model.
I assume this could be less important for brain white matter fascicles…but any help would be appreciated :slight_smile:

Best, Arnaud


A bit late to chip in with my 2 cents, but better than never, I guess…

OK, what this sounds like to me, is that the FODs might have ended up scaled differently in different subjects, so that the effective threshold on tracking varies - this would indeed ‘hide’ smaller amplitude, more minor tracts in those subjects where the FODs are smaller than they would be when using the subject-specific response. Conversely, I’d expect messier tracking in those subjects where the FODs ended up larger than expected.

In my experience, the response is remarkably stable across subjects (assuming the same acquisition protocol, particularly the b-value, is used). What does change is the data scaling: that is determined by the coil loading, the scanner’s calibration, internal FFT scaling, etc. When using the subject-specific response, these global differences in scaling are inherently accounted for, since the response is derived from the same data, and ends up scaled to the same extent. If however this response is then averaged and used to process the same subjects without any attempt at adjusting the scaling in each subject’s raw data, then this will introduce differences in the scaling of the output FODs. This is the same issue that needs to be accounted for in fixel-based analyses, and is a sufficiently important topic that it has its own page in the documentation.

If you were already performing some kind of subject-wise global intensity normalisation, then your experience is unexpected, and I’d like to figure out what the problem might be. But first we’d need to rule out the much simpler explanation above…

That sounds very interesting! I look forward to the results - will this be made available at some point…?


Hi everyone,

Following from this conversation, I am interested to model subject-specific intrapair differences in monozygotic twins. Would the group average response function be recommended for such an approach as well?



Hi @emmanuelpua,

Yep, it certainly would be equally recommended. As I mentioned somewhere above in a reply, the important thing is mostly that you use just a single response function (or a single one per tissue, if performing multi tissue CSD) for all subjects. Since all your twins are humans, you’re essentially just after a “human single-fibre white matter response for your particular scanner and acquisition protocol”. So as I mentioned above, that could in principle even just come from 1 (random) subject. But because it’d be weird to pick one at random, or for any reasons, the easy thing is to just take the average one. If you’re talking about a severely diverse group, or e.g. a comparison between populations where one population is for instance severely affected by neurodegeneration, then it may be more clever to not use an average response of all subjects per se, but in certain scenarios it would be wiser to use the average response of the healthy population only. But even then, it’s probably not going to differ a whole lot from the overall average one… so no worries in practice either way.

So well, in summary: as long as you use a single response (or single set of tissue responses) for all subjects, you’re fine. That’s what allows you to compare those subjects CSD outcomes. Using different responses for different subjects renders any comparison problematic.



Thanks Thijs!


Hi everyone,

regarding this convo, would you suggest using one single RF even if only tractography and no quantitative analysis is to be done? Do you have experience on how much the RF changes across subjects and if this variation differs across b-val? Also, does FOD scaling affect tractography reconstruction?




I have some doubts regarding this topic as well. I always asumed that you only need an average response function or somekind of normalisation if you are interested in some measure derived from SIFT or SIFT2, am I right? For example if I would like to use FA-weighted matrices (or weighted by any metric), then I belive that should be fine to calculate each matrix independently and compare it, or if I’m interested in some graph metrics, they should be quite robust, regardless the response function used or the absence of normalisation, am I right? Thanks in advance!

Best regards,



Hi @Chiara_Maffei1,

It matters indeed less if you’re not after quantitative analysis. However, if you’re still working with “a group of subjects”, in the sense that the goal is to compare or even just “do” something across them, and as long as they’ve of course been acquired using the exact same protocol, I’d still recommend just using one single unique (set of, in case multi-tissue) response function. In my experience, the response functions vary very little across subjects in shape/contrast (not size, see below!); and if they do, it’s also due to data quality, amount of certain tissues present, and in the end, performance of the response function selection algorithm, which is not per se something that is uniquely valuable to a single subject. Also, the kinds of variations across subjects I’ve seen (which are very little indeed) seems to barely affect the CSD outcome in a substantial manner at all. So there’s no real worries, I’d say.

It sure does, since all our FOD-based tractography algorithms have an amplitude threshold to cut off streamlines ("-cutoff"). However, using a single (set of, in case of multi-tissue) response function is only half of the requirements to make sure that this doesn’t affect any consistency across subjects. The other half is mtnormalise, which accounts for the intensity differences that directly affect the size (amplitude) of the FODs.

That said, there is in a strange way something to be said for indeed using the response function(s) of the subject itself in a scenario where you’re really only after single-subject tractography; since the size of the response functions will also scale with the data; so performing a CSD technique will actually normalise the data to it’s own scale up to a certain extent. So let’s say, e.g., in a clinical scenario where you perform tractography on individual subjects for their own sake (e.g. delineating a bundle for surgical workup), you can probably stick to just dwi2response on the individual subject and CSD using it’s own response function(s), and then directly tractography with “known” good values for -cutoff that suit your scenario (e.g. a specific bundle of interest). But, really strictly speaking, if you want to establish a well controlled processing protocol for a given fixed acquisition protocol, I’d argue to compute (“average”) response function(s) once, based on (a group of) healthy subjects, and always use these for similar subjects (e.g. responses of healthy adult humans, to be used for adult humans in general, both healthy and with a condition / pathology), but also always follow up with mtnormalise, so your -cutoff thresholds can generalise in a consistent manner.

So well, in conclusion, if you’re going to do stuff “across subjects”, or even wanting to set up a well-controlled “standardised” processing pipeline to be used consistently, I’d recommend a fixed set of responses + mtnormalise. However, in practice, in some tractography scenarios, you could be fine with dwi2response per subject and CSD using their own response(s); and then the need for mtnormalise is in practice much less. The latter strategy is in another way also potentially a bit “safer”, since it’s “robust against unexpected things changing to the acquisition protocol”. But of course, if you’re doing anything across a group of subjects, that should never, ever (ever, ever) happen. In a clinical scenario though, I’ve seen cases of that happening, due to diverse reasons…


See my answer above; it depends on where you get your tractograms from, and how. Due to -cutoff, amplitude of FODs (and hence, normalisation issues of all kinds) do affect the outcome of your tractogram itself.

The FA metric itself is of course not affected; but that’s due to it being derived of the tensor model, which happens to model the ADC values, which happen to be derived from a “normalised” (by the b=0 image(s)) value.

Well yes, but then again no if that metric is apparent fibre density (from the FODs) itself.

So in conclusion, always check your entire pipeline for any uses of the FOD amplitude (or even shape, to be honest); if you rely on them being consistent (i.e. “normalised”), then it’d be recommended to use a fixed (set of) response function across your subjects, combined with mtnormalise.


Also, to emphasise this once more (can’t be emphasised enough :sweat_smile:), this doesn’t mean it has to be an “average” (set of) response function(s); just a single unique (set of) response function(s). In more and more scenarios I’m witnessing myself, I’m seeing value to derive the response functions e.g. of healthy subjects only. But in practice, the difference is often very (very) little compared to using an average across “all” subjects (in a study that contains non-healthy subjects).

On a completely separate note, in studies related to development, I’d also get the response function(s) only from the most developed subject(s) in the spectrum/range that’s being studied; for other reasons. And also within those subjects, you’re after the response functions representing the most developed tissue, I’d argue. There’s different ways of looking at this latter scenario though; several of which are “ok”, but they all mean something different…

But well, in conclusion here: a single (set of) response function(s) doesn’t per se strictly mean an “average” set of response functions. Just a single, unique one; so there’s a fixed point of reference to express the results of CSD techniques relative to.