Systematic FD offset between baseline and post-intervention timepoints in longitudinal FBA

emmatinney · February 19, 2026, 4:02pm

Hi all,

I’m running a longitudinal fixel-based analysis on a large exercise intervention trial and I’m seeing a systematic increase in mean fiber density (FD) of approximately 0.1 at the post-intervention timepoint relative to baseline. This offset is present across virtually all 72 TractSeg-defined white matter tracts and across all subjects regardless of intervention group, which makes me think it’s a preprocessing artifact rather than a biological effect. I’ve been trying to track down the source and would appreciate any guidance.

Some information:

Post-test FD range across tracts: 0.4223 to 0.6335
Range of difference: -0.0689 0.2984
SD/Mean ratio: 1.162

Study design:

Two-timepoint longitudinal study (baseline and 12-month post-intervention)
~636 participants at baseline, ~480 with post-test diffusion data (attrition)
3 acquisition sites
Multi-shell diffusion data, already eddy/distortion corrected before entering the MRtrix pipeline

Pipeline (both timepoints):

mrconvert → dwidenoise → mrdegibbs → dwibiascorrect ants
mrgrid regrid to 1mm isotropic
dwi2response dhollander (per subject)
responsemean to compute group-average RF
SynthStrip for brain masking
dwi2fod msmt_csd using the group-average RF
mtnormalise
population_template (separate template for post-test data)
mrregister → mrtransform → fod2fixel → fixelreorient → fixelcorrespondence → warp2metric
fixelfilter smooth
Mean FD/FC/FDC extracted per TractSeg tract using mrstats

The problem: Post-intervention mean FD values are systematically ~0.1 higher than baseline across all subjects and tracts. The offset is not uniform, when I plot baseline tract-level mean FD against the offset (post minus pre), I get a strong negative correlation (r = −0.886, p < 2e-16). Tracts with lower baseline FD show larger increases, and tracts with higher baseline FD show smaller increases.

What I’ve investigated:

Different group-average response functions: Because of attrition, the baseline and post-test group-average RFs were computed from different subject pools (636 vs 480 subjects). I tried re-running dwi2fod and all downstream steps for the post-test data using the baseline group-average RF, but this did not substantially reduce the offset.
Separate population templates: Baseline and post-test data were registered to separate population templates (one built from baseline FODs, one from post-test FODs). I’m wondering whether different templates could produce systematically different FD values even when the same RF is used?
mtnormalise: Both timepoints had mtnormalise applied independently. Could differences in the normalization (due to different subject pools or different template masks) produce this kind of multiplicative scaling effect?

My questions:

Given that using the same RF didn’t fix the issue, what else in the pipeline could produce a multiplicative offset (strong negative correlation between baseline FD and the shift)?
For longitudinal FBA, is it necessary to use a single population template built from both timepoints together, rather than separate templates? Could template differences explain this pattern?
Is there a recommended approach for mtnormalise in longitudinal designs? should both timepoints be normalized together, or is independent normalization acceptable?
Are there any other common pitfalls in longitudinal FBA that could produce this kind of systematic offset?

Any advice would be greatly appreciated. Happy to share more details about the pipeline or diagnostic outputs.

Thanks!