Longitudinal FBA: small absolute / large standard effect

Hi MRtrix’ers,

Following the paper of Genc et al. (2018), we performed a longitudinal FBA comparing an intervention with a control group. In this analysis, the population template is generated in a two-step approach: first affine registration to the intra-subject template followed by a nonlinear registration from the intrasubject to the group template. To assess longitudinal changes in FD, log-FC and FDC within and between both groups, we subtracted each time-point 1-image from the time-point 2 and time-point 3 images and divided this difference images by the time interval (in weeks), covarying for age and TIV.

We found a significant effect of logFC in 3 fixels in the mPFC (pFWE<0.05).
When looking at the absolute effect sizes (mrstats abs_effect.mif -mask fwe_signmask.mif), you can appreciate that they are very small:
volume mean median std min max count
[ 0 ] 2.06935e-08 1.97287e-08 3.17442e-09 1.81134e-08 2.42384e-08 3

However, the standard effect is much bigger:
volume mean median std min max count
[ 0 ] 1.84473 1.89786 0.153643 1.67158 1.96476 3

If we calculate the percentage effect as described on Expressing the effect size relative to controls — MRtrix 3.0 documentation we get the following result:
volume mean median std min max count
[ 0 ] 0 0 0 0 0 3

We’re not sure how to explain these null findings, while we did find a significant difference between both groups using fixelcfestats.

Thanks in advance!

Best,
Michelle and @JeroenBlommaert

1 Like

Hi Michelle,

I’m rather sceptical of the numbers you’re reporting, so would encourage some very critical assessment of the data as a whole.

What those absolute effect statistics are claiming is that, in that region, the rate of change of log(FC) with respect to time in weeks is ~ 2e-08. So over the course of a year, log(FC) is predicted to change by ~ 1e-06. That’s a tiny number, well below the precision of image registration that the FC metric is derived from. Yet this is supposedly the region with a statistically significant effect. So something foul is afoot.

  1. You need to confirm that your usage of the fixelcfestats command is faithful to your intended hypothesis test. There are a lot of ways in which a GLM other than the correct one can be invoked that won’t result in an error, but produce all sorts of weird outputs.

  2. Check the raw fixel data input, make sure that they make sense and are within the expected numerical ranges.

  3. Make sure that you are performing fixel data smoothing. In previous software versions, this was performed internally within the fixelcfestats command; as of 3.0.0, this is instead done prior to fixelcfestats using command fixelfilter. Failing to do so can result in very small statistically significant regions, as there are greater opportunities to obtain very large test statistics due to large effect / small variance by chance alone.

    Indeed based on your data I would suggest that it is an exceptionally small standard deviation in those fixels that has led to the significant result. You will need to try to determine whether that is due to chance, or maybe some other factor in your processing has resulted in your subjects containing equivalent or almost-equivalent values in those fixels. You could address whatever caused such, or you could omit those fixels from your statistical inference.

  4. Look at the template streamlines tractogram in the region of the significant result, and compare it to the fixel analysis mask. Fixels that possess very little fixel-fixel connectivity can be problematic due to intrinsic non-stationarity correction (essay on forum). Deriving a fixel mask for statistical inference that necessitates adequate fixel-fixel connectivity can be beneficial; indeed I’ll likely explicitly recommend this when I get around to revising the pipeline documentation.

If we calculate the percentage effect as described we get the following result:
volume mean median std min max count
[ 0 ] 0 0 0 0 0 3

I can only hypothesize that this is a floating-point precision issue. That calculation for log(FC) only depends on the absolute effect, nothing else. Those calculations should all be being done using double-precision, in which case I’d have expected a non-zero result; but if the numbers bottle-neck to single-precision at any point, then e^(2.0 x 10^-8) will be so close to 1 that the result of the total computation is zero.

Rob

Hi Rob,

thanks for the quick response, we have been fighting a bit with this analysis already :sweat_smile:
Hopefully your tips can give a clue where to look.

I was wondering about the interpretation of FC in this analysis.
The longitudinal analysis, as described by genc et al., first uses affine registration for the intra-subject registration and non-linear registration for inter-subject registration.
For log-fc this means that the assessed longitudinal differences in log-fc are based on differences in the Jacobian of the affine registration, since the non-linear part will be the same.

How should I interprete this?

Kind regards,

Jeroen

Hi Rob,

Thank you for your input!

We double checked whether we correctly performed fixelcfestats and fixelfilter, like your suggestions in (1) and (3), and this seems to be the case.

We already accounted for the issue you propose in your fourth suggestion, as we performed the analysis with a mask where fixels were traversed by at least 150 streamlines.

The problem seems to be in the difference images that we computed as input for the statistical analysis.
The raw log-FC values are within the range of -1.08 to 1.02.
The change scores of log-FC are within the range of -9e-08 to 1.1e-07. These values are thus a lot smaller than the ones of the raw data.
Side note: the raw FD values range between 0 and 1.81. The change scores of FD range between -0.1 and 0.1.
Is it possible that the values in log-FC are driven by the affine transformation like @JeroenBlommaert suggested and that this is causing the small differences? Or do you think something else is at play?

Any suggestions about how to deal with this issue?

PS: Is it normal that for each person the minimum FD is 0 (as long as this is only the case for a few fixels each time)? Or is this something we should look into as well?

Best,
Michelle

Hi Jeroen / Michele,

There’s a principal issue with FC that I think stems from a misunderstanding of the referenced text. Quoting @sgenc:

In order to build an unbiased longitudinal template, we selected 22 individuals (11 female) to first generate intra-subject templates. For each of these individuals, the time-point 1 and time-point 2 FOD maps were rigidly transformed to their midway space and subsequently averaged to generate an unbiased intra-subject template. The 22 intra-subject FOD templates were used as input for the population template generation step. Following generation of the population template, each individual’s FOD image was registered to this longitudinal template, …

Let’s enumerate this for clarity:

  1. Template generation:

    1. Select a subset of 22 individuals

    2. For each of these individuals:

      1. Perform rigid-body registration between two time points

      2. Compute mean of two time-points in midway space

    3. Use non-linear registration of results of 1.2.2 to produce template.

  2. Transformation of data to template:

    1. For every individual:

      1. For both time points:

        1. Perform non-linear registration of image to template

        2. Transform FOD data to template space

        3. Segment FODs into fixels in template space

        4. Compute FC for that time point based on results of 2.1.1.2-3.

While for a subset of of subjects, a rigid-body averaging of two time points is performed (1.2.2), this is done for template construction only. When it comes to producing quantitative data in template space (2.1.1.3-4), this is done independently per time point.

If you have utilised some other pipeline structure, where FC is derived from a composition of a within-subject transformation and then a transformation of a per-subject mean to the template, then yes, the difference in FC between the two time points will be driven entirely by any non-rigid component of the within-subject transformation. So if that’s what you’ve done, that’s almost certainly what’s leading to your suspiciously small values.


Side note: the raw FD values range between 0 and 1.81. The change scores of FD range between -0.1 and 0.1.

PS: Is it normal that for each person the minimum FD is 0 (as long as this is only the case for a few fixels each time)? Or is this something we should look into as well?

This is quite common. If, for any given template fixel, when establishing fixel correspondence for a particular subject, there does not exist a fixel in that subject whose orientation is within 45 degrees of the template fixel, then the value of FD for that subject in that fixel will be zero. I seem to recall this thread some years back being the first public reporting of such, but it’s been in the back of my head for a long time. I’m hoping that this effect can be mitigated in the future (:crossed_fingers: for 3.1.0), using a combination of reduced fixel segmentation thresholds, more sophisticated fixel correspondence, and fixel-wise regressors / subject exclusion. For now you could probably think about excluding from the analysis mask those fixels for which the number of subjects with FD=0 is too large.

Rob

Hi @rsmith ,

It is indeed this misconception that we got wrong.

However, if you want to ensure longitudinal effects to stem from intrasubject differences then the only way this can be ensured is through a 2-step registration (non-linear to within-subject template + non-linear to general template.
As I understand from your explanation, this is not the case for the longitudinal pipeline proposed by genc et al… There, the longitudinal design originates solely from the template creation, but registration is performed in the traditional FBA pipeline way. Is this correct? Also, what is the reasoning to go for the 1-step registration instead of the 2-step approach? Of course this time doing both steps using non-linear deformation.

Anyway, thanks a lot for your help so far!

Cheers,

Jeroen and @Michelle

However, if you want to ensure longitudinal effects to stem from intrasubject differences then the only way this can be ensured is through a 2-step registration (non-linear to within-subject template + non-linear to general template.

I’m not sure I would say so definitively. Imagine that you have images of newborns, then the same subjects imaged at 18 years. You generate some template, and then independently register each time point to that template. You then quantify FC and find a huge longitudinal difference between the newborn and 18-year time points. Would it be accurate to say that one “would not be sure if that effect stemmed from intra-subject differences”? Perhaps if you literally did a group-wise comparison, completely ignoring the fact that the same subjects were included in both time-point “groups”, one could philosophically make that argument, even if absurd for this extreme example. But that’s also not what’s done in @sgenc’s paper. The within-subject difference over time is explicitly computed, and it is these values upon which statistical inference is performed.

Fundamentally what I think you’re suggesting is that, instead of:

  1. Non-linearly register each time point independently to template;
  2. Quantify FC for each fixel in template space independently for each time point;
  3. Calculate the difference over time per subject;

, one could instead:

  1. Non-linearly register the two time points per participant to one another;
  2. Non-linearly register the participant template to the group template;
  3. For each fixel in the group template, quantify FC from step 1 at the spatial location determined by step 2.

This is theoretically possible. It would depend on the trustworthiness of non-linear image registration between time points, for which the intrinsic variance will differ from that of two independent non-linear registrations to a smooth group template. You would also need to decide whether FC gets quantified based on the group-template fixel reoriented into participant template space, or based on the orientation of the “corresponding” fixel in the participant template. I would expect that @Dave probably made this decision at the time based on the existing VBA literature, but I’m not as familiar with that literature myself.

Maybe there’s a student project in here?

Note also that your proposal would be highly dependent on the registration process being perfectly symmetrical. If, for every subject, you were to asymmetrically register time point 2 to time point 1, quantify within-subject longitudinal changes from that, then warp those data to a group template and perform statistical analysis, any statistical effect observable across the group could be a manifestation of internal biases of the asymmetric registration algorithm and/or the presence of image interpolation effects at one time point only.

There, the longitudinal design originates solely from the template creation, but registration is performed in the traditional FBA pipeline way. Is this correct?

Well, the experiment overall has a longitudinal design, both in data acquisition and analysis. It’s a question of where the nature of that experimental design comes into play relative to image registration. One could I suppose indeed say that in that study “registration was performed in the traditional FBA pipeline way”, but that is only because deviation of the example pipeline to handle longitudinal data occurred primarily (i.e. ignoring details of group template creation) after registration to group template (by explicitly quantifying within-subject differences between time points) rather than before registration to group template.

I hope that clarifies rather than obfuscates :grimacing:
Rob

Hi @rsmith ,

this was indeed more and less the method I was insinuating

  1. Non-linearly register the two time points per participant to one another;
  2. Non-linearly register the participant template to the group template;
  3. For each fixel in the group template, quantify FC from step 1 at the spatial location determined by step 2.

I would avoid the non-symmetries in registrations by performing step1 to an intrasubject template instead of the first time point. Also, I might simplify step 3 by calculating FC on the combined registration of steps 1 and 2.

However, I do resonate with your concerns that we should check the stability of this process. Such analysis would sadly be beyond the scope of our current research project. And indeed, the main innovation (method-wise) of the paper by genc et al. are the necessary adaptations made in the statistical model.

For now we will thus stick with the previous approach, but your input has really helped us gaining a deeper understanding of the methodology.

Kind regards,

Jeroen & @Michelle