SIFT2 and fiber count

Hi again MRtrix experts,
I wanted to check if I got all the procedures and concepts right before reporting.
I have multishell data with reverse phase encoding for 97 subjects, that I corrected and then normalized following the procedures on the DWI Pre-processing for Quantitative Analysis documentation.

Based on ACT, I performed dwi2fod msmt_csd, then tckgen with iFOD2, then SIFT2. With the whole tck, I wanted to obtain 3 pairs of tracts, so I used tckedit 3 times with -tck_weight_in and -tck_weight_out, and 2 -includes with sphere ROIs of radius 12 (and -ends_only option).

I am interested in obtaining an indicator of ‘connectivity’ between the ROIs, to be used in regression analysis, and I read that after normalization and SIFT, I can use the fiber count directly (100 for subject 1, 120 for subject 2). But, with SIFT2, should I do a weighted sum? i.e. sum of all the weights in -tck_weight_out of the tract? Is there something else I am missing? With this objective in mind, when does AFD make sense?

In this scenario, what would be your recommendation for the tckgen -num option? I was using 500,000 but I think it is not enough for what I am reading…

Thank you very much again for this great tool and your responses,

Hi Gari,

Some points:

I read that after normalization and SIFT, I can use the fiber count directly

Essentially. By doing the intensity normalisation / group average response function steps in pre-processing, you guarantee that “one unit of AFD” is equivalent across subjects. However here we actually want to make “one unit of connection density” equivalent between subjects. Matching the total number of reconstructed streamlines across subjects (in combination with the AFD normalisation) is adequate for this; at least until I eventually demonstrate properly in an article how it should be done (and therefore you don’t get stuck trying to justify it in your own manuscript).

Sorry, I should have published that one a long time ago…

But, with SIFT2, should I do a weighted sum?

Yep. Well, I prefer “sum of weights” rather than weighted sum. Recall that each streamline “weight” is something proportional to cross-sectional area, so it makes sense to add these up for the total pathway. With “streamline count”, it’s essentially the same process that’s happening, only that each streamline has unity weight.

In this scenario, what would be your recommendation for the tckgen -num option? I was using 500,000 but I think it is not enough for what I am reading…

This is always a difficult one: I’ve given a similar answer a few times, but I’m not sure I should be adding it to the documentation FAQ given it’s really an experimental question rather than a software question (maybe we need a forum FAQ?)

The streamlines tractography reconstruction is stochastic and discrete, and therefore repeating tracking and SIFT2 on the same image data will give slightly results. The less streamlines you generate, the more variance that stochastic behaviour will contribute to your result, and the more the fact we have discrete streamlines rather than a continuous field of connectivity contributes to the quantisation of the result.

E.g. If you get 10 streamlines in one subject, and 12 in another, do you trust that to be a genuine difference? What about 1,000 v.s. 1,200?

Personally the only recommendation I can make is to actually do this experiment, see how much variance you get from repeating the tracking on the same image data, and ideally contrast this against scan-rescan variance if possible. If there’s computation time limitations as well, incorporate that into the final decision as you see fit. But it’s better to have some understanding of the influence of such parameters, rather than picking a number and hoping for the best.


Thank you very much for your answer.
Yes, you are right, in this case is a sum of weights :slight_smile:

I have repeated scans, so I am going to use 2M and 10M (I already have 500K) and I will post the results here, maybe they will be useful for somebody else.

Thanks again!

Hi again @rsmith,
I obtained whole tractograms with 500k, 2M and some subjects with 5M (I have some space problems to solve… I will post all results once I solve that).

As a check, I read the whole 2M fibers’ weight and sum them, and the variability is small (mean:2180042, sd:53226.64, sd is 2.5% of the mean).

As explained above, I created some tracts with spheres and obtained the sum of weights (the data below is for one tract obtained with the 2M tractogram). The variability is huge (mean: 784.43, sd:554.6892, sd is 70% of the mean, min:102.2971, max:3450.907, removing outliers with sd=2 only affects to 3 subjects, new mean = 701, new sd = 384, 55%).

I have duplicated data for 31 of the 66 subjects, so I did some correlations:

  • Whole tractogram (day1 vs day2): cor coef = 0.37 (p = 0.04) *
  • Tract1 created with spheres (day1 vs day2): cor coef = 0.7 (p=1.223e-05) ***
  • Correlation of tract1 with tract2: cor coef = 0.11 (p=0.26)

In the image below tract1 for the maximum and the minimum cases (day1 and day2) are shown:

So, it seems that sift2 is working well, it has high intrasubject test-retest reproducibility. But, I think I can’t compare the tract values considering the huge intersubject variability? What would be the variability you would consider ‘normal’? Is this something that could be solved with 5M or 10M fibers?
I think the problem is the location of the spheres: they have been placed as a surface coordinate transformed from an average space. Do you think that this could be the problem? In your experience, what would be your approach here? The creation of cortical ROIs instead of the spheres could be a solution?

Thanks again for your help!


There’s a lot going on here, and it’s getting beyond the realm of software and more into research. I would strongly recommend reading this paper, including the supplementary material (which in hindsight should have been an appendix…). There I disentangle what factors contribute to reproducibility in the connectome (which, ultimately, can be thought of as just an exhaustive list of tracts of interest).

You’re right in that the inter-subject variability is quite high; and it probably always will be in diffusion MRI tractography. Getting consistent endpoint-to-endpoint tracking with a streamlines algorithm and noisy image data is hard. I show in that paper that applying SIFT actually decreases inter-subject variability in the connectome; so the variance isn’t coming from the SIFT algorithm, but the tracking itself. The data in the supplementary material additionally shows (at least in the intra-subject, scan-rescan case) that the majority of that variance comes from the diffusion imaging & modelling.

Obviously how the tracking targets / parcels are defined could have an influence on track selection and hence variance. I’m not 100% sure exactly what you’ve done; but if you’ve taken a surface coordinate in average space and performed an affine transformation to subject space rather than transforming to the subject-specific surface, that will influence the GM-WM interface surface area encapsulated by that sphere between subjects, and hence directly affect the number of streamlines selected. A cortical ROI with a surface-based transformation would probably be more robust.

However you choose to proceed here, bear in mind that when using ACT, streamlines are terminated right at the GM-WM interface; so if you define a ROI that’s entirely within the cortical surface, streamlines won’t actually intersect that ROI, and hence tckedit won’t select those streamlines the way you expect it to (we’re working on better mechanisms for this…). tck2connectome by default uses a simple heuristic to account for this offset.


1 Like

Thanks Rob,
I will continue with the testing.

I used freesurfer to do a surface to surface transform (mri_surf2surf) and then used the white matter surface correspondence to volume coordinates to select the center of the sphere, but I am afraid that in some cases this coordinate can fall in a sulcus and others in a gyrus. I will have to test if there is a correspondence between this and the number of fibers.

Nevertheless, I think the way to go is to create surface ROIs first, and then create the volumetric ROIs (I usually sample 1-2 mm below the surface to avoid the problem you mention). In this case I can obtain the depth or curvature of the ROI to know if it lies in a sulcus or a gyrus and check its correlation with the fiber sum of weigths.

Thanks again for your feedback, I will post results as soon as I have them. In the meantime, any comment will be welcome :slight_smile:

1 Like