Expansion on the not-related-to-seeding bias:
Description of this specific bias originates from targeted tracking experiments, where if a specific seed point were genuinely biologically connected to two different ROIs, but one was very close to the seed whereas the other were very far, then the number of probabilistic streamlines reaching the former ROI would be greater than the latter, as the streamlines would have fewer opportunities to deviate from the genuine underlying trajectory.
Firstly, this is a fundamentally different interpretation of “connectivity” than what we use in MRtrix3 world. The type of experiment described above purports to provide a “probability of connectivity” between the seed location and the target region, based on the number of seeded streamlines that reach said target region. What SIFT / SIFT2 (and others) purport to provide is not a measure of probability of connectivity, but density.
These two interpretations are actually incompatible even before you get to the interpretation. The first is only possible based on targeted tracking (and to be fully precise, is only appropriate if seeding from an infintessimally small seed location, i.e. all streamline seeds are precisely the same). The second - at least for the quantification techniques referenced - are only possible based on whole-brain tractography (as one must be able to compare the relative streamlines densities in different locations in the brain with the relative fibre densities / diffusion-weighted signals in those locations).
So let’s consider the “probability of intersection” of ROI pairs in the context of whole-brain tractography. Moreover, let’s ignore streamline seeding effects, and let’s even ignore the specifics of SIFT / SIFT2. But we need more regions.
--- B --------------- E
/ /
A --- D ---------------
C F
A is connected to B but not C; D is connected to E but not F. But when we do a streamlines reconstruction, we’re not going to reconstruct the precise underlying trajectories perfectly: some streamlines intersecting A will intersect C instead of B; some streamlines intersecting D will intersect F instead of E.
Because the pathway from A to B is short, probably most streamlines intersecting A will successfully reach B, and few will instead go to C. Because the pathway from D to E is long, probably many (as the length increases, approaching 50%) of the streamlines intersecting D will hit F instead of E.
Now, are “tracts are more likely to be connected to closer ROIs”? No: D-E will most likely still be a more dense connection than A to C (assuming the number of streamlines intersecting A is equal to the number of streamlines intersecting D), despite A-C being shorter than D-E. It’s also not as simple as a “bias toward shorter tracts”, since D-F is estimated to be more densely connected than A-C despite being longer.
I have on a number of occasions described this as a “distance-dependent blurring of the connectome”. Long streamlines are more likely to traverse erroneous trajectories. If a region of interest X is biologically connected specifically to some regions and not others, then the further those regions are from X, the more the tractography-estimated connectivity from X that should ideally be attributed to a subset of those regions will be instead more evenly distributed across those regions.
The above is of course assuming probabilistic tractography. For deterministic, as the distance increases, it is not that the estimated connectivity becomes erroneously non-specific, but it becomes increasingly likely that the connectivity will be attributed specifically but erroneously to the wrong regions.