Invnodevol connectome still depends on brain size


I am finding that even after getting a measure of brainwide streamline density corrected for parcel sizes (with tck2connectome -invnodevol), those values still correlate highly with brain size (across subjects) which doesn’t seem to make sense.

In detail:

  1. I am using HCP diffusion data, unprocessed. 50M streamlines.
    …preprocessing steps completed…
  2. tckgen … -act r5TT.nii -backtrack -crop_at_gmwmi -seed_dynamic
  3. tcksift2 trackfile.tck … sift_weightfactor.txt -act r5TT.nii -fd_scale_gm
  4. tck2connectome trackfile.tck {Schaefer 400 parcellation in volume space} output_streamlineweights.csv -tck_weights_in sift_weightfactor.txt -assignment_radial_search 2
  5. tck2connectome trackfile.tck {Schaefer 400 parcellation in volume space} output_invnodevol.csv -tck_weights_in sift_weightfactor.txt -assignment_radial_search 2 -scale_invnodevol

My measure of brainwide streamline density is the mean of all edges in output_invnodevol.csv
My measure of brain size (quite indirectly) is the mean of (output_streamlineweights.csv divided by output_invnodevol.csv)

These two measures correlate strongly inversely across subjects (R=-.85), so that big-brained subjects have reduced streamline density. This means that comparing streamline density for a particular connection across clinical groups is going to be confounded by brain size. I thought invnodevol was meant to correct for brain size effects? Am I misinterpreting something?



“invnodevol” controls for node-wise volume bias - that is, the bias that larger nodes will have inherently more streamlines connecting to/from them. It may be true that larger brains could have larger nodes, but it will not necessarily control for global effects, such as anything introduced by larger brain volumes.

I would recommend using FreeSurfer to get an estimated intracranial volume, presuming you have access to the T1 images.


So I’ve confirmed that brain size (from freesurfer intracranial volume) correlates strongly with edge weights (streamline counts) after connectome construction. It’s a weird pattern. In particular, the strongest 2% of edges are stronger in bigger brains, and the weakest 98% of edges are weaker in bigger brains. This then goes on to significantly confound most summary weighted network measures.

Do you guys know why this is happening?

Hi Jayson,

The ideas you’re getting into here are strongly related to what I’ve called inter-subject connection density normalisation; I talk about it here, though you’re proposing augmentations thereof that I could not possibly have covered there.

tcksift2 trackfile.tck … sift_weightfactor.txt -act r5TT.nii -fd_scale_gm

If you’re using multi-shell HCP data, you shouldn’t use -fd_scale_gm; that heuristic is intended for cases where the non-zero GM signal artificially inflates the WM FOD size due to not having separate GM and WM compartments in the deconvolution.

My measure of brainwide streamline density is the mean of all edges in output_invnodevol.csv

I honestly … don’t quite know what to do with this. :exploding_head:

I hope it can be inferred from the article linked above that with appropriate inter-subject normalisation, one does indeed obtain an estimate of “total white matter connectivity” per subject that can be compared across subjects. One would expect this to be inflated in subjects with greater WM volume, greater fibre density per voxel, and additionally depends on the relative pathway lengths. But if, instead of simply summing FBC across all streamlines, you first compute ((2 x FBC) / (V1 + V2)), and then sum those values across all edges… I just don’t find that quantity to have a dimensionality that is amenable to a summation operation.

The whole “divide by node volumes” thing is IMO based on pretty naive logic: “bigger nodes will have more streamlines, so divide by the volume”. It’s based on more of an interpretation of probability of connectivity, rather than a density of connectivity (which is what the SIFT model aims for); I talk about that conflict a bit here, though I need to at some point finish a public-domain article where I’ll talk about that again. So using the two in conjunction in my own humble subjective opinion does not make a lot of sense.

My measure of brain size (quite indirectly) is the mean of (output_streamlineweights.csv divided by output_invnodevol.csv)

I think there’s maybe not been enough thought put into what’s actually being quantified here, prior to analysing the resulting values.

For any given edge, the values of those two matrices is:

  • output_streamlineweights.csv: N (ignoring SIFT2 weights for simplicity)
  • output_invnodevol.csv: (2 x N) / (V1 + V2)

The value of their ratio is therefore (V1 + V2) / 2. By taking the mean of this across all edges, what you’re getting is precisely the average node volume. You could argue that average node volume is a reasonable proxy for brain volume, but the fact that it was obtained in such a roundabout way suggests that it wasn’t appreciated that this is what was going on.

These two measures correlate strongly inversely across subjects (R=-.85)

This depends a great deal on details of the tckgen call that are currently elided. This is another point that I aim to delve into further in a future article, so I’ll have to try to restrain myself here.

Let us pre-suppose that you generated for each subject a fixed total number of streamlines. Each subject has the same number of streamlines, regardless of brain volume. Let’s assume that 100% of streamlines contribute to the connectome in all subjects. And let’s assume, for the sake of simplicity, that streamlines are distributed precisely evenly across all edges in the connectome, regardless of brain size. If you generate 10M streamlines per subject, each edge contains N = (10M / (400 x 399 / 2)) = 125 streamlines; again, identical for all edges for all subjects.

Your “brainwide streamline density” is the mean of (2 x N / (V1 + V2)) across all edges.
Your “brain size” is the mean of ((V1 + V2) / 2).
Recall that N is a constant.
So ultimately what you’ve discovered is that if you perform a regression between a value and its own reciprocal, you get a strong negative correlation. :upside_down_face:
(It’s slightly more complicated in that the former is a sum of reciprocals rather than a reciprocal of sums, but the point stands)

I hope this pushes you in the right direction for a more thorough understanding of how these numerical manipulations influence the final observations.


Thank you for the comprehensive reply, and the preprint paper really helped. In addition to what you’ve said so far, I realise we should’ve done intensity normalization, group average or single-subject response function, and multiplied streamline counts by the proportionality coefficient.