Hi Jayson,
The ideas you’re getting into here are strongly related to what I’ve called inter-subject connection density normalisation; I talk about it here, though you’re proposing augmentations thereof that I could not possibly have covered there.
tcksift2 trackfile.tck … sift_weightfactor.txt -act r5TT.nii -fd_scale_gm
If you’re using multi-shell HCP data, you shouldn’t use -fd_scale_gm
; that heuristic is intended for cases where the non-zero GM signal artificially inflates the WM FOD size due to not having separate GM and WM compartments in the deconvolution.
My measure of brainwide streamline density is the mean of all edges in output_invnodevol.csv
I honestly … don’t quite know what to do with this. 
I hope it can be inferred from the article linked above that with appropriate inter-subject normalisation, one does indeed obtain an estimate of “total white matter connectivity” per subject that can be compared across subjects. One would expect this to be inflated in subjects with greater WM volume, greater fibre density per voxel, and additionally depends on the relative pathway lengths. But if, instead of simply summing FBC across all streamlines, you first compute ((2 x FBC) / (V1 + V2))
, and then sum those values across all edges… I just don’t find that quantity to have a dimensionality that is amenable to a summation operation.
The whole “divide by node volumes” thing is IMO based on pretty naive logic: “bigger nodes will have more streamlines, so divide by the volume”. It’s based on more of an interpretation of probability of connectivity, rather than a density of connectivity (which is what the SIFT model aims for); I talk about that conflict a bit here, though I need to at some point finish a public-domain article where I’ll talk about that again. So using the two in conjunction in my own humble subjective opinion does not make a lot of sense.
My measure of brain size (quite indirectly) is the mean of (output_streamlineweights.csv
divided by output_invnodevol.csv
)
I think there’s maybe not been enough thought put into what’s actually being quantified here, prior to analysing the resulting values.
For any given edge, the values of those two matrices is:
output_streamlineweights.csv
: N
(ignoring SIFT2 weights for simplicity)
output_invnodevol.csv
: (2 x N) / (V1 + V2)
The value of their ratio is therefore (V1 + V2) / 2
. By taking the mean of this across all edges, what you’re getting is precisely the average node volume. You could argue that average node volume is a reasonable proxy for brain volume, but the fact that it was obtained in such a roundabout way suggests that it wasn’t appreciated that this is what was going on.
These two measures correlate strongly inversely across subjects (R=-.85)
This depends a great deal on details of the tckgen
call that are currently elided. This is another point that I aim to delve into further in a future article, so I’ll have to try to restrain myself here.
Let us pre-suppose that you generated for each subject a fixed total number of streamlines. Each subject has the same number of streamlines, regardless of brain volume. Let’s assume that 100% of streamlines contribute to the connectome in all subjects. And let’s assume, for the sake of simplicity, that streamlines are distributed precisely evenly across all edges in the connectome, regardless of brain size. If you generate 10M streamlines per subject, each edge contains N = (10M / (400 x 399 / 2)) = 125
streamlines; again, identical for all edges for all subjects.
Your “brainwide streamline density” is the mean of (2 x N / (V1 + V2))
across all edges.
Your “brain size” is the mean of ((V1 + V2) / 2)
.
Recall that N
is a constant.
So ultimately what you’ve discovered is that if you perform a regression between a value and its own reciprocal, you get a strong negative correlation. 
(It’s slightly more complicated in that the former is a sum of reciprocals rather than a reciprocal of sums, but the point stands)
I hope this pushes you in the right direction for a more thorough understanding of how these numerical manipulations influence the final observations.
Cheers
Rob