Hi Andy,
I’ve had a go at comprehensively explaining these concepts in this preprint. I’ve learned over time that trying to explain one piece of the puzzle or the other never really worked, and the same questions kept repeating, so I’m hoping that that article helps people to properly understand what’s going on here by covering all bases at once.
In the absence of any special commandline options in the tck2connectome
call, the value in each edge of the connectome will be the sum of the weights of those streamlines ascribed to that edge. This has intrinsic dimensionality of L^{2}. When properly normalised (as explained in the linked article), the units are in fact AFD/mm; which looks kinda weird because of the quantitative properties of the AFD measure, but it’s a fibre crosssectional area.
Lastly, while most of the numbers from the connectome seem reasonable (i.e., within the range of 01), a few of the numbers are greater than 1 (e.g., 1.03.0). Is this normal?
The purpose of the proportionality coefficient in SIFT2 is that a “typical” streamline will have approximately unity weight. So normally the magnitude of the values would be comparable to that of a matrix generated using raw streamline count. Can I therefore conclude that you are already normalising by the proportionality coefficient?
If that’s the case, then sure, it’s perfectly reasonable to have some connectome edge values greater than 1.0. Consider as follows. If you have a bundle that is precisely 1mm voxel x 1mm voxel in crosssection, every voxel in that bundle has a single fixel with a fibre density of 1.0, there is no partial volume with any other bundle, then the resulting structural connectivity measure (termed Fibre Bundle Capacity (FBC) from now on) will be 1.0. So if you’re obtaining values greater than 1.0 in your connectome, that simply means that there are some edges for which the structural pathway is wider in crosssection than an image voxel (or 1mm^{2}, depending on how stringently you follow the instructions in the preprint); which we know a priori to be the case in DWI data.
If they need to all be scaled to be within a range of 01 to be used with an external graph theory program, would there be any problem with dividing all of the numbers by a constant (such as 10) to reduce the absolute values but keep the relative proportions constant?
This all depends on the nature of the subsequent calculation.
Consider an absurd case, where you choose this number independently for each subject, and then want to compare the raw FBC values in individual edges across subjects: clearly here, the fact that that scaling operation (and how it’s being done) is occurring is of great importance.
Now consider the converse extreme case, where some graph theory metric arbitrarily demands that all values lie within the range [0.0, 1.0], but as long as the data satisfy that requirement, you get precisely the same answer regardless of the exact magnitude of the multiplicative factor you choose: here the scaling is of no consequence, as long as you get the maximum value in the connectome below 1.0.
Almost certainly, what you’re doing lies somewhere between these two extremes. But exactly where it lies within that spectrum is important, given the stark difference in outcomes at either end of the spectrum.
So some questions to consider:

Why are the values capped between 0.0 and 1.0? Is this because the operation is intended for functional correlation data, and hence is it even appropriate for structural connectivity data (particularly given the wildly different distributions of edgewise connectivity values)?

Is it an arbitrary constraint put in place by the programmer? Would the internal calculations actually work if values exceeded 1.0?

Does the outcome of the analysis vary wildly depending on the multiplicative factor you apply to force the values within that range? If so, you need to very carefully consider every factor that contributes to the determination of that scaling.

If you were to determine this scaling factor independently for each subject, would this introduce unwanted intersubject variance, due to the calculation of that scaling depending on the maximal value of FBC throughout the connectome for each subject?

Is there some other way that you could e.g. determine a single multiplicative factor to use across all subjects, perhaps with some truncation of values that are still above 1.0?

For an analysis that is intended to operate on data that are constrained as described, would the analysis be betterbehaved if you were to use the logarithm of FBC in each edge, rather than FBC itself? Bear in mind that FBC will typically vary by ~ 5 orders of magnitude across different edges, which an analysis designed to work on data distributed between 0.0 and 1.0 is almost certainly not intended to handle.
Plenty of food for thought there…
Rob