Connectomestats issue

Dear experts,

I am doing a simple correlation between dwi connectome matrix and a behavior measurement using connectomestats (command: connectomestats input tfnbs design contrast output). However, I got a warning during the analysis.
“design matrix conditioning is poor (condition number: 623.695); model fitting may be highly influenced by noise”

My design matrix is very simple, one column of 1s and one column of behavioral measurements.

So I was wondering what could be the possible reasons for the incorrect model? I didn’t do distortion correction in the preprocessing but applied ACT in tckgen (using -seed_gmwmi) and tcksift. Do you think this could be the reason for the errors or noise?

I would appreciate any of your advice or comment!

Welcome @XL258W!

There’s been a little bit of discussion regarding this warning in this thread, and while I’ve contemplated making changes to the relevant code, I’m still not confident in exactly how I want to classify these things.

Firstly, the design matrix condition number is wholly independent of the data being provided as input to the command; it depends solely on the design matrix.

The purpose of this warning is that it can identify cases where there has been an outright error in construction of the design matrix. However it seems that it can also flag cases where the design matrix is maybe not optimal, but is nevertheless functional. Hence I’m trying to decide whether I should revise the rules around when a warning is or is not issued.

In your case:

My design matrix is very simple, one column of 1s and one column of behavioral measurements.

, my best guess is that you have entered your behavioural measurements as-is (in whatever units they are quantified), and have not demeaned the data. The consequence of this is that the model will “have a hard time” determining, based on the input data values, what to ascribe to the first explanatory variable (the column of ones) and what to ascribe to the second explanatory variable (the behavioural measurements). You could think of this “difficulty” as being quantified via the condition number.

Alternatively, the condition number can be large when the magnitudes of values within different columns of the design matrix vary substantially. E.g. If you had two behavioural measures, but one was quantified in the range [0.0, 1.0] and the other in the range [0, 1,000,000], the condition number would be similarly high, because the model “has difficulty” calculating both the very small rate of change of the input values as a function of the first measure and the very large rate of change of the input values as a function of the second measure.

If my hunch is right, demeaning the behavioural data (and possibly also scaling to unit variance if necessary) should yield a drastically reduced condition number. If that’s the case, please report back to us with your results, as this will hopefully be a good reference for others.


Hi Rob,

Thank you very much for the suggestion and clear explanation. I z-transformed the behavioral measurement and a covariate, and it ran smoothly without warning. The condition number is 1.82185 for this demeaned model. The final results are almost the same between the two models.

I have another question regarding the analysis. In my study, I did whole-brain analysis (since I don’t have any ROI) using 2 approaches, TBSS and the tractography-based connectome analysis. Using TBSS, I found strong effects across the brain. However, no effect was found using connectome analysis (very low corrected 1-p values) using the same sample.

One possible reason I can think of could be that I didn’t do distortion correction (due to no opposite PE) for the dwi images, which will introduce errors for either using ACT or not using it. May I ask whether you have any suggestion on other options or analysis that I can try to improve the results. Do you think fixel-based analysis would work here?

To be honest, this is entirely unsurprising. While failure to perform distortion correction might contribute towards this, if you had used ACT and obtained the same result I would be similarly unsurprised.

It’s a matter of variance. You have some underlying biological effect in your cohort, and you want to detect it. Competing against that is the entire gamut of unwanted sources of variance, from image noise & artifacts, to misregistration, to conditioning of the diffusion model, to the way quantitative information is projected from the individual subject data into the space in which statistical inference is performed, and everything in between. Now if you have one statistical analysis pipeline that is not dependent on subject-specific tractography, and another that is, which is likely to possess more intrinsic variance? While it wasn’t the primary purpose of such, Figure 4 in this manuscript shows just how much variance there is in connectome data (at least when quantifying the density of each connection, i.e. Figure Bundle Capacity (FBC); for alternative “connectivity” metrics the relative variance may differ). Reducing that intrinsic variance technologically is going to be extremely difficult; tractography is just so fundamentally ill-posed.

This is also why my own efforts have been invested moreso in FBA than tractography these last few years…

Do you think fixel-based analysis would work here?

Definitely try it. If you observed a statistically significant difference in TBSS (presumably quantifying FA), then there’s a good chance that FBA will achieve statistical significance.


Thank you very much for the explanation and suggestion! Those are very helpful to me as a beginner.