Fixelcfestats concerns

rsmith · December 4, 2021, 5:00am

[WARNING] A total of 31890 fixels do not possess any streamlines-based connectivity; these will not be enhanced by CFE, and hence cannot be tested for statistical significance.

This warning keeps coming up for users because it has become more important to define a good fixel analysis mask, but I have not yet made the corresponding change to the online documentation to generate and utilise such a mask. If a fixel is not intersected by any streamlines, there’s a pretty good chance that it’s not of interest to you. Such disconnected fixels have always been present in the absence of an explicit fixel mask, but unlike the original implementation, those fixels are highly detrimental if included. Maybe I should just change the warning message to “recommend using -mask” and it won’t get reported so much?

Design matrix conditioning is poor (condition number: 699.547); model fitting may be highly influenced by noise.

Firstly, I’m probably using the wrong linear algebra metric to assess the conditioning of the design matrix (quantification of which has successfully caught multiple different user errors in the past) (GitHub issue). I think I’ve seen other softwares quoting the estimability of individual factors rather than the matrix as a whole, which might help identify where poor conditioning is or is not consequential. Secondly, there’s an arbitrary thresholding issue in terms of whether that quantification is simply reported at the command-line or escalated to a WARNING-level message. But explaining whether or not it’s a problem and what to look for requires an understanding of of the relevant linear algebra. So it may be another example where I’ve programmed a precise message that doesn’t serve the purpose for which it was intended…

My last concern is that I run into out of memory issues after devoting 256 GB of memory to the process. Do you have any advice for memory concerns in large cohorts?

Do you happen to be getting a console message (not warning) regarding the presence of non-finite values in the data? This engages a different GLM implementation, and it seems post-3.0.0 that my implementation of such is failing to re-use RAM across permutations. Regardless, you could try the code here, which changes the memory handling in either case.

I have since changed my design matrix / contrasts to be:

These designs explicitly enforce a zero intercept. My suspicion is that you do not explicitly expect to observe an FD of zero when your cognitive metric is zero. I would advise all to only exclude the global intercept column if you genuinely understand the ramifications of such.

I have read somewhere in here that when values range too differently across variables (e.g. from 0 to 1 for you cognitive metrics and between -100 and 100 for you gender nuisance) you will get rank deficiency, and the effect found is driven by noise.

Poor conditioning and rank deficiency are not quite equivalent. The condition number is kind of like the precision of the system: if values were to change, how stably or unstably would the system respond to that. Having values in different columns that are of drastically different magnitudes can hurt this because of finite machine precision influencing the intermediate calculations. If two factors become quite collinear / have a high covariance, it becomes harder to determine what to attribute to one factor vs. the other. As they become perfectly collinear, the condition number goes to infinity as it’s impossible to solve those two factors unambiguously; this is rank deficiency.

I know this will effect interpretability of beta values, but is this a valid approach?

Yes; indeed recommended if anything (I’ve even contemplated automatically doing this transformation internally in the MRtrix3 code). Historically, I’ve done this manually, and then if I’m interested in beta coefficients, I simply apply the reverse transformation to get from “rate of change of exploratory variable with respect to column 1” to “rate of change of exploratory variable with respect to variable of interest from which column 1 was generated”.