Fixelcfestats concerns

Hello,

I am running a whole brain FBA on a large subject cohort (n~700), looking for correlations between logFC and a cognitive score, controlling for sex and handedness (age is accounted for in the cognitive score). My template was made from 40 subjects chosen to represent a wide range of ages among participants.

I get two warnings. The first was that [WARNING] A total of 31890 fixels do not possess any streamlines-based connectivity; these will not be enhanced by CFE, and hence cannot be tested for statistical significance. I’ve seen this discussed elsewhere, but this number seems especially high, given there are 517176 fixels in the template space. Is this something I should be particularly concerned about, and if so, what do you suggest for QC?

The second, is Design matrix conditioning is poor (condition number: 699.547); model fitting may be highly influenced by noise.
My design matrix is a text file in which each line looks like:
“0 METRIC COVAR1 COVAR2” (without "s, values filled in) and is specific for a subject.
And the contrasts is a one line text file containing: “0 1 0 0” (without "s) to run a regression against the metric at the fixed level of two covariates.
Similar to the first point, is this warning something I should be concerned about?

I did my best to do manual QC given the large cohort size. I ran TractSeg on the FOD template and got good looking results. Before beginning FBA I removed subjects with poor brain masks (missing brain, or brain mask not defined), and after warping to template I removed subjects with poor warps to template space. The 2million count SIFT-ed tractogram looks complete too (see pic below).

My last concern is that I run into out of memory issues after devoting 256 GB of memory to the process. Do you have any advice for memory concerns in large cohorts?

Any guidance is appreciated, thank you.
Steven

I’ve ran fixecfestats with a subset of subjects and 1000 permutations and it didn’t crash! I thresholded the resulting index.mif file by the fwe_1mpvalue image at 0.95 to visualize only significant results. Everything looks great! However, I want to see if the relationship between log(FC) and my behavioral metric is positive or negative. Given that my contrasts/design matrix, as explained above, were organized as [0 1 0 0] / [1 METRIC COVAR1 COVAR2], I would expect that beta1.mif (indexed from 0) would correspond to the beta weights corresponding to the column I am interested in. However, the effect sizes are really small, with beta ranging from about -.001 to .003. Is my intuition about the beta1.mif file correct?

EDIT: Given the probably small values of FC, it probably makes sense for the beta values to be small.

Thanks,
Steven

Hi Steven,

I wouldn’t be overly concerned about this. It’ll depend on what thresholds were used in the fixel segmentation step (fod2fixel), and in the tractography step. For example, if fixels are included in the fixel mask that have smaller amplitudes than allowed in the tractography, it’s not surprising to find many fixels without any corresponding streamlines, which would give you this warning.

If you’re really concerned about this, you could try using the suggestion from @rsmith, and generate an alternative mask using tck2fixel & mrthreshold, and then use mrcalc to check the difference between the mask you used and the tractography-derived one to see which fixels were excluded, and satisfy yourself that they’re not in locations you’d consider important for your analysis.

Is your first column genuinely filled with zeros? If so, that would definitely lead to extremely poor conditioning!

I assume (hope?) this column is actually coded as zeros & ones to represent group membership? If so, then the poor condition number is unexpected, and suggests that at least two of your columns are very strongly correlated. Definitely worth investigating if that’s the case…

Finally, if you have a column of zeros & ones, you’ll probably want to add a column of ones in there. I’d also suggest you might want to change that group membership column to -1 & 1 respectively, as that tends to help with interpretation of the coefficients. See this response from @rsmith for a more detailed explanation of these issues.

Actually, I just spotted that your later post suggests this is actually a column of ones (is that correct)? In which case, it’s still very much worth trying to figure out what is causing the rank deficiency. Assuming your columns are [ 1 cog.score sex handedness ], then I really wouldn’t expect such strong correlations between the columns…

Wow. Not sure what to do here, that’s a lot of subjects! @rsmith has already done a huge amount of work to reduce memory requirements, so assuming you’re using the latest version of the code, I’m not sure what more we can do. We’ll need to think long and hard about this… But it’s a least reassuring that it runs for a subset of subjects.

Given the way your design matrix is constructed, your beta coefficients should correspond to the impact of a unit increase in cognitive score on logFC. So it implicitly depends on the units of your cognitive score, and what kind of range of values you expect to see. For example, if you expect to see differences in cognitive score of the order of 30 points in your cohort, then that corresponds to a e.g. 0.003 × 30 = 0.09 difference in logFC (which translates to a e0.09 ≈ 10% change in FC). Hard to know what’s right without knowing more about your cognitive score – but hopefully this will be enough for you to figure out whether your values are reasonable.

Worth also pointing out that these effect sizes are probably not to be trusted until you fix that rank deficiency issue…

Hi @jdtournier,

Thanks for the reply!

I looked at some results and found that the coverage was what I suspected given the brain mask, so I think it is safe to ignore it.

I have since changed my design matrix / contrasts to be
[cog.metric sex handedness ] / [ 1 0 0 ] for FD analyses and
[cog.metric sex handedness logICV ] / [ 1 0 0 0 ] for FC and FDC analyses
That is, I removed the mean intercept term (sorry for the confusion of 1s vs. 0s before :grimacing:). There is no group column since I am just running a regression against the cog metric at fixed levels of the confounds. Might be worth noting that handedness is not binary, but a score from -100 to 100 based on a handedness questionnaire.

The matrix conditioning factor is still poor (~250), but better than previously reported.
I’ve made sure the variables are encoded correctly, so I am unsure how to further debug. ICV does share some variance with the cognitive metric and sex variables.

Also thanks for the explanation of the beta coefficients! The small values make sense given the difference in scales of the cognitive score and log(FC).

In regards to memory, there might be some bootstrapping approach I could finagle together to study the whole population, but for now working with the subsets of subjects is working well for me.

Hi! I solved this problem demeaning the covariates. I have read somewhere in here that when values range too differently across variables (e.g. from 0 to 1 for you cognitive metrics and between -100 and 100 for you gender nuisance) you will get rank deficiency, and the effect found is driven by noise.

Hope it is useful,

Best

2 Likes

Good idea, will give this a shot and report back!

For each design matrix column, I demeaned and rescaled to unit variance, and now the conditioning factor is only 2! I know this will effect interpretability of beta values, but is this a valid approach?

Thanks,
Steven

No column of ones? I think you need it – otherwise your model can’t cope with no effect. I.e. if there is no effect of cog.metric, sex, handedness or logICV (their coefficients are zero), then the model predicts a value of zero for FD/FC/FDC. Having that column of ones allows the model to use that coefficient as the overall mean – which I really think you need here.

Great to hear! Yes, this is a valid approach. As you suggest, it just means you need to be careful when interpreting the beta coefficients.

1 Like

[WARNING] A total of 31890 fixels do not possess any streamlines-based connectivity; these will not be enhanced by CFE, and hence cannot be tested for statistical significance.

This warning keeps coming up for users because it has become more important to define a good fixel analysis mask, but I have not yet made the corresponding change to the online documentation to generate and utilise such a mask. If a fixel is not intersected by any streamlines, there’s a pretty good chance that it’s not of interest to you. Such disconnected fixels have always been present in the absence of an explicit fixel mask, but unlike the original implementation, those fixels are highly detrimental if included. Maybe I should just change the warning message to “recommend using -mask” and it won’t get reported so much?

Design matrix conditioning is poor (condition number: 699.547); model fitting may be highly influenced by noise.

Firstly, I’m probably using the wrong linear algebra metric to assess the conditioning of the design matrix (quantification of which has successfully caught multiple different user errors in the past) (GitHub issue). I think I’ve seen other softwares quoting the estimability of individual factors rather than the matrix as a whole, which might help identify where poor conditioning is or is not consequential. Secondly, there’s an arbitrary thresholding issue in terms of whether that quantification is simply reported at the command-line or escalated to a WARNING-level message. But explaining whether or not it’s a problem and what to look for requires an understanding of of the relevant linear algebra. So it may be another example where I’ve programmed a precise message that doesn’t serve the purpose for which it was intended…

My last concern is that I run into out of memory issues after devoting 256 GB of memory to the process. Do you have any advice for memory concerns in large cohorts?

Do you happen to be getting a console message (not warning) regarding the presence of non-finite values in the data? This engages a different GLM implementation, and it seems post-3.0.0 that my implementation of such is failing to re-use RAM across permutations. Regardless, you could try the code here, which changes the memory handling in either case.

I have since changed my design matrix / contrasts to be:

These designs explicitly enforce a zero intercept. My suspicion is that you do not explicitly expect to observe an FD of zero when your cognitive metric is zero. I would advise all to only exclude the global intercept column if you genuinely understand the ramifications of such.

I have read somewhere in here that when values range too differently across variables (e.g. from 0 to 1 for you cognitive metrics and between -100 and 100 for you gender nuisance) you will get rank deficiency, and the effect found is driven by noise.

Poor conditioning and rank deficiency are not quite equivalent. The condition number is kind of like the precision of the system: if values were to change, how stably or unstably would the system respond to that. Having values in different columns that are of drastically different magnitudes can hurt this because of finite machine precision influencing the intermediate calculations. If two factors become quite collinear / have a high covariance, it becomes harder to determine what to attribute to one factor vs. the other. As they become perfectly collinear, the condition number goes to infinity as it’s impossible to solve those two factors unambiguously; this is rank deficiency.

I know this will effect interpretability of beta values, but is this a valid approach?

Yes; indeed recommended if anything (I’ve even contemplated automatically doing this transformation internally in the MRtrix3 code). Historically, I’ve done this manually, and then if I’m interested in beta coefficients, I simply apply the reverse transformation to get from “rate of change of exploratory variable with respect to column 1” to “rate of change of exploratory variable with respect to variable of interest from which column 1 was generated”.

1 Like

Thanks @rsmith and @jdtournier (and all of your other answers to my posts)!

I do not get a console message as you’ve described. I get a SLURM out of memory event always happening in during the permutations. It is no longer an issue, since 1) I upped my memory to 400 GB (thank goodness for HPCs) and 2) it turns out I don’t need as many subjects as originally anticipated.

Understood, thanks.
DM:[1 cog_metric covar1 covar2], Con: [0 1 0 0]

Perfect!