Fixelcfestats "running permutations" step takes several days to complete in v3.0.1?

Hi, I wanted to test out the newest version of fixelcfestats, which seemed like it would be sleeker given that fixel-fixel connectivity doesn’t need to be computed each time the command is executed. When I run my model in the older version, it takes about 1hr/contrast (to calculate fixel-fixel connectivity and run the permutations for that contrast). I reran the same analysis in the new version of MRtrix3, allocating the same 40G of memory that I always do for fixelcfestats (with all the contrasts represented in a single matrix), and the process has been crawling through the first set of permutation calculations for over four days. The process isn’t “hung”, because there has been incremental creeping forward of the progress bar. Any ideas for why this might be? Full info below…

##command

fixelcfestats -force -nshuffles 1000 $DIR/fd_smooth/ $DIR/scripts_fixel/fixel_1208_fd.txt $DIR/scripts_fixel/design_1208_ones_sex_age_raceAA_raceOT_qc_wrat.txt $DIR/scripts_fixel/contrast_matrix_0000000b.txt $DIR/population_template/matrix $DIR/fixelcfe_output_smooth/fixelcfestats_1208_ones_sex_age_raceAA_raceOT_qc_wrat_fdB_2 -nthreads 2

##contrast matrix

cat contrast_matrix_0000000b.txt
0 0 0 0 0 0 1
0 0 0 0 0 0 -1
0 0 0 0 0 1 0
0 0 0 0 0 -1 0
0 0 0 0 1 0 0
0 0 0 0 -1 0 0
0 0 0 1 0 0 0
0 0 0 -1 0 0 0
0 0 1 0 0 0 0
0 0 -1 0 0 0 0
0 1 0 0 0 0 0
0 -1 0 0 0 0 0

##output after running for over four days:

cat fixelcfestats_1208_ones_sex_age_raceAA_raceOT_qc_wrat_fdB.txt
fixelcfestats: [WARNING] existing output files will be overwritten
fixelcfestats: Number of fixels in template: 173239
fixelcfestats: Importing data from files listed in “fixel_1208_fd.txt” as found relative to directory “/gpfs/ysm/scratch60/pittenger/rgg27/PNC/fixel//fd_smooth/”… …done
fixelcfestats: Number of inputs: 1208
fixelcfestats: Number of factors: 7
fixelcfestats: Design matrix condition number: 13.7053
fixelcfestats: Number of hypotheses: 12
fixelcfestats: [WARNING] A total of 5180 fixels do not possess any streamlines-based connectivity; these will not be enhanced by CFE, and hence cannot be tested for statistical significance
fixelcfestats: Loading fixel data (no smoothing)… [==================================================]
fixelcfestats: Calculating basic properties of default permutation… [========================================]
fixelcfestats: Outputting beta coefficients, effect size and standard deviation… [=================================================]
fixelcfestats: Running GLM and enhancement algorithm for default permutation… [======================================]
fixelcfestats: Running permutations… [========================================

Hi Rachel,

I’ve not observed this kind of behaviour, and there’s no particular reason why the command should be running that much slower than the previous code. I’d have expected the execution time to take less than 12 hours given the total amount of processing should be less than your prior 12 separate executions, each of which individually required building the fixel-fixel connectivity matrix. The internals of the GLM have changed quite a lot, but I didn’t observe any major slowdowns in my own testing.

I can only generate a couple of hypotheses that are consistent with the information provided (there are new capabilities that will slow down execution, but they’re not relevant given their absence from your terminal output):

  1. Your number of inputs is much larger than anything I’ve tested on. The change in empirical null distribution generation from Manly to Freedman-Lane will involve an additional 1208x1208 matrix multiplication compared to the old code, which won’t be super cheap.

  2. The old code handled generation of t-values in batches, whereas the new code does the whole matrix multiplication for all fixels in one go. It’s possible that with the data I did testing on that was fine, but with your very large number of inputs the matrix data are becoming too large, leading to cache miss slowing down execution. I can probably re-introduce some manual buffering to the execution which might help (and if I do I now know who to ask to test it :stuck_out_tongue:). But I’d have hoped that Eigen would have made the appropriate decisions here…

  3. The fixel-fixel connectivity matrix should either be memory-mapped, or explicitly loaded into RAM if this is not possible. There’s some chance that in your situation these data are not explicitly loaded into RAM at the commencement of execution, but access to the data are also slow: this would lead to the CFE portion taking a long time to execute due to delays in acquiring the fixel connectivity information for each fixel to be enhanced.

Given the length of the permutation progress bar I’m hoping that execution has completed in the time since you made the post. But thanks for flagging, and I’ll have to think for a little while about where effort needs to be invested here.

Cheers
Rob

Hi Rob, thank you for the thoughtful answer. I do wonder if maybe #3 is occurring on my HPC system. There are ~180k fixels in my input data, so nothing too crazy, right? But yes–happy to stress test things if you make any changes :grin:

I had to cancel the original processes because they were going to time out on the cluster, but I was able to speed things up considerably by reconfiguring memory/CPU allocation (I bumped CPUs up to “-nthreads 10” and allocated less memory per CPU). Now it’s taking a couple of days instead of longer than 1 week. Probably there is still something weird going on; I will continue to tweak, and I should probably also chat with our IT folks to see if they have ideas.