Recommended cluster allocation (e.g., # of cores, RAM) for Fixel-Based Analysis

installation

#1

Hello,
I’m planning to run a fixel-based analysis with ~40 subjects with b=0,1000, and 2000 shells, 1.5 mm isotropic voxels. Because fixelcfestats is very RAM-hungry, I am planning to run at least that portion on our university computing cluster. I was wondering if the community/developers had any recommendations for what kind of resources to request. A few specific questions:

  • How much RAM am I likely to need for a whole brain analysis of 40 subjects, looking at a single or two potential behavioral correlates?
  • Is there any advantage to requesting multiple CPU cores? How many would you recommend? Any way of estimating effects on overall compute time?
  • Does it matter if sub-parts of the RAM are assigned to separate nodes that may be performing the computation? Or is it ideal for the RAM to all be assigned to one compute node (possibly with multiple cores)
  • Are any of the other steps computationally expensive/slow? Would it be advantageous to assign those steps to the computing cluster as well? e.g., would it be worth it to run dwipreproc on the cluster, given the potential headache of also installing FSL there?

Thank you for any help you can provide!

John


#2

Hi John,

How much RAM am I likely to need for a whole brain analysis of 40 subjects, looking at a single or two potential behavioral correlates?

The number of subjects / size of design matrix bears little influence on the RAM usage of the command. The major storage requirement comes from the fixel-fixel connectivity matrix, which primarily scales according to the number of fixels in your analysis. In my limited experience, a 1.25mm template with typical thresholds gives ~ 400,000 fixels, and with 2 million tracks in template space this requires ~80GB of RAM. This will scale somewhere between quadratically and cubically with altering the spatial resolution of the template, and sub-linearly with the number of tracks (since additional tracks tend to connect the same fixels as prior tracks).

Is there any advantage to requesting multiple CPU cores?

The majority of the runtime in fixelcfestats is spent on permutation testing, and this will happily flog as many threads as you offer up; on our dual-Xeon 16-core systems the usage sits at around 3196%. Since you’ll also be requesting a large chunk of RAM, the easiest solution is just to request all cores and all memory available on a single node. Specifically the permutation stage, which again is the majority of the runtime, will scale pretty much perfectly (inverse) linearly with the number of threads.

Does it matter if sub-parts of the RAM are assigned to separate nodes that may be performing the computation? Or is it ideal for the RAM to all be assigned to one compute node (possibly with multiple cores)

You can’t run fixelcfestats across nodes, only across cores within a node. When requesting the resources using e.g. SLURM, there may be separate options for requesting memory per core versus total memory for the job; but given you should be requesting use of an entire node, you should also request a fixed amount of memory according to how much the job needs. RAM is not “split” between cores on a node (except for in the job queue / resource allocator, where it’s only used to manage running multiple single-core jobs on a single node without encountering memory issues; but this shouldn’t apply in your case).

Are any of the other steps computationally expensive/slow? Would it be advantageous to assign those steps to the computing cluster as well? e.g., would it be worth it to run dwipreproc on the cluster, given the potential headache of also installing FSL there?

It’s all relative. Anything that’s moderately expensive will be faster if you can run it in parallel for different subjects across nodes rather than serially on your own system. dwipreproc is a good example, though you should also look into using the CUDA version of eddy. If you can script, sometimes it’s easier to just run everything on a cluster, and only pull what data you need locally for QA. Ultimately there’s no unambiguous correct answer here, it depends on your own priorities.

Cheers
Rob


#3

Hi Rob,
Thank you so much for your detailed reply. This is very helpful.

I had one last question: about how long can I expect the command to take? For example, how long has it taken on the system you described?

John


#4

My most recent testing took 8.5 hours on that system - for the current master branch at least:smiling_imp:


#5

That’s pretty quick!

Thanks, Rob. I appreciate your help!