Recommended cluster allocation (e.g., # of cores, RAM) for Fixel-Based Analysis

johncplass · July 11, 2017, 4:06pm

Hello,
I’m planning to run a fixel-based analysis with ~40 subjects with b=0,1000, and 2000 shells, 1.5 mm isotropic voxels. Because fixelcfestats is very RAM-hungry, I am planning to run at least that portion on our university computing cluster. I was wondering if the community/developers had any recommendations for what kind of resources to request. A few specific questions:

How much RAM am I likely to need for a whole brain analysis of 40 subjects, looking at a single or two potential behavioral correlates?
Is there any advantage to requesting multiple CPU cores? How many would you recommend? Any way of estimating effects on overall compute time?
Does it matter if sub-parts of the RAM are assigned to separate nodes that may be performing the computation? Or is it ideal for the RAM to all be assigned to one compute node (possibly with multiple cores)
Are any of the other steps computationally expensive/slow? Would it be advantageous to assign those steps to the computing cluster as well? e.g., would it be worth it to run dwipreproc on the cluster, given the potential headache of also installing FSL there?

Thank you for any help you can provide!

John

rsmith · July 18, 2017, 5:32am

Hi John,

How much RAM am I likely to need for a whole brain analysis of 40 subjects, looking at a single or two potential behavioral correlates?

The number of subjects / size of design matrix bears little influence on the RAM usage of the command. The major storage requirement comes from the fixel-fixel connectivity matrix, which primarily scales according to the number of fixels in your analysis. In my limited experience, a 1.25mm template with typical thresholds gives ~ 400,000 fixels, and with 2 million tracks in template space this requires ~80GB of RAM. This will scale somewhere between quadratically and cubically with altering the spatial resolution of the template, and sub-linearly with the number of tracks (since additional tracks tend to connect the same fixels as prior tracks).

Is there any advantage to requesting multiple CPU cores?

The majority of the runtime in fixelcfestats is spent on permutation testing, and this will happily flog as many threads as you offer up; on our dual-Xeon 16-core systems the usage sits at around 3196%. Since you’ll also be requesting a large chunk of RAM, the easiest solution is just to request all cores and all memory available on a single node. Specifically the permutation stage, which again is the majority of the runtime, will scale pretty much perfectly (inverse) linearly with the number of threads.

Does it matter if sub-parts of the RAM are assigned to separate nodes that may be performing the computation? Or is it ideal for the RAM to all be assigned to one compute node (possibly with multiple cores)

You can’t run fixelcfestats across nodes, only across cores within a node. When requesting the resources using e.g. SLURM, there may be separate options for requesting memory per core versus total memory for the job; but given you should be requesting use of an entire node, you should also request a fixed amount of memory according to how much the job needs. RAM is not “split” between cores on a node (except for in the job queue / resource allocator, where it’s only used to manage running multiple single-core jobs on a single node without encountering memory issues; but this shouldn’t apply in your case).

Are any of the other steps computationally expensive/slow? Would it be advantageous to assign those steps to the computing cluster as well? e.g., would it be worth it to run dwipreproc on the cluster, given the potential headache of also installing FSL there?

It’s all relative. Anything that’s moderately expensive will be faster if you can run it in parallel for different subjects across nodes rather than serially on your own system. dwipreproc is a good example, though you should also look into using the CUDA version of eddy. If you can script, sometimes it’s easier to just run everything on a cluster, and only pull what data you need locally for QA. Ultimately there’s no unambiguous correct answer here, it depends on your own priorities.

Cheers
Rob

johncplass · July 20, 2017, 3:07pm

Hi Rob,
Thank you so much for your detailed reply. This is very helpful.

I had one last question: about how long can I expect the command to take? For example, how long has it taken on the system you described?

John

rsmith · July 21, 2017, 12:58am

My most recent testing took 8.5 hours on that system - for the current master branch at least…

johncplass · August 7, 2017, 2:59pm

That’s pretty quick!

Thanks, Rob. I appreciate your help!

AmirHussein · June 10, 2019, 5:33pm

Hi all,

I’m doing an FBA on ~380000 fixels with 2 million tracks, and currently don’t have enough RAM to do so (32GB RAM). I wonder is there a way to allocate a 300GB SSD drive as RAM for the “fixelcfestats”? The analysis is being done on a system with Intel Xeon processors and on the Linux Ubuntu platform.

Bests,
Amir

rsmith · June 16, 2019, 1:07pm

Hi Amir,

Rather than trying to pull that off, I would suggest instead giving this branch a go. There are a substantial number of changes in there to be aware of, but it should get your RAM usage down below 32GB.

Rob