How much RAM am I likely to need for a whole brain analysis of 40 subjects, looking at a single or two potential behavioral correlates?
The number of subjects / size of design matrix bears little influence on the RAM usage of the command. The major storage requirement comes from the fixel-fixel connectivity matrix, which primarily scales according to the number of fixels in your analysis. In my limited experience, a 1.25mm template with typical thresholds gives ~ 400,000 fixels, and with 2 million tracks in template space this requires ~80GB of RAM. This will scale somewhere between quadratically and cubically with altering the spatial resolution of the template, and sub-linearly with the number of tracks (since additional tracks tend to connect the same fixels as prior tracks).
Is there any advantage to requesting multiple CPU cores?
The majority of the runtime in
fixelcfestats is spent on permutation testing, and this will happily flog as many threads as you offer up; on our dual-Xeon 16-core systems the usage sits at around 3196%. Since you'll also be requesting a large chunk of RAM, the easiest solution is just to request all cores and all memory available on a single node. Specifically the permutation stage, which again is the majority of the runtime, will scale pretty much perfectly (inverse) linearly with the number of threads.
Does it matter if sub-parts of the RAM are assigned to separate nodes that may be performing the computation? Or is it ideal for the RAM to all be assigned to one compute node (possibly with multiple cores)
You can't run
fixelcfestats across nodes, only across cores within a node. When requesting the resources using e.g. SLURM, there may be separate options for requesting memory per core versus total memory for the job; but given you should be requesting use of an entire node, you should also request a fixed amount of memory according to how much the job needs. RAM is not "split" between cores on a node (except for in the job queue / resource allocator, where it's only used to manage running multiple single-core jobs on a single node without encountering memory issues; but this shouldn't apply in your case).
Are any of the other steps computationally expensive/slow? Would it be advantageous to assign those steps to the computing cluster as well? e.g., would it be worth it to run dwipreproc on the cluster, given the potential headache of also installing FSL there?
It's all relative. Anything that's moderately expensive will be faster if you can run it in parallel for different subjects across nodes rather than serially on your own system.
dwipreproc is a good example, though you should also look into using the CUDA version of
eddy. If you can script, sometimes it's easier to just run everything on a cluster, and only pull what data you need locally for QA. Ultimately there's no unambiguous correct answer here, it depends on your own priorities.