Quick question: I saw this question has been answered for fixel-based analysis such that you would preferably be able to max out the number of CPUs but only require one node. Does this also apply to tckgen and tcksift? Or would it be more productive to do it across >1 node? I am intending to run this on a cluster that has MPI capabilities (using SLURM).
Approximately how much allocation is necessary for 1 dataset (e.g. HCP participant) with 100M streamlines, sifting to 10M?
OK, in general all MRtrix3 are heavily multi-threaded, so will always benefit from compute nodes with higher core counts. Also, it’s generally preferable to run sequential fully-multi-threaded jobs per compute node, rather than concurrent single-threaded jobs, since this reduces the competition for finite compute resources (RAM, CPU cache, IO bandwidth, etc).
Otherwise, no MRtrix3 command uses MPI, so there’s no point in pursuing that avenue. On top of that, given the task that tcksift performs, there is no reasonable way that I can think of to parallelise this across compute nodes. On the other hand, tckgen is relatively trivial to spread across nodes, as long as you don’t use the grid or dynamic seeding mechanisms - see this old post for details.
In terms of allocation, tckgen requires very little RAM, but will produce large files that scale with the number of streamlines. As ballpark figure, I reckon you should be looking at ~½GB per 1M streamlines. On the other hand, tccksift requires a lot of live RAM, and that also scales with both the number of streamlines, and with the size of FOD image. To SIFT a 100M tractogram, I reckon you’ll need a minimum of 64GB, if not 128GB…