I am building a Mrtrix pipeline on our HPC but when running it it takes muuuuuch more time than on my own laptop (for example dwi2fod takes over 2 hours). I already tried switching between CPU/GPU based nodes and assigning more threads and memory but nothing seems to solve the problem. Am I overseeing something simple?
Run mrconvert with the -debug option to see what MRtrix3 is reporting as the number of threads being used so you can be sure it’s actually using the resources you’re allocating, as per this post.
Are you specifying the number of threads to be used within commands (like -nthreads X) or relying on MRtrix3 to auto-detect the max number of threads available? In my experience, the latter strategy sometimes does not work.
Did you set up the MRtrix3 env variables, like MRTRIX_NTHREADS? In your job script, you could set this to be your cluster job manager’s environment variable report of the number of threads, if it’s true that MRtrix3 is not successfully automatically detecting this. In SLURM, this would be $SLURM_CPUS_PER_TASK. See setup docs here.
Another issue could be the filesystem. Your HPC machines might be running a somewhat esoteric networked filesystem that is not detected as being networked, causing very slow performance for commands that use memory mapping (which is the majority of commands). More info here: Very slow performance when using network file system
Thank you both for your replies! It (luckily) doesn’t seem to be the problem you are describing @bjeurissen . Can imagine that would’ve been a lot harder to fix. I have looked at optimizing all MRTRIX3 env variables and setting up the slurm job variables and now the full pipeline is running in a quite reasonable time per patient.