I’m running connectome generation for 54 subjects on a cluster with 48 cores. When I’ve submitted 48 jobs for 48 subjects together, I’ve noticed that steps ‘tckgen’ and ‘tcksift’ took incredibly long time to run. Is there a more efficient way to go about? I’ve read elsewhere that by default these commands use maximum number of threads available. Should I limit that explicitly by the ‘-nthreads’ option?
You will probably get the highest throughput by processing a single image per machine with the threads matching the number of physical + virtual CPU cores on that machine, which is the default in MRtrix3.
By launching 54 instances of
tckgen , each starting 48 threads, your system is probably busy with thread management and fetching and writing data. If your cluster is a single machine it might be faster to process two images in parallel with
-nthreads 24 .
Note that you can use an environmental variable to control the numberof threads on each worker: link
Thank you very much for the reply.
If I disable threading by -nthreads = 0, and submit all the 48; it would be slower than running 2subjects on -nthreads =24 of course, but when it’s done, will I have more number of subjects? I’m sorry if I’m getting confused with concepts here
Assuming it is a single machine, the total run time of processing 48 images in parallel without multithreading might be longer than processing each image consecutively with 48 threads or than running 2 images at a time launching 24 threads each. It depends on a number of factors but in general, I’d process one or two images in parallel with the total number of threads roughly equal to the number of cores.
Thank You very much for the clarification