Hi Experts,
I’m running connectome generation for 54 subjects on a cluster with 48 cores. When I’ve submitted 48 jobs for 48 subjects together, I’ve noticed that steps ‘tckgen’ and ‘tcksift’ took incredibly long time to run. Is there a more efficient way to go about? I’ve read elsewhere that by default these commands use maximum number of threads available. Should I limit that explicitly by the ‘-nthreads’ option?
Regards,
Archith Rajan
You will probably get the highest throughput by processing a single image per machine with the threads matching the number of physical + virtual CPU cores on that machine, which is the default in MRtrix3.
By launching 54 instances of tckgen
, each starting 48 threads, your system is probably busy with thread management and fetching and writing data. If your cluster is a single machine it might be faster to process two images in parallel with -nthreads 24
.
Note that you can use an environmental variable to control the numberof threads on each worker: link
1 Like
Thank you very much for the reply.
If I disable threading by -nthreads = 0, and submit all the 48; it would be slower than running 2subjects on -nthreads =24 of course, but when it’s done, will I have more number of subjects? I’m sorry if I’m getting confused with concepts here
Assuming it is a single machine, the total run time of processing 48 images in parallel without multithreading might be longer than processing each image consecutively with 48 threads or than running 2 images at a time launching 24 threads each. It depends on a number of factors but in general, I’d process one or two images in parallel with the total number of threads roughly equal to the number of cores.
2 Likes
Thank You very much for the clarification