Tckgen on HCP Dataset takes excessively long

A lot of MRtrix3 commands we’ve seen effectively 100% CPU utilisation on 16-core 32-thread, but I’ve never had access to higher than that. It’s entirely feasible that a particular section of code could bottleneck at around that point. For tckgen, I can see a few points that could conceivably lead to less than 100% usage:

  • Although tracks are generated in parallel, they still need to be written to file sequentially. This is done by writing the track data from the tractography threads into a queue structure, and a single thread is responsible for writing that data to file. This requires mutual exclusion locking, which may prevent tracking threads from running at full speed once a certain number of threads is reached. You may have simply found this number.

  • If you are writing to a network-based file system, it’s maybe conceivable that the file system I/O is at its limit, but I find this fairly unlikely. You could try setting the config file entry TrackWriterBufferSize to a larger number and see what happens.

  • With dynamic seeding specifically, determining streamline seed points is not entirely independent between threads: All threads are both reading from fixel data to determine seed probabilities, and writing fixel data to dynamically update those probabilities, and these data are common across all threads. This is done using the C++11 atomics library rather than explicit mutual exclusions, and I’ve deliberately used the most relaxed memory synchronisation rules I could, but this could conceivably hit a multi-threading limit. Running tckgen with some other seeding mechanism should tell you whether or not it’s the dynamic seeding that’s preventing 100% utilisation.