Why does dwi2response take so long with multi shell data

I’m currently running dwi2response on a HCP diffusion dataset (~3,9 GB file, 1000, 2000, 3000 and b0, 90 directions for each non b0 shell). When I use the debug flag I am informed that one of the first steps is to mrconvert (and therefore copy) the file to the temporary directory with a -strides 0,0,0,1 option.

This takes very very long and I wonder why. Can someone explain this to me? Is the data somehow rearranged in memory to make later computations faster? Is there a way to speed things up?

Is the data somehow rearranged in memory to make later computations faster?

Yes, data is copied to volume contiguous format for faster subsequent processing.

It might be CPU dependent but I see similar things on my machine, for instance when mrcating two files with different strides to a new axis. Tweaking Disk Caching might help, you could try to change the temporary directory location (don’t use networked storage, preferrably an SSD or RAM-backed file system) and see if that makes a difference. My workaround is to use a different machine (with larger CPU cache) to bring the data into the right stride order (mrconvert -strides) before feeding it to dwi2response.

For large datasets, this is most likely related to the kernel’s handling of dirty pages that @maxpietsch linked to. It’s a problem that comes up quite regularly, and that I’ve been trying to find a workaround for. If you’re feeling in an investigative mood, maybe you could try the following and report back:

Change line 63 in the file core/file/mmap.cpp to set the delayed_writeback variable to true (should be set to false normally):

       bool delayed_writeback = true;

then recompile:

./build

… and try again. I’d be interested to know if that solves the problem without causing any other issues…

Thanks for your help @maxpietsch!
@jdtournier, that helped a lot. Thanks! Can I leave the setting like that or has it important implications?

That’s what I’d like to know! Let me know if anything funny happens…

Just to expand on that: I don’t expect this to have any unexpected or undesirable side effects, other than the possibility that it might require twice the RAM it would otherwise need just at the point where it’s writing the output file back to disk. But there’s good reason to suggest that a modern OS would actually handle this more intelligently than I’m giving it credit for… If this holds up, I’m hoping to make this the default behaviour in future releases of MRtrix3.

1 Like

sounds great. thanks! I will watch out.