Get stuck when import DWI data

I am using dwi2response on the HCP Diffusion dataset in order to get the response function of DWI images to generate the track of fibers. However, when running this command, it gets stuck for about 4h+ at this step (shown below). And I think it is abnormal and weird.

dwi2response msmt_5tt degibbs.mif 5ttseg.mif ms_5tt_wm.txt ms_5tt_gm.txt \
             ms_5tt_csf.txt -voxels ms_5tt_voxels.mif

dwi2response: 
dwi2response: Note that this script makes use of commands / algorithms that have relevant articles for citation. Please consult the help page (-help option) for more information.
dwi2response: 
dwi2response: Generated scratch directory: /Users/star_volcano/Desktop/100206/T1w/Diffusion/dwi2response-tmp-LP1BO3/
dwi2response: Importing DWI data (/Users/star_volcano/Desktop/100206/T1w/Diffusion/degibbs.mif)...

I am using a MacBook pro with M1pro and 16G memory.
The information of the DWI image which I would like to process is as below(4.1G):

Dimensions:        145 x 174 x 145 x 288
  Voxel size:        1.25 x 1.25 x 1.25 x 1
  Data strides:      [ -1 2 3 4 ]
  Format:            MRtrix
  Data type:         32 bit float (little endian)
  Intensity scaling: offset = 0, multiplier = 1
  Transform:                    1           0           0         -90
                               -0           1           0        -126
                               -0           0           1         -72
  command_history:   mrconvert -fslgrad bvecs bvals data.nii.gz DTI.mif  (version=3.0.3)
                     dwidenoise DTI.mif denoise.mif -noise noiselevel.mif -mask preproc_mask.mif  (version=3.0.3)
                     mrdegibbs denoise.mif degibbs.mif  (version=3.0.3)
  comments:          FSL5.0
  dw_scheme:         0.5421861165,0.6720491444,-0.5043651084,5
  [288 entries]      -0.918106027,-0.306174009,-0.2516720074,1000
                     ...
                     -0.9878630087,0.008873000078,-0.1550740014,1995
                     -0.46236328,0.6267413795,0.6272283798,3000
  mrtrix_version:    3.0.3

I have seen a similar problem on this topic: Why does dwi2response take so long with multi shell data.
However, the installation method of MRtrix3 has changed a lot compared with the year 2019, I can’t solve the problem using the way mentioned in that topic.

Is there any solution?

By the way, I am going to process about 1,000 images in the HCP dataset. Can I accelerate
calculation with GPU or using parallelization?

Thanks a lot!

Hi @StarVolcano,

Yes, unfortunately this is an ongoing issue, I’ve not had a chance to look into it properly (or access to a mac for testing, though I do have access to a little Intel Mac Mini now). However:

Actually, that change is still applicable – though the line to change is now line 64. However, if you originally installed one of the precompiled packages, then yes, this won’t be as simple to do, you would indeed need to compile from source.

I think the above is actually the best solution, as far as I can tell…

An alternative if you have enough RAM on your system, is to create a RAM-backed filesystem via ramfs, and use that to host the temporary scratch folder that the script will be creating. This should avoid the issues of intermittent RAM → HD sync events, which is what causes these massive slowdowns.

There are instructions on how to do this in various places online (e.g. this GitHub gist). Once you have a RAM disk set up and mounted, you should be able to tell dwi2response to place its temporary folder in it via its -scratch option.

You can certainly parallelise by distributing subjects across different systems / nodes. There is no point is parallelising on the same system since all MRtrix commands already multi-thread to the maximum extent by default, any attempt at further parallelisation will only result in further performance degradation as the different tasks will compete for limited resources. There is no scope for GPU acceleration unfortunately since MRtrix does not include any support for that (yet…).

Hope this helps…
Cheers,

Donald.

1 Like

I appreciate your elaborate reply and help very much!

According to your description, can I conclude this problem as follow?
My laptop’s RAM is not big enough to store the temporary file produced when running this command.

So maybe there are three kinds of solutions, one is the method in the previous topic which avoids writeback immediately, one is the alternative that you mentioned, creating a RAM-backed filesystem to avoid the issues of intermittent RAM → HD sync events, the last is reducing the size of DWI image by bringing the image into a coarse structural space(maybe 2mm) and cut down the number of shells which is just my assumption(I have no idea whether it will have a big impact on the final tracking result).

Do you think this works in consideration of my task to process 1,000 subjects?
Hope to get your advice.

Whatever, I will try the first solution and see the result.
Thank you again for the detailed suggestion.

No, that’s not the issue – 16GB should be plenty enough to process the data (though these are indeed big files and it will take some time). The issue is a bit technical and relates to our use of memory-mapping, which allows us to instruct the OS to transparently ‘insert’ the whole file as-is into system memory. This makes it easy to directly read and write to the file with very little overhead.

The problem is that behinds the scenes, the OS will need to manage which bits of the file are loaded from disk into RAM (by default, it will only ‘page in’ those bits that the program explicitly tries to access), and more importantly, when we write to the output file (i.e. write to those memory locations), the OS needs to eventually ‘commit’ these changes back to the hard drive. Most OS’s won’t immediately write all the changes to disk, but delay the write back till later – it’s a good idea, since a write to a specific memory location is likely to be followed soon after by a write to an adjacent location, so it’s better to keep things in RAM for a reasonable amount of time and only commit them to disk when all changes to that bit of the file are likely to be complete. There are various policies that the OS might pick for exactly how it handles this, and on Linux these things can be adjusted. No such flexibility on macOS unfortunately (at least not as far as I can tell).

But the upshot of it is that for some workloads where lots of memory locations are changed all over the file (which happens in this particular case), the OS is more likely to reach the point where it feels the need to commit (technically, the number of ‘dirty pages’ exceeds its tolerance), and it may then decide to commit all changes to disk – halting all execution in the meantime. And to compound the issue, once the program is allowed to run again, it’ll carry on modifying memory locations close to the ones that just got committed, which means the OS will then have to commit the exact same memory pages again.

What I would like to do is to find a way to tell the OS to delay committing the data back to file until the file is closed, but unfortunately there doesn’t seem to be a way of doing this (at least, I couldn’t find an option to do this anywhere when I looked into it). In the absence of such an option, the alternative is the explicit delayed writeback that we have implemented in the backend to avoid issues like this on certain types of filesystems (such as network file shares, since if left unchecked, this can generate a huge amount of needless network traffic).

My alternative suggestion was to store the file directly on a memory-backed filesystem, since in this case the OS doesn’t need to load or commit any memory pages at all – they’re already in the system RAM. Hopefully that will avoid the issue altogether.


I wouldn’t advocate reducing the quality of your data, unless that happens to be the only way to proceed, it will just make it harder to publish your findings. However, if you need to process that many HCP subjects, then you really don’t want to be doing this on your macbook… Depending on what you want to do, I wouldn’t be surprised if the full processing took on the order of a day per subject (again, very dependent on what you’re planning), which would take you about 3 years on a single machine running full time… Downsampling the data will definitely help with processing time, but as I said, reviewers may question your approach if & when you try to publish your results (again, depends entirely on what you’re going to do).

You should get access to a HPC cluster, or at least a dedicated workstation (if not several…) if you’re going to do this – in which case this particular problem may be a non-issue: you probably won’t encounter the problem on a Linux HPC with enough RAM (its dirty page handling policy is different from macOS, and doesn’t force a writeback so easily).

All the best,
Donald.

1 Like

It’s so grateful to have your explanation and advice! I do have a better understanding of this problem.

I was using the laptop to run a test and faced this problem which makes me confused. Your advice is meaningful to me. I’ll follow your suggestion to get access to an HPC cluster in future work.

I’ll feedback soon~