Get stuck when import DWI data

jdtournier · October 6, 2022, 2:23pm

No, that’s not the issue – 16GB should be plenty enough to process the data (though these are indeed big files and it will take some time). The issue is a bit technical and relates to our use of memory-mapping, which allows us to instruct the OS to transparently ‘insert’ the whole file as-is into system memory. This makes it easy to directly read and write to the file with very little overhead.

The problem is that behinds the scenes, the OS will need to manage which bits of the file are loaded from disk into RAM (by default, it will only ‘page in’ those bits that the program explicitly tries to access), and more importantly, when we write to the output file (i.e. write to those memory locations), the OS needs to eventually ‘commit’ these changes back to the hard drive. Most OS’s won’t immediately write all the changes to disk, but delay the write back till later – it’s a good idea, since a write to a specific memory location is likely to be followed soon after by a write to an adjacent location, so it’s better to keep things in RAM for a reasonable amount of time and only commit them to disk when all changes to that bit of the file are likely to be complete. There are various policies that the OS might pick for exactly how it handles this, and on Linux these things can be adjusted. No such flexibility on macOS unfortunately (at least not as far as I can tell).

But the upshot of it is that for some workloads where lots of memory locations are changed all over the file (which happens in this particular case), the OS is more likely to reach the point where it feels the need to commit (technically, the number of ‘dirty pages’ exceeds its tolerance), and it may then decide to commit all changes to disk – halting all execution in the meantime. And to compound the issue, once the program is allowed to run again, it’ll carry on modifying memory locations close to the ones that just got committed, which means the OS will then have to commit the exact same memory pages again.

What I would like to do is to find a way to tell the OS to delay committing the data back to file until the file is closed, but unfortunately there doesn’t seem to be a way of doing this (at least, I couldn’t find an option to do this anywhere when I looked into it). In the absence of such an option, the alternative is the explicit delayed writeback that we have implemented in the backend to avoid issues like this on certain types of filesystems (such as network file shares, since if left unchecked, this can generate a huge amount of needless network traffic).

My alternative suggestion was to store the file directly on a memory-backed filesystem, since in this case the OS doesn’t need to load or commit any memory pages at all – they’re already in the system RAM. Hopefully that will avoid the issue altogether.

I wouldn’t advocate reducing the quality of your data, unless that happens to be the only way to proceed, it will just make it harder to publish your findings. However, if you need to process that many HCP subjects, then you really don’t want to be doing this on your macbook… Depending on what you want to do, I wouldn’t be surprised if the full processing took on the order of a day per subject (again, very dependent on what you’re planning), which would take you about 3 years on a single machine running full time… Downsampling the data will definitely help with processing time, but as I said, reviewers may question your approach if & when you try to publish your results (again, depends entirely on what you’re going to do).

You should get access to a HPC cluster, or at least a dedicated workstation (if not several…) if you’re going to do this – in which case this particular problem may be a non-issue: you probably won’t encounter the problem on a Linux HPC with enough RAM (its dirty page handling policy is different from macOS, and doesn’t force a writeback so easily).

All the best,
Donald.