Dwipreproc command error; memory issue?

phmag · September 6, 2016, 4:45am

I’m getting a strange error after executing this dwipreproc command for example:

dwipreproc -rpe_none AP 040/dwidn.mif 040/dwipp.mif
dwipreproc:
dwipreproc: Note that this script makes use of commands / algorithms that have relevant articles for citation; INCLUDING FROM EXTERNAL SOFTWARE PACKAGES. Please consult the help page (-help option) for more information.
dwipreproc:
dwipreproc: Generated temporary directory: /tmp/dwipreproc-tmp-NFNIWJ/
Command: mrconvert /mridata/workingdata/BIN/subjects/040/dwidn.mif /tmp/dwipreproc-tmp-NFNIWJ/series.mif
dwipreproc: Changing to temporary directory (/tmp/dwipreproc-tmp-NFNIWJ/)
Command: mrconvert series.mif dwi_pre_topup.nii -stride -1,+2,+3,+4
dwipreproc: Creating phase-encoding configuration file
Command: dwi2mask series.mif - | maskfilter - dilate - | mrconvert - mask.nii -datatype float32 -stride -1,+2,+3
Command: mrconvert series.mif - -stride -1,+2,+3,+4 | mrinfo - -export_grad_fsl bvecs bvals
dwipreproc:
dwipreproc: [ERROR] Command failed: mrconvert series.mif - -stride -1,+2,+3,+4 | mrinfo - -export_grad_fsl bvecs bvals
dwipreproc: Output of failed command:
mrinfo: [ERROR] no filename supplied to standard input (broken pipe?)
mrinfo: [ERROR] error opening image “-”
dwipreproc: Changing back to original directory (/mridata/workingdata/BIN/subjects)
dwipreproc: Deleting temporary directory /tmp/dwipreproc-tmp-NFNIWJ/

I suspect it may be a memory issue as if I execute this command first (ie deleting the contents of the mrtrix tmp directory) it works ok.

rm /tmp/mrtrix-tmp-*

then:

dwipreproc -rpe_none AP 040/dwidn.mif 040/dwipp.mif
dwipreproc:
dwipreproc: Note that this script makes use of commands / algorithms that have relevant articles for citation; INCLUDING FROM EXTERNAL SOFTWARE PACKAGES. Please consult the help page (-help option) for more information.
dwipreproc:
dwipreproc: Generated temporary directory: /tmp/dwipreproc-tmp-AG3CZA/
Command: mrconvert /mridata/workingdata/BIN/subjects/040/dwidn.mif /tmp/dwipreproc-tmp-AG3CZA/series.mif
dwipreproc: Changing to temporary directory (/tmp/dwipreproc-tmp-AG3CZA/)
Command: mrconvert series.mif dwi_pre_topup.nii -stride -1,+2,+3,+4
dwipreproc: Creating phase-encoding configuration file
Command: dwi2mask series.mif - | maskfilter - dilate - | mrconvert - mask.nii -datatype float32 -stride -1,+2,+3
Command: mrconvert series.mif - -stride -1,+2,+3,+4 | mrinfo - -export_grad_fsl bvecs bvals
Command: eddy --imain=dwi_pre_topup.nii --mask=mask.nii --index=indices.txt --acqp=config.txt --bvecs=bvecs --bvals=bvals --out=dwi_post_eddy
dwipreproc: [WARNING] eddy has not provided rotated bvecs file; using original gradient table
Command: mrconvert dwi_post_eddy.nii.gz result.mif -stride -2,-3,4,1 -fslgrad bvecs bvals
Command: mrconvert result.mif /mridata/workingdata/BIN/subjects/040/dwipp.mif
dwipreproc: Changing back to original directory (/mridata/workingdata/BIN/subjects)
dwipreproc: Deleting temporary directory /tmp/dwipreproc-tmp-AG3CZA/

Just wondered if you had any thoughts on this?

rsmith · September 6, 2016, 5:30am

Yes, I would definitely suspect a memory issue in this instance, as previously experienced here and here. The clue is in the output provided after this line:

dwipreproc: Output of failed command:

It subsequently gives details of mrinfo not receiving the piped image it expects, but it’s mrconvert that is genuinely failing, and that command is not providing any terminal output at all.

Click here for the technical details of what's going on if you're interested

The issue is that it’s a very low-level system operation that goes awry. The memory-mapped output file gets truncated by the system without the memory-mapping code being made aware of it. When the code subsequently tries to access beyond the actual memory-mapped file size, it generates a bus error signal, which calls terminate().

When running a binary command directly, this will generate an error at the command-line such as: “Bus Error (core dumped)”; which is useful but maybe not verbose enough. However by default glibc writes this message to the current tty terminal rather than stderr. When executed within one of the Python scripts using subprocess, this message is lost.

Theoretically, setting the environment variable LIBC_FATAL_STDERR_ is supposed to instruct the signal handler to instead write the error to stderr; but in my experience this didn’t work.

Therefore, the actual solution is to register our own signal handlers that will be called in the event of a fatal system signal. Apart from customising the error messages, and making sure that they are written to stderr such that they can be piped appropriately, we can also perform limited cleanup in the event of catastrophic command failure: for instance, deleting any temporary image files used in piping.

Once we have confirmed and merged in the code changes here, both the MRtrix3 binaries and scripts should give more meaningful error messages when such a system error is encountered. Note however that this will require re-running the configure script to activate the changes, and is not guaranteed to be supported on all systems.

We are also considering modifying the Python scripts to create the temporary working directory in the current working directory by default, rather than using /tmp/ by default, in the hope that such memory issues are encountered less frequently. It will still be possible to set the temporary directory location manually using the -tempdir option if you know you have enough space in e.g. /tmp, and we’ll probably add a config file entry to control this behaviour as well.

Cheers
Rob

jdtournier · September 6, 2016, 8:17am

Just a word of caution about this, since doing this might have an impact on performance. Image piping used to write temporary images in the current folder, and while this worked fine when the folder was local to the machine, it often slowed things down drastically when operating over networked filesystems, since the kernel would typically ensure the file is fully written before rereading it again in the next command - all over the network. This is rarely an issue with local filesystems since the kernel can make stronger assumptions and cache the data more aggressively in RAM - the data probably never actually get written to disk. Writing to the current folder might alleviate some of these out of RAM issues, but it might also perform really poorly on HPC clusters, since users’ home folders typically reside on networked filesystems…

rsmith · September 6, 2016, 8:52am

Just a word of caution about this, since doing this might have an impact on performance.

I was the one who didn’t want to do this. But ultimately performance lost out against having a significant number of users encountering script failures due to running out of memory, which is much more likely for a full-blown script than a single piped image. The performance issue will need to receive a good-sized recommendation in the documentation with regards to setting the config file entry to redirect it (there will be a separate config file entry to TmpFileDir that will specifically affect the scripts).

jdtournier · September 6, 2016, 10:24am

Sorry, bit confused…

What does ‘this’ refer to here? Did you want temp files in current folder or /tmp?

Not sure I follow: how did performance lose out? I do agree we’re encountering a lot of failures due to limited tmp space, so it’s definitely worth doing something about. We just need to handle this right.

One option we might have with python is to actually check how much space is left on the relevant filesystem. Maybe there’s a cross-platform module for this…? If so, we can check before running and issue a warning about the likely failure and a suggestion as to how to avoid it?

rsmith · September 6, 2016, 11:28am

I was the one who didn’t want to do this.

What does ‘this’ refer to here? Did you want temp files in current folder or /tmp?

I didn’t want to change away from what I had put in place already, which was defaulting to /tmp/. I noticed a definite performance improvement when I made that the default. But some of the newer scripts are bigger than what I had at that time.

But ultimately performance lost out against having a significant number of users encountering script failures due to running out of memory

Not sure I follow: how did performance lose out?

In the discussion of changing the default from /tmp/ to the current working directory, the superior performance of /tmp/ lost against use of the current working directory not resulting in complete failure of scripts.

One option we might have with python is to actually check how much space is left on the relevant filesystem. Maybe there’s a cross-platform module for this…? If so, we can check before running and issue a warning about the likely failure and a suggestion as to how to avoid it?

os.statvfs() is consistent between Python 2 and 3, but is only available on Unix in both cases. Would need to branch for Windows like this.

Trouble is the space required depends on both the script (and potentially even the specific script algorithm being invoked at the command-line) and the input data. So each script (/ algorithm) would need to provide a function to estimate how much space it requires for a given input - which may include e.g. compressed images.

I thought about categorizing the scripts into ‘light’ and ‘heavy’ disk usage, and changing the default temporary directory location based on that; but it’d probably just lead to confusion as to why one script does one thing and one does another.

phmag · September 7, 2016, 12:23am

So we’ve resolved the issue. It turns out that / was full…

ie

df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg_samuraix-lv_root
50574012 47643584 364904 100% /

This was due to me interrupting the process a number of times earlier this week in the attempt to get some batch processing working. In addition to the " mrtrix-tmp-* " files it was creating and (quite rightly) not deleting after an interrupt command, it was also creating the " dwipreproc-tmp- *" files which also weren’t being deleted.

I’m sure you’re aware of these files and it was me inadvertently interrupting the commands which were causing the issue. We’ve deleted all the "dwi-prproc-tmp- folders from the / folder now and freed up ~2G of space.

In short, it was me interrupting the process that ultimately led to the memory issue.

rsmith · September 7, 2016, 1:03am

Good to hear that you’re up and running again! We kind of hijacked the thread with our technical discussion; but anyone with thoughts on how they think this issue could be better managed by the software is welcome to comment.

As an aside:

In addition to the " mrtrix-tmp-* " files it was creating and (quite rightly) not deleting after an interrupt command …

Apart from the more informative error messages, the other benefit we’ll get from defining our own ‘signal handlers’ is that we can actually delete any temporary images used for piping if a command fails or is terminated manually .
Note however that script temporary directories are now retained when a script fails, so that you get a chance to investigate rather than immediately wiping away the evidence. @maxpietsch We may need to look into altering this behaviour to force temporary directory cleanup if it’s a manual user termination…

archithrajan1 · March 28, 2017, 9:49am

Hello experts,
Although only remotely related to this thread,I have also received similar WARNING:[quote=“phmag, post:1, topic:425”]
dwipreproc: [WARNING] eddy has not provided rotated bvecs file; using original gradient table
[/quote]

How big an issue would this be in tckgen stage?I didn’t have reverse Phase encoding direction data and hence used the following options:

 dwipreproc -rpe_none -export_grad_mrtrix Gradtable Test.mif Testpreproc.mif

jdtournier · March 28, 2017, 12:30pm

That would depend on how much motion there was during the scan, and hence how much rotation needs to be corrected for. It can have an effect, but it’s hard to give any indication without looking into the specifics of your cohort and how much head motion is present.

By the way, the reason you get this message is presumably because you have an older installation of FSL that doesn’t include eddy’s bvecs correction.

This is unrelated to the gradient rotation issue. This means eddy will perform the motion and eddy-current correction, but no EPI distortion correction. Again, how big a problem this is depends on the specifics of your acquisition (i.e. how bad the distortions are), where you are going to focus your analysis (i.e. how close to distorted areas), and what kind of analysis you want to do (e.g. you can’t really do ACT without correcting for distortions).

So I don’t think we can give a one-size-fits-all answer to your question about how big an issue it is to run without the DW gradient rotation and EPI distortion correction…