dwi2response bus error

mvaessen · May 20, 2016, 11:47am

Hello,

I am trying to follow the Structural connectome for Human Connectome Project (HCP) tutorial, but am running into some problems at the dwi2response stage.

The given example with msmt_5tt algorithm results in a “bus error” somewhere down the line.
I have tried the script with the “fa” option as well and this one runs fine.
The “tournier” options also gives a “bus error” somwhere down the line.

I am running mrtrix3 from a VM running ubuntu 16.04. I had some trouble getting it to run at first, but after installing an older version of “eigen”, it runs fine for the most part (found solution somewhere else on this list). My system has an Intel Xeon E5-1650 CPU and 8GB of RAM available in the VM.

Thank for any help!

-Maarten

Output of mstm_5tt algo:

dwi2response: Changing to temporary directory (/tmp/dwi2response-tmp-7N6A84/)
Command: dwi2mask dwi.mif mask.mif  -quiet
Command: dwi2tensor dwi.mif - -mask mask.mif -quiet | tensor2metric - -fa fa.mif -vector vector.mif  -quiet
Command: mrtransform 5tt.mif 5tt_regrid.mif -template fa.mif -interp linear  -quiet
Command: mrconvert 5tt_regrid.mif - -coord 3 0 -axes 0,1,2 -quiet | mrcalc - 0.95 -gt fa.mif 0.2 -lt -mult mask.mif -mult gm_mask.mif  -quiet
Command: mrconvert 5tt_regrid.mif - -coord 3 2 -axes 0,1,2 -quiet | mrcalc - 0.95 -gt mask.mif -mult wm_mask.mif  -quiet
Command: mrconvert 5tt_regrid.mif - -coord 3 3 -axes 0,1,2 -quiet | mrcalc - 0.95 -gt fa.mif 0.2 -lt -mult mask.mif -mult csf_mask.mif  -quiet
dwi2response: Calling dwi2response recursively to select WM single-fibre voxels using 'tournier' algorithm
Command: dwi2response -quiet tournier dwi.mif wm_ss_response.txt -mask wm_mask.mif -voxels wm_sf_mask.mif
Bus error (core dumped)
dwi2response: [ERROR] Command failed: mrcalc iter4_first_peaks.mif -sqrt 1 iter4_second_peaks.mif iter4_first_peaks.mif -div -sub 2 -pow -mult iter4_CF.mif  -quiet
dwi2response: [ERROR] Command failed: dwi2response -quiet tournier dwi.mif wm_ss_response.txt -mask wm_mask.mif -voxels wm_sf_mask.mif

Output of tournier algo (last part):

Command: mrthreshold iter7_CF.mif -top 3000 - -quiet | maskfilter iter7_SF.mif dilate iter7_SF_dilated.mif -npass 1  -quiet
Command: dwi2fod dwi.mif iter7_RF.txt iter8_FOD.mif -mask iter7_SF_dilated.mif  -quiet
Bus error (core dumped)
dwi2response: [ERROR] Command failed: dwi2fod dwi.mif iter7_RF.txt iter8_FOD.mif -mask iter7_SF_dilated.mif  -quiet
dwi2response: Changing back to original directory (/mnt/hgfs/D/996782/T1w/Diffusion)
dwi2response: Deleting temporary directory /tmp/dwi2response-tmp-BD6TD0/

jdtournier · May 20, 2016, 12:11pm

Darn, bus errors are really obscure… my guess is it’ll be due to not enough RAM for your VM. All the intermediate files are being written to /tmp, which will reside in RAM on a modern distribution. Maybe you can try running the script with a proper drive-backed temp folder, using the -tempdir option (I think). If that works, it would signal that you do need more RAM for this work - the HCP data are huge…

Romain_Valabregue · May 20, 2016, 12:24pm

Hello

I run into the same error recently on a specific ubuntu machine (whereas the same command was running on other similar machine even on machine with less RAM)…

I just did an update of the system (apt-get upgrade) and this solve the problem. So I do not understand why but it is worth to try.

cheers

Romain

mvaessen · May 20, 2016, 1:23pm

Hi,

Thanks for the quick replies!

I tried several things:

updating the system (was almost fully updated, although there was one python module update which might be relevant)
running with -tempdir located at the same location as the data (a mapped windows drive)
copying the HCP data to a local VM drive and running with -tempdir on the local VM drive
increasing the RAM to 16GB. I watched the mem usage during the process, but it never seems to exeed 1GB.

Running it from the local VM seemed to go well for a bit longer, as now I get the bus error a bit later.

dwi2response: Calling dwi2response recursively to select WM single-fibre voxels using 'tournier' algorithm
Command: dwi2response -quiet tournier dwi.mif wm_ss_response.txt -mask wm_mask.mif -voxels wm_sf_mask.mif
Bus error (core dumped)
dwi2response: [ERROR] Command failed: dwi2fod dwi.mif iter7_RF.txt iter8_FOD.mif -mask iter7_SF_dilated.mif  -quiet
dwi2response: [ERROR] Command failed: dwi2response -quiet tournier dwi.mif wm_ss_response.txt -mask wm_mask.mif -voxels wm_sf_mask.mif

Just wondering what happens to the intermediate files from this step (recursive iterations with the tournier algo), I don’t see them in the temp dir?

Any suggestions are welcome.

Thx,

-Maarten

Paolo_Avesani · May 20, 2016, 1:47pm

I had the same problem and I have discovered the reason of the “Bus error” message.
It is related to a possible shortage of “temp” directory where some commands like mrconvert relies upon. If there is no more space on the /tmp the program stops with
the “Bus error” message. This is the reason of “randomness” of the error: it depends from the file size and from the /tmp free available space.

The management of “temp” folder is twofold in the latest version of Mrtrix: one by the commands implemented in c++ the other by the python script. Some python scripts allow the inline specification of the temp location (see the option -tempdir). The system would offer also the alternative to define the location in the .mrtrix.conf file with the property:
TmpFileDir: /your/path
but it is not clear the policy where the program look for the conf file.

Despite the option for temporary folder the issue is that the python scripts may call other commands, like mrconvert, that don’t inherit the specification for the tmp folder. Mrconvert address the temporary directory using the operating system policy: ‘/tmp’ for Linux and ‘.’ for Windows. There is no way to modify at run time this directive. You have to change this in the source code and then recompile.

I hope it may help.

rsmith · May 20, 2016, 2:02pm

The -nocleanup option is not currently ‘propagated’ to the dwi2response tournier call within dwi2response msmt_5tt; it will generate its own temporary directory, and delete it upon completion. I suppose it should be possible, in cases where -nocleanup is provided to dwi2response msmt_5tt, to propagate the temporary directory path and the -nocleanup flag in to the recursive call. But for now, running the tournier method is probably simpler in terms of chasing down the source of the problem.

It is related to a possible shortage of “temp” directory where some commands like mrconvert relies upon.

That’s a good point actually: the -tempdir option only specifies the location where the script will create its own temporary working directory; it does not influence the location used by MRtrix3 commands when creating temporary files for e.g. piping. I think I might have hit this snag before myself due to an explicit RAMFS being used for /tmp/.

Rather than using the -tempdir option, try setting the TmpFileDir key within your MRtrix config file to something that’s guaranteed to have more storage space.

mvaessen · May 20, 2016, 3:00pm

Thanks all, I was able to solve the problem!
Athough just setting the TmpFileDir to something where there is space was not enough, as this only influenses where the piping intermediate files are stored apperantly.
The subcommand from the script with the recursive tournier algo makes a temp dir in the system default temp dir, and this script produces A LOT of data. The individual iterations are not cleared but stored until it’s finished. I had to symlink my /tmp to something else and then it worked just fine!

Perhaps it could be nice to pass on the -tempdir setting to the iterative part for msmt option?

-M

jdtournier · May 20, 2016, 3:04pm

OK, sounds like we really need to forward that tempdir option on in recursive calls. But I think we also need to remove intermediate files as soon as they’re no longer needed. I also just noticed that all files produced for every iteration are stored until completion (and beyond if -nocleanup is supplied), and for HCP data that would equate to a massive amount of storage… So we should only keep data from the previous iteration, unless the -nocleanup option has been supplied. @rsmith, what do you reckon?

rsmith · May 23, 2016, 3:33am

Yep, both of those should be possible. I’ll get on to that.

nadluru · June 1, 2016, 4:01pm

Hi Rob,

I am facing an issue where dwi2fod is failing on some 5th iteration within the dwi2response script. I did notice that dwiextract within dwi2response etc. seem to be using /tmp/dwi* despite me specifying -tempdir and my /tmp currently has only about 14 GB on it.

Has the -tempdir option forwarding been implemented yet?

Thanks so much!
Nagesh

jdtournier · June 1, 2016, 6:52pm

Yes, but it hasn’t been merged to master yet - the changes are sitting on this pull request. The reason for the delay is that there’s a few other changes in there that break backward-compatibility somewhat, so the plan is to merge this (along with a bunch of other stuff) as part of a version number increase, and publicise the main changes on the forum at that point. Hopefully this’ll happen at some point next week, assuming everything goes to plan…

nadluru · June 1, 2016, 6:56pm

Ah I see. Thanks so much for the update Donald! I will wait for the update next week, fingers crossed

Chiara_Maffei1 · May 8, 2018, 3:25pm

Hi Mrtrixers,

I am running into the same issue with dhollander dwi2response. Specifying -tempdir is not enough for the script as it is still trying to copy data into the system /tmp folder. I was wondering whether you modified this.

Thank you !

Chiara

ThijsDhollander · May 9, 2018, 1:33am

Hey @Chiara_Maffei1,

The temp folders for the scripts all end up (potentially nested in) the folder specified via -tempdir.
I believe the temporary data (images) for things like piping between MRtrix binaries (which is used extensively in some scripts; notably dwi2response) by default end up in the current working directory. I’m not sure where that would exactly be in case of this happening within a script (either the directory from where the script is run, or maybe within the temporary directory generated by the script).
In any case, there are also config file options to set both of these explicitly; see http://mrtrix.readthedocs.io/en/latest/reference/config_file_options.html and search for “temp” within that page to find anything relevant.

Maybe @jdtournier or @rsmith can comment further on what happens (by default) with all the different cases of temporary files generated by scripts and/or piping between binaries?

jdtournier · May 10, 2018, 10:53am

Yes, I was going to suggest it would most likely be related to the piping of temporary images. You’ll note that the relevant details are described in the implementation section. By default, these will be written to /tmp on any OS other than Windows. Also check whether that location contains obsolete temporary files left behind by failed MRtrix3 commands:

ls /tmp/mtrix-tmp-*`

and remove them if you find any (assuming there are no jobs currently running…). This might explain why your commands are failing in the first place (no space left on /tmp). If /tmp is genuinely too small, try setting the TmpFileDir option in the config file to e.g. . (current folder), as suggested by @ThijsDhollander (instructions here).

I do however note that the documentation for the TmpFileDir config option doesn’t match the current implementation – but that’ll be fixed shortly.

Chiara_Maffei1 · May 10, 2018, 3:09pm

Hi Donald and Thijs,

I already tried to set the tmp in the user config file to the current folder but that did not fix the problem.

I was just wondering if you implemented a way of explicitly set the /tmp folder when running the script.

Thank you for your suggestions,

Chiara

jdtournier · May 10, 2018, 4:17pm

Ok, so this might be some other problem. Can you post the exact command that you used (ideally with the -debug option), along with the full command-line output produced? That might us a bit more of a clue as to where the failure is occurring exactly.

ThijsDhollander · May 11, 2018, 1:25am

Aha, that explains a lot. I must admit I was also slightly confused when spotting that one; hence my suggestion that these by default would end up in the working directory isn’t correct indeed.

akubicki · August 29, 2018, 9:26pm

Hi, have there been any updates since the last post?

I am running into the same issues when running dwi2response on our compute cluster. Our /tmp directory is only 2GB, so I think it’s getting filled up and the script crashes.

I tried making a configuration file with TmpFileDir pointing to a 10TB volume in my home directory (~/.) but I’m not clear on how to tell mrtrix to use this file, is it done automatically?, or do I need to specify it explicitly when running dwi2response? Or do I need to redo the ./configure script? The documentation says that mrtrix will look in ~/. for this file (mrtrix.conf), but it still seems to be writing to the default /tmp even after making this file. I tried the -tempdir flag also but that didn’t help.

From talking to our IT department, it definitely sounds possible to symlink /tmp to somewhere else, I just thought I would try to get this working first.

Thanks in advance!
Antoni

jdtournier · August 31, 2018, 7:36am

Hi Antoni,

There’s two options that influence temporary files:

TmpFileDir controls the location of temporary files used when piping images between commands. By default, this is set to /tmp, so if you find you’re running out of space on that device, then this is the likely culprit.
ScriptTmpDir controls the location of temporary files created by MRtrix3 scripts. By default this is the current folder, so is unlikely to cause issues on /tmp.

If you’ve set TmpFileDir already in your config file, and yet you’re still encountering errors, I recommend you try running a command like mrinfo -debug x: this should report which locations it’s looking for, and whether it loaded the file if found. On my system, this gives:

$ mrinfo -debug x
mrinfo: [DEBUG] No config file found at "/etc/mrtrix.conf"
mrinfo: [INFO] reading config file "/home/jdt13/.mrtrix.conf"...
mrinfo: [DEBUG] reading key/value file "/home/jdt13/.mrtrix.conf"...
mrinfo: [INFO] opening image "x"...
mrinfo: [ERROR] unknown format for image "x"
mrinfo: [ERROR] error opening image "x"

So it found no system-wide config in /etc/mrtrix.conf, but it did find my user config.

If the file was being read properly, try running a command like:

$ mrconvert test.mif -
mrconvert: [100%] copying from "test.mif" to "/tmp/mrtrix-tmp-tOULV0.mif"
/tmp/mrtrix-tmp-tOULV0.mif

This actually prints out the name of the temporary file on the terminal, so you can verify whether the option did anything (you’ll want to delete that file afterwards, by the way).

If everything is working as expected, but you’re still having issues, then we’d need a lot more information to get to the bottom of it…