Trouble with MRtrix3 and MRtrix2

MattRowe · March 22, 2017, 11:43am

Hello

I am new to MRtrix3, but have used MRtrix2 in the past. I’m coming back to neuro recently after a break.

Currently, I am having a lot of problems getting MRtrix3 running. I have downloaded and compiled MRtrix3 successfully.

I have checked some commands, mrview, mrconvert, these work fine with default settings

I use mrconvert with -fslgrad bvec bval option to convert to .mif format, then I use dwi2response with tournier or fa options to get response, I cannot get past this point, the dwi2response script gets stuck on the first command which is something like “mrconvert … - stride 0,0,0,1 | dwiextract /tmp-dwi2response…/dwi.mif”. This command seems to just take forever, I have left it overnight and it has not completed. I can find the folder in the /tmp/ directory and the dwi.mif file.

The data is multishell 30xB0, 200xB1000, 480x2000, 640xB3000, 1.5Gb compressed ~2.9Gb uncompressed in mif format from the ISMRM TRACED challenge. It is a big dataset but I would expect this to be typical for multishell data.

I’ve also tried extracting B2000 and 3000 shells and running same commands, I get the same thing, left overnight and no progress. I’ve tried inputing in gz format, converting to mif format or inputing as raw nifti, nothing changes.

Is this dataset only applicable for a multishell multi tissue workflow, it seems that it gets stuck at a very early stage, just parsing the data?

Can you advise please.

Many thanks

Matt

jdtournier · March 22, 2017, 12:48pm

OK, I appreciate this may have tainted your first impression…

Is there any indication of what is going on? Is the system stuck at full CPU usage, or is the disk constantly churning? My first guess would be that your system is low on RAM for these kinds of data: for this command to succeed, you’d need at least 2×2.9Gb of free memory: one to uncompress the image into RAM, the second to hold the temporary output. The latter doesn’t necessarily need to be in live RAM if /tmp is an on-disk filesystem, or if your system has enough swap space, but if there isn’t enough live RAM, the need to constantly swap data between RAM and spinning disks will make everything slow to a crawl. So if your system doesn’t have at least 6Gb of free RAM, that would be the obvious explanation.

You might be able to get around this to some extent by manually uncompressing the data first (e.g. using gunzip), since MRtrix3 will then access the data via memory-mapping (as it does for MRtrix 0.2), which doesn’t require an explicit RAM allocation as such. But it will still be slow for the same reason: the system can’t hold all of the data needed in RAM concurrently, so it will need to swap bits in & out during processing.

If this isn’t the issue, then I’m a bit stumped. Maybe you tell us the specifics of your system (hardware, OS, and specific versions of dependencies) - the contents of your release/config file will already have a lot of that information in it. Then run the commands with the -debug option and post the full output, and tell us what the system is doing when it stalls (is the hard drive light on constantly, can you hear the drive heads seeking constantly, is the CPU at max usage, is the system responsive, etc). Might also be worth copy/pasting the problematic command and running it directly at the terminal, see if that works as expected.

maxpietsch · March 22, 2017, 12:57pm

I have similar issues with HCP data on my machine if I try to change the strides to volume contiguous. I have 10GB RAM free during conversion.


mrinfo ../tmp/data.nii.gz 
************************************************
Image:               "../tmp/data.nii.gz"
************************************************
  Dimensions:        145 x 174 x 145 x 288
  Voxel size:        1.25 x 1.25 x 1.25 x 1
  Data strides:      [ -1 2 3 4 ]
  Format:            NIfTI-1.1 (GZip compressed)
  Data type:         32 bit float (little endian)
  Intensity scaling: offset = 0, multiplier = 1
  Transform:                    1           0           0         -90
                               -0           1           0        -126
                               -0           0           1         -72
  comments:          FSL5.0

mrconvert ../tmp/data.nii.gz  data.mif
mrconvert: [100%] uncompressing image "../tmp/data.nii.gz"
mrconvert: [ 50%] copying from "../tmp/data.nii.gz" to "data.mif"..


mrconvert ../tmp/data.nii.gz -stride 0,0,0,1 0001.mif
mrconvert: [100%] uncompressing image "../tmp/data.nii.gz"
mrconvert: [  0%] copying from "../tmp/data.nii.gz" to "0001.mif"... # VERY slow and system unresponsive


mrconvert data.mif  -stride 0,0,0,1 0001.mif 
mrconvert: [  0%] copying from "data.mif" to "0001.mif"... # VERY slow and system unresponsive

My workaround was to convert the files on another machine where stride conversion runs smoothly.

jdtournier · March 22, 2017, 1:05pm

so this is specific to your system…? Even though you have enough RAM? Can you post the output of mount so we can have a look at the filesystem used and its settings, both on your system and the one where this runs smoothly? I have a feeling this may have something to with it.

The other potential culprit is the kernel’s handling of dirty pages - might be worth looking at that too…

maxpietsch · March 22, 2017, 1:13pm

We had a look at it a while ago and I remember you saying something about dirty pages and cache misses but my memory is a bit foggy… It takes some time for the system to become fully responsive again after aborting mrtransform on my machine.

my machine:

mount
/dev/mapper/systemvg-rootlv on / type ext4 (rw,noatime,nodiratime,discard,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/cgroup type tmpfs (rw)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
none on /sys/fs/pstore type pstore (rw)
/dev/mapper/home-crypt on /home type ext4 (rw,noatime,nodiratime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)
rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw)
systemd on /sys/fs/cgroup/systemd type cgroup (rw,noexec,nosuid,nodev,none,name=systemd)
gvfsd-fuse on /run/user/9074/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,user=mp14)
perinatal-nas01:/perinatal-projectdata on /projects/perinatal type nfs4 (rw,proto=tcp,port=2049,sloppy,addr=159.92.151.44,clientaddr=159.92.151.97)
/dev/mapper/luks-5c673ed4-ba64-40a6-94e3-74240c294f75 on /media/mp14/ext type ext4 (rw,nosuid,nodev,uhelper=udisks2)

cat /proc/vmstat | egrep "dirty|writeback" ## during mrconvert
nr_dirty 184362
nr_writeback 327420
nr_writeback_temp 0
nr_dirty_threshold 653073
nr_dirty_background_threshold 326536

cat /proc/vmstat | egrep "dirty|writeback" ### a minute after aborting
nr_dirty 52
nr_writeback 0
nr_writeback_temp 0
nr_dirty_threshold 681880
nr_dirty_background_threshold 340940

The machine where it works is beastie01.

MattRowe · March 22, 2017, 1:26pm

Hi, thanks for your quick response Donald and Max

For me, I am running Ubuntu 16.04, quad core i5 Intel® Core™ i5-6400 CPU @ 2.70GHz × 4

15.6 GHz RAM 2 Tb HDD

Here is the config file in release and then here is a picture of the system monitor just after launch of “dwi2response tournier dwi.mif response.txt”:

There is one other process running in this particular case, a download, but the first steps of MRtrix dwi2response kick in at about 40 seconds on the timeline.

The RAM nor all the CPUs seem to ever get saturated, the memory useage seems to go in steps which decay over a few minutes, max memory usage during dwi2response in the first few minutes seems to get to about 4.7 Gb

I will do some longer term monitoring when I get a chance

Many thanks

Matt

MattRowe · March 22, 2017, 1:39pm

And also, to add a couple more details, I also find as Max does that it freezes my system up somewhat, other windows become unresponsive. An additional detail, we use full disk encryption here for data security reasons, could this cause a problem:

For me also, the output of mount:

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=8171592k,nr_inodes=2042898,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=1638792k,mode=755)
/dev/mapper/ubuntu–vg-root on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=26,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=13492)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
/dev/sda2 on /boot type ext2 (rw,relatime,block_validity,barrier,user_xattr,acl,stripe=4)
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=1638792k,mode=700,uid=1000,gid=1000)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
/dev/mapper/ubuntu–vg-root on /var/lib/lxd/shmounts type ext4 (rw,relatime,errors=remount-ro,data=ordered)

maxpietsch · March 22, 2017, 1:42pm

It is IO related as my CPU is nearly idle but my system load is >20 (I have 4 cores + 4HT).

You could try increasing your dirty page thresholds by adding

vm.dirty_background_ratio = 5
vm.dirty_ratio = 80

to /etc/sysctl.conf followed by sudo sysctl -p

This seems to make it a bit faster on my machine but not much. Also, it might help to use single threaded conversion: mrconvert data.mif -stride 0,0,0,1 0001.mif -nthreads 0. With 0 threads I manage to convert 6% of the HCP file in 2 minutes.

@jdtournier Aborting and rerunning mrconvert with the same output using -force, makes it quickly catch up to the point where I aborted.

EDIT:

Performance seems to decrease over time. -nthr 0 after 64minutes: 56% converted.

maxpietsch · March 22, 2017, 3:28pm

Just tried:

vm.dirty_background_ratio = 60
vm.dirty_ratio = 80

aaand:

time mrconvert ../tmp/data.nii.gz -stride 0,0,0,1 0001_0.mif
mrconvert: [100%] uncompressing image "../tmp/data.nii.gz"
mrconvert: [100%] copying from "../tmp/data.nii.gz" to "0001_0.mif"

real	0m39.572s

cat /proc/vmstat | egrep "dirty|writeback"
nr_dirty 201594
nr_writeback 612374
nr_writeback_temp 0
nr_dirty_threshold 3000027
nr_dirty_background_threshold 225002

Quick

rsmith · March 23, 2017, 12:17am

Just a friendly reminder that once we’ve settled on the specifics of a solution, this should be added to the documentation troubleshooting page.

jdtournier · March 23, 2017, 8:44am

Good point. But before we close this off, I was wondering whether it might be worth adding a config file option to disable the use of memory-mapping on output - it’s just a trivial two-line change to force the use of delayed write-back regardless of the filesystem type (this was actually suggested by @rsmith a while back). That might be a more appropriate setting for users who can’t modify the kernel’s dirty page handling (requires admin rights), and where the amount of RAM isn’t a concern. Would that be useful?

This is in fact how things used to work (at least for a while), until users reported issues with RAM utilisation on memory-constrained systems. The problem with forcing the use of delayed write-back is that it will force an explicit RAM allocation, so can fail outright in cases that might otherwise have been OK to run. With memory-mapping, we rely on the kernel’s virtual memory management to handle the data, which allows it to run using less hardware RAM - even though, as you noticed, it will definitely perform a lot better when there is enough RAM to run without swapping data around…

Lots of discussion around this issue here and here, if anyone is interested…

MattRowe · March 29, 2017, 9:43am

Hi Donald/Max

Thank you both for your help, with the configuration adjustments Max suggested, MRTrix 3 is now working very nicely, the processing is pretty quick now.

Many thanks

Matt

maxpietsch · March 29, 2017, 10:45am

Thanks Matt for getting back to us. Pull request for changed troubleshooting page is here.

MattRowe · April 7, 2017, 6:19am

Just a quick note on this.

I have found recently that the vm.dirty_background_ratio and vm.dirty_ratio parameters were at some point set back to default and caused the same original performance issues.

Not sure what prompted this, I haven’t done any reinstallation or major system reconfiguration. I fairly rarely reboot my machine, mainly I sleep it, so it could be on reboot, in which case maybe some configuration is needed to ensure these parameters are always as expected, I will look into it.

Might be worth mentioning on the troubleshooting page

Matt

maxpietsch · April 7, 2017, 9:57am

Might be this bug in ubuntu or if you use a laptop it might be something like /usr/lib/pm-utils/power.d/laptop-mode overwriting the settings at boot.