ERRORs from ./run_tests on MRtrix 3.1.5 on Ubuntu 16.04


#1

Dear Experts,

Have just pulled and built the latest MRtrix (3.1.5). Just having upgraded to Ubuntu 16.04, I was forced the build with Eigen 3.2.8 as in this post. Functionalities like mrview seems to work fine.

Here is the **release/config** file
#!/usr/bin/python
#
# autogenerated by MRtrix configure script
#
# configure output:
# 
# MRtrix build type requested: release
# 
# Detecting OS: linux
# Checking for C++11 compliant compiler [g++]: 5.3.1 - tested ok
# Detecting pointer size: 64 bit
# Detecting byte order: little-endian
# Checking for variable-length array support: yes
# Checking for non-POD variable-length array support: yes
# Checking for zlib compression library: 1.2.8
# Checking for Eigen 3 library: 3.2.8
# Checking shared library generation: yes
# Checking for Qt moc: moc (version 4.8.7)
# Checking for Qt qmake: qmake (version 4.8.7)
# Checking for Qt rcc: rcc (version 4.8.7)
# Checking for Qt: 4.8.7


PATH = r'/home/finn/Software/MIRTK/mirtk-1.1/bin:/usr/lib/fsl/5.0:/home/finn/Software/freesurfer/bin:/home/finn/Software/freesurfer/fsfast/bin:/home/finn/Software/freesurfer/tktools:/home/finn/Software/freesurfer/mni/bin:/usr/lib/ants:/home/finn/Software/mrtrix3/scripts:/home/finn/Software/mrtrix3/release/bin:/home/finn/scripts:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/finn/Software/scripts:/home/finn/Software/niftyreg/bin'
obj_suffix = '.o'
exe_suffix = ''
lib_prefix = 'lib'
lib_suffix = '.so'
cpp = [ 'g++', '-c', 'CFLAGS', 'SRC', '-o', 'OBJECT' ]
cpp_flags = [ '-std=c++11', '-pthread', '-fPIC', '-march=native', '-DMRTRIX_WORD64', '-isystem', '/home/finn/Software/mrtrix3/eigen', '-Wall', '-O2', '-DNDEBUG' ]
ld = [ 'g++', 'OBJECTS', 'LDFLAGS', '-o', 'EXECUTABLE' ]
ld_flags = [ '-pthread', '-lz' ]
runpath = '-Wl,-rpath,$ORIGIN/'
ld_enabled = True
ld_lib = [ 'g++', 'OBJECTS', 'LDLIB_FLAGS', '-o', 'LIB' ]
ld_lib_flags = [ '-pthread', '-shared', '-pthread', '-lz' ]
eigen_cflags = [ '-isystem', '/home/finn/Software/mrtrix3/eigen' ]
moc = 'moc'
rcc = 'rcc'
qt_cflags = [ '-m64', '-pipe', '-O2', '-Wall', '-W', '-D_REENTRANT', '-DQT_NO_DEBUG', '-DQT_SVG_LIB', '-DQT_OPENGL_LIB', '-DQT_GUI_LIB', '-DQT_CORE_LIB', '-DQT_SHARED', '-isystem', '/usr/share/qt4/mkspecs/linux-g++-64', '-isystem', '/usr/include/qt4/QtCore', '-isystem', '/usr/include/qt4/QtGui', '-isystem', '/usr/include/qt4/QtOpenGL', '-isystem', '/usr/include/qt4/QtSvg', '-isystem', '/usr/include/qt4', '-isystem', '/usr/X11R6/include' ]
qt_ldflags = [ '-m64', '-Wl,-O1', '-L/usr/lib/x86_64-linux-gnu', '-L/usr/X11R6/lib64', '-lQtSvg', '-lQtOpenGL', '-lQtGui', '-lQtCore', '-lGL', '-lpthread' ]
nogui = False

My problems have come when I run the tests, which gives some “scattered” failures

./run_tests 
logging to "testing.log"
fetching test data... OK
building testing commands... OK
running "5tt2gmwmi"... 1 of 1 passed
running "5tt2vis"... 1 of 1 passed
running "5ttedit"... 2 of 2 passed
running "amp2sh"... 1 of 1 passed
running "dirgen"... 4 of 4 passed
running "dwi2adc"... 1 of 1 passed
running "dwi2fod"... 3 of 3 passed
running "dwi2mask"... 1 of 1 passed
running "dwi2noise"... 1 of 1 passed
running "dwi2tensor"... 8 of 8 passed
running "dwidenoise"... 0 of 6 passed    <-------- ERROR
running "dwiextract"... 2 of 2 passed
running "fixel2sh"... 1 of 1 passed
running "fixel2tsf"... 1 of 1 passed
running "fixelcalc"... 4 of 4 passed
running "fixelthreshold"... 2 of 2 passed
running "fod2fixel"... 3 of 3 passed
running "label2colour"... 2 of 2 passed
running "label2mesh"... 2 of 2 passed
running "labelconvert"... 2 of 2 passed
running "maskfilter"... 9 of 9 passed
running "mesh2pve"... 1 of 1 passed
running "meshconvert"... 10 of 10 passed
running "mrcalc"... 4 of 4 passed
running "mrcat"... 6 of 6 passed
running "mrconvert"... 11 of 11 passed
running "mrcrop"... 2 of 2 passed
running "mrfilter"... 8 of 8 passed
running "mrmath"... 3 of 3 passed
running "mrpad"... 4 of 4 passed
running "mrresize"... 8 of 8 passed
running "mrstats"... 5 of 5 passed
running "mrthreshold"... 10 of 10 passed
running "mrtransform"... 6 of 6 passed
running "peaks2amp"... 1 of 1 passed
running "sh2amp"... 3 of 3 passed
running "sh2peaks"... 1 of 1 passed
running "sh2power"... 1 of 1 passed
running "sh2response"... 1 of 1 passed
running "shbasis"... 4 of 4 passed
running "shconv"... 2 of 2 passed
running "tck2connectome"... 3 of 3 passed
running "tckconvert"... 3 of 4 passed    <-------- ERROR
running "tckgen"... 14 of 14 passed
running "tckmap"... 4 of 4 passed
running "tcknormalise"... 1 of 1 passed
running "tckresample"... 4 of 5 passed    <-------- ERROR
running "tcksample"... 7 of 8 passed    <-------- ERROR
running "tcksift"... 1 of 1 passed
running "tcksift2"... 1 of 1 passed
running "tensor2metric"... 8 of 8 passed
running "transformcalc"... 5 of 5 passed
running "transformconvert"... 4 of 4 passed
running "tsfdivide"... 1 of 1 passed
running "tsfmult"... 1 of 1 passed
running "tsfsmooth"... 0 of 1 passed    <-------- ERROR
running "tsfthreshold"... 2 of 2 passed
running "voxel2fixel"... 1 of 1 passed
running "warpcorrect"... 1 of 1 passed
running "warpinit"... 1 of 1 passed

When inspecting the testing.log, all the ERRORs in dwidenoise, tcksample and tsfsmooth seem related to borderline issues with precision.

One example for dwidenoise.

# command: dwidenoise dwi.mif - | testing_diff_data - dwidenoise/dwi.mif 1e-6 [ ERROR ]
testing_diff_data: [ERROR] images "/tmp/mrtrix-tmp-YJQj8d.mif" and "dwidenoise/dwi.mif" do not match within specified precision of 9.9999999999999995e-07 (144.999 vs 144.999)
testing_diff_data: [ERROR] images "/tmp/mrtrix-tmp-YJQj8d.mif" and "dwidenoise/dwi.mif" do not match within specified precision of 9.9999999999999995e-07 (167.968 vs 167.968)
testing_diff_data: [ERROR] images "/tmp/mrtrix-tmp-YJQj8d.mif" and "dwidenoise/dwi.mif" do not match within specified precision of 9.9999999999999995e-07 (153.085 vs 153.085)
testing_diff_data: [ERROR] images "/tmp/mrtrix-tmp-YJQj8d.mif" and "dwidenoise/dwi.mif" do not match within specified precision of 9.9999999999999995e-07 (144.977 vs 144.978)

and here the two tests for the others

# command: tcksample tracks.tck tcksample/fa.mif tmp.csv -nthreads 0 -stat_tck median -precise -use_tdi_fraction -force && testing_diff_matrix tmp.csv tcksample/tdidiv.csv 1e-5 [ ERROR ]
testing_diff_matrix: [ERROR] matrices "tmp.csv" and "tdidiv.csv do not match within specified precision of 1.0000000000000001e-05 (0.0129299127 vs 0.0129419165)
## ERROR: 1 tests failed for "tcksample"
# command: tsfsmooth afd.tsf -stdev 2 tmp.tsf -force; testing_diff_tsf tmp.tsf tsfsmooth/out.tsf 0 [ ERROR ]
testing_diff_tsf: [ERROR] track scalar files "tmp.tsf" and "tsfsmooth/out.tsf" do not match within specified precision of 0 (0.282232 vs 0.282232)
## ERROR: 1 tests failed for "tsfsmooth"

but the ERRORs from testing tckconvert and tckresample are harder for me to understand

# command: tckconvert tracks.tck -scanner2voxel dwi.mif tmp.vtk -force && diff tmp.vtk tckconvert/out1.vtk [ ERROR ]
## ERROR: 1 tests failed for "tckconvert"

# command: tckresample tracks.tck tmp.tck -step_size 0.9 -force && testing_diff_tck tmp.tck tckresample/stepsize.tck 1e-5 [ ERROR ]
testing_diff_tck: [ERROR] 226 mismatched streamlines - test FAILED
## ERROR: 1 tests failed for "tckresample"

I guess everything is related. What should/could I do?

Cheers,
Finn


#2

Hello Finn,

Thanks for reporting that. There could be multiple reasons causing the loss of precision. Could you please report what the output of cat /proc/cpuinfo is?

Cheers,
Max


#3

Here we go

cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 61
model name	: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
stepping	: 4
microcode	: 0x22
cpu MHz		: 2404.593
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt dtherm ida arat pln pts
bugs		:
bogomips	: 4789.04
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 61
model name	: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
stepping	: 4
microcode	: 0x22
cpu MHz		: 2400.281
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt dtherm ida arat pln pts
bugs		:
bogomips	: 4789.04
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 61
model name	: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
stepping	: 4
microcode	: 0x22
cpu MHz		: 2400.093
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 2
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt dtherm ida arat pln pts
bugs		:
bogomips	: 4789.04
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 61
model name	: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
stepping	: 4
microcode	: 0x22
cpu MHz		: 2536.125
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 2
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt dtherm ida arat pln pts
bugs		:
bogomips	: 4789.04
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

#4

I’ve had test failures like these at various times myself, but haven’t put the time into doing thorough testing across multiple machines to figure out what is acceptable and what is not. I can give a few indicators though:

  • A lot of testing_diff_* calls are using absolute thresholds of 1e-6 (or even I think I saw one with 1e-7). This isn’t ideal: firstly because the accuracy of single precision floating-point operations is approximately 1e-6 (not guaranteed), and secondly because that accuracy is a fractional one, not absolute. So if there’s single precision used anywhere (either in MRtrix3 or an invoked function), or any change in library implementation between systems, that can be enough to throw an error here.

  • The tsfsmooth test is actually applying a threshold of 0. Given that command involves a fair bit of floating-point addition & multiplication, it’s unsurprising that it could get a nonzero difference somewhere.

  • The tckconvert test is using a raw diff call. So if it’s the ASCII vtk format, this could be throwing an error just because the vertex positions are being printed with a different precision - there may be differences between implementations regarding how they interpret the requested number of digits when converting number types to strings, pretty sure I’ve seen this at one point.

  • Initially I thought the tckresample failure might have been a threading problem, but I haven’t actually multi-threaded that command yet (primarily so it preserves the streamline order). So I might have to dig further into what’s going on there.

In general though, these tests were primarily set up to validate the porting of commands to the new syntax that came with tag 0.3.13, and are now used to test for regressions using TravisCI; they haven’t been fine-tuned to accept the variability between platforms whilst flagging minor regressions, and there’s still a lot of commands that don’t have tests. Also for a lot of the failures, you can see that the numerical differences are close to, or even less than, the precision to which those values are reported in the text. So from your point of view, I wouldn’t be concerned by these; but thanks for reporting them, we’ll chase up the slightly odd ones and look into tweaking those thresholds.

Cheers
Rob


#5

Thought that it would be a straight forward thing wrt to floating-point operations. Though slightly strange when all the tests for the new dwidenoise failed. So thanks for explaining. I will not be using tckresample for the moment, try to remember that one if I will be using it in the future.

So on the whole, not concerned given your feedback.

Thanks,
Finn


#6

Yes, the odd failure is nothing to worry about when the difference is that small. Still, it’s very odd that you would see these failures when the tests run fine on all the systems that I use. Your processor isn’t much different from mine, and some of these computations really should be fully deterministic (dwidenoise for example). But it might be that the compiler on your system does something subtly different in the order of execution, or any number of other things pointed out in the article linked to by @maxpietsch. One possibility is that your floating-point unit is somehow set up differently, defaulting to a different rounding mode for example. That would be a bit more worrying. @bjeurissen had a similar problem a long time ago, which we eventually convinced ourselves must have been due to the rounding mode - although we never really got to the bottom of it. There are facilities in libc to check and set the rounding mode, but I’m not sure that they affect all the different possible instructions that the CPU might use - a lot of the computations performed by Eigen are vectorised using different instructions made available via CPU extensions, when those are available, particularly SSE/SSE2/SSE4 and others. I’ve no idea whether the rounding mode used for those instructions can be manipulated with the libc functions, if at all…

If you want to check this aspect of things, you can try adding to cmd/dwidenoise.cpp this line at line 21:

#include <fenv.h>

and at line 232 (before the ThreadedLoop() call):

VAR (fe_getround());

If you run the tests again and check the logs, you should see a line like this:

dwidenoise [cmd/dwidenoise.cpp: 233]: fegetround() = 0

If you don’t get a zero (corresponding FE_TONEAREST - round to nearest), then that would definitely be something to worry about…


#7

Thanks Donald. I am by no means a computer expert. At first, it just didn’t compile/build, but understood that it should be

VAR (fegetround());

So ran the tests, and luckily I got

dwidenoise [cmd/dwidenoise.cpp: 233]: fegetround() = 0


#8

Good to hear. I’m also getting similar issues with precision depending on the exact version of Eigen used… It’s probably using different CPU instructions in different contexts, and that can affect the precision of the results. So we’ll need to relax the tests a little bit…