Tck2connectome: different results between mac and linux

mbergamino · January 7, 2022, 5:07pm

Dear MRtrix community,
I am writing here because I found a problem in the output from tck2connectome.
For my work, I have always used MAC computers. Last week I installed UBUNTU v 20.04 and I tried my pre-processing script with MRtrix3.
The version of MRtrix3 is the same in these two computers:

e.g. from UBUNTU:

== tck2connectome 3.0.3 ==
64 bit release version, built Jul 19 2021, using Eigen 3.3.7
Author(s): Robert E. Smith (robert.smith@florey.edu.au)
Copyright (c) 2008-2021 the MRtrix3 contributors.

This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.

Covered Software is provided under this License on an "as is"
basis, without warranty of any kind, either expressed, implied, or
statutory, including, without limitation, warranties that the
Covered Software is free of defects, merchantable, fit for a
particular purpose or non-infringing.
See the Mozilla Public License v. 2.0 for more details.

For more details, see http://www.mrtrix.org/.

When I ran tck2connectome by using the same files (same tracks, same atlas) I found the outputs completely different.
I know that there are differences between versions (Tck2connectome different outputs), but I cannot understand why I have this big difference from the same version and different OS.

Thanks,
Maurizio

orencivier · January 8, 2022, 4:26am

Dear @mbergamino

Although things should work similarly on different operating systems, there are too many factors involved. Usually there is something different in how you set up the environment, which can be tracked down and hopefully corrected, but in some cases, the underlying libraries simply work a bit differently on different operating systems (e.g., different numerical precision), which is not under our control really.

My best recommendation for you is to put all of your pipeline inside a container, and then reproducibility is guaranteed. If you’re not familiar with containers, or if you wish a simpler solution, I’d recommend on the NeuroDesk neuroimaging analysis virtual desktop that I co-develop. It is basically a completely reproducible environment (which can be run on any operating system), and it includes MRtrix: https://neurodesk.github.io/ (to see what MRtrix versions are available in NeuroDesk, check Applications | NeuroDesk).

If you use NeuroDesk, you can keep your existing scripts (even if they call to other packages in addition to MRtrix), but under the hood, everything will run within containers that guarantee 100% reproducible results across operating systems. I’m contributing to both MRtrix and Neurodesk, so you’ll welcome to post here any questions you might have on this solution.

All the best,
Oren

jdtournier · January 8, 2022, 3:29pm

As @orencivier mentioned, it’s not always trivial to guarantee consistent behaviour across platforms. Version changes in the libraries we use can also introduce some minor differences, but that’s generally not a problem. We rely on very few external dependencies, and I don’t expect these to cause any trouble.

That said, we do run checks to ensure results are consistent to numerical precision. I’m surprised there would be differences for that command in particular, given that all tests are required to pass on all platforms before we merge - and that includes 3 tests for the tck2connectome command, checking that results match exactly with those expected for all 3 platforms we support (Linux, macOS, Windows).

With that in mind, I’m surprised to hear that you would end up with different results – let alone very large discrepancies. Can you be more specific as to what the differences are, how large the errors are relative to the expected values, and what the commands that you used were exactly?

Can you also state what hardware you’re running this on? We have definitely had issues with the newer ARM-based mac M1 range, though they’ve all been related to OpenGL rather than numerical accuracy…

mbergamino · January 8, 2022, 6:03pm

Hi,
Thank you so much for your answers.
I can understand that some small differences can exist between different OS versions, however, here I have very big differences.

The ubuntu version is:

Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.3 LTS
Release:	20.04

The MAC version: (Mojave)

ProductName:	Mac OS X
ProductVersion:	10.14.6
BuildVersion:	18G103

I tried to upload the matrices here, but I have some difficulties. I have inserted only some parts of the matrices (12 rows X 12 columns):

command in ubuntu:
# tck2connectome tracks.tck NODES.mif connectome_original_symmetric.csv -out_assignments assignments_symmetric.txt -symmetric -zero_diagonal -force (version=3.0.3)


0	218	1621	208	2	0	3652	128	4	0	1763
218	0	50	2776	0	1	29	4270	0	5	4
1621	50	0	393	573	15	7903	357	69	24	379
208	2776	393	0	24	298	368	10086	10	365	41
2	0	573	24	0	17	377	7	920	14	3
0	1	15	298	17	0	10	80	4	1013	0
3652	29	7903	368	377	10	0	388	503	19	1000
128	4270	357	10086	7	80	388	0	7	889	42
4	0	69	10	920	4	503	7	0	8	1
0	5	24	365	14	1013	19	889	8	0	0
1763	4	379	41	3	0	1000	42	1	0	0

command in MAC:
tck2connectome tracks.tck NODES.mif connectome_original_symmetric.csv -out_assignments assignments_symmetric.txt -symmetric -zero_diagonal -force (version=3.0.3)


0	113	834	123	2	0	1941	71	2	0	939
113	0	20	1475	0	0	14	2273	0	4	3
834	20	0	194	293	8	4288	197	38	18	191
123	1475	194	0	14	162	204	5394	6	192	24
2	0	293	14	0	8	194	2	512	7	0
0	0	8	162	8	0	3	44	2	540	0
1941	14	4288	204	194	3	0	200	282	15	520
71	2273	197	5394	2	44	200	0	3	461	22
2	0	38	6	512	2	282	3	0	4	1
0	4	18	192	7	540	15	461	4	0	0
939	3	191	24	0	0	520	22	1	0	0

As atlas file, I tried to use both nii.gz and .mif (here NODES.mif) but the results are the same (same differences between the matrices).

Maurizio

jdtournier · January 13, 2022, 11:43am

Sorry about the slow response – trying to get all my teaching in order for next week…

Looking at your results, it’s clear this is not a numerical precision issue – we did see some differences introduced in the past due to the use of different rounding modes on the CPU floating-point unit, but that’s clearly not it.

What’s interesting here is that most values in the macOS results seem to be consistently just over half those in the Linux results. What I’m thinking is that for some reason, the tracks.tck file may have got corrupted / truncated on the mac. Is the output of tckinfo -count tracks.tck identical on both platforms?

mbergamino · January 17, 2022, 10:35pm

Hi,
sorry for my late reply, but I wanted to try several options before to continue the discussion.
I tried the same command on Mac High Sierra and (I do not understand why…) I had the same matrix that I had from UBUNTU. The matrix with the different values was generated by a laptop Mac 15.1 with Catalina. Really, I do not have any answer for that.
However, I run:
tckinfo -count tracks.tck
and I have:

***********************************
  Tracks file: "tracks.tck"
    act:                  5tt_to_FA.nii.gz
    backtrack:            1
    count:                2000000
    downsample_factor:    3
    fod_power:            0.25
    init_threshold:       0.0599999987
    lmax:                 8
    max_angle:            45
    max_dist:             150
    max_num_seeds:        2000000000
    max_num_tracks:       2000000
    max_seed_attempts:    1000
    max_trials:           1000
    method:               iFOD2
    min_dist:             2.5
    mrtrix_version:       unknown
    output_step_size:     0.625
    rk4:                  0
    samples_per_step:     4
    sh_precomputed:       1
    source:               wmfod_norm.mif
    step_size:            0.625
    stop_on_all_include:  0
    threshold:            0.06
    timestamp:            1641421735.7773880959
    total_count:          3091897
    unidirectional:       0
    ROI:                  mask data_bias_final_nodif_brain_mask.nii.gz
    ROI:                  mask data_bias_final_nodif_brain_mask.nii.gz
    ROI:                  seed gmwmSeed_coreg.mif
tckinfo: [done] counting tracks in file
actual count in file: 2000000
***********************************

for both systems and it is logic, because I used the same tractography.
Honestly, if someone can help me to understand the differences between Catalina and High Sierra/Ibuntu will be fantastic.

Thanks
Maurizio

rsmith · February 16, 2022, 2:17am

There’s no way a difference of that magnitude could be present due only to execution on different OS’s that would have remained undetected to this point. A truncation of the track file is the most likely culprit, but in general you want to do as much testing as you can to make sure that you’re performing a fair comparison. Personally, I would:

Transfer files from both machines into a common location and execute the diff command to make sure that the files that you are using on the two systems are indeed precisely identical.
Run tck2connectome with the -keep_unassigned option and without the -symmetric and -zero_diagonal options. This will yield the unmodified matrix, on which you can e.g. calculate the sum across all rows / columns to see how many streamlines were read (since in this use case all read streamlines are assigned to the matrix, even if they’re not successfully assigned to a node). I’m expecting that on the MAC the matrix will contain less streamlines than what are expected to be present in the input track file.

We can discuss further diagnosis strategies beyond that point once both of these have been done.

Cheers
Rob