Problem to extract subset of fibers with tckedit

wamigy · February 28, 2017, 10:15am

Hello all,

I was playing around with tckedit to reduce the size of the tck files and I noticed a strange behaviour, probably something I’m not sure to understand.

I generated a 2M fibers tck with the usual command for HCP (without ACT):
tckgen WM_FODs.mif tracs_2000000.tck -seed_dynamic WM_FODs.mif -maxlength 250 -number 2000000 -cutoff 0.06

Then, obtained two 1M packets:
tckedit tracs_2000000.tck tracs_part1.tck -number 1000000
tckedit tracs_2000000.tck tracs_part2.tck -number 1000000 -skip 1000000

Is it correct to assume that tracs_part1.tck will contain the first 1M fibers and tracs_part2.tck the next 1M fibers ? I supposed that this is what these files contained, so I checked it with matlab:
> t1 = read_mrtrix_tracks(‘tracs_part1.tck’);

t2 = read_mrtrix_tracks('tracs_part2.tck');
tall = read_mrtrix_tracks('tracs_2000000.tck');
n_fib_all = length(tall.data);
n1 = length(t1.data);
n2 = length(t2.data);
diff_all = zeros(n_fib_all, 3);
for ii = 1 : length(tall.data)
    if ii <= n1
        diff_all( ii,: ) = max(abs(t1.data{ii} - tall.data{ii}));
    else
        diff_all( ii,: ) = max(abs(t2.data{ii-n1} - tall.data{ii}));
    end
end

I therefore expected diff_all to be nearly zeros, allowing rounding errors. However, this piece of matlab code produced an error before its termination complaining that, at ii = 322689, t1.data{ii} did not have the same number of elements than tall.data{ii}.

I then checked of many times this inconsistency happened and there are 1148 occurrences for the 1st packet.

Is there something I do wrong ?

Thank you !

jdtournier · February 28, 2017, 10:39am

I have a feeling this might be because tckedit is multi-threaded, so won’t guarantee ordering is preserved. Can you run all this again with the -nthread 0 option to tckedit to disable multi-threading, see if that resolves the issue?

Assuming that was the problem, this might be something that we’d need to do something about. At the very least, document the fact that ordering is only guaranteed with multi-threading disabled. The reason we typically don’t worry about ordering is that in most cases, tracking is random, so there is no particular order anyway. But this issue might also suggest that invoking two separate calls to tckedit like this might not split the file perfectly, in that the streamlines near the split may end up in either batch, depending on which thread happens to get there first. I’m not sure whether this does happen, but it’s certainly something to verify. This would also be relevant for this recent discussion.

wamigy · February 28, 2017, 10:51am

Thank you for your quick answer.

I can confirm that using -nthread 0 fixed the problem, great!

jdtournier · February 28, 2017, 10:59am

Well, maybe not so great… I think this might need to be fixed properly to avoid further issues of this nature - particularly if people do rely on this to split their input tractograms…

@rsmith, any thoughts…?

wamigy · March 1, 2017, 10:00am

Hello again,

I just have a quick question: I noticed that tcksample also has the -nthread option. Do you think the order of the output sampling may also be changed ?

Thanks !

jdtournier · March 1, 2017, 10:12am

Good point. Yes, it is multi-threaded (code here), so could also be affected. Note that the presence of the -nthread option on the help page is not necessarily an indication that the command will use multi-threading, it’s a standard option shared across all commands, so included by default on all help pages, in case any parts of the command do make use of it.

I’ll wait for @rsmith to comment on whether this is indeed a problem. I remember some discussion we had some time ago about keeping track of the order of the incoming streamlines to preserve it in the output, I can’t remember whether anything was done about it - I’m assuming not based on a quick glance through the code, maybe now is the time…

rsmith · March 1, 2017, 11:58am

Yes, tckedit will only exactly preserve track order if -nthreads 0 is used. This would be a useful note to add to the command’s documentation.

tcksample is thread-safe since most of its applications require order-coherence with the input tracks, unlike tckedit where only some applications require it. The clue is the line after @jdtournier’s highlight: When a single scalar statistic value is generated per streamline, these are stored in a vector of data with ordering according to the incoming streamline indices (as opposed to processing order), and only written to the output file once all streamlines have been processed.

tckresample is not multi-threaded for this very reason; but probably a better solution would be to multi-thread it and add the same note in the command documentation as what tckedit requires.

Longer-term, yes I would like to have an order-preserving track file writer; there’s already an issue listed. However when -nthreads 0 is an immediate solution and isn’t prohibitively slow, I’m not sure I can justify the time on it over everything else on my to-do list.

wamigy · March 1, 2017, 1:09pm

Ok perfect! Thank you.