Msmt CSD wholebrain tracking: not enough streamlines

gmkerbler · May 12, 2016, 6:00am

Hi MRtrix team,

I followed your tutorial on “Multi-tissue CSD” using a single subject from the HCP data set.
I tried to generate a 10million streamlines wholebrain tractogram, but it seems only ~3.5 million were successfully created, while using up the maximum number of attempts (from looking at the ‘tckinfo’ of the track file).
Have you encountered this error before; and could you give me some pointers to figure out what’s going wrong?

thank you,
Georg

p.s. I have processed the same subject (+ 9 others) from the HCP data set using the usual single-shell method and everything worked fine there.

rsmith · May 12, 2016, 8:38am

Hi Georg,

If you run tckgen with the -info option, it will provide statistics regarding how streamlines were terminated, and the criteria by which generated streamlines were accepted or rejected. This should provide some insight as to why so few of the generated streamlines are being written to file.

My suspicion is that the combination of the use of MSMT CSD, and the new default mechanism for generating the WM response function, is making the FOD amplitudes slightly smaller, such that streamlines are terminating due to the FOD amplitude threshold just before they reach the GM. This would be corroborated if your proportion of streamlines rejected by ACT for terminating in WM is very high. In that case I would suggest reducing the FOD amplitude threshold from the default 0.1 to e.g. 0.05.

Cheers
Rob

ThijsDhollander · May 16, 2016, 3:38am

This may indeed partially be due to the fact that MSMT-CSD FOD amplitudes are a bit lower, due to either other tissue types being filtered out, as well as the hard constraint that renders the lobes slightly wider, but the peak amplitude quite a bit lower to compensate. Yet another reason may be the Tournier algorithm for the response function selection, which aims for (almost) the largest response function magnitude, so the FODs end up being even smaller than before.

Since the introduction of the latter response function selection algorithm, it may actually make sense to revise that default threshold of 0.1 to something else altogether… I keep hearing from people that they need to lower it to get good tractograms… so “by default”, a lot of people are in fact using a lower threshold…

ThijsDhollander · May 16, 2016, 3:42am

Just to slightly elaborate: I actually tend to use thresholds with typical values in the range of 0.06 to 0.08. I find that 0.05 may in some cases be pushing it a bit too far (i.e., introducing a few clearly false positive streamlines “hopping over” sulci…).

jdtournier · May 16, 2016, 8:21am

@gmkerbler, it would be great if you could follow @rsmith’s suggestions so we can get to the bottom of this. If there is an issue with the new response function estimation that I proposed, it needs to be fixed ASAP.

@ThijsDhollander, not sure changing the default cutoff for tracking is necessarily the best way to deal with this if it does turn out to be the problem - although it is a perfectly sensible workaround for now. I’m thinking we might be better off correcting the scaling of the response function based on the ratio of the mean intensity within the voxels used for the response function estimation to the mean intensity in the WM overall. That should bring it more in line with what we were getting before - although I doubt it’ll bring it completely in line with the FA-based approach. One of the problems with the FA-based approach is that it tends to favour low b=0 intensities introduced by e.g. Gibbs ringing, which artificially increases the FA. So that tends to bring the scaling of the response function down. And that was the approach that was used to select the 0.1 threshold in the first place…

gmkerbler · May 18, 2016, 1:03am

Thanks for your input @ThijsDhollander @rsmith @jdtournier

I’m re-running the tckgen step with ‘-info’ and ‘-cutoff 0.05’ and let you know outcome asap.
sorry for taking so long…
Scaling the response function might be a good idea if small FOD amplitudes are the culprit; or, as @ThijsDhollander was pointing out, maybe simply changing the default threshold parameter is an even simpler fix?

rsmith · May 18, 2016, 1:55am

Hi again Georg,

Donald has looked a little into the response function scaling here: it doesn’t look like that aspect should be influencing the tracking results too much. My suspicion is that it’s the combination of MSMT with ACT that’s causing grief, and that a lower FOD threshold is more appropriate just for that specific use case. Let us know how you go.

gmkerbler · May 23, 2016, 12:35am

Hello again,

The tracking finished now with the lower cutoff (0.05), and the track count is now about double what it was before, but still didn’t reach 10.000.000 all up; here’s the tckinfo output of the tck-file:

Tracks file: "10M_cutoff_05.tck"
act: /home/uqgkerbl/NCI11081_scratch/results_DWI/100307/Diffusion/5TT.mif
backtrack: 1
count: 7858161
crop_at_gmwmi: 1
downsample_factor: 3
fod_power: 0.25
init_threshold: 0.0500000007
lmax: 8
max_angle: 45
max_dist: 250
max_num_attempts: 1000000000
max_num_tracks: 10000000
max_seed_attempts: 50
max_trials: 1000
method: iFOD2
min_dist: 2.5
mrtrix_version: 0.3.13-834-g53858ec8
output_step_size: 0.625
rk4: 0
samples_per_step: 4
seed_dynamic: /home/uqgkerbl/NCI11081_scratch/results_DWI/100307/Diffusion/WM_FODs.mif
sh_precomputed: 1
source: /home/uqgkerbl/NCI11081_scratch/results_DWI/100307/Diffusion/WM_FODs.mif
step_size: 0.625
stop_on_all_include: 0
threshold: 0.05
timestamp: 1463532980.89874053
total_count: 8730072
unidirectional: 0
ROI: seed WM_FODs.mif

If I don’t reach a high enough count using msmt-csd I’m tempted to go back to the single-shell approach for now.
Or would you suggest to lower the cutoff even more to achieve a higher count; or will that introduce too many ‘false’ streamlines in the output?

rsmith · May 23, 2016, 2:39am

OK, I think you may have actually hit a different type of limitation here. Look at the relative numbers of streamlines generated vs. streamlines written to file:

count: 7858161
total_count: 8730072

So the premature termination of tckgen this time around was not caused by the default limit on the total number of streamlines to generate (which can be bypassed using -maxnum 0, by the way).

I don’t suppose you happen to be running on a Windows machine, and the size of the output file is exactly either 2GB or 4GB? If so, you’ve run into a maximum file size problem, probably due to using a FAT32 file system. If this is the case:

Check the terminal to see if some kind of error message was generated when tckgen terminated.
You can try either increasing your step size (currently 0.625, so I’m guessing you’re running HCP 1.25mm data), or increasing the downsample factor from the default value (3 for iFOD2), to get to your 10m target.
Write to a different file system that doesn’t have this size restriction (e.g. NTFS).

jdtournier · May 23, 2016, 8:41am

@rsmith: well spotted! Could also be due to running out of space on the storage device? I thought we’d tried to catch and report these types of errors, but maybe that’s not working quite as well as it should…

gmkerbler · May 27, 2016, 5:31am

Thanks for pointing out the obvious @rsmith

I’m actually running this on MASSIVE, I assume you are familiar with this cluster, and I believe it would be your usual CentOS system. There should be plenty of space @jdtournier
I just looked at my .err log file, and it’s quite obvious what happened, see:
22050526 generated, 7524721 selected [ 75%]slurmstepd: *** JOB 6220875 CANCELLED AT 2016-05-26T11:42:11 DUE TO TIME LIMIT on m2026 ***

I don’t have a lot of experience with using clusters like MASSIVE and restrictions like the above; do you maybe have some advice how to get around this issue? Maybe I can:

Request more resources i.e. CPUs for computation
Reduce number of streamlines (create 5M, SIFT down to 1M)

ThijsDhollander · May 27, 2016, 5:40am

Aha, ok, that makes things clear though… you’ll have to find a way to split things in smaller jobs indeed. Doing the tracking in smaller batches and combining them at the end is ok for most tckgen settings (i.e. where the seeding as well as the generation of tracks is completely independent for each track). SIFTing in smaller batches may in principle be similar-ish to SIFTING the whole lot, but may probably end up not really being (exactly) the same at all… you’ll at least need a good density in your tractogram for an algorithm such as SIFT to be well informed. Maybe a (slightly ad hoc) strategy could be to indeed SIFT in smaller batches, but still up to a “reasonable” number per batch (i.e. don’t “SIFT down” that much), and then combine everything, and then SIFT the final (smaller to begin with) lot again. A sort of multi-level SIFT…

ThijsDhollander · May 27, 2016, 5:41am

Some further advice: depending on the setup of your cluster, you may be able to request more time when you submit a job. That may fix the problem altogether without shady fiddling with batches of tracks.

rsmith · May 27, 2016, 6:00am

I doubt you’d be able to get more CPUs for computation; once you’re using all of the cores on a given node, using additional cores requires e.g. MPI programming. Though since you’re digging around on this topic, I would confirm that you are in fact using all cores on the assigned node, and that hyper-threading is being activated. Been a while since I used MASSIVE, but you should receive some form of usage report at job completion indicating your % utilisation of the cores requested. For tckgen this should be close to (200% x number of cores) - two threads per core, both running at close to 100%.

If it’s definitely a time limit issue you’re encountering, then appropriately setting your time per job is a much cleaner solution than splitting track files / reducing track counts and the like. Your jobs may be receiving whatever the ‘default’ amount of time is per job if you haven’t explicitly set that parameter in your SLURM script. There’s some basic instructions here for setting the fundamental parameters for a processing job.

gmkerbler · May 31, 2016, 1:30am

Thanks for the feedback.
I was only requesting a single CPU so far for the tracking for a start; I now changed my script to use 12 CPUs instead. That should result in speeding up the tracking by a lot and hopefully staying within the generic time-limit (given the time constraint is still the same while using more CPUs now). I think the SIFTing will be next hurdle I will come across given the tracking goes smoothly now with the multi-threading job.