SIFT2 - Processing Speed

theNeuropsychologist · June 14, 2021, 2:29pm

Hi everyone,

I have a - maybe weird - question regarding the SIFT2 algorithm. For my analysis, I used SIFT1 before and this usually took quite some time per subject (more than 2 hours I think). Because I see the advantages of it now, I tried filtering my tractography data using SIFT2. What was surprising for me, was the processing speed. SIFT2 took only 15 minutes per subject, with 10 million streamlines. Because of this significant time difference, I am afraid now, that something is going wrong here. Does anyone know, whether this time difference between SIFT1 and SIFT2 is normal? The output data looks okay to me.

I run MRtrix3 on a Linux computer (Ubuntu 20.04.2) with an Intel Core i7 processor and 16 GB RAM. Can maybe someone estimate whether 15 min for 10 million streamlines is normal with this hardware and SIFT2? Thanks in advance!

Best regards,
Lars

rsmith · July 1, 2021, 3:17am

Hi Lars,

It’s difficult to do a one-to-one computational comparison between SIFT and SIFT2. For instance, if I were comparing pipelines rather than the methods themselves, I’d probably be comparing generating 10M streamlines and running SIFT2 to generating maybe 50M streamlines and using SIFT to filter to 10M, making the pipeline endpoints comparable, in which case the primary processing time differences would actually come from the increased number of streamlines generated, not the SIFT / SIFT2 methods themselves.

If you’re comparing the two algorithms with the same number of input streamlines, I would still expect SIFT2 to run faster, because the complexity of calculations to be performed per iteration are comparable (within an order of magnitude at least) but the number of iterations required is much smaller for SIFT2.

First, consider the calculations per iteration. For SIFT, it has to estimate, for every streamline not yet removed, how the cost function would change if that streamline were removed; the complexity of this scales approximately as the number of fixels traversed per streamline, summed across all streamlines. For SIFT2, there are two primary calculations per iteration: firstly, each streamline weight is optimised based on the set of fixels it traverses, then the streamlines density per fixel is updated based on these new weights; both of these scale in complexity with the number of fixels traversed per streamline summed across all streamlines. While the actual underlying calculations are more complex in the latter case, leading to a longer computation time per iteration, the scaling properties relative to the sizes of the input data are equivalent.

Where SIFT2 cuts down on processing time is in the number of iterations. While I’ve never been completely happy with performance of the optimisation algorithm in later iterations, it does converge to a half-decent solution in a very small number of iterations, and the total number of iterations is not too large. SIFT on the other hand is limited in how much progress it can make in a single iteration, because as it removes more and more streamlines, the calculations it made at the start of the iteration become inaccurate, and so it has to recalculate. And when that happens, it recalculates everything, including those streamlines that are almost certainly never going to be removed.

(Indeed even with this constant recalculation, and a huge number of iterations, it’s still possible for artifacts to manifest, because with the way the algorithm is designed, it’s persistently operating based on outdated calculations)

I’ve no concern with a 15 minute run time for 10 million streamlines with typical number of threads / number of fixels. If you’re uncertain, try running with the -output_debug flag, which dumps into the working directory a bunch of images that show the influence of the modulated streamline weights on the model fit.

Cheers
Rob

theNeuropsychologist · July 7, 2021, 12:58pm

Hi Rob,

thank you so much for this very comprehensive reply. That was incredibly helpful for me! I am glad to hear that you have no concerns regarding the run time. In fact, I have analyzed the results of SIFT2 by now and it all looks very good. It is great how much time one can save with SIFT2, with the same amount of streamlines in the end. Thanks again!

Best regards,
Lars