It’s difficult to do a one-to-one computational comparison between SIFT and SIFT2. For instance, if I were comparing pipelines rather than the methods themselves, I’d probably be comparing generating 10M streamlines and running SIFT2 to generating maybe 50M streamlines and using SIFT to filter to 10M, making the pipeline endpoints comparable, in which case the primary processing time differences would actually come from the increased number of streamlines generated, not the SIFT / SIFT2 methods themselves.
If you’re comparing the two algorithms with the same number of input streamlines, I would still expect SIFT2 to run faster, because the complexity of calculations to be performed per iteration are comparable (within an order of magnitude at least) but the number of iterations required is much smaller for SIFT2.
First, consider the calculations per iteration. For SIFT, it has to estimate, for every streamline not yet removed, how the cost function would change if that streamline were removed; the complexity of this scales approximately as the number of fixels traversed per streamline, summed across all streamlines. For SIFT2, there are two primary calculations per iteration: firstly, each streamline weight is optimised based on the set of fixels it traverses, then the streamlines density per fixel is updated based on these new weights; both of these scale in complexity with the number of fixels traversed per streamline summed across all streamlines. While the actual underlying calculations are more complex in the latter case, leading to a longer computation time per iteration, the scaling properties relative to the sizes of the input data are equivalent.
Where SIFT2 cuts down on processing time is in the number of iterations. While I’ve never been completely happy with performance of the optimisation algorithm in later iterations, it does converge to a half-decent solution in a very small number of iterations, and the total number of iterations is not too large. SIFT on the other hand is limited in how much progress it can make in a single iteration, because as it removes more and more streamlines, the calculations it made at the start of the iteration become inaccurate, and so it has to recalculate. And when that happens, it recalculates everything, including those streamlines that are almost certainly never going to be removed.
(Indeed even with this constant recalculation, and a huge number of iterations, it’s still possible for artifacts to manifest, because with the way the algorithm is designed, it’s persistently operating based on outdated calculations)
I’ve no concern with a 15 minute run time for 10 million streamlines with typical number of threads / number of fixels. If you’re uncertain, try running with the
-output_debug flag, which dumps into the working directory a bunch of images that show the influence of the modulated streamline weights on the model fit.