Can I down-sample a larger tractogram to mimic a smaller one?

Xiaoping_Wu · October 11, 2018, 6:20pm

Hello MRtrix experts,

I have a large set of tracks that has been generated by applying tckgen -select 100M .... Now I want to have another smaller set, say one that has 10M tracks in it. Could I just run tckedit 100M.tck -number 10M 10M.tck or should I run tckgen again (ie apply tckgen -select 10M...)?

Thanks much,
xiaoping

jdtournier · October 12, 2018, 9:08am

Assuming you’re using a fully random seeding mechanism, then that’s fine: all streamlines are generated from random locations, there should be no systematic difference between the first and last 10M streamlines (or indeed any other arbitrary selection of 10M streamlines).

On the other hand, if you used a deterministic seeding strategy (i.e. -seed_random_per_voxel or -seed_grid_per_voxel), then there will be a systematic difference between earlier and later streamlines. Not too sure how to address that with the current functionality offered in tckedit, unfortunately…

Xiaoping_Wu · October 12, 2018, 12:54pm

Thanks for your response.
Is it true that the seeding strategies -seed_gmwmi and -seed_dynamic are random?
Thanks,
xiaoping

jdtournier · October 12, 2018, 2:19pm

I’m pretty sure they are, but @rsmith would be better placed to confirm this…

rsmith · November 25, 2018, 3:49am

Not too sure how to address that with the current functionality offered in tckedit , unfortunately

I did at some point (I think prior to tckedit even) have a command that extracted a random subset of a tractogram; i.e. for a fixed number of desired output streamlines, the indices of those streamlines that would be extracted was randomised. Not sure how difficult it would be to incorporate into tckedit…

rsmith · November 25, 2018, 3:55am

Is it true that the seeding strategies -seed_gmwmi and -seed_dynamic are random?

“Random” is not quite the right word to be using here. What’s requisite for the streamlines extracted via tckedit -number to be a representative subset of the larger tractogram is for the streamlines to be order-independent.

This is the case for -seed_gmwmi, since each streamline seed point is derived entirely independently of any other tractogram information.

While -seed_dynamic contains stochasic elements, and hence could be described as “random”, it is not order-independent: the probability of seeding from each fixel constantly evolves over time as the tractogram is generated; initially, there is no track density information, and so seeding behaves comparably to the -seed_image option, but as the tractogram gets increasingly dense, the probabilities of seeding in fixels densely reconstructed vs. sparsely reconstructed diverges. So the seed locations of streamlines in the latter part of the tractogram are not independent of those streamlines that were generated in the earlier part of the tractogram, and thus using tckedit -number is not going to be an entirely representative subset. Combining it with the -skip option to sample a batch of streamlines from later in the tractogram may be more representative, but there are no guarantees.