Extract pathway after Sift2

Hi All,
I am trying to run targeted tractography using a pathway of interest and whole brain tractogram. I generated the pathway between two ROIs and removed the outliers. Then, those final streamlines were merged with the whole brain tractogram with 2m streamlines (after sift). However, after running sift2, the initial fibers can not be extracted with tckedit by using the same ROIs. Besides, the number of extracted streamlines is not equal to the initial pathway. I tried to follow the instruction provided in this post by upsampling the generated tracts, but it did not change the output. Below is the summary of commands and the results:

tckedit wholeBrain.tck pathway.tck combined.tck
(number of streamlines : 2000511)
sift2 combined.tck wmfod_norm.mif -tck_weights_out all_weights
tckedit combined.tck -include ROI1 -include ROI2 -tck_weights_in all_weights tck_weights_out weights_pathway pathway_sift2.tck ( number of streamlines : 359 )


The green color is the tracts after removing outliers that I aim to extract them again after sift2. The other one in the direction-specific color is what tckedit after sift2 generated with some irrelevant streamlines.

tckstats combined.tck

          mean       median    std. dev.          min          max       count
  23.4063      10.4075      28.1432      2.46828          125      2000511

tckstats pathway_sift2.tck

     mean       median    std. dev.          min          max       count
  102.897      102.496      8.01789      81.0354      123.331          359

Even though I specified the -number option for tckedit, it could not generate the streamlines and gave me the warning message that " User requested 511 streamlines, but only 359 were written to file". Even if it can write the specified number of streamlines in the output file, those are not from the main pathway of interest.
How can I improve the result of tckedit in termes of streamline number and location?

One more question. How does sift2 allocate weights to each streamline in tckedit command? Is it based on one to one correspondence between each streamline in tck file and weighting factor in the text file? In this case, should the providing .tck and txt file be in the same size? I came up with a quick and dirty idea to resolve the problem with unwanted streamlines. I was thinking of running tcksift2 on combined.tck (whole brain tractogram and pathway of interest before removing outliers) first and then extracting the pathway with the output weights (for instance, 1900 tracts). Following that, outliers should be removed to get the final pathway (e.g. 500 tracts). To create weights for new pathway, I need to rerun tckedit.
tckedit newPathway_OutliersRemoved.tck -include ROI1 -include ROI2 -tck_weights_in Pathway_weights_beforeRemovingOutliers -tck_weights_out finalweights
However, with this approach, the input weights file (1900 weights) is larger than the new .tck file. I tested it. It created the output weights file, but I don’t know whether it computes correctly. For example, if it allocates the first 500 weighting factors to the output file, then it doesn’t reflect the correct weights. Am I right?

Cheers

Hi @NeuroSh,

However, after running sift2, the initial fibers can not be extracted with tckedit by using the same ROIs.

You do not want to simply extract the weights of the manually-delineated streamlines. Some streamlines from the whole-brain tractogram may additionally satisfy the criteria to be attributed to your pathway of interest, and will contribute to the reconstruction-based representation of the underlying fibre densities, and must therefore contribute to the quantification of connectivity of that bundle. Basically, whatever criteria you have applied to generate the reconstruction of the pathway of interest, need to be re-applied to the whole concatenated tractogram post-SIFT2.

(number of streamlines : 2000511)
( number of streamlines : 359 )

So to be clear, the target tractography output - after removal of outliers - contains 511 streamlines?

Even though I specified the -number option for tckedit, it could not generate the streamlines and gave me the warning message that " User requested 511 streamlines, but only 359 were written to file".

Bear in mind that tckedit cannot generate streamlines; it only selects them. All the -number option does is terminate the command once that number of streamlines have been selected from the input file, such that any remaining streamlines in the input track file that would otherwise satisfy your criteria for selection are just ignored, and the output track file contains precisely the number of streamlines that you requested.

As a sanity check, you should be taking your tckedit call that you are performing on the whole concatenated tractogram post-SIFT2, and running it on your targeted tracking file. I’m expecting that to result in selection of 359 or fewer streamlines, at which point you need to diagnose why your tckedit call is not selecting all of the streamlines within your bundle of interest, in a manner that is entirely independent of concatenation with the whole-brain reconstruction or SIFT2.

How does sift2 allocate weights to each streamline in tckedit command? Is it based on one to one correspondence between each streamline in tck file and weighting factor in the text file? In this case, should the providing .tck and txt file be in the same size?

Yes and yes. The streamlines data are in a fixed order in the track file, and the numerical values in the SIFT2 output text file are in a fixed order, and so they should contain the same number. If tckedit runs to completion with the -tck_weights_in option, and in the course of scanning through the entire track file it discovers that these two files do not contain the same number of entries, it will issue a warning, as this is highly suggestive of a data management problem (i.e. some processing step has been applied to one of these files but not the other).

For example, if it allocates the first 500 weighting factors to the output file, then it doesn’t reflect the correct weights.

As above, there is no way by which the tckedit command can locate the appropriate streamline weights that correspond to streamlines that you have extracted in a previous step. However, if all of the steps in going from the whole concatenated tractogram to the subset of streamlines of interest are done using both -tck_weights_in and -tck_weights_out, then the final output weights file should contain those weights corresponding to those streamlines in the final output track file. You can sanity-check this by simply comparing the number of streamlines in a track file (tckinfo) with the number of lines in a weights file (wc -l).

I’ve added a GitHub issue to try to catch misuse in this context and immediately provide feedback that the two files don’t correspond if that is the case.

Cheers
Rob

Hi all and @rsmith,

I have tried to replicate the method above for targeted track experiment (ROI to ROI). First, I generated my pathway of interest by

tckgen FOD pathway.tck ACT seed_image target_image -seeds 0 -seed_unidirectional -stop -select 50000 -nthreads 6 -force.

Then I followed the above method with WB tractogram(5M). In my case, the number of streamlines post SIFT2 (50227) was larger than the original pathway of interest (50k). You have mentioned that " Some streamlines from the whole-brain tractogram may additionally satisfy the criteria to be attributed to your pathway of interest" . With that explanation, I further used the post SIFT2 pathway to generate connectivity matrix and pathway assignment. I expected the pathway assignment should contain only one column of the target region (2 in my case), however, there is a mixture of index number (0,1,2) from 1st row to 226th row and from 226th row onward the index number was all 2. I am suspecting that these would corresponding to 227 streamlines that I got after SIFT2. With regard to the connectivity matrix, it contains 0.0001538481, 0.1167868869 and which is the corrected SC in this case?

tck2connectome pathway_sift2.tck combined_ROI_template pathway_connmat.csv -assignment_radial_search 2 -scale_invlength -scale_invnodevol -out_assignments pathway_assignment_v1 -tck_weights_in pathway_weight.txt -vector

Thank you,

Thomas

I further used the post SIFT2 pathway to generate connectivity matrix and pathway assignment. I expected the pathway assignment should contain only one column of the target region (2 in my case), however, there is a mixture of index number (0,1,2) from 1st row to 226th row and from 226th row onward the index number was all 2. I am suspecting that these would corresponding to 227 streamlines that I got after SIFT2. With regard to the connectivity matrix, it contains 0.0001538481, 0.1167868869 and which is the corrected SC in this case?

I think there’s a bit going on behind the scenes here that’s not being fully explained, and I’m not entirely sure whether or not any of it is actually required.

I suspect what you’ve done is produced some sort of modified parcellation image that you are then using to try to generate a “simplified” connectivity matrix from which the pathway of interest can then be trivially isolated. However the details of exactly what you’re doing here are requisite information for interpreting the rest of your description.

Moreover, it would seem to me that whatever mechanism you are using to obtain the number of 50227 streamlines corresponding to the pathway of interest is the selection of the pathway of interest: the sum of the weights of those 50227 streamlines is the estimated connectivity of the pathway, and subsequent generation of a connectome matrix is entirely superfluous. In many instances I would advocate using connectivy matrix generation in this context only because it provides a mechanism by which to perform that selection; but from your description this seems to not be a necessary step.

If your only concern is the use of -scale_invlength and -scale_invnodevol:

  • -scale_invlength should absolutely not be used in conjunction with SIFT / SIFT2. This is an incomplete heuristic correction of one specific reconstruction bias, which SIFT / SIFT2 remove in a data-driven way; applying this scaling factor after SIFT / SIFT2 will therefore introduce a bias in the opposite direction.

  • -scale_invnodevol, if you insist on using it, can be applied post hoc by just manually calculating the volumes of the two relevant nodes. While tck2connectome internally applies this scaling factor to each streamline individually, the scaling factor is identical for all streamlines within a given edge, and so can instead be applied once to the whole edge.

Rob

Hi again,

Thank you for your input, my plan was to estimate the connectivity of a pathway and can you confirm whether my understandings of selection and quantification of streamlines are corrected:

a). Selection of pathway of interest

1.tckgen WB
2.SIFT2 
3.tck2connectome atlas
4.connectome2tck

I can directly extract the structural connectivity between 2 ROIs in the connectome matrix

b). Quantification of connection strength

1.pathway of interest by tckgen -unidirectional -include -seed_image -select
2.Merge pathway of interest + WB
3.SIFT2
4.Extract pathway of interest by tckedit
5.Structural connectivity of pathway is the sum of all streamlines weight

The option -unidirectional in method (b) would quantify only streamlines from ROI_1 and terminate ROI_2, hence the structural connectivity is computed by all streamlines from ROI_1 to ROI_2 as opposed to method (a) in which the structural connectivity represent track counts from both direction. In other posts, you have stated that method (a) would provide track count to be more biological meaningful than method (b), is the term biological meaningful in this context suggesting the true number of streamlines corresponding to a bundle between two regions? If so what are the benefits of method (b) in which number of streamlines corresponding to a bundle can be larger compared to method (a).

Thomas

a). Selection of pathway of interest

The fact that you’re not combining targeted tracking with a whole-brain tractogram here does make this process entirely standard. I’m however not sure therefore why it’s actually being presented here? What purpose does this “selection of pathway of interest” serve? Obviously for part b1 you need to have at least some knowledge of where your pathway is and what parcels those streamlines are assigned to, but it’s not clear the mapping from this list of commands to “selection of pathway of interest”.

It’s also no clear why:

I can directly extract the structural connectivity between 2 ROIs in the connectome matrix

needs to be stated if your principal interest is part b?

The option -unidirectional in method (b) would quantify only streamlines from ROI_1 and terminate ROI_2, hence the structural connectivity is computed by all streamlines from ROI_1 to ROI_2 as opposed to method (a) in which the structural connectivity represent track counts from both direction.

Firstly, I wouldn’t use the word “quantify” when discussing tckgen -unidirectional, especially if SIFT2 is also a part of the discussion. Given SIFT2 is about obtaining quantitative properties in the context of endpoint-to-endpoint connectivity, using the word “quantify” in less appropriate ways within the same discussion is only likely to lead to confusion.

There’s still a couple of issues with your description that I’m concerned are indicative of misunderstanding, so I’ll be as explicit as I can to be safe.

It’s true that a pathway of interest extracted from a whole-brain tractogram (let’s imagine GM-WM interface seeding for simplicity) will include streamlines that have been generated through traversing in both directions, whereas targeted tracking from ROI 1 to ROI 2 will obviously only include streamlines that traversed in one direction. However the former should not be thought of as “representing track counts from both directions”, nor should the latter be thought of as contrary to this. While it’s possible for streamlines trajectories generated from ROI 1 to ROI 2 might not be identical to those generated from ROI 2 to ROI 1 (for that reason quite often both experiments are performed and then the streamlines are combined), this is non-ideal behaviour; similarly, a measure of pathway connection strength should be as independent as possible from the direction of propagation of the underlying streamlines from which the quantity is derived.

This all leads to “the structural connectivity represent track counts from both direction”. It would be accurate to say that structural connectivity is calculated based on streamlines that were generated in both directions. However the two directions should not be treated as having individual track counts. Not only is targeted tracking in isolation non-quantitative, but also the diffusion signal is symmetric, streamlines trajectory reconstruction would ideally also be symmetric, and the process of extraction of the connectivity strength (whether by manually summing streamlines weights or by generating a symmetric connectivity matrix*) is also direction-agnostic.

Now this might have just been clumsy wording, in which case I’ve probably gone overboard, but I wanted to give the full answer anyway.

In other posts, you have stated that method (a) would provide track count to be more biological meaningful than method (b)

I certainly hope I’ve never said that; please provide a link if I’ve given a description somewhere that might give this impression.

Methods a and b are based on precisely the same model and target quantity. The differences are:

  • Method b includes a more dense reconstruction of the pathway of interest that method a. This means that the number of streamlines within the pathway of interest will be larger, and hence the issues associated with streamlines being discrete entities may be reduced. E.g. One would have more confident in drawing conclusions about a difference of 1000 to 1100 streamlines than a difference of 10 to 11 streamlines. It does however introduce complex interactions due to regularisation, but I’ll spare the details of that here.

  • For method a, the mechanism of selection of streamlines corresponding to the pathway of interest is restricted to the ROIs at the streamlines endpoints. In method b, use of tckedit means that conditions other than the endpoints of the streamlines can be used in determining which streamlines are attributed to the pathway of interest and which are not.

is the term biological meaningful in this context suggesting the true number of streamlines corresponding to a bundle between two regions?

There’s no such thing as “the true number of streamlines”. Streamlines exist only in the domain of digital reconstruction. What we purport to provide is a streamline count (or, for SIFT2, sum of streamline weights) that is proportional to the intra-cellular cross-sectional area of the white matter pathway reconstructed by those streamlines, which we purport is a reasonably useful marker of biological “connectivity”.

If so what are the benefits of method (b) in which number of streamlines corresponding to a bundle can be larger compared to method (a).

The main benefit is for small bundles where selection of streamlines from a whole-brain reconstruction yields a streamline count that is too small to be trustworthy. If your pathway of interest is reasonably large, I probably wouldn’t bother with the added complexity.

Rob