Extract pathway after Sift2

NeuroSh · December 3, 2019, 12:13pm

Hi All,
I am trying to run targeted tractography using a pathway of interest and whole brain tractogram. I generated the pathway between two ROIs and removed the outliers. Then, those final streamlines were merged with the whole brain tractogram with 2m streamlines (after sift). However, after running sift2, the initial fibers can not be extracted with tckedit by using the same ROIs. Besides, the number of extracted streamlines is not equal to the initial pathway. I tried to follow the instruction provided in this post by upsampling the generated tracts, but it did not change the output. Below is the summary of commands and the results:

tckedit wholeBrain.tck pathway.tck combined.tck
(number of streamlines : 2000511)
sift2 combined.tck wmfod_norm.mif -tck_weights_out all_weights
tckedit combined.tck -include ROI1 -include ROI2 -tck_weights_in all_weights tck_weights_out weights_pathway pathway_sift2.tck ( number of streamlines : 359 )

The green color is the tracts after removing outliers that I aim to extract them again after sift2. The other one in the direction-specific color is what tckedit after sift2 generated with some irrelevant streamlines.

tckstats combined.tck

          mean       median    std. dev.          min          max       count
  23.4063      10.4075      28.1432      2.46828          125      2000511

tckstats pathway_sift2.tck

     mean       median    std. dev.          min          max       count
  102.897      102.496      8.01789      81.0354      123.331          359

Even though I specified the -number option for tckedit, it could not generate the streamlines and gave me the warning message that " User requested 511 streamlines, but only 359 were written to file". Even if it can write the specified number of streamlines in the output file, those are not from the main pathway of interest.
How can I improve the result of tckedit in termes of streamline number and location?

One more question. How does sift2 allocate weights to each streamline in tckedit command? Is it based on one to one correspondence between each streamline in tck file and weighting factor in the text file? In this case, should the providing .tck and txt file be in the same size? I came up with a quick and dirty idea to resolve the problem with unwanted streamlines. I was thinking of running tcksift2 on combined.tck (whole brain tractogram and pathway of interest before removing outliers) first and then extracting the pathway with the output weights (for instance, 1900 tracts). Following that, outliers should be removed to get the final pathway (e.g. 500 tracts). To create weights for new pathway, I need to rerun tckedit.
tckedit newPathway_OutliersRemoved.tck -include ROI1 -include ROI2 -tck_weights_in Pathway_weights_beforeRemovingOutliers -tck_weights_out finalweights
However, with this approach, the input weights file (1900 weights) is larger than the new .tck file. I tested it. It created the output weights file, but I don’t know whether it computes correctly. For example, if it allocates the first 500 weighting factors to the output file, then it doesn’t reflect the correct weights. Am I right?

Cheers

rsmith · December 15, 2019, 1:23am

Hi @NeuroSh,

However, after running sift2, the initial fibers can not be extracted with tckedit by using the same ROIs.

You do not want to simply extract the weights of the manually-delineated streamlines. Some streamlines from the whole-brain tractogram may additionally satisfy the criteria to be attributed to your pathway of interest, and will contribute to the reconstruction-based representation of the underlying fibre densities, and must therefore contribute to the quantification of connectivity of that bundle. Basically, whatever criteria you have applied to generate the reconstruction of the pathway of interest, need to be re-applied to the whole concatenated tractogram post-SIFT2.

(number of streamlines : 2000511)
( number of streamlines : 359 )

So to be clear, the target tractography output - after removal of outliers - contains 511 streamlines?

Even though I specified the -number option for tckedit, it could not generate the streamlines and gave me the warning message that " User requested 511 streamlines, but only 359 were written to file".

Bear in mind that tckedit cannot generate streamlines; it only selects them. All the -number option does is terminate the command once that number of streamlines have been selected from the input file, such that any remaining streamlines in the input track file that would otherwise satisfy your criteria for selection are just ignored, and the output track file contains precisely the number of streamlines that you requested.

As a sanity check, you should be taking your tckedit call that you are performing on the whole concatenated tractogram post-SIFT2, and running it on your targeted tracking file. I’m expecting that to result in selection of 359 or fewer streamlines, at which point you need to diagnose why your tckedit call is not selecting all of the streamlines within your bundle of interest, in a manner that is entirely independent of concatenation with the whole-brain reconstruction or SIFT2.

How does sift2 allocate weights to each streamline in tckedit command? Is it based on one to one correspondence between each streamline in tck file and weighting factor in the text file? In this case, should the providing .tck and txt file be in the same size?

Yes and yes. The streamlines data are in a fixed order in the track file, and the numerical values in the SIFT2 output text file are in a fixed order, and so they should contain the same number. If tckedit runs to completion with the -tck_weights_in option, and in the course of scanning through the entire track file it discovers that these two files do not contain the same number of entries, it will issue a warning, as this is highly suggestive of a data management problem (i.e. some processing step has been applied to one of these files but not the other).

For example, if it allocates the first 500 weighting factors to the output file, then it doesn’t reflect the correct weights.

As above, there is no way by which the tckedit command can locate the appropriate streamline weights that correspond to streamlines that you have extracted in a previous step. However, if all of the steps in going from the whole concatenated tractogram to the subset of streamlines of interest are done using both -tck_weights_in and -tck_weights_out, then the final output weights file should contain those weights corresponding to those streamlines in the final output track file. You can sanity-check this by simply comparing the number of streamlines in a track file (tckinfo) with the number of lines in a weights file (wc -l).

I’ve added a GitHub issue to try to catch misuse in this context and immediately provide feedback that the two files don’t correspond if that is the case.

Cheers
Rob

ThomasHMAC · December 19, 2019, 8:26pm

Hi all and @rsmith,

I have tried to replicate the method above for targeted track experiment (ROI to ROI). First, I generated my pathway of interest by

tckgen FOD pathway.tck ACT seed_image target_image -seeds 0 -seed_unidirectional -stop -select 50000 -nthreads 6 -force.

Then I followed the above method with WB tractogram(5M). In my case, the number of streamlines post SIFT2 (50227) was larger than the original pathway of interest (50k). You have mentioned that " Some streamlines from the whole-brain tractogram may additionally satisfy the criteria to be attributed to your pathway of interest" . With that explanation, I further used the post SIFT2 pathway to generate connectivity matrix and pathway assignment. I expected the pathway assignment should contain only one column of the target region (2 in my case), however, there is a mixture of index number (0,1,2) from 1st row to 226th row and from 226th row onward the index number was all 2. I am suspecting that these would corresponding to 227 streamlines that I got after SIFT2. With regard to the connectivity matrix, it contains 0.0001538481, 0.1167868869 and which is the corrected SC in this case?

tck2connectome pathway_sift2.tck combined_ROI_template pathway_connmat.csv -assignment_radial_search 2 -scale_invlength -scale_invnodevol -out_assignments pathway_assignment_v1 -tck_weights_in pathway_weight.txt -vector

Thank you,

Thomas

rsmith · January 2, 2020, 12:27am

I further used the post SIFT2 pathway to generate connectivity matrix and pathway assignment. I expected the pathway assignment should contain only one column of the target region (2 in my case), however, there is a mixture of index number (0,1,2) from 1st row to 226th row and from 226th row onward the index number was all 2. I am suspecting that these would corresponding to 227 streamlines that I got after SIFT2. With regard to the connectivity matrix, it contains 0.0001538481, 0.1167868869 and which is the corrected SC in this case?

I think there’s a bit going on behind the scenes here that’s not being fully explained, and I’m not entirely sure whether or not any of it is actually required.

I suspect what you’ve done is produced some sort of modified parcellation image that you are then using to try to generate a “simplified” connectivity matrix from which the pathway of interest can then be trivially isolated. However the details of exactly what you’re doing here are requisite information for interpreting the rest of your description.

Moreover, it would seem to me that whatever mechanism you are using to obtain the number of 50227 streamlines corresponding to the pathway of interest is the selection of the pathway of interest: the sum of the weights of those 50227 streamlines is the estimated connectivity of the pathway, and subsequent generation of a connectome matrix is entirely superfluous. In many instances I would advocate using connectivy matrix generation in this context only because it provides a mechanism by which to perform that selection; but from your description this seems to not be a necessary step.

If your only concern is the use of -scale_invlength and -scale_invnodevol:

-scale_invlength should absolutely not be used in conjunction with SIFT / SIFT2. This is an incomplete heuristic correction of one specific reconstruction bias, which SIFT / SIFT2 remove in a data-driven way; applying this scaling factor after SIFT / SIFT2 will therefore introduce a bias in the opposite direction.
-scale_invnodevol, if you insist on using it, can be applied post hoc by just manually calculating the volumes of the two relevant nodes. While tck2connectome internally applies this scaling factor to each streamline individually, the scaling factor is identical for all streamlines within a given edge, and so can instead be applied once to the whole edge.

Rob

ThomasHMAC · January 8, 2020, 7:44pm

Hi again,

Thank you for your input, my plan was to estimate the connectivity of a pathway and can you confirm whether my understandings of selection and quantification of streamlines are corrected:

a). Selection of pathway of interest

1.tckgen WB
2.SIFT2 
3.tck2connectome atlas
4.connectome2tck

I can directly extract the structural connectivity between 2 ROIs in the connectome matrix

b). Quantification of connection strength

1.pathway of interest by tckgen -unidirectional -include -seed_image -select
2.Merge pathway of interest + WB
3.SIFT2
4.Extract pathway of interest by tckedit
5.Structural connectivity of pathway is the sum of all streamlines weight

The option -unidirectional in method (b) would quantify only streamlines from ROI_1 and terminate ROI_2, hence the structural connectivity is computed by all streamlines from ROI_1 to ROI_2 as opposed to method (a) in which the structural connectivity represent track counts from both direction. In other posts, you have stated that method (a) would provide track count to be more biological meaningful than method (b), is the term biological meaningful in this context suggesting the true number of streamlines corresponding to a bundle between two regions? If so what are the benefits of method (b) in which number of streamlines corresponding to a bundle can be larger compared to method (a).

Thomas

rsmith · January 19, 2020, 9:27am

a). Selection of pathway of interest

The fact that you’re not combining targeted tracking with a whole-brain tractogram here does make this process entirely standard. I’m however not sure therefore why it’s actually being presented here? What purpose does this “selection of pathway of interest” serve? Obviously for part b1 you need to have at least some knowledge of where your pathway is and what parcels those streamlines are assigned to, but it’s not clear the mapping from this list of commands to “selection of pathway of interest”.

It’s also no clear why:

I can directly extract the structural connectivity between 2 ROIs in the connectome matrix

needs to be stated if your principal interest is part b?

The option -unidirectional in method (b) would quantify only streamlines from ROI_1 and terminate ROI_2, hence the structural connectivity is computed by all streamlines from ROI_1 to ROI_2 as opposed to method (a) in which the structural connectivity represent track counts from both direction.

Firstly, I wouldn’t use the word “quantify” when discussing tckgen -unidirectional, especially if SIFT2 is also a part of the discussion. Given SIFT2 is about obtaining quantitative properties in the context of endpoint-to-endpoint connectivity, using the word “quantify” in less appropriate ways within the same discussion is only likely to lead to confusion.

There’s still a couple of issues with your description that I’m concerned are indicative of misunderstanding, so I’ll be as explicit as I can to be safe.

It’s true that a pathway of interest extracted from a whole-brain tractogram (let’s imagine GM-WM interface seeding for simplicity) will include streamlines that have been generated through traversing in both directions, whereas targeted tracking from ROI 1 to ROI 2 will obviously only include streamlines that traversed in one direction. However the former should not be thought of as “representing track counts from both directions”, nor should the latter be thought of as contrary to this. While it’s possible for streamlines trajectories generated from ROI 1 to ROI 2 might not be identical to those generated from ROI 2 to ROI 1 (for that reason quite often both experiments are performed and then the streamlines are combined), this is non-ideal behaviour; similarly, a measure of pathway connection strength should be as independent as possible from the direction of propagation of the underlying streamlines from which the quantity is derived.

This all leads to “the structural connectivity represent track counts from both direction”. It would be accurate to say that structural connectivity is calculated based on streamlines that were generated in both directions. However the two directions should not be treated as having individual track counts. Not only is targeted tracking in isolation non-quantitative, but also the diffusion signal is symmetric, streamlines trajectory reconstruction would ideally also be symmetric, and the process of extraction of the connectivity strength (whether by manually summing streamlines weights or by generating a symmetric connectivity matrix*) is also direction-agnostic.

Now this might have just been clumsy wording, in which case I’ve probably gone overboard, but I wanted to give the full answer anyway.

In other posts, you have stated that method (a) would provide track count to be more biological meaningful than method (b)

I certainly hope I’ve never said that; please provide a link if I’ve given a description somewhere that might give this impression.

Methods a and b are based on precisely the same model and target quantity. The differences are:

Method b includes a more dense reconstruction of the pathway of interest that method a. This means that the number of streamlines within the pathway of interest will be larger, and hence the issues associated with streamlines being discrete entities may be reduced. E.g. One would have more confident in drawing conclusions about a difference of 1000 to 1100 streamlines than a difference of 10 to 11 streamlines. It does however introduce complex interactions due to regularisation, but I’ll spare the details of that here.
For method a, the mechanism of selection of streamlines corresponding to the pathway of interest is restricted to the ROIs at the streamlines endpoints. In method b, use of tckedit means that conditions other than the endpoints of the streamlines can be used in determining which streamlines are attributed to the pathway of interest and which are not.

is the term biological meaningful in this context suggesting the true number of streamlines corresponding to a bundle between two regions?

There’s no such thing as “the true number of streamlines”. Streamlines exist only in the domain of digital reconstruction. What we purport to provide is a streamline count (or, for SIFT2, sum of streamline weights) that is proportional to the intra-cellular cross-sectional area of the white matter pathway reconstructed by those streamlines, which we purport is a reasonably useful marker of biological “connectivity”.

If so what are the benefits of method (b) in which number of streamlines corresponding to a bundle can be larger compared to method (a).

The main benefit is for small bundles where selection of streamlines from a whole-brain reconstruction yields a streamline count that is too small to be trustworthy. If your pathway of interest is reasonably large, I probably wouldn’t bother with the added complexity.

Rob

ThomasHMAC · January 28, 2020, 5:35pm

Thank you for your thorough answer, it actually helped me lots and I am sorry for the confusion. I will try my best to clarify both methods that I have done.

First of all, my original purpose was conducting a “ targeted tracking” from ROI_1 to ROI_2 and estimate the structural connectivity. The ROIs masks were created from GM regions of Shen parcellation atlas that already co-registered to individual diffusion space, and these ROIs represent the left and right rostral middle frontal.The reason I chose these ROIs in mini “targeted tracking experiment” is to replicate the similar findings of previous study. Hence, I used the steps described in method_b to perform my mini “ targeted tracking” experiment.

b)

1.tckgen FOD.mif pathway.tck -algorithm iFOD2 –act 5TT.mif -seed_image ROI_1 -include ROI_2 -seeds 0 -seed_unidirectional -select 50000 -stop -backtrack

2.tckedit WB.tck pathway.tck combined.tck -nthreads 0

3.ticksift2 combined.tck FOD.mif combined_probweight.txt

4.tckedit combinded.tck pathway_postsift2.tck -include ROI_1 -include ROI_2 -tck_weights_in combined_pro_weight -tck_weights_out pathway_prob_weight -ends_only -nthreads 0

5.Structural connectivity of pathway is the sum of all streamlines weight

If i used the option -ends_only in step 4 above, I believe this would only test the ends of each streamline against my included ROIs? If so how does it different compared to radial_search in tck2connectome in method_a because my ROIs’ mask correspond to node 146 and node 11 of the Shen atlas?

At this point, I will try to experiment tckgen without the option unidirectional.

This post was dated back to 2017 and your answer was “ If you want the track counts to be biologically meaningful, approach b) is required. ”. I may have interpreted your explanation in a very wrong way, please correct me if I am wrong.

Since tck2connectome and connectome2tck was recommeded here as well as in the BATMAN tutorial to select connections of interest. I have follow similar steps described in method_a below

a)

1.tckgen -algorithm iFOD2 -samples 4 -act 5TT.mif -seed_gmwmi gmwmi.mif FOD.mif WB.tck -select 5M

2.tcksift2 WB.tck FOD.mif WB_probweight.txt

3.tck2connectome WB.tck shen_atlas conmat_shen.csv -assignment_radial_search 2 -scale_invnodevol -tck_weights_in WB_probweight.txt -zero_diagonal -symmetric -out_assignments

4.connectome2tck -nodes node146, node11 -exclusive WB.tck assignments.csv pathway.tck -tck_weights_in -prefix_tck_weights_out

With the generation of connectivity matrix, I can manually extract structural connectivity betwee node 146 and node 11. I further used connectome2tck in step 4 to extract streamlines between node 146 and node 11. It turned out like you have mentioned in here, there is variability in total number of streamlines between my healthy subjects ranging from 38 to 661. Although I have no prior ground knowledge of this frontal connection, I believe the number of streamlines were too small in this case.

Again, thank you for your efforts to explain such complicated matter

Thomas

rsmith · March 1, 2020, 4:30am

First of all, my original purpose was conducting a “ targeted tracking” from ROI_1 to ROI_2 and estimate the structural connectivity.

This is immediately an erroneous conflation up front. On one hand, you have “targeted tracking versus whole-brain fibre-tracking”. On the other, you have “I want to estimate a quantity for my pathway of interest related to structural connectivity”. These are two separate things. The former is an experimental design detail, the latter is the derivative outcome of calculations to be used in e.g. hypothesis testing.

Because of this conflation, I can potentially interpret this statement in two different ways, which I need to disentangle because it’s possible the discussion is being made more complex than it needs to be:

“I want to estimate the structural connectivity from ROI_1 to ROI_2, and I want to do so using targeted tracking only”.
This is possible (and the details of such are explained in a manuscript I have mentioned about 100 times on this forum now; @jdtournier ), but comes with consequences, as failing to consider the whole white matter connectivity limits one’s ability to provide robust quantification.
“I want to estimate the structural connectivity from ROI_1 to ROI_2, but I have made an erroneous assumption that targeted tracking must be used in doing so”.
Quantifying the connectivity from ROI_1 to ROI_2 is not predicated on the reconstruction method that is referred to as “targeted tracking”. Ultimately generation of a connectome matrix is just “quantifying the connectivity from ROI_1 to ROI_2 for all possible ROI_1 and ROI_2”, and this can be done using a pre-generated whole-brain tractogram.

If i used the option -ends_only in step 4 above, I believe this would only test the ends of each streamline against my included ROIs?

This statement is correct, but it’s not the difference between tckedit and tck2connectome that I was hinting at (see next point below).

If so how does it different compared to radial_search in tck2connectome in method_a because my ROIs’ mask correspond to node 146 and node 11 of the Shen atlas?

The exact node indices don’t matter. What matters is that there is an explicit process required in determining, for each streamline, what parcels / ROIs it should be assigned to and which it should not. In tckedit, this is based either on intersection of all streamline vertices with the image(s), or intersection of the two streamline endpoints with the image(s) in the case of -ends_only. In tck2connectome, multiple mechanisms are provided, with the default being the radial search. The consequences of this difference can be foreseen based on the nature of your streamlines reconstruction and the nature of your parcellation. We spoke a bit about this issue in this manuscript.

I do have a better mechanism for dealing with ROIs that are intended to apply to streamlines terminations at the grey matter that’s been collecting dust. Given how often this issue keeps coming up I’m going to have to try to get that moving again.

This post was dated back to 2017 and your answer was “ If you want the track counts to be biologically meaningful, approach b) is required. ”

There’s a false equivalence here between the two methods you proposed in the post above, and the proposals in the post to which you linked.

The relevant text is:

I tried different ways of extracting the tracks between 2 regions like
a) tckgen with seed as left thalamus and include region as left frontal, 1M tracks
b) whole brain tractography 100M ->SIFT 10M->connectome2tck between the 2 regions
c) tckgen with seed as thalamus 10M ->connectome2tck between the 2 regions

None of those proposals include concatenation of whole-brain and targeted-tracking data as in your proposal b.

The intent of my statement:

“ If you want the track counts to be biologically meaningful, approach b) is required. ”

, is that approaches a) and c) as presented in that post are not quantitative.

My suspicion is that you are being led astray because I’m providing precisely correct & generalised answers, whereas your classifications are not consistent with mine from the outset. So let’s instead try a simplified explanation, with the caveats expressed more appropriately as such.

For a particular bundle of interest:

If not targeting quantification, perform targeted tracking.
If targeting quantification, perform whole-brain tracking, SIFT2, pathway extraction.
Technically, it’s possible to perform quantification using 1.; but that would require further explicit description of exactly how that works.
Technically, it’s possible to augment 2. using targeted tracking; there may be benefits for some experiments, but it increases complexity.

Rob