I used -seed_gmwmi for this instance of tckgen. I also used -seed_image, which did not cause the tcksift error. Any suggestions as to what the problem is?
Running into this problem again I’m doing the Fixel/AFD walkthrough & just created 20 million tracks with tckgen. Tcksift only seems to find 1 track. I ran tckinfo and it returned actual count in file: 19999920. I’m working on a cluster & cleared out my home directory incase space was the issue again, but that didn’t help. Here are the exact lines of code I used:
[as695@blade04 template5]$ tcksift tracks_20_million.tck FOD_template.mif tracks_2_mill_sift.tck -term_number 2000000
tcksift: [100%] Creating homogeneous processing mask
tcksift: [100%] segmenting FODs
tcksift: [100%] mapping tracks to image
tcksift: [WARNING] Only 1 tracks read from input track file; expected 20000000
tcksift: [ERROR] Filtering failed; desired number of filtered streamlines is greater than or equal to the size of the input dataset
I’m not too familiar with this part of the code (@rsmith might want to help me out here…), but this does look a bit odd… So to try to get to the bottom of this:
what is the size of the tracks_20_million.tck:
ls -l tracks_20_million.tck
what does this report:
tckstats tracks_20_million.tck
are you sure about the space issue:
df -h tracks_20_million.tck
has anything happened to FOD_template.mif between tckgen and tcksift? Anything that would cause the streamlines to no longer overlap with the image…?
OK, nothing suspicious in there at all… I’m out of ideas. Can you try running the tcksift command with the -debug flag, see if that shows up anything useful?
Actually, there is something a bit odd in your tckstats output: the max length is over 5000 - you have a 5m long streamline in there…?!? And this despite your explicit -maxlen 250 option to tckgen… Something doesn’t sound right here, but I’m not sure I know what could possibly cause this…
Ah jeez, that’s no good. Good catch, I didn’t even notice that. I’m currently running tcksift with -debug and while it’s not finished, there seem to be issues with finding SH peaks…
tcksift: [ 0%] segmenting FODs...
tcksift: [DEBUG] launching thread "sink"...
tcksift: [DEBUG] waiting for completion of thread "source"...
tcksift: [ 33%] segmenting FODs...
tcksift: [DEBUG] failed to find SH peak!
tcksift: [ 34%] segmenting FODs...
tcksift: [DEBUG] failed to find SH peak!
tcksift: [DEBUG] failed to find SH peak!
tcksift: [DEBUG] failed to find SH peak!
tcksift: [DEBUG] failed to find SH peak!
tcksift: [DEBUG] failed to find SH peak!
It continues on like that until 100% completion of FOD segmentation . It’s currently 57% of the way through of mapping tracks to the image.
Also, not sure if this is relevant, but it can’t seem to find a config file?:
$ tcksift tracks_20_million.tck FOD_template.mif tracks_2_mill_sift.tck -term_number 2000000 -debug
tcksift: [INFO] reading config file "/etc/mrtrix.conf"...
tcksift: [DEBUG] reading key/value file "/etc/mrtrix.conf"...
tcksift: [DEBUG] No config file found at "/home/linux/.mrtrix.conf"
No need to worry about the ‘failed to find SH peak’ messages, they don’t necessarily indicate an outright failure - just that a particular starting point failed, but there are typically multiple restarts (note to self: check that one). Also, no point in retrying with the full option names: that’s a feature of MRtrix3 - it would have failed if there had been any ambiguity. I don’t see anything suspicious in what you’re showing…
Also, no point in retrying with the full option names: that’s a feature of MRtrix3 - it would have failed if there had been any ambiguity.
I see! That’s clever & useful. Good to know.
Yeah, I don’t really know what’s going on. I will say that I ran into something similar yesterday on a much smaller tckgen threshold with a subject. tcksift was giving me the same error of only finding one streamline. I reran tckgen on the subject and tcksift worked the second time around. I didn’t change anything except reordered the flags in tckgen. Hopefully this second run of tckgen on my FOD_template will lead to the same result of tcksift deciding to work! Thanks for your help, I’ll keep looking into my data & post if I see anything out of the ordinary.
tcksift: [WARNING] Only 1 tracks read from input track file; expected 20000000
tcksift: [INFO] Proportionality coefficient after streamline mapping is 3.7425955080267309e-05
OK, that’s weird: It claims only 1 track was read, yet the proportionality coefficient is of the order of magnitude I would expect had a 20M whole-brain tractogram been read successfully. That would suggest there’s something wrong with the delimiters in the track file, and the whole tractogram is being read as a single streamline. But more fundamentally, tckstats and tcksift use the same back-end code to read streamlines files, so there shouldn’t be such a drastic difference in outcomes…
If you can’t find a solid way to reproduce the behaviour, we’ll probably need access to some example data.
Forgive me for the delayed response! I was able to generate a SIFT file with my second run of tckgen, where I altered -max/minlen to -max/minlength. I’d be more than happy to share the original tckgen file that SIFT failed on, & also generate a new tckgen file the same way I did the original one and see if the error replicates. Let me know if you’re still interested & I’ll send over the files via dropbox
Any news on how you solved the issue? I’m facing the same problem after running tckgen (20 M tracks) and tcksift (reducing to 2M tracks) on the cluster:
tcksift: [WARNING] Only 1 tracks read from input track file; expected 20000000
However, according to tckstats, I only have 5409747 tracks.
tckstats: [100%] Reading track file
tckstats: [WARNING] expected 20000000 tracks according to header; read 5409747
mean median std. dev. min max count
60.6027 54.7897 37.128 9.97026 250 5409747
This is weird since in the process of tckgen, I see that at least 18M tracks are created (please see below):
I have no recollection of having received exemplar data from @aszymanski (correct me if I’m wrong), so I don’t know if the source of the problem was isolated or if some mitigating strategy was found.
According to tckinfo, I have 20000000 tracks:
It’s important to know here that tckinfo is only reporting the contents of the .tck file header, which is just a set of keys and values similar to that used by the .mif / .mih format. So “count: 20000000” is just a pair of text strings near the start of the file, which is echoed without ever actually reading the streamlines data. That number is updated by tckgen dynamically as the .tck file is generated; so we can at least say that tckgenthinks that it has written 20M streamlines.
If you were to run tckinfo -count, which explicitly reads through all streamline data in order to determine the streamline count, I would expect it to report the same number of streamlines as does tckstats. I still do not know why the number of streamlines reported by tcksift is different again, but (combined also with the seemingly intermittent nature of the issue) it nevertheless points to some form of data corruption.
I’d need to have access to some raw data in order to properly investigate. However given I’m currently constrained by satellite internet, if you are indeed able / willing to share such, if you can reproduce the fault with a smaller streamline count before uploading it would be very much appreciated.
Indeed, it reported the same number of streamlines.
I think that the corruption of data was due to the fact that I was exceeding my quota limit in the cluster I’m using (hence it could not continue writing the file), so it has nothing to do with MRtrix. I’m testing this at the moment.
The same issue is being encountered here. I may have come up with a reason and a possible solution.
I am running the HCP’s analysis pipeline on a set of 80 HCP-YA subjects with the same pipeline being applied to all, however, tcksift stops with the same warning for some of the subjects (1 tract is read instead of 25M). Here is the tckstats output for one of those subjects (Subject ID for a probable replication: #206222):
tckstats: [100%] Reading track file
tckstats: [WARNING] expected 25000000 tracks according to header; read 24788701
mean median std. dev. min max count
52.8172 38.7908 45.7504 2.5 249.239 24788701
Here is the tckstats output for one of the successfully SIFTed subjects (ID: 102513):
tckstats: [100%] Reading track file
mean median std. dev. min max count
56.5981 41.8272 48.3274 2.5 249.137 25000000
According to this comparison and previous posts, I guess the problem lies in the mismatch between the expected and the actual number of tracts. So, I think editing the header with the actual number of tracts may be an option to overcome this problem but I don’t know how to do it. Any ideas?
Just repeating the reminder at the head of that page that it’s intended to act as a historical reference, not as a “recommended pipeline” that is kept up-to-date and maintained.
tcksift stops with the same warning for some of the subjects (1 tract is read instead of 25M).
With the tckstats example you show, it’s only fractionally less than the intended 25M streamlines that are read, which is quite drastically different from 1. Can you confirm that this is just a matter of different subjects having been used as exemplars for the tcksift vs. tckstats warning messages? Or are tcksift and tckstats reading a different number of streamlines from the same track file?
I think editing the header with the actual number of tracts may be an option to overcome this problem but I don’t know how to do it. Any ideas?
The only problem that this manipulation would overcome would be the issuing of the warning itself regarding the mismatch (this is actually what tckgen & other commands do internally: as more streamlines are written to an output track file in batches, the “count” field in the header of the file is updated accordingly). It would not alter the number of streamlines that can actually be read by any MRtrix3 command, just their expectation of how many will be read. So if the problem is that tcksift can only read one streamline from the file, then modifying the header so that tcksift knows immediately upon reading the track file header that it can only expect to read one streamline from that file doesn’t actually fix your fundamental problem, which is the fact that only one streamline can be read, and SIFT doesn’t work too well in that scenario
Nevertheless, if you’re interested in this kind of data “hacking”, the kind of software you’re looking for are colloquially referred to as “hex editors”. This is the kind of software that a developer such as myself may well employ in trying to diagnose the origin of such a fault.