5ttgen waiitng for creation of new file

isAarya · July 20, 2017, 11:00pm

Hi All,

I have a diffusion image with 126 directions and b-values as b=0,500,1500,2500,3500

Since I have more than one b-value, I decided to obtain multi-tissue response function using

 dwi2response msmt_5tt subj.mif 5tt.mif wm.txt gm.txt csf.txt

But in order to use this, I am required to generate 5tt.mif file for my T1 image, which I do so using the this

5ttgen fsl T1.mif 5tt.mif

But for some reason, this is taking extremely long to run (more than 1.5 days)…it does not complete.

All I get is this on my cmd prmpt

5ttgen fsl T1image.mif T1_5tt.mif -nocrop -mask bet_preproc_subj_mask.nii.gz
5ttgen: 
5ttgen: Note that this script makes use of commands / algorithms that have relevant articles for citation; INCLUDING
FROM EXTERNAL SOFTWARE PACKAGES. Please consult the help page (-help option) for more information.
5ttgen: 
5ttgen: Generated temporary directory: /data/home/5ttgen-tmp-NGXSSH/
Command:  mrconvert /data/home/T1image.mif /data/home/5ttgen-tmp-NGXSSH/input.mif
Command:  mrconvert /data/home/bet_preproc_subj_mask.nii.gz /data/home/5ttgen-tmp-NGXSSH/mask.mif 
 -datatype bit -stride -1,+2,+3
5ttgen: Changing to temporary directory (/data/home/5ttgen-tmp-NGXSSH/)
Command:  mrconvert input.mif T1.nii -stride -1,+2,+3
Command:  mrcalc T1.nii mask.mif -mult T1_masked.nii.gz
Command:  fast T1_masked.nii.gz
Command:  run_first_all -s L_Accu,R_Accu,L_Caud,R_Caud,L_Pall,R_Pall,L_Puta,R_Puta,L_Thal,R_Thal -i T1.nii -o 
first
5ttgen: Waiting for creation of new file "first-L_Accu_first.vtk"

This seems to be running but doesnt complete. I am not sure if it runs into error or not ?

Am I missing something here ?

Kind regards

Antonin_Skoch · July 20, 2017, 11:18pm

Dear @isAarya,

you may find useful looking at following threads discussing the same issue:

Antonin

rsmith · July 21, 2017, 1:54am

OK, full disclosure needed since people are still hitting this, in the hope that the next victim finds these details.

It is my most up-to-date hypothesis (thanks to help from @aszymanski) that the vast majority of users encountering this issue do so due to the following combination of events / details:

Registration of T1 image to DWIs using FSL flirt. In its default usage, flirt resamples the input image to match the target image. This is unnecessary for ACT. It is better to have flirt provide the calculated transformation matrix, then apply that linear transformation to the T1 image (either using flirt again, or combination of transformconvert and mrtransform), such that the image header transformation is altered but the image grid / intensities are not.
The nucleus accumbens is very small. If the image resolution is very coarse, FSL first is unable to find any voxels that lie entirely within the surface representation of this structure (reported in the FIRST log files as “no interior voxels to estimate mode”), and subsequently fails. Later aspects of the run_first_all script also fail because image / surface files that should have been created are instead absent.
If SGE is available, run_first_all will delegate sub-cortical segmentation processes to the local job queue, and terminate itself with a successful return code before waiting for completion of those jobs. This makes it difficult to tell whether FIRST has run locally but failed, has been submitted to SGE and will later succeed, or has been submitted to SGE but will later fail; hence the “Waiting for creation of file” message.

With fully up-to-date MRtrix3 code, the messages / warnings / errors reported by 5ttgen fsl have changed a little from what you have shown here:

If your system does not have SGE configured, it will terminate with an error if FIRST has failed to segment all sub-cortical structures.
If your system does have SGE installed, it will provide a message stating that it is going to wait for FIRST to produce the output files expected from it, but that if FIRST fails these files will never be created and the script will be left hanging indefinitely.

While I can continue to try to provide more meaningful warnings / errors in this script given the complexities of the run_first_all implementation (which is out of my control), I would suggest that the fundamental cause of the issue is that your T1 image has been undesirably down-sampled.

Cheers
Rob

isAarya · July 24, 2017, 12:04am

Hi,

So I have investigated the cause for 5ttgen error with my own data which has resolution of 0.75 x 0.75 x 0.75 mm.

I am definitely not having issues due to SGE.

Hence I had a go on implementing run_first_all on the original T1 image (no registration to diffusion space).

I hit the same error; i.e on command prompt FSL exited with a numerical code but did not generate the ***_all_fast_firstseg.nii.gz and ***_all_fast_origsegs.nii.gz which it does on successful segmentation and completion of first segmentation function.

On closer inspection of the log file, it too complained the following

create shapeModel 
done creating shapeModel 
-0.0365444 0.100175 0.0195898 
0.00522365 0.0199885 -0.139243 
-0.161422 -0.0248745 -0.00742343 
NEw done imodes transform
Error: cannot find image /data/home/first-nativeT1-L_Accu_first

Which stems from FSL failing to segment Left Accumbens (most likely due to its smaller structure). As also mentioned by @rsmith .

This repeats again on registering the T1 image to diffusion space using FSL FLIRT as can be seen from first post above.

I had other subjects, which too failed FSL segmentation for left Pallidum and brainstem.
i.e. running FSl on command prompt exits with a numerical code. It is the user who has to go and see if the necessary output files have been generated and if they are correct. Because for one subject FSL completed correctly, but the segmentation result was totally incorrect.

I believe the developers for 5ttgen can’t debug for this.

But as feedback, I would like to propose a timeout error if the 5ttgen script is taking unusually long time for its completion. Or terminating the 5ttgen script if it comes across similar message “cannot find image ***_first” in their FSL log files. And also a better error message telling the end user if the script has terminated either due to FSL segmentation or some other step in the 5ttgen script. Because in my instance the script neither terminated nor gave any error message.

Sorry for the extremely long post, but I was very much wanting to use 5ttgen script with my data (will test with 5ttgen freesurfer now).

It is very hard for users to get data with multiple b-values and gradient directions and to be restricted by a third party software from using some extremely good Mrtrix functions is kind of let down.

I am just wondering if there will be any workaround this issue in future, where if FSl segmentation fails users can follow another approach and still be able to use 5ttgen script !.

Kind regards.

IA

rsmith · July 24, 2017, 7:15am

Error: cannot find image /data/home/first-nativeT1-L_Accu_first

@isAarya: Could you possibly try re-gridding your T1 image to 1mm isotropic, and then running run_first_all? It’s possible that there’s a more general failure of nucleus accumbens segmentation, which is not specific to very low-resolution data but more generally due to deviations from data that the method was tuned on.

But as feedback, I would like to propose a timeout error if the 5ttgen script is taking unusually long time for its completion.

With fully up-to-date MRtrix3 code, if SGE is not present on the system, the 5ttgen fsl script should no longer wait for these files; it should immediately exit as soon as run_first_all completes and it detects that this file is not present. I could add a timeout, but this will only have an effect on systems where SGE is enabled.

Or terminating the 5ttgen script if it comes across similar message “cannot find image ***_first” in their FSL log files.

I’ve considered something like this; but it ends up being a lot of work on my end trying to implement and manage it. Because there may be a delay in the creation of those files; or the files may be created, but the underlying error has not yet occurred; or FIRST’s text file outputs may change in a newer FSL version.

And also a better error message telling the end user if the script has terminated either due to FSL segmentation or some other step in the 5ttgen script. Because in my instance the script neither terminated nor gave any error message.

This should be better in the most up-to-date code; upon failure (in the absence of SGE) an error is provided along with reporting the location of the FIRST error text files. If SGE is enabled, a more clear warning is provided regarding why the script is waiting for that file, and the fact that the script is unable to detect a failure of FIRST.

I am just wondering if there will be any workaround this issue in future, where if FSl segmentation fails users can follow another approach and still be able to use 5ttgen script !.

There are always workarounds and alternatives.

If you are able to manipulate the run_first_all settings in order to get it to complete, you can then use the -continue option that’s available in all MRtrix3 Python scripts to instruct the script to skip those steps that have already been completed (admittedly it’s a little hacky, but it can be useful).
5ttgen fsl was never intended to be the one and only mechanism for generating 5TT files for ACT. I went to the effort of creating the algorithm module in the Python library, which allows for the addition of new mechanisms for performing any given task, partly for this reason (it’s also how dwi2response provides so many different algorithms for response function estimation). So if anyone ever feels inclined to experiment with different softwares for deriving the 5TT image, or workarounds for getting successful tissue segmentations on certain types of data, you can create a new script file in lib/mrtrix3/_5ttgen/, and the 5ttgen script will make it available to you at the command-line.

isAarya · July 30, 2017, 11:39pm

@rsmith

Did you mean re-gridding i.e. resizing the image file using Mrtrix mrresize to 1mm isotropic ?

rsmith · July 31, 2017, 1:59am

Yep; just mrresize -voxel 1.0.

There’s definitely been a pattern of reports of this error being with T1s of low resolution, and in my own testing I’ve found that upsampling the image to 1mm resolution results in a successful FIRST run. A future update to 5ttgen fsl will do this automatically, while also warning the user that their data are of a low resolution. What I would like to know is whether having T1 data of a higher resolution also tends to result in FIRST failure.

isAarya · August 1, 2017, 12:51am

@rsmith

I did resize the data to 1mm isotropic using mrresize imagedata.nii outputdata.nii -voxel 1.0

FSL command line

run_first_all -i outputdata.nii -o /data/home/first-output/

In command line I used following to check log errors (or any output error)

cat *.logs/*.e*

It did not display any error/information.

But I also checked the output files; my_output_name_all_fast_firstseg.nii.gz and my output_name_all_fast_origsegs.nii.gz.

But the output is not correct (snapshot for “_all_fast_firstsegs.nii.gz” )

Resize just overcame the previous error of not being able to find L_Accumbens.

Kind regards.

rsmith · August 1, 2017, 2:17am

Yeah, something’s gone well and truly wrong there. You’d need to have a look at the intermediate steps of run_first_all, both at 0.7mm and 1.0mm, to properly understand what’s happening in each case. Unfortunately I can’t help a whole lot in those details.

vasiliki · September 14, 2017, 2:58pm

Hi Rob,

I have the problem in this thread and I have a question. I work on a cluster. When I submit the command on the terminal individually for each subject it works but when I submit it as part of a script it collapses. It gets stuck on the “Waiting for creation of new file “first_all_none_firstseg.nii.gz””. Do you think this may have to do with the fact that I am working on the cluster? It doesn’t really give me an error. It is just stuck. Any ideas about how to troubleshoot this?

Cheers,
Vasiliki

jdtournier · September 14, 2017, 4:19pm

Hi @vasiliki, have you seen this thread? I have a strong suspicion this might fix it…

rsmith · September 15, 2017, 4:34am

No, the fact that it reaches the “waiting for creation of new file” message means it can’t be an error at the SGE job submission step.

I do need to make changes to how the success / failure of FIRST is detected; but even though there’s a chance that with those changes the script would still be able to proceed, it doesn’t explain why the FIRST job is (semi-)failing when run on SGE. It’s quite unusual for the same subject to succeed when run locally, yet fail via SGE (assuming the FSL version is the same); I’d be curious to find out exactly why that is the case…

If you can send to me two script temporary directories, from running the same script on the same subject but once successful from running locally and once failed via SGE (preferably having been left running for a good couple of hours to make sure the job has in fact executed), that could prove informative. Unfortunately I can’t provide a “simple” solution to fix processing the data via SGE; but if you really need to be able to process the data in this way in the near future, let me know and I can send you a modified script that should get you by until I implement a more robust and user-friendly fix.

Rob

isAarya · September 15, 2017, 5:45am

Hi,

The waiting for creation of new file message indicates that your 5ttgen script has failed and most likely due to failure in FSL FIRST function which the 5ttgen script calls for you.

In can’t say much about cluster as I have not used it.

FSL FIRST fails as it is unable to segment the supported structures (Putamen, Caudate, Nucleus Accumbens, Globus Pallidus, Hippocampus, Amygdala, Thalamus and Brainstem)

I would suggest running your dataset using

run_first_all -i t1_image -o output_name

When using command prompt the function will terminate with a numerical code. This does not imply the segmentation has completed suceessfully.

You need to carry out

cat *.logs/*.e*

To check for any errors in your log file. If there is no error, only then your segmentation has been complete.
Have a look at this FSL guide http://web.mit.edu/fsl_v5.0.8/fsl/doc/wiki/FIRST(2f)UserGuide.html,

I would still check the output of the file to ensure it is indeed correct.

To resolve for the error; you may then change need to change your resolution to 1.0 mm or more.

The other reason why FSL FIRST would fail can be found in these links
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=fsl;c4072a90.1411

https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=fsl;f6eb1454.1412

Note: in my study FSL failed as the orientation of images was not in MNI standard and the high resolution of 0.75mm ( shown in figure above). Once I fixed the orientation and the resolution. The error was resolved.

Also you mentioned that you have tested on individual subjects…was this carried out for all subjects …or just random subjects…

If it was carried out for just some random subjects…it is still likely that one of your dataset/subject 5ttgen has failed and hence the outcome

rsmith · September 15, 2017, 8:15am

Note: in my study FSL failed as the orientation of images was not in MNI standard and the high resolution of 0.75mm ( shown in figure above). Once I fixed the orientation and the resolution. The error was resolved.

OK, the orientation might be a point of interest; the strides of the image are modified to -1,+2,+3 (i.e. LAS) before FIRST is run, this has certainly solved alignment issues for me in the past but maybe it’s not an adequate solution…

@vasiliki: In addition to sending the temporary directories, are you able to test different MRtrix3 versions within the SGE environment? If so, there’s a new code branch fsl_checkfirst_function that contains a proposed fix; it’s quite difficult for me to test such changes since I don’t have access to a SGE-enabled system, so any feedback would be very much appreciated.

vasiliki · September 25, 2017, 10:41am

Hi Rob,
Thank you very much for your reply. So, here is the progress up to date. Bash script on sge enabled environment runs fine but sge script in sge enabled environment fails. I did not try to disable sge environment. If I understand well the reason for this happening is that in the case of the sge script, each subject/file is allocated as one job to one server according to the predefined threads. However, this subject needs to undergo a number of processes that are then submitted as sge jobs. So it is like a “nested” sge submission (if I can say that) with the first level-outer loop referring to the subject level and the second level-inner loop to the individual command level. I think that my system cannot handle this hierarchy and therefore it collapses. I will send you in your email later this week the files you requested so you can have a go.
Cheers,
Vasiliki