Population Template Non-Linear Registration Error

soph · August 6, 2020, 3:44pm

Hi everyone,

I’ve been running the population_template command on a set of 46 FODs and corresponding masks for a week now. Linear registration was completed successfully, but it suddenly failed during stage 5 of 16 of non-linear registration on the mrregister command. This is the error message I got:

Also, the error.txt file in the template folder contains this:

mrregister /gs/gsfs0/users/mrrc-all/PROJECTS/HOLTZER/MSAGING/MS_PROCESSING/GROUP_LEVEL_PROCESSING/wmfod_input/2019-10049-047-TH_wmfod.mif nl_template3.mif -type nonlinear -nl_niter 5 -nl_warp_full warps_4/2019-10049-047-TH.mif -transformed inputs_transformed/2019-10049-047-TH.mif -nl_update_smooth 2.0 -nl_disp_smooth 1.0 -nl_grad_step 0.5 -force -nl_init warps_3/2019-10049-047-TH.mif -mask1 /gs/gsfs0/users/mrrc-all/PROJECTS/HOLTZER/MSAGING/MS_PROCESSING/GROUP_LEVEL_PROCESSING/mask_input/2019-10049-047-TH_mask.mif -mask2 nl_template_mask3.mif -datatype float32 -nl_lmax 2

If anyone could help shed some light on this, I would really appreciate it! I may try to rerun the command with only 20 FODs to try to speed things up, but I want to make sure this doesn’t happen again.

Thanks,
Soph

maxpietsch · August 7, 2020, 1:51pm

The lack of output of the failed command is a bit surprising and unfortunately makes debugging this guesswork. @rsmith @jdtournier Any idea what scenarios cause no error message being shown?

You should be able to continue the script with the -continue option and a unique file identifier of the last successfully processed file (for instance warps_<X>/<filename>.mif if the last successful command was a call of mrregister). You can find the command log in the scratch directory. I’d make a backup of the temporary scratch directory before rerunning with -continue and first try to find out what went wrong.

A week processing time for 46 images does sound fairly long. Also, 20 subjects will be about twice as fast but the slowest and most memory intense steps of population_template are the last nonlinear registration stages that you haven’t even reached yet. Could your computer be running out of RAM? If so, then mrregister might have been killed by the operating system if it also ran out of swap space. You could test that by manually repeating the failed command from the scratch directory while monitoring your memory usage (via htop for instance or via egrep 'Mem|Cache|Swap' /proc/meminfo). If you run out of RAM, check the size of your transformed input image and template (mrinfo).

soph · August 7, 2020, 6:43pm

Thanks for the reply! I tried rerunning the mrregister command and I believe it was a RAM issue. Would cutting down the number of subjects to 20 resolve this problem?

maxpietsch · August 10, 2020, 7:44am

Unlikely, registration is performed iteratively on image and template pairs so the runtime depends roughly linearly on the number of subjects, memory requirements are about constant. On any recent computer with “normal” image resolution and extent you shouldn’t run out of RAM. If this is run on a HPC node, you might need to allocate more RAM.

How large (voxels) are your input images (mrinfo) and did you request a custom template resolution (-voxel_size)? Did you upsample your input data? Make sure to use masks for all input images as this speeds up registration and helps in reducing the template size. The template size is determined after initial alignment and cropped to the dilated template mask.

rsmith · August 23, 2020, 12:15pm

@rsmith @jdtournier Any idea what scenarios cause no error message being shown?

In case it’s relevant, this must be a version of MRtrix3 prior to 3.0.0. The output error string is now changed to be more explicit in cases where the failed command does not produce any information on stdout or stderr, precisely because in cases like this it makes the script’s output message slightly confusing.

It’s mostly there for invoked external commands though. If mrregister terminated without producing any terminal output, then not even the signal handler completed; which would suggest that the operating system SIGKILLed it.