Hello experts, I am looking for feedback on a method I came up with for combining data in a multi-site study. I have data from 8 scanners (the same sequence was used at each site-- b=1000, 32 diffusion directions). Originally I had hoped to do a brute force combination of the data in the fixel pipeline, ultimately controlling for site variance in the fixelcfestats design. This tactic disagreed with the pipeline–I was unable to create an acceptable fod template image using a subset of participants randomly selected from each scanner. Instead, I have done the following “2-step population template creation/registration”, which seems to be working:
For each scanner individually: ran all preprocessing and created 8 scanner-specific population templates (this included using scanner-specific response functions)
Created a “meta” population template using the 8 scanner-specific templates as inputs
Performed a 2-step registration, wherein each subject’s fod image was first transformed to their respective scanner-specific template, and then subsequently transformed to the meta-template space.
performed fixelcfestats with the 8 scanners entered as dummy covariates in the design file
I tested this approach using age as a continuous predictor. The scanner-specific fixelcfe results (I analyzed each scanner separately as well) appear comparable to the results when I combine all the subjects and control for scanner (following the 2-step fod registration/transformation).
So, at this point, what I’m looking for is any feedback/advice/criticism of whether this is a defensible way to combine data from multiple scanners. I would generally appreciate any advice on how to combine data across sites to perform fixel-based analyses.
As always, I appreciate the hard work of the developers!
Do you have the same number of subjects from each site? Depending on the number of subjects you might end up with more or less smooth site-specific templates which might bias the registration. Make sure that the images are intensity normalised across the whole population (and bias field corrected).
This interpolates the images twice. Instead, I would initialise mrregister each subject to the final template using the composed warps (transformcompose) as initialisation.
What do you mean with “acceptable fod template”. What did you try? The latest version of population_template uses three stages (rigid, affine and then nonlinear). You can check intermediate templates in the temporary directory that population_template creates (use -nocleanup to keep it).
If the linear registration fails: Do images from different scanners have large rotations between their scanner coordinate systems? If so, I would register them to a single subject or template using a rigid transformation model. Then apply those transformations to the image header (mrtransform -linear without regridding to a template image) to roughly align the images without interpolating the intensity values. This should give you a good starting position for population_template.
No, the number of subjects varies considerably by site. I was running into trouble during the coregistration phase of the population template creation (if I remember correctly, there were errors referencing scaling). However, now that I think about it, I have since identified some problematic scans; it’s possible it would work now that I’ve removed the last of the “iffy” data.
Yes, the registration was failing–but I haven’t tried the new version yet, so I’m having that installed on our cluster now. I will also throw in the -nocleanup flag. I’m not sure about how to check for rotations between scanner coordinates but that could certainly be part of the problem; if the new version of the software still fails, I’ll try the rigid transformation method you suggested. Thanks again!
Just throwing my 2 cents here If registrations keep failing, perhaps you could try to create an FA template (from all data) instead and use the registrations based on FA to get the FODs in template space. Although I’m not sure if the warps you get then will be useful for later computation of fiber cross section
Were there substantial changes to the new version of dwiintensitynorm? I am intensity norming ~900 scans together in total and it has been running for 4 days across 20 compute cores. I have performed this step before on over 200 subjects (using version 0.3.15) and it didn’t take anywhere near this long (an hour at most). I have confirmed it is not stuck or hung, just proceeding extremely slowly. Here’s the command I ran in the newest release… (allocated 20 cores):
>dwiintensitynorm -force $DIR/preproc/dwiintensitynorm/dwi_input/ $DIR/preproc/dwiintensitynorm/mask_input/ $DIR/preproc/dwiintensitynorm/dwi_output/ $DIR/preproc/dwiintensitynorm/fa_template.mif $DIR/preproc/dwiintensitynorm/fa_template_wm_mask.mif -debug -nthreads 20
Any ideas about what could be causing this to run so slowly?
Oh–one key difference that might be important is that I’m using symbolic links to the dwi and mask data this time. Not sure if that could have anything to do with it but just in case…
One key difference that I can think of, is that we changed the temp folder for such scripts to be created in the folder where you run the script from, rather than in the temp folder of your system. You could test if explicitly providing the -tempdir option with the path to your system’s temp folder has a significant impact on the processing speed.
Without more information I can only add some random guesses:
dwiintensitynorm uses population_template which as of this commit has double the number of linear registration stages. I have never run it on such a huge population but it should not increase the computation time by two orders of magnitude.
The dirty pagesissue could cause such a decrease in performance but I can not think of any recent changes to the code that might have increased this problem. Can you monitor the CPU and machine load with top. If there is a swapping issue then the load average is high but the CPU wa is low.
What is the old version you are referring to? We changed the behaviour of dwiintensitynorm regarding symbolic links in November.
population_template’s temporary directory is created inside dwiintensitynorm’s. So if this used to be in RAM and now is on disk, this might impact performance. Also, if it is now in RAM and you run out of RAM then swapping will cause this decrease in performance.
@Thijs, at first I thought that might have been the solution, but it seems there is a second component to what’s going on, which is that the -debug flag is also slowing performance to a crawl compared to the scenario. In other words, this:
-tempdir /tmp/ -debug
speeds things up a little, but removing the -debug flag:
-tempdir /tmp/
actually puts things back at the speed I would expect. For example, the version without the -debug flag flew through to a place where it crashed in half an hour (crashed because of another issue–see below), but the identical command with the -debug flag is still crawling along 10 hours later. I won’t post the entire job output on the forum due to sheer amount of text (and potentially, privacy issues), but I’m happy to email to you or any other developers who are interested.
@maxpietsch The old version was 0.3.15. There wasn’t really any swap happening according to the system readouts (I had allocated more than enough resources). It seems to be a combo of the needing to set the -tempdir and turn off -debug (see above). These issues might be worth noting in the new instructions for anyone else who might encounter similar problems on HPC systems?
Unfortunately the speedier runtime just led me to the next problem, which is that the population_template step of dwiintensitynorm is throwing warnings:
population_template: e[00;31m[WARNING] large scaling ([1.742296302208011, 11.68378226098585, 2.287510441403197]) in linear_transforms_4/03_S0008_DTI_11.txte[0m population_template: e[00;31m[WARNING] large shear ([-0.8094410813514853, 0.08274868310296028, -1.115916020834314]) in linear_transforms_4/03_S0008_DTI_11.txte[0m
My only guess as to what’s going on is that the data for which this scale/shear warning pops up have 37 volumes (128 x 128 x 60 x 37) instead of 36 volumes (128 x 128 x 60 x 36)… I don’t necessarily understand why that would registration to crash though.
This means that the y-axis for that image is scaled by a factor of 11.7 in relation to the template. Clearly, this warrants a warning…
I would suggest having a look at the input data, masks and resulting fa images in dwiintensitynorm’s temporary directory for 03_S0008_DTI_11. I suspect your data is corrupted. If if looks normal, view it in relation to the template: mrview /tmp/dwiintensitynorm-tmp-JHW1CW/fa/03_S0008_DTI_11.mif -overlay.load /tmp/dwiintensitynorm-tmp-JHW1CW/linear_template3.mif. Any large rotations or coordinate flips?
I would suggest looking at the output of dwi2tensor for those images to make sure the gradient directions are kosher.
Registration should not crash but if it diverges from a reasonable transformation then, in my experience, it is usually due to corrupted data or poor masks. If the data is ok, feel free to send me the command just after "replacing the transformation obtained with" and the corresponding fa image, linear transformations (linear_transforms_*/03_S0008_DTI_11.txt), masks and template image.
What is the actual error message?
-debug in the scripts add -info to mrregister calls which should not impact performance noticeably. However, there is a difference in how the script handles subprocesses in -debug mode. This is some quite complex piece of code to debug… @rsmith might have an idea?
For example, the version without the -debug flag flew through to a place where it crashed in half an hour (crashed because of another issue–see below), but the identical command with the -debug flag is still crawling along 10 hours later.
-debug in the scripts add -info to mrregister calls which should not impact performance noticeably. However, there is a difference in how the script handles subprocesses in -debug mode. This is some quite complex piece of code to debug… @rsmith might have an idea?
The -debug flag is intended for just that: debugging. It’s not intended for use when running 900 subjects. The way that the Python script library interacts with executed commands when the -debug option is provided is quite different, and it’s inevitable that there will be some decrease in performance; but this should be relatively small. Even if you could robustly prove that the -debug flag is the cause of the slowdown, I’m not sure if there’s anything I could do about it; unless there’s a definite condition that is causing that piece of code to ‘hang’ as such.
Though I am curious to know whether or not the relative slowdown between the presence and absence of -debug holds with a smaller number of subjects.
I’m also slightly concerned about the prevalence of these:
Specifically the ‘e[01;31m’ bit. Assuming this is what’s appearing on your terminal and hasn’t somehow come into being in the process of copy&pasting the terminal text into the forum, this suggests that the MRtrix3 back-end is failing to properly identify the nature of standard error / output. This may be due to the differences in how the Python run.command() function invokes commands when the -debug flag is used…
How exactly are you executing the script / what terminal emulator / what Python version / are you piping the terminal output to file?
In my large sample, the problem does seem to have been partly due to the masks I was using. I’m not entirely sure why but I think it’s because the masks weren’t tight enough–I had (probably foolishly) been using BET to make masks because I had encountered issues with dwi2mask in the past. After I swapped out the masks, dwiintensitynorm was at least able to run without crashing. You are ALSO right that the data is corrupted… I haven’t gotten to the bottom of what’s causing it, but sure enough, this is what the output of dwi2tensor looks like for several subjects:
@rsmith, lesson learned about the debugging flag. For whatever reason, that turned out to be the culprit behind the incredible processing slowdown; once I removed the debug flag, dwiintensitynorm took less than 5 hours, even though I kept the temporary folders in the directory I was running the script from.
I’m also slightly concerned about the prevalence of these:
population_template: e[01;31m[ERROR] Command failed:
Specifically the ‘e[01;31m’ bit. Assuming this is what’s appearing on your terminal and hasn’t somehow come into being in the process of copy&pasting the terminal text into the forum, this suggests that the MRtrix3 back-end is failing to properly identify the nature of standard error / output. This may be due to the differences in how the Python run.command() function invokes commands when the -debug flag is used…
That seemed to simply be an artifact of the way the HPC system sent the output via email. (If I tell the script to send the output to a specific folder, those timestamps don’t show up.) Sorry for the confusion!
For whatever reason, that turned out to be the culprit behind the incredible processing slowdown; once I removed the debug flag, dwiintensitynorm took less than 5 hours, even though I kept the temporary folders in the directory I was running the script from.
I’ll still have a look into this when I get a chance. Even though that flag is not intended for “general” use, I wouldn’t expect the slowdown to be that severe. So thanks for bringing it to our attention.
That seemed to simply be an artifact of the way the HPC system sent the output via email. (If I tell the script to send the output to a specific folder, those timestamps don’t show up.) Sorry for the confusion!
Those aren’t timestamps: they’re ANSI escape codes that are used to add colour to the terminal output (e.g. normally an [ERROR] would appear as bold red text). The MRtrix3 backend does its best to detect where the terminal output is being sent to, and enables or disables both this feature and the line-refreshing progress bar (replacing it with a more plain progress bar if deemed necessary). So the fact you’re seeing these codes suggests that this detection mechanism isn’t working as well as it could; hence why knowing exactly how you are invoking the script may be helpful to us.
I also raised this question because these two points may be inter-related. If the MRtrix3 commands invoked by the script are generating larger amounts of text than they should be (to display the line-refreshing progress bars), and the Python library is running in -debug mode (where it literally has to capture and display one character at a time and write all of that text to local storage), it may be a case of compounding effects leading to the slowdown.
I see, okay. Here’s my code (from when I ran the script with the -debug flag; filesystem names deidentified). It’s a simple bash script that I submit to our LSF queue (bsub < myscript.sh)
This looks like you’re trying to set the path to a file where the standard output should be written. However it looks as though the option should be “#BSUB -o” rather than “#BSUB -oo”. I’m not sure whether the system will correctly parse what you have provided there; but the fact that you said that you received the output via email would suggest that it is not. So the slowdown / crash you were experiencing with -debug may be specific to whatever mechanism LSF uses to cache standard output when the -o option is not provided.
MRtrix3 writes almost all of its terminal output to standard error, not standard output. We reserve standard output only for data that are intended to be passed between commands via piping, or numerical results that are intended to be caught in a script. So you want to use the #BSUB -e option as well.
I’d be curious to know whether or not one or both of those suggestions alleviates the -debug slowdown issue…
Edit: A quick dwi2response test gave identical run times with and without -debug. So I reckon there’s a good chance it’s a LSF thing.
In reply to your edit, I haven’t had a chance to test this systematically, so I think you are correct–I have noticed slowdowns on several other processes that are usually extremely fast. Thanks again for your ideas and help! For what it’s worth, with the help of this thread (and others…haha), my analyses not only ran all the way through, but also produced some incredibly cool and replicable results.