Mrconvert and dwicat

I’ve just updated to the latest version of MRtrix 3. It looks great so thanks for all your hard work putting it together.

I have an existing pipeline for multi-shell DWI data (acquired as separate series). As the first step I had been combining the acquisitions together and converting from DICOM to mif in a single step using mrconvert.

I’d now like to take advantage of the new dwicat command to account for differential intensity scaling between the acquisitions.

Is there a recommended way of achieving this? Should I just run mrconvert on each series separately and then combine with dwicat or is there a clever way of combining the commands?

Many thanks and all the best

Stephen

I’m pretty sure the only limitation with dwicat is that it’s not possible to use the -grad or -fslgrad options to supply the DW gradient table, since it would be difficult to work out which input image each DW gradient table should belong to. So that means the command will accept any image format capable of storing the DW gradient table in the header. That includes DICOM and .mif/.mih, and means you should be able to pass your DICOM folders directly as input on the command-line, e.g.:

dwicat DICOM_series1/ DICOM_series2/ DICOM_series3/ dwi.mif

Note that this assumes each of the series you’re trying to concatenate resides in a distinct folder, so that the DICOM imports happen without user interaction. Otherwise, it’ll also work if they’re all mixed up in the same folder, but it won’t so easy to automate – in which case you might indeed be better off using mrconvert for the initial import before calling dwicat.

Thanks so much Donald, that works perfectly for me.

All the best

Stephen

What’s maybe missing from this discussion currently is what is to be done with the data after dwicat. For instance, if you intend to run dwidenoise, the question is: is the magnitude of the noise equivalent between series before the scaling within dwicat, or after (i.e. whatever caused the image intensities to go up or down between series also caused the noise level to go up and down)? This could break dwidenoise's assumption of fixed noise level within a spatial window if not handled correctly.

I should also note more generally that dwicat has been added to MRtrix3 in the hope that it will prove useful, but is not something that has gone through a barrage of tests to assess e.g. influence on robustness of FBA pipelines or the like. Indeed I personally work with Siemens data and so don’t suffer from the specific issue that this script is intended to address. Also note that the precision of results do depend a bit on the presence of a brain mask, which of course is difficult to generate robustly even on processed DWI data let alone unprocessed DWI data. So please do give it a try and let us know if it helps, but don’t consider it a perfectly-robust, MRtrix3-team-designated-requisiteTM tool.

Hi MRtrix community,

I am jumping in this thread as it is relevant to my work as well.
I am starting to develop a pipeline for a DWI dataset acquired at 7T. This is a multi-shell protocol, however collected as separate single-shells acquisitions. Specifically, the sequence includes: one b=1000 shell (12 encoding directions + 1 b=0 volume), one b=2000 shell (27 directions, of which 3 b=0 volumes) and one b=3000 shell (53 directions, of which 5 b=0 volumes). Additionally, 6 b=0 volumes were acquired with opposite phase encoding direction.

If I understand correctly, multiple shells should be combined together and fed into relevant commands as one single dwi.mif file (correct?). Given the considerations of @rsmith, I am wondering what is the best approach to follow? I can think of three alternatives:

  • Concatenate the DICOMs as @jdtournier suggested

  • Concatenate the already converted .mif files

  • Do some pre-processing (e.g. dwidenoise) on the separate shells and then combine (?)

Regardless of the specific method, is there a specific order how they should be concatenated? (e.g. all b0s stacked at the beginning of each shell?)

Based also on what I read in another post (Troubles with tractography on 7T diffusion data), I checked whether the intensity fluctuates between the different shells. Turns out it does a bit (mean b=0 intensity within same brain mask = 624.135, 619.602, 617.853 for b=1000,2000 and 3000 respectively).
Is this considered problematic? I tried to scale it by fixing the mean b=0 intensity to 1000 as suggested in the mentioned post, but this did not solve the problem…

Sorry if the answers to these questions may seem very obvious to you, but I never handled neither multi-shell nor 7T data before, so I am trying to define the most appropriate way of proceeding.

Thanks in advance!

Hi Samantha,

I think there’s been a fair bit of residual uncertainty about exactly what to do here. I took on responsibility for producing dwicat even though I don’t have to deal with such data myself, and that means it hasn’t been thoroughly experimented with. But we should probably try to reach something resembling a consensus.

By my logic, since dwidenoise is looking for a noise level, that level should scale with the image intensities (assuming that digitisation is not the principal source of such noise), meaning that if the image intensities varied drastically between series, and one wanted to run dwidenoise on the concatenated data, one would want to rescale those intensities using dwicat before dwidenoise. However there are others more familiar with this technique that I’d want to hear from before advising such.

But if we accept that, there’s still a couple of other things to consider:

  • I know that mrhistmatch scale (which performs the grunt work of dwicat) works much better if a decent brain mask is used. But this means firstly deriving a brain mask on raw unprocessed DWI data, which may be problematic, and secondly deciding how to derive such a mask; e.g. if you want to use all of your DWI data in the derivation of that mask, that requires concatenating your data, generating a mask, and then discarding the concatenated DWI data and re-doing the concatenation using dwicat; slightly circular, but hey, if it works…

  • There will be some intrinsic variance within the estimation process of dwicat due to image noise / re-gridding / methodological details. If this variance were to exceed the variance in image intensities between acquisitions, then it may in fact be detrimental to apply dwicat. Part of my hesitation in advocating for its use is that I don’t actually know where that threshold is.

  • If your analysis treats each unique b-value separately, and does not explicitly infer any relationship between intensities across b-values, then the outcome should be just about invariant to what dwicat does. The problem is principally e.g. if you’re calculating kurtosis tensors, looking for non-Gaussianity of the signal decay curve, but for one of your b-values the intensities are artificially way lesser or greater than what they should be: that scaling effect would completely corrupt the resulting tensor fits. But for multi-shell SD, it doesn’t actually matter.

  • As discussed elsewhere recently (but currently evading my search attempts), you also want to check the parameters of your sequences. Sometimes when different b-values are acquired using separate executions of the diffusion-weighted sequence, the TE and TR might be shortened for the lower b-values, and this breaks the fundamental assumption of dwicat, which is that the b=0 volumes in each series should look identical.

  • Concatenate the DICOMs as @jdtournier suggested
  • Concatenate the already converted .mif files

There should be no difference between these two; there’s no information in the DICOMs that would be present in those existing .mifs but wouldn’t make it through dwicat if executing directly on the DICOMs, or vice-versa.

Do some pre-processing (e.g. dwidenoise) on the separate shells and then combine (?)

In addition to the discussion above, here you would have the issue that the 13 volumes present in the b=1000 series really isn’t adequate for dwidenoise to operate. I don’t know that there’s a hard threshold, but I reckon having more than 27 volumes in order to justify a 5x5x5 neighbourhood is probably a decent rule of thumb?

Turns out it does a bit (mean b=0 intensity within same brain mask = 624.135, 619.602, 617.853 for b=1000,2000 and 3000 respectively).
Is this considered problematic? I tried to scale it by fixing the mean b=0 intensity to 1000 as suggested in the mentioned post, but this did not solve the problem…

That’s pretty small deviation (<1%), so wouldn’t consider it problematic. Given that “fixing the mean b=0 intensity to 1000” would by definition make the mean b=0 intensity exactly 1000 in all series and hence solve the problem, there may have been some error in the attempt. But this is nevertheless essentially what dwicat is doing: it works using histogram matching rather than just a mean intensity, and it currently uses the b=0 intensities of the first input series as the reference rather than 1000, but the concept is basically the same.

Rob

Hi Rob,

Thanks for your reply and your comments! I will address some of these and provide some more details on my intentions for the subsequent analyses.

I suspect and realise now that I was probably too ambitious in expecting that the scaled b=0 images would then match at the voxel level, rather than at the mean/median intensity level. In this latter case, and if an equal overall mean intensity suffices, then the approach of scaling by fixing the mean b=0 intensity to 1000 obviously works.

I also came across mrhistmatch scale after I posted my question, using the b=0 from the b2000 and b3000 acquisitions as input images and the b=0 from b1000 as target. I used the same brain mask as mask_input and mask_target, simply obtained with dwi2mask on the b1000 acquisition. However, the mean b=0 intensity of the scaled images was not equal across series. From what you mention, I can assume that this is due to an improper use of the brain mask?

I then wonder, does dwicat also rely on the use of a good brain mask to achieve a good result (if it is basically doing the same as mrhistmatch)? Since this might not be trivial, then I think I would be better off maybe scaling the mean intensity “by hand” (with the above mentioned approach), and then concatenate them with mrcat instead, no? What do you think?

TR and TE are the same across acquisitions, so that’s good.

My analysis plan includes performing MSMT CSD to then perform tractography (although I am not sure yet of which specific tractography approach) to track projections from 2 deep brain nuclei to the cortex. Would then this be a non-fundamental issue in my case?

Finally, I still have a doubt about this one:

Thanks again!
Sam

I also came across mrhistmatch scale after I posted my question, using the b=0 from the b2000 and b3000 acquisitions as input images and the b=0 from b1000 as target. I used the same brain mask as mask_input and mask_target, simply obtained with dwi2mask on the b1000 acquisition. However, the mean b=0 intensity of the scaled images was not equal across series. From what you mention, I can assume that this is due to an improper use of the brain mask?

Well, if dwicat were to not use mrhistmatch scale, and instead just set the mean b=0 intensity (optionally within a mask) to some fixed value, then one would expect that following dwicat, the mean b=0 intensity (within the mask if a mask was used) would be that fixed value and hence identical across series. So assuming that you either used a mask for both intensity matching and testing of the intensity matching, or you didn’t use a mask in either case, the masking is a red herring.

The reason why the mean b=0 intensities don’t match exactly in this case is because that’s not what mrhistmatch scale does. If it just took the mean intensities and made them equivalent, I’m not sure if that would classify as “histogram matching”; “intensity matching” maybe.

Now here’s where things get weird (and this is possibly another source of internal doubt about more widely advocating the use of dwicat). And hopefully I get this right and don’t make a fool of myself.

What’s actually happening is that the image intensities are all being sorted and concatenated into a large linear equation, the solution to which is the scaling factor that makes those histograms most similar to one another. If using mrhistmatch linear, this linear system formulation is necessary in order to simultaneously estimate both a slope and intercept for modulating the intensities of one image to match another. But in the case of mrhistmatch scale, this linear system of Ax=b has A as a N x 1 vector of input image intensities, x is a 1 x 1 “matrix” which is just the scaling factor, and b is a N x 1 vector of target image intensities (there’s some internal trickery in here to deal with the case where the target image may not have the same number of voxels as the input image, but I’ll spare the headache as much as possible).

This gets solved using the Normal Equation (as of 3.0.2):

ATAx = ATb

Both ATA and ATb are just scalars. So really, what mrhistmatch scale is doing is finding the right multiplicative factor:

x = ATb / ATA

that will make (input_image x target_image) equivalent to (input_image x input_image), where this is a kind of multiplication of “corresponding” intensities between two images.

This contrasts against just making the mean b=0 image intensities equivalent, which in this notation (and limited typesetting) would look something like:

x = mean(b) / mean(A)

So the fact that mrhistmatch scale is doing something different to just making the means equivalent means (:upside_down_face:) that the means won’t necessarily be equivalent afterwards. Your homework question is: Is the former approach better? :grimacing:

does dwicat also rely on the use of a good brain mask to achieve a good result (if it is basically doing the same as mrhistmatch)?

I’d say that the precision of dwicat is better if it can be provided with a reasonable mask. It works without one, but the fact that you then have a lot of pure-noise voxels contributing to the intensity matching means that the estimates are not quite as good. I did some testing at the time where I extracted different b=0 volumes, artificially scaled them, and then checked the scaling factors estimated within dwicat, and from memory the errors relative to the actual induced scaling were less if a mask was used. But this was not robust testing across a range of datasets.

Since this might not be trivial, then I think I would be better off maybe scaling the mean intensity “by hand” (with the above mentioned approach), and then concatenate them with mrcat instead, no?

Better would be to modify dwicat to have a command-line option to use the mean b=0 intensity of each series for determining the appropriate scaling factors instead of mrhistmatch. That would take far less effort overall if you have a lot of data to process. But I would not expect use of the mean intensity to be any less dependent on masking than the mrhistmatch approach.

Would then this be a non-fundamental issue in my case?

If it’s purely single-subject analysis, then it’s likely the case that the whole pipeline is in fact invariant to this process. However if you’re doing a group analysis, and hence using group-averaged response functions, then fluctuations of image intensities within specific shells would bias the tissue decomposition in one direction or another.

Regardless of the specific method, is there a specific order how they should be concatenated? (e.g. all b0s stacked at the beginning of each shell?)

Not for any MRtrix3 commands. Anything that is b-value dependent will explicitly group image volumes into “shells”, and interpret them based on the diffusion sensitisation direction (or equivalently to one another in the case of b=0 volumes), and so is order-independent both within and between shells.

Rob

Hi Rob,

After testing different options, I will try to summarise the findings hoping to make some sense.

Battle was between dwicat and

sig=$(dwiextract in.mif -bzero - | mrmath - mean -axis 3 - | mrstats - -output mean)
mrcalc in.mif 1000 $sig -div -mult out.mif

first using a brain mask, then using no mask. I then plotted the histograms of intensities (mrhistogram) of the mean b=0 volume for each re-scaled shell. I looked at the overlap visually, and then calculated the differences of the AUC. I did the same plotting and calculations for the untouched, unscaled images as a reference.

–> Comparison of the two approaches using a mask:
I first tried to obtain a mask that would be as close to good as possible, by doing 3 iterations of dwi2mask -> dwibiascorrect -> dwi2mask (and some final manual editing to fill remaining holes). The same mask was used across all commands and all shells.
These are the histogram plots of intensities (within mask) of the mean b=0 image for the untouched, mean_scaled and histogram_scaled shells.


Not big differences visible on the naked eyes, but turns out that the differences in AUC are actually smaller in the untouched images than when applying either one of the two intensity matching methods. If the criteria is difference magnitude, histogram matching outperforms mean matching.

–> Comparison of the two approaches without using a mask:
Well, here it is interesting that basically the histograms already match when left untouched (funny). The two matching approaches then introduce some variability in the sense that you can observe some difference in the AUC of the scaled mean b=0. Again, histogram matching outperforms mean matching.

I hope I did not completely misunderstood or chose a totally useless and irrelevant way of judging which of the approaches works better.
Anyway, given that my histograms are already matching after simple conversion from DICOM, all this fuss is probably not relevant to my case :sweat_smile:?

I do indeed focus on the single-subjects and I am not really aiming at group-level analysis. So I guess that I can just “not bother” with this and concatenate the different shells with mrcat?

Hi Sam,

Cool to see your investment of effort here. It also highlights how the absence of a mask influences the histogram matching process (and indeed may have influenced your histogram generation process here: they look too coarse along the x-axis). What exactly are you quantifying when you refer to AUC? Is that literally the area under the histogram curve? Because that’s just the sum of all image voxel intensities, which can be quantified without necessitating histogram binning.

Anyway, given that my histograms are already matching after simple conversion from DICOM, all this fuss is probably not relevant to my case :sweat_smile:?

Entirely possible: as mentioned previously, it’s a question of whether or not the imprecision of the histogram-matching process exceeds the actual variance in your input data. Though including this process can also act as a safety net, in case your exemplar data do not contain rescaling between shells but other data do.

Hi Rob,

Yes, the AUC was simply the area under the histogram curve. But, I did also subtract the single histogram bins (all generated with command: mrhistogram -bin 100), and the pattern of results was the same (i.e. the least difference between histogram bins corresponded to not applying any intensity scaling at all).

All in all, it’s good to be aware of this potential caveat and do this extra check as a very first step in the analysis. I still need to start data acquisition, and I am currently evaluating and testing the DWI sequence. I will eventually check this at the group level and judge whether some intensity scaling is advisable… Thanks a lot for your input :slight_smile: greatly appreciated!