If the images are not ODFs, I’d recommend removing large bias fields if present, rescaling the images to roughly 0.5 average intensity but most importantly match intensities across input images. This might help.
You can also use a different cost function for instance in ANTs and convert the warps to mrtrix format.
Make sure the affine registration and masks are sensible.
You can convert the non-diffeomorphic warp to the jacobian determinant image and check where the Jacobian is far from 1. You could exclude these areas from the mask, forcing registration to ignore those.