I would recommend you experiment with it and find what works best for your data and specific application. This is one of those things where there is no right or wrong answer. If we had really high SNR, then there would be very few noisy peaks to worry about, so we could lower that threshold. It really depends on your data and how strongly the tracts you’re trying to delineate contribute to the signal, relative to the noise. I think a cutoff of 0.1 is a good starting point - this had been the default for a good decade. But there is nothing preventing you from using a different value.
No. In fact, there’s a good argument to be made for having much higher sampling on the higher shells, since they have a lot less signal, but also a lot more useful contrast.
That would certainly be analysable by MSMT-CSD, so it would suffice in that sense. But ideally, if you have control over this, I’d recommend dropping the b=4000 shell and adding a lot more directions to the b=2500 shell - and even reducing the number of b=1000 volumes to boost the b=2500 further.
Also, bear in mind that you can use MSMT-CSD even with a ‘single-shell’ acquisition, since the b=0 volumes constitute a ‘shell’ in their own right – this means you can do a 2-tissue MSMT-CSD. Take a look at this thread for example.