Aha, good to hear, that clarifies a lot! So to clarify from our end too: the b=0 is never assumed, but regarded just as another “shell”. The easiest way to get your mindset in the right mood when working with MRtrix is to think of “shell” as “b-value”. So you’re selecting b-values with the -shell option. The b=0 value/“shell” doesn’t have a special status (but it may be of particular value to you of course).
What @bjeurissen mentioned, is that there’s currently a bug in dwi2fod msmt_csd though: it essentially ignores the -shell option altogether. So what the msmt_csd algorithm got confronted with is a set of response functions with only 3 b-values, whereas the data had 5 b-values (and the -shell option didn’t do its work to reduce those to 3 as well). Hence mismatch, hence the error that you got. But this is definitely a bug. In the mean time, you can overcome this, by making sure you only feed in a dataset with the b-values that you need (and not the ones that you don’t need, i.e. b=500 in your scenario).
There’s 2 ways for you to proceed now (both achieve exactly the same end result, it’s just different in how you execute them). The easiest is do dwiextract before you do any other step (including dwi2response). If you do that, you don’t have to worry about a -shell option any more: that selection will have already been made “once and for all”. This would work like this:
dwiextract dwi.mif dwi_without_b500.mif -shell 0,1500,2500,3500
dwi2response dhollander dwi_without_b500.mif wm.txt gm.txt csf.txt -mask mask.mif
dwi2fod msmt_csd dwi_without_b500.mif wm.txt wm.mif gm.txt gm.mif csf.txt csf.mif -mask mask.mif
Things to note:
- See how I included b=0 at the start in
-shell 0,1500,2500,3500
- See how both subsequent command lines become much “cleaner”. The b=500 is already safely dismissed by then, and I just use the
dwi_without_b500.mif as the inputs there. You can of course give it a shorter name for convenience; my example just aims to be quite explicit with file names there. 
The other way would be closer to what you were trying to do, but would involve a -shell option to dwi2response but still another one to dwiextract before dwi2fod. It would work, but is unnecessarily complicated. 
Finally: dwi2response doesn’t even need the -mask if you’re working on typical human data, it can automatically do a dwi2mask itself if you don’t give it the -mask. If you do supply it to dwi2response, make sure that it covers a whole brain though, and not for instance just the white matter or something. dwi2response needs regions of GM and CSF within the mask to be able to extract those response functions of course.
Finally-finally: unless you’re looking at very exotic data (from the point of view of what was scanned), the dhollander algorithm should at this stage perform at least as good, and most of the time better than the msmt_5tt algorithm to extract the responses. In cases of several “common” pathologies, it even performs much better, as the default 5tt segmentations that we offer out of the box can make significant mistakes there. If you’ve got questions specifically for your data/subject(s) at hand and you don’t mind disclosing, definitely just describe or show a few screen shots and I can reassure (if applicable) you about this. 
EDIT: wow, that response took so long to type with other things happening over here in between… that I got beaten by 20 minutes with other responses. 