How do the compute intensive algorithms scale with the number of CPUs?

Hi all,

I’ve been using Amazon EC2 for some compute intensive algorithms lately (population_template, mrregister, tckgen and the like), which provides virtual machines with up to 192 cores. Of course, in comparison to my local desktop things run a lot faster, and htop shows all CPUs at 100% load. But I’m wondering: Does computation time scale somewhat linearly with increasing CPU count? Or is there some internal overhead that I’m not aware of that would increase with the number of CPUs, and if that is the case, is there an “optimal” number of CPUs that you would recommend, either from empirical testing or just from your individual experience?