What I have heard is that we want to inline almost everything, not to avoid function call overhead, but to make things visible to the optimizer. And that the ideal super compiler would heavily inline everything it could and then perform *outlining*, essentially compression, to reduce the code size for improved I-cache.
But, this is a very confusing topic. I've thought about it a lot and got nowhere. 100 % caller saved vs 100 % callee saved would seem to be isomorphic. But maybe 50/50 is somehow better? I feel like we will only find the truth by testing different calling conventions and then afterwards figuring out why some are better. Or maybe they all behave about the same.