The more established configuration management (CM) systems like Puppet and Chef use a pull-based approach: clients poll a centralized master periodically for updates. Some of them offer a masterless approach as well (so, push-based), but state that it is 'not for production' (Saltstack) or 'less scalable' (Puppet). The only system that I know of that is push-based from the start is runner-up Ansible.
What is the specific scalability advantage of a pull based system? Why is it supposedly easier to add more pull-masters than push-agents?
For example, agiletesting.blogspot.nl writes:
in a 'pull' system, clients contact the server independently of each other, so the system as a whole is more scalable than a 'push' system
On the other hand, Rackspace demonstrates that they can handle 15K systems with a push-based model.
infastructures.org writes:
We swear by a pull methodology for maintaining infrastructures, using a tool like SUP, CVSup, an rsync server, or cfengine. Rather than push changes out to clients, each individual client machine needs to be responsible for polling the gold server at boot, and periodically afterwards, to maintain its own rev level. Before adopting this viewpoint, we developed extensive push-based scripts based on ssh, rsh, rcp, and rdist. The problem we found with the r-commands (or ssh) was this: When you run an r-command based script to push a change out to your target machines, odds are that if you have more than 30 target hosts one of them will be down at any given time. Maintaining the list of commissioned machines becomes a nightmare. In the course of writing code to correct for this, you will end up with elaborate wrapper code to deal with: timeouts from dead hosts; logging and retrying dead hosts; forking and running parallel jobs to try to hit many hosts in a reasonable amount of time; and finally detecting and preventing the case of using up all available TCP sockets on the source machine with all of the outbound rsh sessions. Then you still have the problem of getting whatever you just did into the install images for all new hosts to be installed in the future, as well as repeating it for any hosts that die and have to be rebuilt tomorrow. After the trouble we went through to implement r-command based replication, we found it's just not worth it. We don't plan on managing an infrastructure with r-commands again, or with any other push mechanism for that matter. They don't scale as well as pull-based methods.
Isn't that an implementation problem instead of an architectural one? Why is it harder to write a threaded push client than a threaded pull server?