The controller as a performance bottleneck is quite true, and it can represent a single-point-of-failure as well in some architectures. This has been known for quite some time. For a while there were vendor-specific techniques for working around this, but since then the industry as a whole has converged upon something called MPIO, or Multi-Path I/O.
With MPIO you can present the same LUN across multiple paths across a storage fabric. If the the server's HBA and the storage array's HBA each have two connections to the storage fabric, the server can have four separate paths to the LUN. It can go beyond this if the storage supports it; it is quite common to have dual-controller setups in the larger disk array systems with each controller presenting an active connection to the LUN. Add in a server with two separate HBA cards, plus two physically separate paths connecting the controller/HBA pairs, and you can have a storage path without single points of failure.
The fancier controllers will indeed be a full Active/Active pair, with both controllers actually talking to the storage (generally there is some form of shared cache between the controllers to help with coordination). Middle-tier devices may pretend to be active/active, but only a single device is actually performing work at any given time but the standby controller can pick up immediately should the first go silent and no I/O operations are dropped. Lower tier devices are in simple active/standby, where all I/O goes along one path, and only moves to other paths when the active path dies.
Having multiple active controllers can indeed provide better performance than a single active controller. And yes, add enough systems hitting storage and enough fast storage behind the controller, and you can indeed saturate the controllers enough that all attached servers will notice. A good way to simulate this is to cause a parity RAID volume to have to rebuild.
Not all systems are able to leverage MPIO to use multiple active paths, that's still somewhat new. Also, one of the problems that has to be solved on the part of all of the controllers is ensuring that all I/O operations are committed in-order despite the path the I/O came in on and on whatever controller received the operation. That problem gets harder the more controllers you add. Storage I/O is a fundamentally serialized operation, and doesn't work well with massive parallization.
You can get some gains by adding controllers, but the gains rapidly fade in the light of the added complexity required to make it work at all.