This used to be true, but is no longer exclusively true.
What they are referring to is Strict Co-Scheduling.
Most important of all, while in the strict co-scheduling algorithm, the existence of a lagging vCPU causes the
entire virtual machine to be co-stopped. In the relaxed co-scheduling algorithm, a leading vCPU decides whether
it should co-stop itself based on the skew against the slowest sibling vCPU
Now, if the host only has 4 threads, then you'd be silly to allocate all of them. If it has two processors and 4 threads per processor, then you might not want to allocate all of the contents of a single processor, as your hypervisor should try to keep vCPUs on the same NUMA node to make memory access faster, and you're making this job more difficult by allocating a whole socket to a single VM (See page 12 of that PDF above).
So there are scenarios where fewer vCPUs can perform better than more, but it's not true 100% of the time.
All that said and done, I very rarely allocate more than 3 vCPUs per guest. Everyone gets 2 by default, 3 if it's a heavy workload, and 4 for things like SQL Servers or really heavy batch processing VMs, or a terminal server with a lot of users.