If your model suggests doing so without the username also;
You can't do this because not all 2FA tokens are the same. Each token is unique (secret key) and therefore you must identify the individual attempting to login before you know which token you are dealing with to check against it.
In the manner you suggest submitting any token code will immediately validate to an unknown user whose token happened to be valid at that time, or no one, which in large websites with many hundreds of thousand of users increases the likelihood of hitting the correct code for any one of those users. Especially since modern 2FA usually employs some kind of window allowing codes to remain valid before and after the current correct iteration (accounting for clock skew and other time difference sources). So each user could likely have many valid codes available at a time, further increasing the chances of success.
I'm not sure you would be able to withhold the information from the user if they typed the wrong password either. I agree you don't send a code (in the case of sms or email codes), but if the user does not understand that the site had a problem, and what that problem is, it creates a frustrating experience for them (as you pointed out in your cons of part 2). Companies hate frustrated customers a lot more than they hate lower security.
Lastly, this would create a way for attackers to run spam that comes from a legitimate business in the case of SMS and email codes. Clicking the send my code button for all codes ever, even with cool-down will get annoying if it keeps appearing every cool-down.
If however in your model you suggest that the username is required;
If username and then the code is required, and a cool-down period is applied to prevent attacks, I would still advise against it. In that case, by only knowing the username, I can run a DOS on that user, forever, with little to no effort. Simply specify username and keep submitting codes. The lower you make the cool-down, the better it is for the user, and attacker, brute-forcing essentially a 6 or 8 digit all integer password. The longer the cool-down, the worse for the user, and better for attacker doing a DOS. In both cases the user loses, attacker wins.
To sum it all up the reasons I think the internet gravitated toward this system are as such:
- Username/Password systems were already in place at the time and hardware 2FA were the first ones to arrive, meaning it was absolutely required to know who the user was first.
- 2FA implementations, even ones in software (the kind to send to email and SMS) are often the same methodologically as the hardware keyfobs. Thus have the same identification requirement.
- Servers are good at dealing with a flood of requests in the case of attacks, end-users are not.
- It is simpler to verify someone's identity with their predefined credentials directly with the server first rather than rely on a multitude of systems which you are not certain are currently up (server01 > sms service > GSM > air > Tower > iPhone > User > internet > ... > server01).