Static password policies are chosen for two major reasons: usability and the body of research demonstrating acceptable effectiveness. Most of my answer comes from the excellent research paper on an advanced password-strength meter, Telepathwords.
First, to summarize some of the research used to back up current password policies:
Password-composition rules date back at least to 1979, when Morris and
Thompson reported on the predictability of the passwords used by users
on their Unix systems; they proposed that passwords longer than four
characters, or purely alphabetic passwords longer than five
characters, will be “very safe indeed” [19] [However] Bonneau
analyzed nearly 70 million passwords in 2012, 33 years later, to
measure the impact of a six-character minimum requirement compared
with no requirement [2]. He found that it made almost no difference in
security...
This includes the work of Komanduri et al. [13] and Kelley
et al. [12], who used similar study designs to perform comparative
analyses of password composition rules. These prior studies found that
increasing length requirements in passwords generally led to more
usable passwords that were also less likely to be identified as weak
by their guessing algorithm [13 12]. Most recently, Shay et al.
studied password-composition policies requiring longer passwords,
finding the best performance came from mixing a 12-character minimum
with a requirement of three character sets [25].
Usability is a huge reason why more complex criteria like password entropy aren't used more frequently:
In a study of the distribution of password policies, Florencio and
Herley found that usability imperatives appeared to play at least as
large a role as security among the 75 websites examined [8]. ...
Ur et al. also studied the effect of password strength meters on
password-creation. They found that when users became frustrated and
lost confidence in the meter, more weak passwords appeared. [28] ...
While [Dropbox's] zxcvbn provides a much-needed improvement in the
credibility of its strength estimates when compared to approaches
relying solely on composition rules, this credibility is unlikely to
be observed by users. In fact, its perceived credibility may suffer if
users, who have been told that adding characters increases password
strength, see scores decrease when certain characters are added. For
example, when typing iatemylunch, the strength estimate decreases from
the second-best score (3) to the worst score (1) when the final
character is added. Even if users find zxcvbn’s strength estimates
credible, they are unlikely to understand the underlying
entropy-estimation mechanism and thus be unsure how to improve their
scores. [30]
Finally, for sake of completeness, we have to realize that defining entropy in this example is very difficult (but far from impossible). There are lots of different assumptions we can make about the sophistication of a password cracker's guessing algorithm or dictionary, and these all lead to differing answers on the entropy of passwords like "Tr0ub4dor&3" or "correct horse battery staple". The most sophisticated password entropy measures are based off dictionaries of millions of passwords and advanced study of password patterns, and this level of sophistication is difficult to achieve for many administrators (and hackers).