Yes, you are correct, and yes you are missing something.
Sure, you could easily increase entropy manifold by using a larger word list; you could also achieve that by using 8-word passphrases, or just using raw entropy bytes directly without the words.
The entire point of that xkcd is balance.
Balance between "enough entropy" and "easy enough to remember".
You could argue if 44 bits is enough entropy, or if you need more. But if so you must take into account the non-negligible cost of reduced memorability. It is always a tradeoff.
As I stated in my answer to the canonical XKCD 936 question:
AviD's Rule of Usability:
Security at the expense of usability comes at the expense of security.
So yes, go ahead and use the full language as your dictionary - but you are paying a price, which many would consider to be a bad tradeoff.
As Randall (xkcd's author) explains here (and in agreement with many studies on the subject), he was basing it not just on all possible words in a dictionary, but words that are EASY for a typical person to remember (and type, I will add).
Another option, more aggressive than xkcd but not ridiculously difficult as the full language, is something like Diceware's dictionary - larger than 11 bits per word, but not much more (just under 13 bits).
So 4 words of that would be ~51.5 bits. Or, take another simple word and get almost 65 bits entropy.
Yes, that improves it a bit, without costing much usability, since they still stick to short, common words. (Personally there are still a few "filler" words, like numbers, that I would prefer to do without).
As always, it is about balance.