Since others have explained the password entropy issue in detail, I'll address your other concern:
But I also remember a windows program that used to crack windows passwords by finding dictionary words within the larger password.
It's true that - assuming an attacker have access to a computer you've used - some programs can scan the persistent storage searching for good password candidates. This is possible not only in case of written down passwords, but there's also a chance that a password once present in the computer's memory (just after being typed, for instance) ends up in the virtual memory, swap files, core dumps, etc.
The only program of this type I know of, AccessData's Forensic Toolkit, scans the hard drive for "every printable character string", as described in this Bruce Schneier post. In principle, it does not make passwords containing dictionary words weaker (since all passwords will consist of "printable characters"), however the way the program sorts this string set before feeding it to a password guesser might have an impact on how likely it will be found. Quoting the post:
When attacking programs with deliberately slow ramp-ups, it's important to make every guess count. A simple six-character lowercase exhaustive character attack, "aaaaaa" through "zzzzzz," has more than 308 million combinations. And it's generally unproductive, because the program spends most of its time testing improbable passwords like "pqzrwj."
In other words, having a huge set of strings won't necessarily help you even if one of them is the password you're looking for, since you have no way to recognize it as a password before testing it (against a hash or an online service). But if you sort this set according to some heuristics dictating "how likely this string is a password", then including dictionary words to it might move it closer to the top of the priority list, increasing the likelyhood it will be found in a timely manner.
That's why I advocate using the first letters of each word in a phrase, instead of using the words themselves. Sure, you'd need a longer phrase to achieve a similar level of entropy, but you'll end up producing a password that "looks like" garbage, instead of one that looks promising. But YMMV.