Well, I'm stumped. Several months back, we launched a totally new website, replacing a legacy system that was pretty messy. Part of the mess was many, many pages created that really didn't need to be there or be crawled by Google. There was a lot of duplicate and shell data that resulted in extra URLs crawled and indexed by Google. With the site transition, we of course broke some of these URLs, but it didn't seem to be too much concern. I blocked ones I knew should be blocked in robots.txt, 301 redirected as much duplicate data as I could (this is still an ongoing process), and simply returned 404 for any others that should have never really been there.
For the last 3 months, I've been monitoring the 404s Google reports in Webmaster, and while we had a few thousand due to the gradual removal of shell and duplicate data, I wasn't too concerned. I've been generating updated sitemaps for Google multiple times a week with any updated URLs. Then, about a week ago, Webmaster started to report a massive increase in 404s, somewhere around 30,000 new 404s a day (making it impossible for me to keep up). My updated sitemaps don't even have 30,000 URLs in them. The 404s are indeed for incorrect URLs, and for URLs that haven't existed for months and haven't been in a sitemap for as long. It's like Google decided to randomly use a sitemap from many months ago, as I have no other idea why it'd all of a sudden crawl a URL for data that hasn't existed for many months and is definitely not linked anywhere (although Webmaster claims it's linked in the sitemap....which it is not).
Does anyone have an explanation for this? I even got an auto message from Webmaster Tools this morning reporting that it has seen a significant increase in 404s from my site. I'm not quite sure how concerned I should really be about this...