3

I have a website that I took over as webmaster. It was in WordPress and was hacked and had thousands of SPAM pages injected to the website. These pages were indexed by Google and in the end had the message "This site may be hacked" against search results.

I have migrated the site to a different CMS and made sure it is clean, added it to my Webmaster Tools and the new pages have been indexed - the problem is Google has just added the new pages to the old SPAM pages. The website is small - not more than 100 pages, but on searching site:example.org I get "About 368,000 results".

Google Webmaster Tools sends the message : Googlebot identified a significant increase in the number of URLs on http://example.org/ that return a 404 (not found) error. This can be a sign of an outage or misconfiguration, which would be a bad user experience. This will result in Google dropping those URLs from the search results. If these URLs don't exist at all, no action is necessary.

It has been over a month, but these thousands of 404 errors are still being reported by Google Webmaster Tools.

I have tried searching the forums and so far my only option is to remove the site completely from Google's index and then adding it afresh. I don't want this blackout because we rely a lot on search traffic to find the site.

Any ideas on how to remove these 404 not found pages from Google Index - all 368,000 of them.

2 Answers2

2

Did you tried to send a site map to Google.

Ask Google to recrawl your URLs If you’ve recently added or made changes to a page on your site, you can ask Google to (re)index it using the Fetch as Google tool.

The "Request indexing" feature on Fetch as Google is a convenience method for easily requesting indexing for a few URLs; if you have a large number of URLs to submit, it is easier to submit a sitemap. instead. Both methods are about the same in terms of response times.

From: https://support.google.com/webmasters/answer/6065812?hl=en

If that does not work if those URL share a similar path please try to add those URL in the robot.txt in a disallow rule.

User-agent: *
Disallow: /common_path_indexed/
yagmoth555
  • 16,300
  • 4
  • 26
  • 48
  • 1
    Hi @yagmoth555 . I have tried adding another sitemap and will give my feedback in a couple of days. – Denver Chiwakira Oct 09 '17 at 05:15
  • Hi @yagmoth555 - After adding the site map, we are down from 368,000 to 5,170 results (mostly the spam results). We seem to be getting somewhere. – Denver Chiwakira Oct 10 '17 at 19:36
  • @DenverChiwakira cool, good news. for the remaining if they fit a path you can use in my second tip, to disallow those (or you allow only your sitemap url. and disallow all other) – yagmoth555 Oct 10 '17 at 19:44
1

You can try adding 301 redirects for those pages so that they point to your front page. This might make it faster for Google to expire the hacked pages.

Tero Kilkanen
  • 34,499
  • 3
  • 38
  • 58
  • Thanks @tero-kilkanen - however, just to give you an example of the urls that are indexed, here are just 3 of the 368,000 : __.../announcer-tsunematsu/profile/__ or __.../vqgawgrkhq_319-htqq_27297.xhtml__ and another __.../bco-32interpret167218/9bwx/lv9/38524/__ I thought it was going to be tedious to go through all the url structures and create `301` redirects for them. – Denver Chiwakira Oct 09 '17 at 05:09