-2

I have :

RewriteRule ^Article/([^/]*)$ /article.php?newsid=$1 [L]

Which means that the URL must be //example.com/Article/855563 but Google crawls //example.com/article.php?newsid=855563. Is there anything I can do to prevent this? Or to redirect 301 to example.com/Article/855563?

MrWhite
  • 11,643
  • 4
  • 25
  • 40
  • 3
    you might want to format your question properly, additionally i doubt you are the owner of domain.com, please use the properly designed domain for that (hint f.e example.com) for example look here https://www.iana.org/domains/reserved as for your question, it is lacking a lot of information about your system. read https://serverfault.com/help for details. as for your question itself: test it yourself, curl/wget or similiar tools which are able to access your website are available and can be used to check if a rewrite rule is working properly. – Dennis Nolte Mar 14 '19 at 14:13
  • From your question I understand that the google bot is Crawling //domain.com/article.php?newsid=855563 instead of //domain.com/Article/855563. Try - RewriteRule ^Article/([^/]*)$ /Article/$1 [NC,L] – mightyteja Mar 14 '19 at 18:14

1 Answers1

1

You first need to identify why Google is crawling the wrong URLs.

  • Did you change an existing URL structure (that had been indexed by search engines and linked to externally)? In which case you would have needed to have implemented a redirect from the old URL to the new in order to preserve SEO and get search engines to replace the old URLs in the SERPs.

  • Are you inadvertently linking to the wrong URL(s) internally and exposing the "wrong" URLs to search engines? If so, these must be fixed before implementing the redirect.

Otherwise, Google should not have been able to discover the "wrong" URLs.

You should also be implementing a rel="canonical" tag in the head of your pages to indicate the correct canonical URL to search engines.

In order to externally redirect a URL of the form /article.php?newsid=<newsid> to /Article/<newsid> (the canonical URL) - the reverse of the existing internal rewrite - then you can do something like the following near the top of your .htaccess file (before the existing rewrite):

RewriteEngine On

RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^newsid=([^&]*)
RewriteRule ^article\.php$ /Article/%1 [R=302,L]

The first condition that checks against the REDIRECT_STATUS environment variable is necessary in order to prevent a redirect loop - it ensures that the rule is only processed on direct requests from the client and not rewritten requests (your existing directive).

The second condition captures the value of the newsid URL parameter (that occurs at the start of the URL path). This is saved in the %1 backreference (used later in the RewriteRule substitution). Note that this captures anything (as in your rewrite), however, if the newsid value is always numeric then this should really be made more restrictive. eg. ^newsid=(\d+) (1 or more digits only).

Note that this is currently a 302 (temporary) redirect. Only change it to a 301 (permanent) redirect once you have confirmed that it works OK - in order to avoid any caching issues.

MrWhite
  • 11,643
  • 4
  • 25
  • 40