2

Our site is based on Angular which makes it almost completely JavaScript based, therefore we need to serve static HTML snapshots to the Googlebot in order for it to crawl us. At the moment, we have this implementation in place:

location / {
    # Rewrite rules for the Google and Bing bots and other crawlers.
    # Serves static HTML from /snapshots when a URL is requested
    if ($http_user_agent ~ (Googlebot|bingbot|Screaming)) {
        rewrite ^/(.*)$ /snapshots/$1.html break;
    }
}

This works in most cases, however if Google was to request a URL such as: http://site.com/support/contact/ it will be rewritten into: http://site.com/support/contact/.html which obviously returns a 404. I need to change the config to remove forward slashes at the end of a URL, to make sure this is returned: http://site.com/support/contact.html

How can this be achieved from within the nginx configuration? We're seeing hundreds of 404 errors in Webmaster Tools because of this.

Thanks!

Tyler Alex
  • 21
  • 1

1 Answers1

2

If you change your rewrite to

rewrite ^/(.*)/$ /snapshots/$1.html break;
rewrite ^/(.*)$ /snapshots/$1.html break;

then the first line will only match lines ending in a slash, and $1 will contain the full path minus the leading and trailing slashes. The second will catch the rest of the cases (that are working now).

Flup
  • 7,688
  • 1
  • 31
  • 43