Details
-
Bug
-
Status: Closed
-
Normal
-
Resolution: Fixed
-
robotstxt-1.09.00, CMS-10.0-FCS
-
None
-
Tiger Sprint 111
Description
The Robots.txt plugin automatically excludes sitemap items that show faceted navigation. However, these excluded URLs are not suffixed with a slash. As a result, all sitemap items whose name merely start with the same part will also be excluded, since crawlers use simple substring matching for the disallowed URLs [1].
Steps to reproduce:
1. add the Robots.txt plugin to a archetype project
2. create a root-level sitemap item called "products" whose relativecontentpath property points to a faceted navigation node
3. request http://localhost:8080/site/robots.txt
Expected: the output is:
User-agent: *
Disallow: /site/products/
Actual: the output is:
User-agent: *
Disallow: /site/products
(note the missing trailing slash).
URLs like "/products-archive" and "/productstorage" would now also be disallowed since they also start with "/products".
[1] http://en.wikipedia.org/wiki/Robots_exclusion_standard#Disadvantages