-
Type:
Bug
-
Status: Closed
-
Priority:
Normal
-
Resolution: Fixed
-
Affects Version/s: robotstxt-1.09.00, CMS-10.0-FCS
-
Fix Version/s: robotstxt-2.1.0, Unreleased - EOL
-
Component/s: None
-
Labels:
-
Similar issues:
-
Epic Link:
-
Sprint:Tiger Sprint 111
The Robots.txt plugin automatically excludes sitemap items that show faceted navigation. However, these excluded URLs are not suffixed with a slash. As a result, all sitemap items whose name merely start with the same part will also be excluded, since crawlers use simple substring matching for the disallowed URLs [1].
Steps to reproduce:
1. add the Robots.txt plugin to a archetype project
2. create a root-level sitemap item called "products" whose relativecontentpath property points to a faceted navigation node
3. request http://localhost:8080/site/robots.txt
Expected: the output is:
User-agent: *
Disallow: /site/products/
Actual: the output is:
User-agent: *
Disallow: /site/products
(note the missing trailing slash).
URLs like "/products-archive" and "/productstorage" would now also be disallowed since they also start with "/products".
[1] http://en.wikipedia.org/wiki/Robots_exclusion_standard#Disadvantages