Uploaded image for project: 'Hippo Plugins'
  1. Hippo Plugins
  2. HIPPLUG-1101

Robots.txt plugin does not add slash after excluded facets path

    XMLWordPrintable

    Details

    • Similar issues:
    • Sprint:
      Tiger Sprint 111

      Description

      The Robots.txt plugin automatically excludes sitemap items that show faceted navigation. However, these excluded URLs are not suffixed with a slash. As a result, all sitemap items whose name merely start with the same part will also be excluded, since crawlers use simple substring matching for the disallowed URLs [1].

      Steps to reproduce:
      1. add the Robots.txt plugin to a archetype project
      2. create a root-level sitemap item called "products" whose relativecontentpath property points to a faceted navigation node
      3. request http://localhost:8080/site/robots.txt

      Expected: the output is:

      User-agent: *
      Disallow: /site/products/

      Actual: the output is:

      User-agent: *
      Disallow: /site/products

      (note the missing trailing slash).

      URLs like "/products-archive" and "/productstorage" would now also be disallowed since they also start with "/products".

      [1] http://en.wikipedia.org/wiki/Robots_exclusion_standard#Disadvantages

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mdenburger Mathijs den Burger
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: