Uploaded image for project: '[Read Only] - Hippo Plugins'
  1. [Read Only] - Hippo Plugins
  2. HIPPLUG-1101

Robots.txt plugin does not add slash after excluded facets path

    XMLWordPrintable

Details

    Description

      The Robots.txt plugin automatically excludes sitemap items that show faceted navigation. However, these excluded URLs are not suffixed with a slash. As a result, all sitemap items whose name merely start with the same part will also be excluded, since crawlers use simple substring matching for the disallowed URLs [1].

      Steps to reproduce:
      1. add the Robots.txt plugin to a archetype project
      2. create a root-level sitemap item called "products" whose relativecontentpath property points to a faceted navigation node
      3. request http://localhost:8080/site/robots.txt

      Expected: the output is:

      User-agent: *
      Disallow: /site/products/

      Actual: the output is:

      User-agent: *
      Disallow: /site/products

      (note the missing trailing slash).

      URLs like "/products-archive" and "/productstorage" would now also be disallowed since they also start with "/products".

      [1] http://en.wikipedia.org/wiki/Robots_exclusion_standard#Disadvantages

      Attachments

        Activity

          People

            Unassigned Unassigned
            mdenburger Mathijs den Burger (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: