Uploaded image for project: 'Hippo CMS'
  1. Hippo CMS
  2. CMS-4951

Indexing the JCR node name breaks the ability to generate valid excerpts with highlighting from the content.

    XMLWordPrintable

Details

    • Improvement
    • Status: Closed
    • Low
    • Resolution: Fixed
    • 2.18.01
    • 2.16.06, 2.18.03, 2.19.00
    • None
    • None

    Description

      For my customer I've been investigating the possibility of creating a relevant excerpt from a search query. By default this is supported by JR and works quite nicely. However because the Hippo repository indexes node names, the offset of terms found in the lucene fulltext field is incorrect.
      This is caused by adding the node name as full text fields, but not storing them in the index. For the excerpts to work correctly the content of such a field needs to be stored, because the information inside the fields are retrieved when the excerpt is generated.

      To fix this issue, I've created a small patch that will allow developers to disable this Hippo added behavior. All that is needed is a new piece of configuration inside the indexing_configuration.xml file. I've left the default to true for backwards compatibility. I can imagine that is is also something that might only be enabled for the site.

      <!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
      <configuration>

      <excludefromnodescope>
      <nodetype>hippo:paths</nodetype>
      <nodetype>hippo:docbase</nodetype>
      </excludefromnodescope>

      <indexnodename>false</indexnodename>

      </configuration>

      Attachments

        Issue Links

          Activity

            People

              jsheriff Junaidh Kadhar Sheriff
              jreijn Jeroen Reijn (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: