Uploaded image for project: '[Read Only] - Hippo Repository'
  1. [Read Only] - Hippo Repository
  2. REPO-1630

Configurability for stop words of StandardHippoAnalyzer

    XMLWordPrintable

Details

    • Improvement
    • Status: Closed
    • Normal
    • Resolution: Fixed
    • None
    • 5.0.0
    • None
    • None

    Description

      Sometimes we want to remove some stop words like 'co' which is meant 'Colombia' in a content domain for instance ('co' is hard-coded in StandardHippoAnalyzer#CZECH_STOP_WORDS at the moment) and many other 2 character abbreviations (e.g, 'ar', 'cl', 'mx', etc.).
      We also want to avoid any upgrade issues in the future by implementing totally new analyzer by ourselves only for this small changes as well, but somehow we want to be able to easily add configurations for included or excluded stop words instead.
      The problem is, StandardHippoAnalyzer is final class, so you cannot easily extend it for the small changes at all.

      Attachments

        Activity

          People

            Unassigned Unassigned
            wko Woonsan Ko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: