Uploaded image for project: 'Hippo CMS'
  1. Hippo CMS
  2. CMS-6882

remove diacritics (iso latin1 chars) from input query in cms search

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Normal
    • Resolution: Fixed
    • None
    • 2.22.15, 2.24.01
    • None
    • None

    Description

      wildcard pre/postfixing combined with stemming is not always possible to work correctly in Lucene. Currently the problem that when for example indexing the word 'Plattenbandförderer' (mind the ö accent) and then quering in the CMS you get:

      'Plattenbandförd' DOES have a hit (cms does auto postfix wildcards)
      'Plattenbandfö' DOES NOT have a hit wildcard postfix fails with accents)
      'Plattenbandförde' DOES NOT have a hit wildcard postfix fails with accents)
      Plattenbandförderer DOES have a hit

      this is because we use an analyzer by default that removes diacritics. Hence, it is better to remove diacritics from the input query before doing a search with it

      Also see https://issues.apache.org/jira/browse/JCR-3511

      Attachments

        Issue Links

          Activity

            People

              svoortman Simon Voortman (Inactive)
              aschrijvers Ard Schrijvers
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: