[CMS-6882] remove diacritics (iso latin1 chars) from input query in cms search - Issues

XML

Word

Printable

Details

Type: Bug
Status: Closed
Priority: Normal
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.22.15, 2.24.01
Component/s: None
Labels:
None

Description

wildcard pre/postfixing combined with stemming is not always possible to work correctly in Lucene. Currently the problem that when for example indexing the word 'Plattenbandförderer' (mind the ö accent) and then quering in the CMS you get:

'Plattenbandförd' DOES have a hit (cms does auto postfix wildcards)
'Plattenbandfö' DOES NOT have a hit wildcard postfix fails with accents)
'Plattenbandförde' DOES NOT have a hit wildcard postfix fails with accents)
Plattenbandförderer DOES have a hit

this is because we use an analyzer by default that removes diacritics. Hence, it is better to remove diacritics from the input query before doing a search with it

Also see https://issues.apache.org/jira/browse/JCR-3511

Attachments

Issue Links

relates to

HSTTWO-2456 remove diacritics in SearchInputParsingUtils as well

Closed

Activity

People

Assignee:: Simon Voortman (Inactive)

Reporter:: Ard Schrijvers

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 05/Feb/13 6:44 PM

Updated:: 11/Feb/13 12:36 PM

Resolved:: 08/Feb/13 11:11 AM