Details
-
Improvement
-
Status: Closed
-
Normal
-
Resolution: Fixed
-
None
-
None
-
None
Description
Sometimes we want to remove some stop words like 'co' which is meant 'Colombia' in a content domain for instance ('co' is hard-coded in StandardHippoAnalyzer#CZECH_STOP_WORDS at the moment) and many other 2 character abbreviations (e.g, 'ar', 'cl', 'mx', etc.).
We also want to avoid any upgrade issues in the future by implementing totally new analyzer by ourselves only for this small changes as well, but somehow we want to be able to easily add configurations for included or excluded stop words instead.
The problem is, StandardHippoAnalyzer is final class, so you cannot easily extend it for the small changes at all.