Details
-
Bug
-
Status: Closed
-
Normal
-
Resolution: Fixed
-
None
-
None
Description
ServicingNodeIndexer is responsible for reading hippo:text to index a document.
However, it simply uses
IOUtils.toString(hippoTextBinaryValue.internalValue.getStream());
, which doesn't care the encoding of the stored bytes. As a result it uses ISO-8859-1 encoding by default, even though UTF-8 is mostly used when saving the extracted pdf text data.
So, please use
IOUtils.toString(hippoTextBinaryValue.internalValue.getStream(), "UTF-8");
instead.