Uploaded image for project: '[Read Only] - Hippo Repository'
  1. [Read Only] - Hippo Repository
  2. REPO-717

Inconsistent encoding usages when saving/reading hippo:text

    XMLWordPrintable

Details

    Description

      ServicingNodeIndexer is responsible for reading hippo:text to index a document.
      However, it simply uses

      IOUtils.toString(hippoTextBinaryValue.internalValue.getStream());

      , which doesn't care the encoding of the stored bytes. As a result it uses ISO-8859-1 encoding by default, even though UTF-8 is mostly used when saving the extracted pdf text data.

      So, please use

      IOUtils.toString(hippoTextBinaryValue.internalValue.getStream(), "UTF-8");

      instead.

      Attachments

        Activity

          People

            jsheriff Junaidh Kadhar Sheriff
            wko Woonsan Ko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: