Uploaded image for project: '[Read Only] - Hippo Site Toolkit 2'
  1. [Read Only] - Hippo Site Toolkit 2
  2. HSTTWO-4502

SimpleHtmlExtractor uses substring() based on an index from an uppercased string

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 13.1.0
    • Component/s: None
    • Similar issues:
    • Flagged:
      Flagged

      Description

      Calls to String.toUpperCase() are sensitive regarding typographic ligatures [1], meaning that the length of the string may differ, String.length versus String.toUpperCase().length. E.g. the uppercase of single character ß is two characters SS

      SimpleHtmlExtractor#getInnerHtmlSimply does line.substring(0, offset) based on the offset gotten from line.toUpperCase().indexOf(endTag); which is not correct and can lead to StringIndexOutOfBoundsException.

      Reproduction: use the following as hippostd:content property of an HTML field and have it rendered by the <hst:html /> tag.

      <html><body>
      <p>8 ligature characters: ff ß ff fl ß ff fl ß</p></body>
      </html>

      This results in StringIndexOutOfBoundsException since index of </BODY> of the uppercased second line is applied to the second line.

      A deep corner case since <html><body> is no longer used except for old content.

      [1] https://en.wikipedia.org/wiki/Typographic_ligature

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jfloor Jasper Floor
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: