Uploaded image for project: '[Read Only] - Hippo Site Toolkit 2'
  1. [Read Only] - Hippo Site Toolkit 2
  2. HSTTWO-4502

SimpleHtmlExtractor uses substring() based on an index from an uppercased string

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Normal
    • Resolution: Fixed
    • None
    • 13.1.0
    • None
    • Flagged

    Description

      Calls to String.toUpperCase() are sensitive regarding typographic ligatures [1], meaning that the length of the string may differ, String.length versus String.toUpperCase().length. E.g. the uppercase of single character ß is two characters SS

      SimpleHtmlExtractor#getInnerHtmlSimply does line.substring(0, offset) based on the offset gotten from line.toUpperCase().indexOf(endTag); which is not correct and can lead to StringIndexOutOfBoundsException.

      Reproduction: use the following as hippostd:content property of an HTML field and have it rendered by the <hst:html /> tag.

      <html><body>
      <p>8 ligature characters: ff ß ff fl ß ff fl ß</p></body>
      </html>

      This results in StringIndexOutOfBoundsException since index of </BODY> of the uppercased second line is applied to the second line.

      A deep corner case since <html><body> is no longer used except for old content.

      [1] https://en.wikipedia.org/wiki/Typographic_ligature

      Attachments

        Activity

          People

            Unassigned Unassigned
            jfloor Jasper Floor
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: