Uploaded image for project: '[Read Only] - Hippo Repository'
  1. [Read Only] - Hippo Repository
  2. REPO-661

Transitive dependency boilerpipe breaks nekohtml

    XMLWordPrintable

Details

    Description

      The boilerpipe code includes a few forked nekohtml classes. These break the html diff functionality, that depends on the original nekohtml implementation.

      The boilerpipe code is only used for indexing (via tika-parsers), but nekohtml is also needed for cleaning and diffing. So it seems safer to exclude boilerpipe from the dependencies.

      Testing revealed that excluding boilerplace does not actually degrade the index. HTML tags are still not indexed.

      Attachments

        Issue Links

          Activity

            People

              jsheriff Junaidh Kadhar Sheriff
              fvlankvelt Frank van Lankvelt (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: