Uploaded image for project: 'Hippo CMS'
  1. Hippo CMS
  2. CMS-15623

Tika 2.4.1 throws an exception during the initizalition of jackrabbit

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Normal
    • Resolution: Fixed
    • 13.4.27, 14.7.18, 15.4.0, saas
    • 13.4.28, 14.7.19, 15.5.0, 16.0.0, saas
    • jackrabbit
    • None
    • Pulsar
    • Undetermined

    Description

      In Tika 1.x versions the default error handling is set to IGNORE level. When an exception is handled, it's ignored unless otherwise specified in the code.

      With Tika 2.x this has been changed to THROW level that results throwing the previously swallowed exceptions.

      This change has been done in TIKA-3304 but unfortunately neither documented in the release notes nor in the migration guide of Tika 2.0.0.

      tika setup in hippo-repository-tika module (/community/repository/tika/src/main/resources/org/onehippo/repository/tika/tika-config.xml) uses a set of Parser classes and excludes the rest. Due to this reason many transitive tika parser dependencies that hippo-repository-tika does not use are excluded  in brx configuration(/brxm/community/project/pom.xml).

      Excluding the dependencies may cause an error if one of the excluded parsers is used (eg: MatParser for matlab operations). This never happens in hippo-repository-tika since it never tries to initialize those excluded classes.

      hippo-jackrabbit project has its own tika setup and instead of using a custom set of parsers, it initializes the DefaultParser. DefaultParser initializes all implemented parser classes (including MatParser). Some of the parsers cannot be initialized during the initialization since their libraries are excluded from brx. Tika 1.x versions handle this correctly by NOT throwing an exception if the initialization of a parser fails. But with Tika 2.x this has been changed and it throws an exception during the initialization.

      When initialization fails, tika stops the initialization but jackrabbit handles this correctly by using the default tika config instead. Eventually the exception does not break any feature, only logs an error when brx server is started.

      To fix this problem, error handling should be set back to IGNORE in hippo-jackrabbit.

      Attachments

        Issue Links

          Activity

            People

              firat Fırat Küçük (Inactive)
              ekarakus Erdem Karakus
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: