Details
Description
In Tika 1.x versions the default error handling is set to IGNORE level. When an exception is handled, it's ignored unless otherwise specified in the code.
With Tika 2.x this has been changed to THROW level that results throwing the previously swallowed exceptions.
This change has been done in TIKA-3304 but unfortunately neither documented in the release notes nor in the migration guide of Tika 2.0.0.
tika setup in hippo-repository-tika module (/community/repository/tika/src/main/resources/org/onehippo/repository/tika/tika-config.xml) uses a set of Parser classes and excludes the rest. Due to this reason many transitive tika parser dependencies that hippo-repository-tika does not use are excluded in brx configuration(/brxm/community/project/pom.xml).
Excluding the dependencies may cause an error if one of the excluded parsers is used (eg: MatParser for matlab operations). This never happens in hippo-repository-tika since it never tries to initialize those excluded classes.
hippo-jackrabbit project has its own tika setup and instead of using a custom set of parsers, it initializes the DefaultParser. DefaultParser initializes all implemented parser classes (including MatParser). Some of the parsers cannot be initialized during the initialization since their libraries are excluded from brx. Tika 1.x versions handle this correctly by NOT throwing an exception if the initialization of a parser fails. But with Tika 2.x this has been changed and it throws an exception during the initialization.
When initialization fails, tika stops the initialization but jackrabbit handles this correctly by using the default tika config instead. Eventually the exception does not break any feature, only logs an error when brx server is started.
To fix this problem, error handling should be set back to IGNORE in hippo-jackrabbit.
Attachments
Issue Links
- is a result of
-
CMS-15592 Upgrade to jackrabbit 2.21.19
- Closed