Description
Changing binaries from PDF to a different type (like ZIP), does not change/remove the original extracted hippo:text value.
When you update a PDF by another PDF, the hippo:text property will be replaced. Even when an error occurs while parsing the PDF, the hippo:text property will be emptied (see also CMS-6495).
However when you use another file like a ZIP, the hippo:text property is not removed. This results in a wrong value (based on the original document) in the hippo:text property. (When you create a ZIP document from scratch, there is also no hippo:text property available.)
This can also be reproduced on the online demo:
- Create new PDF *
- go to Assets
- add file
- upload a PDF
The result from the console will be something like:
hippo:text Binary download (12 bytes) | Upload binary
jcr:data Binary download (6 KB) | Upload binary
jcr:lastModified Date 2015-09-30T17:08:35.995+02:00
jcr:mimeType String application/pdf
- Edit PDF and upload ZIP *
- edit the file
- upload zip
- save en close
The result from the console will be something like:
hippo:filename String testpdf.pdf.zip
hippo:text Binary download (12 bytes) | Upload binary
jcr:data Binary download (6 KB) | Upload binary
jcr:lastModified Date 2015-09-30T17:13:03.845+02:00
jcr:mimeType String application/zip
Notice that the hippo:text property is the same.
- Create ZIP (as comparison) *
- go to Assets
- add file
- upload a ZIP
The result from the console will be something like:
jcr:data Binary download (6 KB) | Upload binary
jcr:lastModified Date 2015-09-30T17:15:24.494+02:00
jcr:mimeType String application/zip
Notice there is no hippo:text property.
Attachments
Issue Links
- is backported by
-
CMS-9624 [10.0] The original extracted hippo:text remains when changing a binary (PDF) to another type (like ZIP)
- Closed