Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
0.25
-
Quasar
-
Puma Sprint 228-SaaS BrX v0.2, Puma Sprint 229, Puma Sprint 230, Puma Sprint 231, Puma Sprint 233
Description
We've noticed an issue where documents appear twice in the lucene index.
Info so far:
- seems unrelated to lucene export
- appeared on a cluster node which created a fresh index
In the /repository server if you do a query the document shows up twice. If you go to the parent node it doesn't show twice, so the issue only seems to be in the lucene index.
Looking at the logs, the following happened:
Node A was creating a new index starting at time X
On Node B an editor created a new document at time X + 1, and publishes this document
Node A still indexing and indexed the document at time X + 2
Node A encounters the document at time X+3 while still doing the first part of indexing (crawling repository).
Node A is finished with first part of indexing, and checks the repository_journal for all recently changed/added records
Node A indexes the document 'again' and causes a duplicate while doing so
According to Ard and Ate there is logic in place that should prevent this from happening, but we are probably hitting a corner case / race condition here.