Uploaded image for project: 'Hippo CMS'
  1. Hippo CMS
  2. CMS-12771

Sometimes a document appears twice in the search results

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 13.4.3, 14.2.0
    • repository
    • None
    • 0.25
    • Quasar
    • Puma Sprint 228-SaaS BrX v0.2, Puma Sprint 229, Puma Sprint 230, Puma Sprint 231, Puma Sprint 233

    Description

      We've noticed an issue where documents appear twice in the lucene index.
      Info so far:

      • seems unrelated to lucene export
      • appeared on a cluster node which created a fresh index
        In the /repository server if you do a query the document shows up twice. If you go to the parent node it doesn't show twice, so the issue only seems to be in the lucene index.

      Looking at the logs, the following happened:
      Node A was creating a new index starting at time X
      On Node B an editor created a new document at time X + 1, and publishes this document
      Node A still indexing and indexed the document at time X + 2
      Node A encounters the document at time X+3 while still doing the first part of indexing (crawling repository).
      Node A is finished with first part of indexing, and checks the repository_journal for all recently changed/added records
      Node A indexes the document 'again' and causes a duplicate while doing so

      According to Ard and Ate there is logic in place that should prevent this from happening, but we are probably hitting a corner case / race condition here.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nout Niels Out
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: