Uploaded image for project: '[Read Only] - Hippo Repository'
  1. [Read Only] - Hippo Repository
  2. REPO-1529

Inefficient and error prone (wrt OOM) start up of cluster nodes that have a revision id but do not yet have a Lucene index

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Top
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Story Points:
      2
    • Processed by team:
      Pulsar
    • Sprint:
      Platform Sprint 135, Platform Sprint 136

      Description

      Pretty much the problem is described at

      https://issues.apache.org/jira/browse/JCR-3905

      That issue is again partly a result of https://issues.apache.org/jira/browse/JCR-3162

      This issue is also related to REPO-1353 : In REPO-1353 there has been implemented a fix to avoid OOM in case a lot of revisions need to be processed, however, it is still very easy to get an OOM now when starting a cluster node that has no index (yet) but does have already a data revision id! Note that if the cluster node is completely new (aka the REPOSITORY_LOCAL_REVISIONS table does not have the JOURNAL_ID for the new cluster node, then the problem is not present because the value from REPOSITORY_GLOBAL_REVISION is then used)

      What happens on a startup of a cluster node without index is the following:

      1) org.apache.jackrabbit.core.RepositoryImpl#initStartupWorkspaces --> org.apache.jackrabbit.core.query.lucene.MultiIndex#createInitialIndex creates the initial index by traversing all child nodes below the 'root node'
      2) After the createInitialIndex, the checkPendingJournalChanges(context); are processed in SearchIndex:

      index.createInitialIndex(context.getItemStateManager(),
                          context.getRootId(), rootPath);
       checkPendingJournalChanges(context);
      

      checkPendingJournalChanges results in fetching the List<ChangeLogRecord> for the revision of the cluster node : This revision for a cluster node can lag much behind. As a result, the List<ChangeLogRecord> can result in an OOM because it can be very large. If it doesn't result in an OOM, it continues to remove from the new created index every doc for the ChangeLogRecord
      3) After this, the cluster node start is invoked (org.apache.jackrabbit.core.cluster.ClusterNode#start) in org.apache.jackrabbit.core.RepositoryImpl#RepositoryImpl : This cluster node start results in a org.apache.jackrabbit.core.journal.AbstractJournal#sync with startup = true, resulting in processing again the same 'checkPendingJournalChanges' that where removed from the index in (2), but now add them again

      Sigh.....

      Obviously, the fix should be fairly simple: When a new index is created, use the REPOSITORY_GLOBAL_REVISION to start with!

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              aschrijvers Ard Schrijvers
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: