Details
-
Bug
-
Status: Closed
-
Top
-
Resolution: Fixed
-
None
-
None
-
None
-
None
-
2
-
Platform Sprint 135, Platform Sprint 136
Description
Pretty much the problem is described at
https://issues.apache.org/jira/browse/JCR-3905
That issue is again partly a result of https://issues.apache.org/jira/browse/JCR-3162
This issue is also related to REPO-1353 : In REPO-1353 there has been implemented a fix to avoid OOM in case a lot of revisions need to be processed, however, it is still very easy to get an OOM now when starting a cluster node that has no index (yet) but does have already a data revision id! Note that if the cluster node is completely new (aka the REPOSITORY_LOCAL_REVISIONS table does not have the JOURNAL_ID for the new cluster node, then the problem is not present because the value from REPOSITORY_GLOBAL_REVISION is then used)
What happens on a startup of a cluster node without index is the following:
1) org.apache.jackrabbit.core.RepositoryImpl#initStartupWorkspaces --> org.apache.jackrabbit.core.query.lucene.MultiIndex#createInitialIndex creates the initial index by traversing all child nodes below the 'root node'
2) After the createInitialIndex, the checkPendingJournalChanges(context); are processed in SearchIndex:
index.createInitialIndex(context.getItemStateManager(), context.getRootId(), rootPath); checkPendingJournalChanges(context);
checkPendingJournalChanges results in fetching the List<ChangeLogRecord> for the revision of the cluster node : This revision for a cluster node can lag much behind. As a result, the List<ChangeLogRecord> can result in an OOM because it can be very large. If it doesn't result in an OOM, it continues to remove from the new created index every doc for the ChangeLogRecord
3) After this, the cluster node start is invoked (org.apache.jackrabbit.core.cluster.ClusterNode#start) in org.apache.jackrabbit.core.RepositoryImpl#RepositoryImpl : This cluster node start results in a org.apache.jackrabbit.core.journal.AbstractJournal#sync with startup = true, resulting in processing again the same 'checkPendingJournalChanges' that where removed from the index in (2), but now add them again
Sigh.....
Obviously, the fix should be fairly simple: When a new index is created, use the REPOSITORY_GLOBAL_REVISION to start with!
Attachments
Issue Links
- causes
-
REPO-1536 Drop patch for REPO-1408 which is obsolete since REPO-1529
- Closed
- relates to
-
REPO-1353 Out of memory error when adding a new cluster node due to large number of revisions to process
- Closed
-
REPO-1407 Don't read full journal_table when starting with clean index
- Closed
-
REPO-1408 Register new cluster node as up to date with the current global revision
- Closed
-
REPO-1541 Release and deploy hippo jackrabbit patched version 2.10.1-h13
- Closed