XML

Word

Printable

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 14.3.0, 14.3.1, 14.3.2
Fix Version/s: 14.2.3, 14.3.3
Component/s: None
Labels:
- kpi-xm-new-feature

Processed by team:
Pulsar
Sprint:
Pulsar 245 - Eng OKRs

Description

Release v14.3.0+ introduces a new hippo:identiable mixin and a hippo:identifier property with a dynamic default value, see: ~~CMS-13587~~.
To support this custom extension on JCR, an updated jackrabbit version, (hippo-) jackrabbit 2.18.5-h3 is needed, which is bundled with release v14.3.0

However, during an upgrade in a running cluster, other cluster instances may receive a failure processing the related node type changes, while they are still on an older version.

This overlooked problem is expected to not be permanent (should be resolved after all cluster instances have been upgraded), but the sites running on the not-yet-upgraded cluster instances may no longer be able to serve requests.

We're looking into a possible remedy for this problem, but at this moment it is strongly advised to not yet upgrade to v14.3.2 (or earlier 14.3.x) until we have resolved this.

2020-10-13: Root cause analysis and resolution

The problem turned out to be a bug (~~CMS-14076~~) which caused a running old instance to persist its failed-to-update (~~CMS-13707~~) node type definitions back into the database.
This effectively 'rolled back' (some of) the intended node type changes needed for v14.3.x, which then resulted in the site failing rendering, even after a completed deployment and restart of the cluster!

This problem is only triggered during a rolling upgrade deployment, when still running old instances receive the cluster synchronization event with the node type changes, and thereafter processing them incorrectly as described above.

When using a stop/start cluster deployment, all instances already will have the node type changes, and therefore also no problem processing them.

With only the ~~CMS-14076~~ bug fixed, a rolling upgrade would still cause an intermittent error on the running old instances, but as soon as those instances are also upgraded, these would get automatically resolved and be stable again.
However, because we already need to back port the CMS-140706 fix, we also back ported the changed for ~~CMS-13707~~, including using the newer jackrabbit v2.18.5-h3, and thereby even those intermittent errors are now resolved.

To support a rolling upgrade to v14.3.x (or later) we therefore will provide an intermediate patch release v14.2.3 with these two fixes.

Also, the CMS-14706 fix, as well as some other important fixes and improvements, will be bundled in a new v14.3.3 release.

The recommended upgrade path from v14.x to v14.3.x or later therefore is:

Upgrade to the latest v14.3.3 or later
Either:
- use a cluster stop/start deploy, possible together with using blue/green deployment: this will not (and did not) have this issue at all
or:
1. first do an intermediate rolling deploy to v14.2.3
2. wait until all instances have been upgraded
3. immediately thereafter do a rolling deploy to v14.3.3

Note: since v14.2.x releases are no longer maintained/updated, the intermediate v14.2.3 patch release is only provided to enable a rolling upgrade to v14.3.x or later, and is not supported for actual production usage.

Attachments

Issue Links

waits for

CMS-14076 HippoNodeTypeRegistry.externalRegistered method doesn't properly suppress persisting node type changes from a cluster event

Closed

CMS-13707 hippo:identifiable mixin is not "jackrabbit trivial", so cannot be applied during upgrade

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Ate Douma (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 12/Oct/20 2:07 PM

Updated:: 15/Oct/20 1:20 PM

Resolved:: 15/Oct/20 1:20 PM