Details
-
Bug
-
Status: Closed
-
Normal
-
Resolution: Fixed
-
None
-
None
-
Flagged
-
Quasar
-
Team Quasar sprint 278, Quasar sprint 279
Description
Client Services Investigation
A client ran into performance issues on their site, investigated thoroughly and sees that since 14.3, getContentBean is always called by HstDelegateeFilterBean. They came up with the following suggestion for HstDelegateeFilterBean#doFilter:
Instead of always calling requestContext#getContentBean:
- first check if in channel manager,
- then check possible BR_VERSION_UUID_REQUEST_PARAMETER,
- only then retrieve the content bean to use in HstRequestUtils.getRenderFrozenNodeId
I.e, replace:
final HippoBean primaryContentBean = requestContext.getContentBean(); if (primaryContentBean != null) { final String renderFrozenNodeId = HstRequestUtils.getRenderFrozenNodeId(requestContext, primaryContentBean.getNode(), HstRequestUtils.getBranchIdFromContext(requestContext)); if (renderFrozenNodeId != null) { // the request is for rendering a specific history version of the primary document requestContext.setRenderingHistory(true); } }
with
if (requestContext.isChannelManagerPreviewRequest()) { final String br_version_uuid = requestContext.getServletRequest().getParameter(BR_VERSION_UUID_REQUEST_PARAMETER); if (br_version_uuid != null) { final HippoBean primaryContentBean = requestContext.getContentBean(); if (primaryContentBean != null) { final String renderFrozenNodeId = HstRequestUtils.getRenderFrozenNodeId(requestContext, primaryContentBean.getNode(), HstRequestUtils.getBranchIdFromContext(requestContext)); if (renderFrozenNodeId != null) { // the request is for rendering a specific history version of the primary document requestContext.setRenderingHistory(true); } } } }
Using above adjusted code has improved client's site performance a lot. The current product snippet was created in CMS-13203 (14.3.0).
Product Engineering Investigation
Good findings and really interesting! Although indeed adding the code changes in CMS-13203 can trigger the observed behavior and can cause what is observed, it isn't the real problem. Understanding the real problem is a bit more complex, but relevant, since it might also mean that the customer should validate their production setup if this happens in production as well : Note that it is likely to only happen in the artificial local Apache Benchmark for high concurrency. I can also reproduce this and have implemented a fix which can still use requestContext.getContentBean() in the HstDelegateeFilterBean without performance degradation. However, because for PaaS it can't harm to only fetch this content bean when we are in channel mgr preview, I also included the suggestion to only fetch the content bean in channel mgr preview
Background
The code
requestContext.getContentBean()
is close to instant, certainly when jcr nodes are already in memory and does not lead to any performance degradation. The call requestContext.getContentBean() is done extremely many times for a single request as can be seen by looking at the call hierarchy of that method: many other classes invoke that method. This also means that invoking requestContext.getContentBean() for the RequestInfoCacheKeyFragmentCreator is close to harmless and does not have to be optimized. Removing it there really shouldn't be needed, and if anything, can only be measured with an apache benchmark concurrency level above the level of which we advice to configure Tomcat to allow concurrent requests, see later.
So why did avoiding invoking requestContext.getContentBean() in the HstDelegateeFilterBean fix the symptom of performance degradation? This has to do with session pooling actually, triggered also by the nature of the fairly simple and not really organic traffic like way which Apache Benchmark works. To understand this better, one needs to realize that the following happens:
In the HstDelegateeFilterBean when requestContext.getContentBean() is invoked, a jcr session is borrowed from the session pool to fetch the content bean. Mind you, we did not yet do any processing, so if the visitor should have a user specific jcr session or specific pooled session, this is not yet known at that time: only whether it is a preview or live request, hence a jcr session from either preview or live session pool is fetched. From then on, when invoking requestContext.getSession() this pooled session is returned. However, later on during the HST request processing, the session on the session context is set null again: this is to support that on next getSession invocation, either a user specific session or a session from a specialized session pool is used, or if nothing specific, just a session from the previous used pool. The nullifying of the session on the context happens in the InitializationValve.
With the above, the exact observed behavior with Apache benchmark can now be explained. First of all, we need to realize that invoking requestContext.getContentBean() in the HstDelegateeFilterBean and then later using a jcr session during request processing results in general (if you use standard setup) in 2 sessions being taken from the session pool. First during HstDelegateeFilterBean and later again after the previous session has been set to null.
By default, the session pools have a limit of 100 max sessions. Even when concurrency levels go up above 100, this is not a problem: when concurrency level is at, say, 200, just some requests are waiting until a session becomes available from the pool, and once available, will be processed. However, this is where things can break down now: if the concurrency level is above the max session pool size, it can happen (and does at high enough concurrency) that ALL requests are processed concurrently and have taking a jcr session from the pool during the HstDelegateeFilterBean requestContext.getContentBean(), but then the session pool is exhausted, and during the actual further request processing, none of the requests can proceed since they want the second session but the session pool is exhausted. This is the exact issue which happens and causes the Apache Benchmark observed behavior. The problem is not in the requestContext.getContentBean() but in the session pool being exhausted.
Harmless in production setup
Although I am certainly fixing this issue by making sure the session pool exhaustion is avoided (and also do the extra harmless check to only fetch the content bean in HstDelegateeFilterBean if in preview), it is unlikely that this problem actually occurs in production. The reason is quite simple: the problem only occurs if there are quite a bit more concurrent requests being handled than the size of the session pool to make sure that the second session cannot be obtained any more for all requests, making them all unable to proceed. However are advise and found best value for maxThreads in Tomcat (maximum allowed concurrent number of requests being handled) is 25 : this turned out the rule of thumb sweet number for maximum throughput where parallel processing of requests was balanced best against cpu thread switching. Hence for our advised setups, the Apache Benchmark observed behavior cannot happen. Unless NS has the default maxThreads of 200 for Tomcat still: that could be problematic but then still the concurrency of real user requests on a single cluster node would have to be extremely high to trigger the actual issue (higher than 100 since you really need to hit a state where pretty much most requests are blocked from proceeding, so many requests having passed the HstDelegateeFilterBean but not passed obtaining the second jcr session).
The fix for the cause instead of the symptom
When requestContext.setSession(session) is invoked which sets a new jcr session on the requestContext, make sure to logout any previously set jcr session on the requestContext : as a result, the previously set jcr session will during logout be directly returned to the jcr session pool instead of at the end of the request handling during
org.hippoecm.hst.core.jcr.pool.PooledSessionResourceManagement#disposeAllResources
This avoids a request to take 2 sessions from the same pool, and as a result, avoid the situation in which potentially the session pool is exhausted by so many requests only having taken the first jcr session and all being blocked from getting the second one: this blocking won't happen any more since the first session will be returned to the pool way earlier. When the initalizationValve sets the JCR Session to null, then also a potential jcr session instance variable for the request context will be logged out
As a harmless extra improvement, I'll include the check to only fetch the content bean in the HstDelegateeFilterBean for the channel mgr preview
Validation
With the above fix for avoiding the session pool exhaustion (and without changes to the HstDelegateeFilterBean), the problem which was easy to reproduce did not surface any more and the session pool exhaustion was fixed.