Description
When a huge (tens of Mb) contentdefinition yaml file needs to be parsed, the current implementation using the snakeyaml Composer() can take a very long time.
Testing shows that on a fast laptop, a 22Mb hcm-content yaml file can take up to ~3.5 seconds to 'just' load.
During startup/bootstrap of the repository, this isn't necessarily a big problem, but for AutoExport, which reloads autoexport modules after writing changes, this actually can become a blocking problem. Especially if a project has several of such huge yaml files.
Testing on such a project shows AutoExport taking ~11 seconds to complete the writing (and then reloading) of minimal changes.
Note that the writing part itself is pretty fast, the problem really is the reloading of such huge contentdefinition files.
The snakeyaml Composer() class is similar to an XML DOM parser, as it only returns the result after the whole yaml file has been loaded and converted into snakeyaml Node definitions.
Underlying the Composer() implementation however is an Event producing parser, comparable to an XML SAX event handler implementation.
So for the specific use-case of reloading contentdefinition files, when we technically only need the following:
- the contentdefinition root path (first scalar node)
- a .meta:order-before property for that root path
we can vastly improve this by only 'scanning' the first few events produces the the snakeyaml parser instead of using the Composer() class to load the whole file.
The only requirement/limitation of this solution is that the .meta:order-before property must be defined before any child node (or possibly even before any non-meta property) to be 'seen' by the 'content-head' parser.
I've implemented an initial ContentSourceHeadParser (extending ContentSourceParser) which demonstrates that just parsing the head of the same huge content yaml file takes ~ 50ms compared to ~3.5s when using the ContentSourceParser.