Uploaded image for project: '[Read Only] - Hippo Configuration Management'
  1. [Read Only] - Hippo Configuration Management
  2. HCM-205

Provide a 'head only' parser to speed up loading huge hcm-content yaml files

    XMLWordPrintable

Details

    • New Feature
    • Status: Closed
    • Normal
    • Resolution: Fixed
    • 1.0.0
    • 1.0.1
    • None

    Description

      When a huge (tens of Mb) contentdefinition yaml file needs to be parsed, the current implementation using the snakeyaml Composer() can take a very long time.

      Testing shows that on a fast laptop, a 22Mb hcm-content yaml file can take up to ~3.5 seconds to 'just' load.

      During startup/bootstrap of the repository, this isn't necessarily a big problem, but for AutoExport, which reloads autoexport modules after writing changes, this actually can become a blocking problem. Especially if a project has several of such huge yaml files.
      Testing on such a project shows AutoExport taking ~11 seconds to complete the writing (and then reloading) of minimal changes.

      Note that the writing part itself is pretty fast, the problem really is the reloading of such huge contentdefinition files. 

      The snakeyaml Composer() class is similar to an XML DOM parser, as it only returns the result after the whole yaml file has been loaded and converted into snakeyaml Node definitions.

      Underlying the Composer() implementation however is an Event producing parser, comparable to an XML SAX event handler implementation.

      So for the specific use-case of reloading contentdefinition files, when we technically only need the following:

      1. the contentdefinition root path (first scalar node)
      2. a .meta:order-before property for that root path

      we can vastly improve this by only 'scanning' the first few events produces the the snakeyaml parser instead of using the Composer() class to load the whole file.

      The only requirement/limitation of this solution is that the .meta:order-before property must be defined before any child node (or possibly even before any non-meta property) to be 'seen' by the 'content-head' parser.

      I've implemented an initial ContentSourceHeadParser (extending ContentSourceParser) which demonstrates that just parsing the head of the same huge content yaml file takes ~ 50ms compared to ~3.5s when using the ContentSourceParser. 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              adouma Ate Douma (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: