In this pull request parent-child relationship for elements generated
with v2 parser is based on actual element IDs instead of IDs baked
somewhere in the HTML script.
With some extra bug fixing it allowed for significantly simplifying json
-> HTML script
Script to render HTML from unstructured elements.
NOTE: This script is not intended to be used as a module.
NOTE: This script is only intended to be used with outputs with
non-empty `metadata.text_as_html`.
TODO: It was noted that unstructured_elements_to_ontology func always
returns a single page
This script is using helper functions to handle multiple pages. I am not
sure if this was intended, or it is a bug - if it is a bug it would
require bit longer debugging - to make it usable fast I used
workarounds.
Usage: test with any outputs with non-empty `metadata.text_as_html`.
Example files attached.
`[Example-Bill-of-Lading-Waste.docx.pdf.json](https://github.com/user-attachments/files/17922898/Example-Bill-of-Lading-Waste.docx.pdf.json)`
[Breast_Cancer1-5.pdf.json](https://github.com/user-attachments/files/17922899/Breast_Cancer1-5.pdf.json)