Marianna 4140f625d0
add script to render html from unstructured elements (#3799)
Script to render HTML from unstructured elements.

NOTE: This script is not intended to be used as a module.
NOTE: This script is only intended to be used with outputs with
non-empty `metadata.text_as_html`.

TODO: It was noted that unstructured_elements_to_ontology func always
returns a single page
This script is using helper functions to handle multiple pages. I am not
sure if this was intended, or it is a bug - if it is a bug it would
require bit longer debugging - to make it usable fast I used
workarounds.

Usage: test with any outputs with non-empty `metadata.text_as_html`.
Example files attached.

`[Example-Bill-of-Lading-Waste.docx.pdf.json](https://github.com/user-attachments/files/17922898/Example-Bill-of-Lading-Waste.docx.pdf.json)`


[Breast_Cancer1-5.pdf.json](https://github.com/user-attachments/files/17922899/Breast_Cancer1-5.pdf.json)
2024-12-04 19:46:51 -08:00
..
2024-01-09 23:37:30 +00:00
2024-01-09 23:37:30 +00:00
2023-12-12 01:04:15 +00:00