2 Commits

Author SHA1 Message Date
Pluto
ec209c6b5f
Remove IDs from HTML code (#4012)
In this pull request parent-child relationship for elements generated
with v2 parser is based on actual element IDs instead of IDs baked
somewhere in the HTML script.
With some extra bug fixing it allowed for significantly simplifying json
-> HTML script
2025-06-11 11:55:02 +00:00
Marianna
4140f625d0
add script to render html from unstructured elements (#3799)
Script to render HTML from unstructured elements.

NOTE: This script is not intended to be used as a module.
NOTE: This script is only intended to be used with outputs with
non-empty `metadata.text_as_html`.

TODO: It was noted that unstructured_elements_to_ontology func always
returns a single page
This script is using helper functions to handle multiple pages. I am not
sure if this was intended, or it is a bug - if it is a bug it would
require bit longer debugging - to make it usable fast I used
workarounds.

Usage: test with any outputs with non-empty `metadata.text_as_html`.
Example files attached.

`[Example-Bill-of-Lading-Waste.docx.pdf.json](https://github.com/user-attachments/files/17922898/Example-Bill-of-Lading-Waste.docx.pdf.json)`


[Breast_Cancer1-5.pdf.json](https://github.com/user-attachments/files/17922899/Breast_Cancer1-5.pdf.json)
2024-12-04 19:46:51 -08:00