mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-06-27 02:30:08 +00:00

Script to render HTML from unstructured elements. NOTE: This script is not intended to be used as a module. NOTE: This script is only intended to be used with outputs with non-empty `metadata.text_as_html`. TODO: It was noted that unstructured_elements_to_ontology func always returns a single page This script is using helper functions to handle multiple pages. I am not sure if this was intended, or it is a bug - if it is a bug it would require bit longer debugging - to make it usable fast I used workarounds. Usage: test with any outputs with non-empty `metadata.text_as_html`. Example files attached. `[Example-Bill-of-Lading-Waste.docx.pdf.json](https://github.com/user-attachments/files/17922898/Example-Bill-of-Lading-Waste.docx.pdf.json)` [Breast_Cancer1-5.pdf.json](https://github.com/user-attachments/files/17922899/Breast_Cancer1-5.pdf.json)