mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-08 09:33:43 +00:00

**Summary** Remove `unstructured.partition.html.convert_and_partition_html()`. Move file-type conversion (to HTML) responsibility to each brokering partitioner that uses that strategy and let them call `partition_html()` for themselves with the result. **Additional Context** Rationale: - `partition_html()` does not want or need to know which partitioners might broker partitioning to it. - Different brokering partitioners have their own methods to convert their format to HTML and quirks that may be involved for their format. Avoid coupling them so they can evolve independently. - The core of the conversion work is already encapsulated in `unstructured.partition.common.convert_file_to_html_text_using_pandoc()`. - `convert_and_partition_html()` represents an additional brokering layer with the entailed complexities of an additional site for default parameter values to be (mis-)applied and/or dropped and is an additional location for new parameters to be added.
10 lines
278 B
Python
10 lines
278 B
Python
from __future__ import annotations
|
|
|
|
import pathlib
|
|
|
|
def convert_file(
|
|
source_file: str, to: str, format: str | None, outputfile: str | pathlib.Path | None = None
|
|
) -> str: ...
|
|
def get_pandoc_formats() -> tuple[list[str], list[str]]: ...
|
|
def get_pandoc_version() -> str: ...
|