unstructured/typings/lxml/etree/_module_func.pyi
Steve Canny 6fe1c9980e
rfctr(html): prepare for new html parser (#3257)
**Summary**
Extract as much mechanical refactoring from the HTML parser change-over
into the PR as possible. This leaves the next PR focused on installing
the new parser and the ingest-test impact.

**Reviewers:** Commits are well groomed and reviewing commit-by-commit
is probably easier.

**Additional Context**
This PR introduces the rewritten HTML parser. Its general design is
recursive, consistent with the recursive structure of HTML (tree of
elements). It also adds the unit tests for that parser but it does not
_install_ the parser. So the behavior of `partition_html()` is unchanged
by this PR. The next PR in this series will do that and handle the
ingest and other unit test changes required to reflect the dozen or so
bug-fixes the new parser provides.
2024-06-21 20:59:48 +00:00

20 lines
581 B
Python

# pyright: reportPrivateUsage=false
from __future__ import annotations
from .._types import _ElementOrTree
from ..etree import HTMLParser, XMLParser
from ._element import _Element
def fromstring(text: str | bytes, parser: XMLParser | HTMLParser) -> _Element: ...
# Under XML Canonicalization (C14N) mode, most arguments are ignored,
# some arguments would even raise exception outright if specified.
def tostring(
element_or_tree: _ElementOrTree,
*,
encoding: str | type[str] | None = None,
pretty_print: bool = False,
with_tail: bool = True,
) -> str: ...