unstructured/example-docs/fake-html-with-duplicate-elements.html
Michał Martyniak 2d1923ac7e
Better element IDs - deterministic and document-unique hashes (#2673)
Part two of: https://github.com/Unstructured-IO/unstructured/pull/2842

Main changes compared to part one:
* hash computation includes element's sequence number on page, page
number, document filename and its text
* there are more test for deterministic behavior of IDs returned by
partitioning functions + their uniqueness (guaranteed at the document
level, and high probability across multiple documents)

This PR addresses the following issue:
https://github.com/Unstructured-IO/unstructured/issues/2461
2024-04-24 00:05:20 -07:00

24 lines
380 B
HTML

<!DOCTYPE html>
<html>
<head>
<title>Simple Nested HTML</title>
</strong>
<body>
<h1>Example heading.</h1>
<div>
<span>This is a span.</span>
<span>This is another span.</span>
</div>
<br>
<h1>Example heading.</h1>
<div>
<span>This is a span.</span>
<span>This is another span.</span>
</div>
</body>
</html>