mirror of
				https://github.com/Unstructured-IO/unstructured.git
				synced 2025-11-04 03:53:45 +00:00 
			
		
		
		
	Part two of: https://github.com/Unstructured-IO/unstructured/pull/2842 Main changes compared to part one: * hash computation includes element's sequence number on page, page number, document filename and its text * there are more test for deterministic behavior of IDs returned by partitioning functions + their uniqueness (guaranteed at the document level, and high probability across multiple documents) This PR addresses the following issue: https://github.com/Unstructured-IO/unstructured/issues/2461
		
			
				
	
	
		
			24 lines
		
	
	
		
			380 B
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			24 lines
		
	
	
		
			380 B
		
	
	
	
		
			HTML
		
	
	
	
	
	
<!DOCTYPE html>
 | 
						|
<html>
 | 
						|
 | 
						|
<head>
 | 
						|
    <title>Simple Nested HTML</title>
 | 
						|
    </strong>
 | 
						|
 | 
						|
<body>
 | 
						|
    <h1>Example heading.</h1>
 | 
						|
    <div>
 | 
						|
        <span>This is a span.</span>
 | 
						|
        <span>This is another span.</span>
 | 
						|
    </div>
 | 
						|
    <br>
 | 
						|
    <h1>Example heading.</h1>
 | 
						|
    <div>
 | 
						|
        <span>This is a span.</span>
 | 
						|
        <span>This is another span.</span>
 | 
						|
    </div>
 | 
						|
 | 
						|
</body>
 | 
						|
 | 
						|
</html>
 |