mirror of
				https://github.com/Unstructured-IO/unstructured.git
				synced 2025-10-31 18:14:51 +00:00 
			
		
		
		
	 237d04c896
			
		
	
	
		237d04c896
		
			
		
	
	
	
	
		
			
			### Summary Some `OCR` elements with only spaces in the text have full-page width in the bounding box, which causes the `xycut` sorting to not work as expected. Now the logic to parse OCR results removes any elements with only spaces (more than one space). --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
		
			
				
	
	
	
		
			661 B
		
	
	
	
	
	
	
	
			
		
		
	
	
			661 B
		
	
	
	
	
	
	
	
Custom Layout Sorting
This directory contains examples of how element sorting works.
Running the example
Running script(.py)
export PYTHONPATH=.:$PYTHONPATH && python examples/custom-layout-order/evaluate_natural_reading_order.py <file_path> <strategy>
Here, the file should be under the project root directory. For example,
export PYTHONPATH=.:$PYTHONPATH && python examples/custom-layout-order/evaluate_natural_reading_order.py example-docs/multi-column-2p.pdf fast
Running jupyter notebook
The Google Colab version of the notebook can be found here: https://colab.research.google.com/drive/1HgBvHNPnY-dXO043DftvvMeynlLPgQ_p