mirror of
				https://github.com/Unstructured-IO/unstructured.git
				synced 2025-10-31 10:03:07 +00:00 
			
		
		
		
	 f8c180a59e
			
		
	
	
		f8c180a59e
		
			
		
	
	
	
	
		
			
			Closes #2027 Tables or pages that contain only numbers are returned as floats in a pandas.DataFrame when the image or page is converted from `.image_to_data()`. An AttributeError was raised downstream when trying to `.strip()` the floats. This update converts those floats if needed and otherwise strips the text. Testing (note: the document used for testing is new, so you will have to copy it to the main branch in order to see that this snippet raises an AttributeError on the main branch, but works on this branch) ``` from unstructured.partition.pdf import partition_pdf filename = "example-docs/all-number-table.pdf" partition_pdf(filename, strategy="ocr_only") ``` --------- Co-authored-by: cragwolfe <crag@unstructured.io>
		
			
				
	
	
	
		
			16 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			16 KiB