mirror of
				https://github.com/Unstructured-IO/unstructured.git
				synced 2025-10-31 10:03:07 +00:00 
			
		
		
		
	 e34396b2c9
			
		
	
	
		e34396b2c9
		
			
		
	
	
	
	
		
			
			## Summary **Improve title detection in pptx documents** The default title textboxes on a pptx slide are now categorized as titles. **Improve hierarchy detection in pptx documents** List items, and other slide text are properly nested under the slide title. This will enable better chunking of pptx documents. Hierarchy detection is improved by determining category depth via the following: - Check if the paragraph item has a level parameter via the python pptx paragraph. If so, use the paragraph level as the category_depth level. - If the shape being checked is a title shape and the item is not a bullet or email, the element will be set as a Title with a depth corresponding to the enumerated paragraph increment (e.g. 1st line of title shape is depth 0, second is depth 1 etc.). - If the shape is not a title shape but the paragraph is a title, the increment will match the level + 1, so that all paragraph titles are at least 1 to set them below the slide title element
		
			
				
	
	
	
		
			41 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			41 KiB