mirror of
				https://github.com/Unstructured-IO/unstructured.git
				synced 2025-10-26 15:42:15 +00:00 
			
		
		
		
	 00181b88df
			
		
	
	
		00181b88df
		
			
		
	
	
	
	
		
			
			**Summary** Adds logic to combine broken numbered list for pdf fast strategy. **Details** Previously the document reads the numbered list items part of the `layout-parser-paper-fast.pdf` file as: ``` '1. An off-the-shelf toolkit for applying DL models for layout detection, character' 'recognition, and other DIA tasks (Section 3)' '2. A rich repository of pre-trained neural network models (Model Zoo) that' 'underlies the off-the-shelf usage' '3. Comprehensive tools for efficient document image data annotation and model' 'tuning to support different levels of customization' '4. A DL model hub and community platform for the easy sharing, distribu- tion, and discussion of DIA models and pipelines, to promote reusability, reproducibility, and extensibility (Section 4)' ``` Now it reads: ``` '1. An off-the-shelf toolkit for applying DL models for layout detection, character recognition, and other DIA tasks (Section 3)' '2. A rich repository of pre-trained neural network models (Model Zoo) that underlies the off-the-shelf usage' '3. Comprehensive tools for efficient document image data annotation and model' tuning to support different levels of customization' '4. A DL model hub and community platform for the easy sharing, distribu- tion, and discussion of DIA models and pipelines, to promote reusability, reproducibility, and extensibility (Section 4)' ``` The added logic leverages `ElementType` and `coordinates` to determine whether the following lines is a part of the previously detected `ListItem` or not. **Test** Add test that checks the element length less than original version with broken numbered list. The test also checks whether the first detected numbered list ends with previously broken line. --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: Klaijan <Klaijan@users.noreply.github.com>