mirror of
				https://github.com/Unstructured-IO/unstructured.git
				synced 2025-10-31 01:54:25 +00:00 
			
		
		
		
	 5376bc510f
			
		
	
	
		5376bc510f
		
			
		
	
	
	
	
		
			
			* add python-magic * first pass on filetype detection * tests for filetype detection * more tests for file detection * added tests for error conditions * install libmagic dev in github * libmagic install instructions * pattern for checking email files * support reading .eml in rb mode * add auto partition function * auto tests for emal * auto tests for docx * added tests for html * add pdf and html tests * linting, linting, linting * added docs for auto partitioning * update readme with generic partition brick * bumped version * added test for bad type * detect .docx files from application/octet-stream * linting, linting, linting * identify xlsx from octet stream * install poppler in ci * fix mocks; test for unknown type * install poppler utils * install in one line * only poppler-utils * file extension logic from application/octet-stream * install local inference for ci * install detectron2 * removing unused dockerfile
		
			
				
	
	
		
			10 lines
		
	
	
		
			101 B
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			10 lines
		
	
	
		
			101 B
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <!DOCTYPE html>
 | |
| <html>
 | |
| <body>
 | |
| 
 | |
| <h1>My First Heading</h1>
 | |
| <p>My first paragraph.</p>
 | |
| 
 | |
| </body>
 | |
| </html>
 |