mirror of
				https://github.com/Unstructured-IO/unstructured.git
				synced 2025-10-22 05:25:29 +00:00 
			
		
		
		
	 19f00b9fa4
			
		
	
	
		19f00b9fa4
		
			
		
	
	
	
	
		
			
			Fixes #1958. `<style>` is invalid where it appears in the HTML of thw WSJ page mentioned by that issue but invalid has little meaning in the HTML world if Chrome accepts it. In any case, we have no use for the contents of a `<style>` tag wherever it appears so safe enough for us to just strip all those tags. Note we do not want to also strip the *tail text* which can contain text we're interested in.