mirror of
				https://github.com/Unstructured-IO/unstructured.git
				synced 2025-10-31 01:54:25 +00:00 
			
		
		
		
	 ec59abfabc
			
		
	
	
		ec59abfabc
		
			
		
	
	
	
	
		
			
			### Summary
Currently, the email partitioner removes only `=\n` characters during
the clearing process. However, email content sometimes contains `=\r\n`
characters, especially when read from file-like objects such as
`SpooledTemporaryFile` (the file type used in our API). This PR updates
the email partitioner to remove both `=\n` and `=\r\n` characters during
the clearing process.
### Testing
```
filename = "example-docs/eml/family-day.eml"
elements = partition_email(
    filename=filename,
)
print(f"From filename: {elements[3].text}")
with open(filename, "rb") as test_file:
    spooled_temp_file = tempfile.SpooledTemporaryFile()
    spooled_temp_file.write(test_file.read())
    spooled_temp_file.seek(0)
    elements = partition_email(file=spooled_temp_file)
    print(f"From spooled_temp_file: {elements[3].text}")
```
**Results:**
- on `main`
```
From filename: Make sure to RSVP!
From spooled_temp_file: Make sure to = RSVP!
```
- on `PR`
```
From filename: Make sure to RSVP!
From spooled_temp_file: Make sure to RSVP!
```