mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-06-27 02:30:08 +00:00

### Summary Currently, the email partitioner removes only `=\n` characters during the clearing process. However, email content sometimes contains `=\r\n` characters, especially when read from file-like objects such as `SpooledTemporaryFile` (the file type used in our API). This PR updates the email partitioner to remove both `=\n` and `=\r\n` characters during the clearing process. ### Testing ``` filename = "example-docs/eml/family-day.eml" elements = partition_email( filename=filename, ) print(f"From filename: {elements[3].text}") with open(filename, "rb") as test_file: spooled_temp_file = tempfile.SpooledTemporaryFile() spooled_temp_file.write(test_file.read()) spooled_temp_file.seek(0) elements = partition_email(file=spooled_temp_file) print(f"From spooled_temp_file: {elements[3].text}") ``` **Results:** - on `main` ``` From filename: Make sure to RSVP! From spooled_temp_file: Make sure to = RSVP! ``` - on `PR` ``` From filename: Make sure to RSVP! From spooled_temp_file: Make sure to RSVP! ```