### Summary
Currently, the email partitioner removes only `=\n` characters during
the clearing process. However, email content sometimes contains `=\r\n`
characters, especially when read from file-like objects such as
`SpooledTemporaryFile` (the file type used in our API). This PR updates
the email partitioner to remove both `=\n` and `=\r\n` characters during
the clearing process.
### Testing
```
filename = "example-docs/eml/family-day.eml"
elements = partition_email(
filename=filename,
)
print(f"From filename: {elements[3].text}")
with open(filename, "rb") as test_file:
spooled_temp_file = tempfile.SpooledTemporaryFile()
spooled_temp_file.write(test_file.read())
spooled_temp_file.seek(0)
elements = partition_email(file=spooled_temp_file)
print(f"From spooled_temp_file: {elements[3].text}")
```
**Results:**
- on `main`
```
From filename: Make sure to RSVP!
From spooled_temp_file: Make sure to = RSVP!
```
- on `PR`
```
From filename: Make sure to RSVP!
From spooled_temp_file: Make sure to RSVP!
```
### Summary
Closes#2489, which reported an inability to process `.p7s` files. PR
implements two changes:
- If the user selected content type for the email is not available and
there is another valid content type available, fall back to the other
valid content type.
- For signed message, extract the signature and add it to the metadata
### Testing
```python
from unstructured.partition.auto import partition
filename = "example-docs/eml/signed-doc.p7s"
elements = partition(filename=filename) # should get a message about fall back logic
print(elements[0]) # "This is a test"
elements[0].metadata.to_dict() # Will see the signature
```
closes#816
## Description
Added functionality for `partition_email` to automatically decode base64
text before passing it to `partition_text` or `partition_html`.
Also adds base64 encoded email text test cases.
### Summary
Closes#1018. Enables `partition_email` and `partition_msg` to detect if
an email has PGP encrypted content. Based on the specification in [RFC
2015](https://www.ietf.org/rfc/rfc2015.txt). The test emails are based
on the example email in the spec. If PGP detected content is detected, a
warning is emitted and an empty set of lists is returned.
### Testing
```python
from unstructured.partition_email import partition_email
filename = "example-docs/eml/fake-encrypted.eml"
partition_email(filename=filename)
```
```python
from unstructured.partition_msg import partition_msg
filename = "example-docs/fake-encrypted.msg"
partition_msgl(filename=filename)
```
Handle Content-Disposition: inline and attachment without filename
* Add new email test example and test with Content-Disposition: inline.
* Move attachment_info above for loop so it is always defined
* Check if item is inline as well as attachment as these both lack an = character to split on
* Create filename if filename is not specified and write file.
* Update list_attachments with new filename
Fix attachments with = in filename
* Limit split to first match of = to prevent creating a list of more than two parts
* Add example email with attachment name and test for issue
* Adds functionality to extract charset info from eml files
* Adds missed file-like object handling in detect_file_encoding
* Adds functionality to replace the MIME encodings for eml files with one of the
common encodings if a unicode error occurs
* Organize the eml example files in the example-docs/eml directory