unstructured/example-docs/eml/simple-rfc-822.eml

11 lines
679 B
Plaintext
Raw Permalink Normal View History

rfctr(email): eml partitioner rewrite (#3694) **Summary** Initial attempts to incrementally refactor `partition_email()` into shape to allow pluggable partitioning quickly became too complex for ready code-review. Prepare separate rewritten module and tests and swap them out whole. **Additional Context** - Uses the modern stdlib `email` module to reliably accomplish several manual decoding steps in the legacy code. - Remove obsolete email-specific element-types which were replaced 18 months or so ago with email-specific metadata fields for things like Cc: addresses, subject, etc. - Remove accepting an email as `text: str` because MIME-email is inherently a binary format which can and often does contain multiple and contradictory character-encodings. - Remove `encoding` parameters as it is now unused. An email file is not a text file and as such does not have a single overall encoding. Character encoding is specified individually for each MIME-part within the message and often varies from one part to another in the same message. - Remove the need for a caller to specify `attachment_partitioner`. There is only one reasonable choice for this which is `auto.partition()`, consistent with the same interface and operation in `partition_msg()`. - Fixes #3671 along the way by silently skipping attachments with a file-type for which there is no partitioner. - Substantially extend the test-suite to cover multiple transport-encoding/charset combinations. --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: scanny <scanny@users.noreply.github.com>
2024-10-15 19:02:33 -07:00
From: sender@example.com
To: recipient@example.com
Date: Tue, 01 Oct 2024 12:34:56 -0500
Subject: Example RFC 822 Email
This is an RFC 822 email message.
An RFC 822 message is characterized by its simple, text-based format, which includes a header and a body. The header contains structured fields such as "From", "To", "Date", and "Subject", each followed by a colon and the corresponding information. The body follows the header, separated by a blank line, and contains the main content of the email.
The structure ensures compatibility and readability across different email systems and clients, adhering to the standards set by the Internet Engineering Task Force (IETF).