unstructured/example-docs/eml/mime-different-plain-html.eml
Steve Canny 1eceac26c8
rfctr(email): eml partitioner rewrite (#3694)
**Summary**
Initial attempts to incrementally refactor `partition_email()` into
shape to allow pluggable partitioning quickly became too complex for
ready code-review. Prepare separate rewritten module and tests and swap
them out whole.

**Additional Context**
- Uses the modern stdlib `email` module to reliably accomplish several
manual decoding steps in the legacy code.
- Remove obsolete email-specific element-types which were replaced 18
months or so ago with email-specific metadata fields for things like Cc:
addresses, subject, etc.
- Remove accepting an email as `text: str` because MIME-email is
inherently a binary format which can and often does contain multiple and
contradictory character-encodings.
- Remove `encoding` parameters as it is now unused. An email file is not
a text file and as such does not have a single overall encoding.
Character encoding is specified individually for each MIME-part within
the message and often varies from one part to another in the same
message.
- Remove the need for a caller to specify `attachment_partitioner`.
There is only one reasonable choice for this which is
`auto.partition()`, consistent with the same interface and operation in
`partition_msg()`.
- Fixes #3671 along the way by silently skipping attachments with a
file-type for which there is no partitioner.
- Substantially extend the test-suite to cover multiple
transport-encoding/charset combinations.

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: scanny <scanny@users.noreply.github.com>
2024-10-16 02:02:33 +00:00

35 lines
1.4 KiB
Plaintext

From: sender@example.com
To: recipient@example.com
Date: Tue, 01 Oct 2024 12:34:56 -0500
Subject: Example MIME Email
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="boundary123"
--boundary123
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
This is the text/plain part.
Did you know that the first email was sent by Ray Tomlinson in 1971? He used the "@" symbol to separate the user's name from the computer name, a practice that is still in use today.
Another interesting fact is that the first known instance of email spam occurred in 1978. A marketing message was sent to 393 recipients on ARPANET, marking the beginning of what we now know as email spam.
--boundary123
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: 7bit
<!DOCTYPE html>
<html>
<head>
<title>Example MIME Email</title>
</head>
<body>
<p>This is the <code>text/html</code> part.</p>
<p>Did you know that the first <b>networked email</b> was sent by Ray Tomlinson in 1971? He used the "@" symbol to separate the user's name from the computer name, a practice that is still in use today.</p>
<p>Another interesting fact is that the first known instance of <i>email spam</i> occurred in 1978. A marketing message was sent to 393 recipients on ARPANET, marking the beginning of what we now know as email spam.</p>
</body>
</html>
--boundary123--