mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-28 11:31:08 +00:00

**Summary** `partition_msg()` previously used the `msg_parser` library for parsing Outlook MSG email files (.msg files). The `msg_parser` library is unmaintained and has several major shortcomings such as not being able to parse MSG files with 8-bit encoded strings and not reliably extracting attachments. Use the new and permissively licenced `python-oxmsg` library instead. **Additional Context** For reviewability purposes, this PR temporarily places the new `partition_msg()` implementation in `new_msg.py` and references that implementation from `msg.py`. `new_msg.py` will be renamed to `msg.py` in a closely following PR. This avoids a very messy interleaving of hunks in a diff between the old and re-written `partition_msg()` implementation. Fixes #2481 Fixes #3006