mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-11-13 17:07:29 +00:00
Summary Fixes path traversal vulnerability in email and MSG attachment filename handling (GHSA-gm8q-m8mv-jj5m). Changes Security Fix Sanitizes attachment filenames in _AttachmentPartitioner for both email.py and msg.py Uses os.path.basename() to strip path components from filenames Normalizes backslashes to forward slashes to handle Windows paths on Unix systems Removes null bytes and other control characters Handles edge cases (empty strings, ".", "..") Defaults to "unknown" for invalid or dangerous filenames Test Coverage Added 17 comprehensive tests covering: Path traversal attempts (../../../etc/passwd) Absolute Unix paths (/etc/passwd) Absolute Windows paths (C:\Windows\System32\config\sam) Null byte injection (file\x00.txt) Dot and dotdot filenames (. and ..) Missing/empty filenames Complex mixed path separators Valid filenames (ensuring they pass through unchanged) Test Results ✅ All 17 new security tests pass ✅ All 129 existing tests pass ✅ No regressions Security Impact Prevents attackers from using malicious attachment filenames to write files outside the intended directory, which could lead to arbitrary file write vulnerabilities. Changes include comprehensive test coverage for various attack vectors and a version bump to 0.18.18. --------- Co-authored-by: Claude <noreply@anthropic.com>