implement additional mime types (#8446)

* implement additional mime types

* correct typo

* reduce complexity

* add optional

* add missing release note

* yamllint

* yamllint

* Update file-router-additional-mime-types-47fe57e6816b83da.yaml

minor reno change for consistency

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
This commit is contained in:
jlonge4 2024-10-16 06:38:49 -04:00 committed by GitHub
parent 8613bb7653
commit 78f378b34d
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 31 additions and 1 deletions

View File

@ -54,16 +54,24 @@ class FileTypeRouter:
:param mime_types: A list of MIME types or regex patterns to classify the input files or byte streams.
"""
def __init__(self, mime_types: List[str]):
def __init__(self, mime_types: List[str], additional_mimetypes: Optional[Dict[str, str]] = None):
"""
Initialize the FileTypeRouter component.
:param mime_types: A list of MIME types or regex patterns to classify the input files or byte streams.
(for example: `["text/plain", "audio/x-wav", "image/jpeg"]`).
:param additional_mimetypes: A dictionary containing the MIME type to add to the mimetypes package to prevent
unsupported or non native packages from being unclassified.
(for example: `{"application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx"}`).
"""
if not mime_types:
raise ValueError("The list of mime types cannot be empty.")
if additional_mimetypes:
for mime, ext in additional_mimetypes.items():
mimetypes.add_type(mime, ext)
self.mime_type_patterns = []
for mime_type in mime_types:
if not self._is_valid_mime_type_format(mime_type):

View File

@ -0,0 +1,22 @@
---
features:
- |
Added a new parameter `additional_mimetypes` to the FileTypeRouter
component.
This allows users to specify additional MIME type mappings, ensuring
correct
file classification across different runtime environments and Python
versions.
enhancements:
- |
Improved file type detection in FileTypeRouter, particularly for Microsoft
Office file formats like .docx and .pptx. This enhancement ensures more
consistent behavior across different environments, including AWS Lambda
functions and systems without pre-installed office suites.
fixes:
- |
Addressed an issue where certain file types (e.g., .docx, .pptx) were
incorrectly classified as 'unclassified' in environments with limited
MIME type definitions, such as AWS Lambda functions.