mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-24 09:26:08 +00:00

The `@apply_metadata` decorator already contains logic to detect the language of the element text (on either a document or element level). Update pdfs, and later images, to use this decorator to get accurate element language results outputted. Test ``` from unstructured.partition.auto import partition def test_partition_pdf(): pdf_path = "example-docs/language-docs/fr_olap.pdf" elements = partition(pdf_path) # optionally set `detect_language_per_element=True)` print(f"Number of elements partitioned: {len(elements)}") # Check if elements are returned assert len(elements) > 0, "No elements were partitioned from the PDF." # check language outputted for each element for element in elements: print(element) print(element.metadata.languages) print("-------------------------------") test_partition_pdf() ``` --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: shreyanid <shreyanid@users.noreply.github.com>