unstructured/partition at 32c79caee30ac2aacc35b964697e68b2d47321db - unstructured - Gitea: Git with a cup of tea

yujunjun/unstructured

mirror of https://github.com/Unstructured-IO/unstructured.git synced 2025-11-01 10:33:09 +00:00

History

cragwolfe 32c79caee3

chore: use only regex for contains_english_word. (#382 )

Updates the characters to split when creating candidate english words. Now uses regex to parse out non-alphabetic characters for each word

Note: This was originally an attempt to speedup contains_english_word() but there is no measurable change in performance.

2023-03-30 16:57:43 +00:00

..

test_auto.py

feat: add partition_msg for MSFT Outlook files (#412 )

2023-03-28 20:15:22 +00:00

test_common.py

fix: track narrative text and figure captions in HTML documents (#309 )

2023-02-28 15:36:08 +00:00

test_doc.py

Resolve various style issues to improve overall code quality (#282 )

2023-02-27 11:30:54 -05:00

test_docx.py

Resolve various style issues to improve overall code quality (#282 )

2023-02-27 11:30:54 -05:00

test_email.py

fix: text kwargs no longer fail with empty string (#413 )

2023-03-28 21:03:51 +00:00

test_epub.py

chore: add tests for docker (#373 )

2023-03-21 13:46:09 -07:00

test_html_partition.py

fix: text kwargs no longer fail with empty string (#413 )

2023-03-28 21:03:51 +00:00

test_image.py

Resolve various style issues to improve overall code quality (#282 )

2023-02-27 11:30:54 -05:00

test_json.py

fix: text kwargs no longer fail with empty string (#413 )

2023-03-28 21:03:51 +00:00

test_md.py

feat: Add GitHub data connector; add Markdown partitioner (#284 )

2023-02-27 14:36:44 -08:00

test_msg.py

feat: add partition_msg for MSFT Outlook files (#412 )

2023-03-28 20:15:22 +00:00

test_pdf.py

feat: add "fast" strategy for PDF parsing; fallback to "fast" if detectron2 is not available (#357 )

2023-03-11 03:16:05 +00:00

test_ppt.py

Resolve various style issues to improve overall code quality (#282 )

2023-02-27 11:30:54 -05:00

test_pptx.py

Resolve various style issues to improve overall code quality (#282 )

2023-02-27 11:30:54 -05:00

test_text_type.py

chore: use only regex for contains_english_word. (#382 )

2023-03-30 16:57:43 +00:00

test_text.py

fix: text kwargs no longer fail with empty string (#413 )

2023-03-28 21:03:51 +00:00