mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-06-27 02:30:08 +00:00

Courtesy @phowat, created a branch in the repo to make some changes and merge quickly. Closes #1486. * **Fixes issue where tables from markdown documents were being treated as text** Problem: Tables from markdown documents were being treated as text, and not being extracted as tables. Solution: Enable the `tables` extension when instantiating the `python-markdown` object. Importance: This will allow users to extract structured data from tables in markdown documents. #### Testing: On `main` run the following (run `git checkout fix/md-tables -- example-docs/simple-table.md` first to grab the example table from this branch) ```python from unstructured.partition.md import partition_md elements = partition_md("example-docs/simple-table.md") print(elements[0].category) ``` Output should be `UncategorizedText`. Then run the same code on this branch and observe the output is `Table`. --------- Co-authored-by: cragwolfe <crag@unstructured.io>
82 B
82 B
Item | Price | # In stock |
---|---|---|
Juicy Apples | 1.99 | 739 |
Bananas | 1.89 | 6 |