Yao You 43b682ad3f
feat: allow extraction of camel cased element type names (#3938)
This PR allows element types with CamelCase names to be extractable
using `extract_image_block_types` variable.

Before: specify `extract_image_block_types=["NarrativeText"]` (or any
casing for `NarrativeText`) would raise a warning that it doesn't match
any available types and not image would be extracted for this element
type

Now: specify `extract_image_block_types=["NarrativeText"]` would extract
images for this element type

## testing

```python
from unstructured.partition.auto import partition
f = "example-docs/pdf/embedded-images-tables.pdf"
elements = partition(f, strategy="hi_res", extract_image_block_types=["narrativetext"])
```

Without this PR no figures would be extracted. With this PR a local
folder would be created to contain images of the narrative text elements
in path like `./figures/figure-1-1.jpg`

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
2025-03-04 01:33:05 +00:00
..