unstructured/example-docs/pdf-bad-color-space.pdf
Austin Walker 2931cb38e8
fix: handle KeyError: 'N' for certain pdfs (#2072)
Closes #2059.

We've found some pdfs that throw an error in pdfminer. These files use a
ICCBased color profile but do not include an expected value `N`. As a
workaround, we can wrap pdfminer and drop any colorspace info, since we
don't need to render the document.

To verify, try to partition the document in the linked issue.

```
elements = partition(filename="google-2023-environmental-report_condensed.pdf", strategy="fast")
```

---------

Co-authored-by: cragwolfe <crag@unstructured.io>
2023-11-15 01:59:05 +00:00

185 KiB