This PR aims to improve the organization and readability of our example
documents used in unit tests, specifically focusing on PDF and image
files.
### Summary
- Created two new subdirectories in the `example-docs` folder:
- `pdf/`: for all PDF example files
- `img/`: for all image example files
- Moved relevant PDF files from `example-docs/` to `example-docs/pdf/`
- Moved relevant image files from `example-docs/` to `example-docs/img/`
- Updated file paths in affected unit & ingest tests to reflect the new
directory structure
### Testing
All unit & ingest tests should be updated and verified to work with the
new file structure.
## Notes
Other file types (e.g., office documents, HTML files) remain in the root
of `example-docs/` for now.
## Next Steps
Consider similar reorganization for other file types if this structure
proves to be beneficial.
---------
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
Update: The cli shell script works when sending documents to the free
api, but the paid api is down, so waiting to test against it.
- The first commit adds docstrings and fixes type hints.
- The second commit reorganizes `test_unstructured_ingest` so it matches
the structure of `unstructured/ingest`.
- The third commit contains the primary changes for this PR.
- The `.chunk()` method responsible for sending elements to the correct
method is moved from `ChunkingConfig` to `Chunker` so that
`ChunkingConfig` acts as a config object instead of containing
implementation logic. `Chunker.chunk()` also now takes a json file
instead of a list of elements. This is done to avoid redundant
serialization if the file is to be sent to the api for chunking.
---------
Co-authored-by: Ahmet Melek <39141206+ahmetmeleq@users.noreply.github.com>