* Initial commit of discord connector
based off of initial work by @tnachen with modifications
https://github.com/tnachen/unstructured/tree/tnachen/discord_connector
* Add test file
change format of imports
* working version of the connector
More work to be done to tidy it up and add any additional options
* add to test fixtures update
* fix spacing
* tests working, switching to bot testing channel
* add additional channel
add reprocess to tests
* add try clause to allow for exit on error
Update changelog and bump version
* add updated expected output filtes
* add logic to check if —discord-period is an integer
Add more to option description
* fix lint error
* Update discord reqs
* PR feedback
* add newline
* another newline
---------
Co-authored-by: Justin Bossert <packerbacker21@hotmail.com>
* Update test fixtures that should have been updated in prior commit
* Disable biomed ingest tests for now, the fail more often than not
* Bonus: echo `tesseract --version` in the update script, since that is a key thing that influences fixture outputs.
- Updates CI to install tesseract version 5.3.0 (better than 4.x in various ways incl. perf.).
- Adds azure expected output fixtures for more useful reference points and as a repro for Some PDF's with scanned images return empty elements #346 .
- Adds a script to regenerate ingest test fixtures that is run in an ubuntu docker container (like CI), with the same version of tesseract. See the comments in scripts/ingest-test-fixtures-update.sh for details.
- Updates expected outputs with above script.
- Updates individual test-ingest scripts to update expected .json output if OVERWRITE_FIXTURES=true.