3 Commits

Author SHA1 Message Date
cragwolfe
87fd0d01dc
feat: Ingest refactors, doc updates (#243)
- Creates ABC's for ingest connectors
- Updates the s3_connector classes to inherit from ABC's
- Moves s3 test script to it's own file to establish pattern for additional connectors
- Rewrites the Ingest.md doc, including instructions how how to add a connector
- Updates the example s3 ingest script to use the new location for main.py

Note that there were no logic changes, this is essentially a refactoring PR.

Test instructions:

Run ./test_unstructured_ingest/test-ingest.sh and ./examples/ingest/s3-small-batch/ingest.sh.
2023-02-21 10:15:33 -08:00
cragwolfe
3c1b089071
feat: Ingest CLI flags and test fixture updates (#227)
* Many command line options added. The sample ingest project is now an easy to use CLI (no code editing
   necessary), capable of processing large numbers of files from S3 in a re-entrant manner. See Ingest.md.
* Fixes issue where text fixtures had been truncated
  * Adds a check to make sure this doesn't happen again
* Moves fixture outputs for the existing connector one subdir lower, 
  to make room for future connector outputs.
2023-02-16 16:45:50 +00:00
cragwolfe
ab542ca3c6
feat: Sample ingest project with S3 connector (#218) 2023-02-14 12:27:45 -08:00