Roman Isecke 901704b6c0
update sphinx docs with ingest content (#1969)
### Description
Create a new structure for ingest content in the docs, update with all
configs
2023-11-02 20:40:35 +00:00

24 lines
1.7 KiB
ReStructuredText

Read Configuration
=========================
A shared read configuration serves as a universal set of parameters that are consistent across
all source connectors, providing a standardized way to access and retrieve documents from various sources.
This configuration typically includes settings such as the download directory, which specifies the location
where retrieved documents are stored. By maintaining common parameters like the download directory, users can
streamline their data extraction processes, making it easier to manage and organize the downloaded documents
irrespective of the source connector in use. This promotes consistency, ease of maintenance, and a more straightforward
integration process when working with multiple source connectors within a system.
Configs
---------------------
* ``download_dir``: What location to download the files to. When run via the CLI, a default
location will be used if one is not provided.
* ``re_download (default False)``: By default, the process will skip downloads if the files already exist in the download directory.
By setting this to ``True``, it will force the files to be re downloaded regardless of them existing already.
* ``preserve_downloads (default False)``: By default, the process will delete the downloaded content at the end if everything finished without error.
By setting this to ``True``, those files will be preserved.
* ``download_only (default False)``: If set to ``True``, the process wil exit right after all the files are downloaded and omit any future
steps such as partitioning and uploading to a destination.
* ``max_docs``: An optional integer which will cap how many documents are pulled in in a single process.