The default configuration mode may be configured by using a `config.json` or `config.yml` file in the data project root. If a `.env` file is present along with this config file, then it will be loaded, and the environment variables defined therein will be available for token replacements in your configuration document using `${ENV_VAR}` syntax.
For example:
```
# .env
API_KEY=some_api_key
# config.json
{
"llm": {
"api_key": "${API_KEY}"
}
}
```
# Config Sections
## input
### Fields
-`type`**file|blob** - The input type to use. Default=`file`
-`file_type`**text|csv** - The type of input data to load. Either `text` or `csv`. Default is `text`
-`file_encoding`**str** - The encoding of the input file. Default is `utf-8`
-`file_pattern`**str** - A regex to match input files. Default is `.*\.csv$` if in csv mode and `.*\.txt$` if in text mode.
-`source_column`**str** - (CSV Mode Only) The source column name.
-`timestamp_column`**str** - (CSV Mode Only) The timestamp column name.
-`timestamp_format`**str** - (CSV Mode Only) The source format.
-`text_column`**str** - (CSV Mode Only) The text column name.
-`title_column`**str** - (CSV Mode Only) The title column name.
-`document_attribute_columns`**list[str]** - (CSV Mode Only) The additional document attributes to include.
-`connection_string`**str** - (blob only) The Azure Storage connection string.
-`container_name`**str** - (blob only) The Azure Storage container name.
-`base_dir`**str** - The base directory to read input from, relative to the root.
-`storage_account_blob_url`**str** - The storage account blob URL to use.
## llm
This is the base LLM configuration section. Other steps may override this configuration with their own LLM configuration.
### Fields
-`api_key`**str** - The OpenAI API key to use.
-`type`**openai_chat|azure_openai_chat|openai_embedding|azure_openai_embedding** - The type of LLM to use.
-`model`**str** - The model name.
-`max_tokens`**int** - The maximum number of output tokens.
-`request_timeout`**float** - The per-request timeout.
-`api_base`**str** - The API base url to use.
-`api_version`**str** - The API version
-`organization`**str** - The client organization.
-`proxy`**str** - The proxy URL to use.
-`cognitive_services_endpoint`**str** - The url endpoint for cognitive services.
-`deployment_name`**str** - The deployment name to use (Azure).
-`model_supports_json`**bool** - Whether the model supports JSON-mode output.
-`tokens_per_minute`**int** - Set a leaky-bucket throttle on tokens-per-minute.
-`requests_per_minute`**int** - Set a leaky-bucket throttle on requests-per-minute.
-`max_retries`**int** - The maximum number of retries to use.
-`max_retry_wait`**float** - The maximum backoff time.
-`sleep_on_rate_limit_recommendation`**bool** - Whether to adhere to sleep recommendations (Azure).
-`concurrent_requests`**int** The number of open requests to allow at once.