mirror of
https://github.com/infiniflow/ragflow.git
synced 2025-06-26 22:19:57 +00:00

### Description This PR introduces two new environment variables, `DOC_BULK_SIZE` and `EMBEDDING_BATCH_SIZE`, to allow flexible tuning of batch sizes for document parsing and embedding vectorization in RAGFlow. By making these parameters configurable, users can optimize performance and resource usage according to their hardware capabilities and workload requirements. ### What problem does this PR solve? Previously, the batch sizes for document parsing and embedding were hardcoded, limiting the ability to adjust throughput and memory consumption. This PR enables users to set these values via environment variables (in `.env`, Helm chart, or directly in the deployment environment), improving flexibility and scalability for both small and large deployments. - `DOC_BULK_SIZE`: Controls how many document chunks are processed in a single batch during document parsing (default: 4). - `EMBEDDING_BATCH_SIZE`: Controls how many text chunks are processed in a single batch during embedding vectorization (default: 16). This change updates the codebase, documentation, and configuration files to reflect the new options. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update - [ ] Refactoring - [x] Performance Improvement - [ ] Other (please describe): ### Additional context - Updated `.env`, `helm/values.yaml`, and documentation to describe the new variables. - Modified relevant code paths to use the environment variables instead of hardcoded values. - Users can now tune these parameters to achieve better throughput or reduce memory usage as needed. Before: Default value: <img width="643" alt="image" src="https://github.com/user-attachments/assets/086e1173-18f3-419d-a0f5-68394f63866a" /> After: 10x: <img width="777" alt="image" src="https://github.com/user-attachments/assets/5722bbc0-0bcb-4536-b928-077031e550f1" />
95 lines
4.8 KiB
Plaintext
95 lines
4.8 KiB
Plaintext
---
|
|
sidebar_position: 1
|
|
slug: /begin_component
|
|
---
|
|
|
|
# Begin component
|
|
|
|
The starting component in a workflow.
|
|
|
|
---
|
|
|
|
The **Begin** component sets an opening greeting or accepts inputs from the user. It is automatically populated onto the canvas when you create an agent, whether from a template or from scratch (from a blank template). There should be only one **Begin** component in the workflow.
|
|
|
|
## Scenarios
|
|
|
|
A **Begin** component is essential in all cases. Every agent includes a **Begin** component, which cannot be deleted.
|
|
|
|
## Configurations
|
|
|
|
Click the component to display its **Configuration** window. Here, you can set an opening greeting and the input parameters (global variables) for the agent.
|
|
|
|
### ID
|
|
|
|
The ID is the unique identifier for the component within the workflow. Unlike the IDs of other components, the ID of the **Begin** component _cannot_ be changed.
|
|
|
|
### Opening greeting
|
|
|
|
An opening greeting is the agent's first message to the user. It can be a welcoming remark or an instruction to guide the user forward.
|
|
|
|
### Global variables
|
|
|
|
You can set global variables within the **Begin** component, which can be either required or optional. Once established, users will need to provide values for these variables when interacting or chatting with the agent. Click **+ Add variable** to add a global variable, each with the following attributes:
|
|
|
|
- **Key**: _Required_
|
|
The unique variable name.
|
|
- **Name**: _Required_
|
|
A descriptive name providing additional details about the variable.
|
|
For example, if **Key** is set to `lang`, you can set its **Name** to `Target language`.
|
|
- **Type**: _Required_
|
|
The type of the variable:
|
|
- **line**: Accepts a single line of text without line breaks.
|
|
- **paragraph**: Accepts multiple lines of text, including line breaks.
|
|
- **options**: Requires the user to select a value for this variable from a dropdown menu. And you are required to set _at least_ one option for the dropdown menu.
|
|
- **file**: Requires the user to upload one or multiple files.
|
|
- **integer**: Accepts an integer as input.
|
|
- **boolean**: Requires the user to toggle between on and off.
|
|
- **Optional**: A toggle indicating whether the variable is optional.
|
|
|
|
:::tip NOTE
|
|
To pass in parameters from a client, call:
|
|
|
|
- HTTP method [Converse with agent](../../../references/http_api_reference.md#converse-with-agent), or
|
|
- Python method [Converse with agent](../../../references/python_api_reference.md#converse-with-agent).
|
|
:::
|
|
|
|
:::danger IMPORTANT
|
|
|
|
- If you set the key type as **file**, ensure the token count of the uploaded file does not exceed your model provider's maximum token limit; otherwise, the plain text in your file will be truncated and incomplete.
|
|
- If your agent's **Begin** component takes a variable, you _cannot_ embed it into a webpage.
|
|
:::
|
|
|
|
:::note
|
|
You can tune document parsing and embedding efficiency by setting the environment variables `DOC_BULK_SIZE` and `EMBEDDING_BATCH_SIZE`.
|
|
:::
|
|
|
|
## Examples
|
|
|
|
As mentioned earlier, the **Begin** component is indispensable for an agent. Still, you can take a look at our three-step interpreter agent template, where the **Begin** component takes two global variables:
|
|
|
|
1. Click the **Agent** tab at the top center of the page to access the **Agent** page.
|
|
2. Click **+ Create agent** on the top right of the page to open the **agent template** page.
|
|
3. On the **agent template** page, hover over the **Interpreter** card and click **Use this template**.
|
|
4. Name your new agent and click **OK** to enter the workflow editor.
|
|
5. Click on the **Begin** component to display its **Configuration** window.
|
|
|
|
## Frequently asked questions
|
|
|
|
### Is the uploaded file in a knowledge base?
|
|
|
|
No. Files uploaded to an agent as input are not stored in a knowledge base and hence will not be processed using RAGFlow's built-in OCR, DLR or TSR models, or chunked using RAGFlow's built-in chunking methods.
|
|
|
|
### How to upload a webpage or file from a URL?
|
|
|
|
If you set the type of a variable as **file**, your users will be able to upload a file either from their local device or from an accessible URL. For example:
|
|
|
|

|
|
|
|
### File size limit for an uploaded file
|
|
|
|
There is no _specific_ file size limit for a file uploaded to an agent. However, note that model providers typically have a default or explicit maximum token setting, which can range from 8196 to 128k: The plain text part of the uploaded file will be passed in as the key value, but if the file's token count exceeds this limit, the string will be truncated and incomplete.
|
|
|
|
:::tip NOTE
|
|
The variables `MAX_CONTENT_LENGTH` in `/docker/.env` and `client_max_body_size` in `/docker/nginx/nginx.conf` set the file size limit for each upload to a knowledge base or **File Management**. These settings DO NOT apply in this scenario.
|
|
:::
|