182 lines
6.8 KiB
Bash
Raw Normal View History

# The type of doc engine to use.
# Available options:
# - `elasticsearch` (default)
# - `infinity` (https://github.com/infiniflow/infinity)
Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140) ### What problem does this PR solve? This PR adds the support for latest OpenSearch2.19.1 as the store engine & search engine option for RAGFlow. ### Main Benefit 1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is much better than Elasticsearch 2. For search, OpenSearch2.19.1 supports full-text search、vector_search、hybrid_search those are similar with Elasticsearch on schema 3. For store, OpenSearch2.19.1 stores text、vector those are quite simliar with Elasticsearch on schema ### Changes - Support opensearch_python_connetor. I make a lot of adaptions since the schema and api/method between ES and Opensearch differs in many ways(especially the knn_search has a significant gap) : rag/utils/opensearch_coon.py - Support static config adaptions by changing: conf/service_conf.yaml、api/settings.py、rag/settings.py - Supprt some store&search schema changes between OpenSearch and ES: conf/os_mapping.json - Support OpenSearch python sdk : pyproject.toml - Support docker config for OpenSearch2.19.1 : docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template ### How to use - I didn't change the priority that ES as the default doc/search engine. Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it will work. ### Others Our team tested a lot of docs in our environment by using OpenSearch as the vector database ,it works very well. All the conifg for OpenSearch is necessary. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2025-04-24 16:03:31 +08:00
# - `opensearch` (https://github.com/opensearch-project/OpenSearch)
DOC_ENGINE=${DOC_ENGINE:-elasticsearch}
# ------------------------------
# docker env var for specifying vector db type at startup
# (based on the vector db type, the corresponding docker
# compose profile will be used)
# ------------------------------
COMPOSE_PROFILES=${DOC_ENGINE}
# The version of Elasticsearch.
STACK_VERSION=8.11.3
2024-02-28 15:02:04 +08:00
# The hostname where the Elasticsearch service is exposed
ES_HOST=es01
# The port used to expose the Elasticsearch service to the host machine,
# allowing EXTERNAL access to the service running inside the Docker container.
2024-02-28 15:02:04 +08:00
ES_PORT=1200
# The password for Elasticsearch.
ELASTIC_PASSWORD=infini_rag_flow
Feat: Adds OpenSearch2.19.1 as the vector_database support (#7140) ### What problem does this PR solve? This PR adds the support for latest OpenSearch2.19.1 as the store engine & search engine option for RAGFlow. ### Main Benefit 1. OpenSearch2.19.1 is licensed under the [Apache v2.0 License] which is much better than Elasticsearch 2. For search, OpenSearch2.19.1 supports full-text search、vector_search、hybrid_search those are similar with Elasticsearch on schema 3. For store, OpenSearch2.19.1 stores text、vector those are quite simliar with Elasticsearch on schema ### Changes - Support opensearch_python_connetor. I make a lot of adaptions since the schema and api/method between ES and Opensearch differs in many ways(especially the knn_search has a significant gap) : rag/utils/opensearch_coon.py - Support static config adaptions by changing: conf/service_conf.yaml、api/settings.py、rag/settings.py - Supprt some store&search schema changes between OpenSearch and ES: conf/os_mapping.json - Support OpenSearch python sdk : pyproject.toml - Support docker config for OpenSearch2.19.1 : docker/.env、docker/docker-compose-base.yml、docker/service_conf.yaml.template ### How to use - I didn't change the priority that ES as the default doc/search engine. Only if in docker/.env , we set DOC_ENGINE=${DOC_ENGINE:-opensearch}, it will work. ### Others Our team tested a lot of docs in our environment by using OpenSearch as the vector database ,it works very well. All the conifg for OpenSearch is necessary. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yongteng Lei <yongtengrey@outlook.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2025-04-24 16:03:31 +08:00
# the hostname where OpenSearch service is exposed, set it not the same as elasticsearch
OS_PORT=1201
# The hostname where the OpenSearch service is exposed
OS_HOST=opensearch01
# The password for OpenSearch.
# At least one uppercase letter, one lowercase letter, one digit, and one special character
OPENSEARCH_PASSWORD=infini_rag_flow_OS_01
# The port used to expose the Kibana service to the host machine,
# allowing EXTERNAL access to the service running inside the Docker container.
2024-02-28 15:02:04 +08:00
KIBANA_PORT=6601
KIBANA_USER=rag_flow
KIBANA_PASSWORD=infini_rag_flow
2024-02-28 15:02:04 +08:00
# The maximum amount of the memory, in bytes, that a specific Docker container can use while running.
# Update it according to the available memory in the host machine.
MEM_LIMIT=8073741824
# The hostname where the Infinity service is exposed
INFINITY_HOST=infinity
# Port to expose Infinity API to the host
INFINITY_THRIFT_PORT=23817
INFINITY_HTTP_PORT=23820
INFINITY_PSQL_PORT=5432
# The password for MySQL.
2024-02-28 15:02:04 +08:00
MYSQL_PASSWORD=infini_rag_flow
# The hostname where the MySQL service is exposed
MYSQL_HOST=mysql
# The database of the MySQL service to use
MYSQL_DBNAME=rag_flow
# The port used to expose the MySQL service to the host machine,
# allowing EXTERNAL access to the MySQL database running inside the Docker container.
2024-02-28 15:02:04 +08:00
MYSQL_PORT=5455
# The hostname where the MinIO service is exposed
MINIO_HOST=minio
# The port used to expose the MinIO console interface to the host machine,
# allowing EXTERNAL access to the web-based console running inside the Docker container.
MINIO_CONSOLE_PORT=9001
# The port used to expose the MinIO API service to the host machine,
# allowing EXTERNAL access to the MinIO object storage service running inside the Docker container.
MINIO_PORT=9000
# The username for MinIO.
# When updated, you must revise the `minio.user` entry in service_conf.yaml accordingly.
2024-03-05 16:33:47 +08:00
MINIO_USER=rag_flow
# The password for MinIO.
# When updated, you must revise the `minio.password` entry in service_conf.yaml accordingly.
2024-02-28 15:02:04 +08:00
MINIO_PASSWORD=infini_rag_flow
# The hostname where the Redis service is exposed
REDIS_HOST=redis
# The port used to expose the Redis service to the host machine,
# allowing EXTERNAL access to the Redis service running inside the Docker container.
REDIS_PORT=6379
# The password for Redis.
REDIS_PASSWORD=infini_rag_flow
# The port used to expose RAGFlow's HTTP API service to the host machine,
# allowing EXTERNAL access to the service running inside the Docker container.
2024-02-28 15:02:04 +08:00
SVR_HTTP_PORT=9380
# The RAGFlow Docker image to download.
# Defaults to the v0.18.0-slim edition, which is the RAGFlow Docker image without embedding models.
RAGFLOW_IMAGE=infiniflow/ragflow:v0.18.0-slim
#
# To download the RAGFlow Docker image with embedding models, uncomment the following line instead:
# RAGFLOW_IMAGE=infiniflow/ragflow:v0.18.0
#
# The Docker image of the v0.18.0 edition includes built-in embedding models:
# - BAAI/bge-large-zh-v1.5
# - maidalun1020/bce-embedding-base_v1
#
# If you cannot download the RAGFlow Docker image:
#
# - For the `nightly-slim` edition, uncomment either of the following:
# RAGFLOW_IMAGE=swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:nightly-slim
# RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:nightly-slim
#
# - For the `nightly` edition, uncomment either of the following:
# RAGFLOW_IMAGE=swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:nightly
# RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:nightly
# The local time zone.
TIMEZONE='Asia/Shanghai'
# Uncomment the following line if you have limited access to huggingface.co:
# HF_ENDPOINT=https://hf-mirror.com
# Optimizations for MacOS
# Uncomment the following line if your operating system is MacOS:
# MACOS=1
# The maximum file size limit (in bytes) for each upload to your knowledge base or File Management.
# To change the 1GB file size limit, uncomment the line below and update as needed.
# MAX_CONTENT_LENGTH=1073741824
# After updating, ensure `client_max_body_size` in nginx/nginx.conf is updated accordingly.
# Note that neither `MAX_CONTENT_LENGTH` nor `client_max_body_size` sets the maximum size for files uploaded to an agent.
# See https://ragflow.io/docs/dev/begin_component for details.
# Log level for the RAGFlow's own and imported packages.
# Available levels:
# - `DEBUG`
# - `INFO` (default)
# - `WARNING`
# - `ERROR`
# For example, the following line changes the log level of `ragflow.es_conn` to `DEBUG`:
# LOG_LEVELS=ragflow.es_conn=DEBUG
# aliyun OSS configuration
# STORAGE_IMPL=OSS
# ACCESS_KEY=xxx
# SECRET_KEY=eee
# ENDPOINT=http://oss-cn-hangzhou.aliyuncs.com
# REGION=cn-hangzhou
# BUCKET=ragflow65536
# A user registration switch:
# - Enable registration: 1
# - Disable registration: 0
REGISTER_ENABLED=1
# Sandbox settings
# Important: To enable sandbox, you must re-declare the compose profiles. See hints at the end of file.
# Double check if you add `sandbox-executor-manager` to your `/etc/hosts`
# Pull the required base images before running:
# docker pull infiniflow/sandbox-base-nodejs:latest
# docker pull infiniflow/sandbox-base-python:latest
# Our default sandbox environments include:
# - Node.js base image: includes axios
# - Python base image: includes requests, numpy, and pandas
# Specify custom executor images below if you're using non-default environments.
# SANDBOX_ENABLED=1
# SANDBOX_HOST=sandbox-executor-manager
# SANDBOX_EXECUTOR_MANAGER_IMAGE=infiniflow/sandbox-executor-manager:latest
# SANDBOX_EXECUTOR_MANAGER_POOL_SIZE=3
# SANDBOX_BASE_PYTHON_IMAGE=infiniflow/sandbox-base-python:latest
# SANDBOX_BASE_NODEJS_IMAGE=infiniflow/sandbox-base-nodejs:latest
# SANDBOX_EXECUTOR_MANAGER_PORT=9385
# SANDBOX_ENABLE_SECCOMP=false
# Important: To enable sandbox, you must re-declare the compose profiles.
# 1. Comment out the COMPOSE_PROFILES line above.
# 2. Uncomment one of the following based on your chosen document engine:
# - For Elasticsearch:
# COMPOSE_PROFILES=elasticsearch,sandbox
# - For Infinity:
# COMPOSE_PROFILES=infinity,sandbox
# - For OpenSearch:
# COMPOSE_PROFILES=opensearch,sandbox