114 Commits

Author SHA1 Message Date
Yongteng Lei
7ebc1f0943
Feat: add model provider DeepInfra (#9003)
### What problem does this PR solve?

Add model provider DeepInfra. This model list comes from our community. 

NOTE: most endpoints haven't been tested, but they should work as OpenAI
does.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-23 18:10:35 +08:00
Stephen Hu
ec21d9a98f
Refactor:remove use less convert for FastEmbed (#8984)
### What problem does this PR solve?

remove use less convert for FastEmbed

### Type of change

- [x] Refactoring
2025-07-23 10:51:48 +08:00
Stephen Hu
5fa6f2f151
Update embedding_model.py (#8836)
### What problem does this PR solve?

Remove useless covert for bge encode_queries

### Type of change

- [x] Performance Improvement
2025-07-15 14:04:58 +08:00
Stephen Hu
5383e254c4
Perf:Remove Useless Convert When BGE Embedding (#8816)
### What problem does this PR solve?

FlagModel internal support returns as numpy

### Type of change
- [x] Performance Improvement
2025-07-14 14:02:48 +08:00
Stephen Hu
8d027813f5
Refactor: Improve How To Handle QWenEmbed (#8765)
### What problem does this PR solve?

Based on https://github.com/infiniflow/ragflow/issues/8740 
1. A better handle for 'NoneType' object is not subscriptable
2. Add some logs to get the internal message

### Type of change

- [x] Refactoring
2025-07-10 10:30:18 +08:00
Stephen Hu
19419281c3
Fix: Change Ollama Embedding Keep Alive (#8734)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8733

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-09 12:17:26 +08:00
Stephen Hu
e60ec0a31b
Fix:disallowed special token while embedding (#8692)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8567

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-07 14:13:37 +08:00
6607changchun
9580e99650
fix: retry embedding with Qwen family models when limits temporarily reached. (#8690)
fix: retry embedding with Qwen family models when limits temporarily
reached.

APIs of Qwen family models are limited by calling rates. When reached,
the "output" attribute of the "resp" will be None, and in turn cause
TypeError when trying to retrieve "embeddings". Since these limits are
almost temporary, I have added a simple retry mechanism to avoid it.
Besides, if retry_max reached, the error can be early raised, instead of
hidden behind "TypeError".

### What problem does this PR solve?

Sometimes Qwen blocks calling due to rate limits, but it will cause the
whole parsing procedure stops when creating knowledge base. In this
situation, resp["output"] will be None, and resp["output"]["embeddings"]
will cause TypeError. Since the limits are temporary, I apply a simple
retry mechanism to solve it.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-07-07 12:15:52 +08:00
Yongteng Lei
f8a6987f1e
Refa: automatic LLMs registration (#8651)
### What problem does this PR solve?

Support automatic LLMs registration.

### Type of change

- [x] Refactoring
2025-07-03 19:05:31 +08:00
Kevin Hu
d46c24045f
Feat: add GiteeAI as a llm provider. (#8572)
### What problem does this PR solve?

#1853

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-30 11:22:11 +08:00
Kevin Hu
aafeffa292
Feat: add gitee as LLM provider. (#8545)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-30 09:22:31 +08:00
Rainman
49d67cbcb7
fix a bug when using huggingface embedding api (#8432)
### What problem does this PR solve?

image_version: v0.19.1
This PR fixes a bug in the HuggingFaceEmBedding API method that was
causing AssertionError: assert len(vects) == len(docs) during the
document embedding process.

#### Problem
The HuggingFaceEmbed.encode() method had an early return statement
inside the for loop, causing it to return after processing only the
first text input instead of processing all texts in the input list.

**Error Messenge**
```python
AssertionError: assert len(vects) == len(docs) # input chunks  != embedded  vectors from embedding api
File "/ragflow/rag/svr/task_executor.py", line 442, in embedding
```



**Buggy code(/ragflow/rag/llm/embedding_model.py)**
```python
class HuggingFaceEmbed(Base):
    def __init__(self, key, model_name, base_url=None):
        if not model_name:
            raise ValueError("Model name cannot be None")
        self.key = key
        self.model_name = model_name.split("___")[0]
        self.base_url = base_url or "http://127.0.0.1:8080"
        def encode(self, texts: list):
            embeddings = []
            for text in texts:
                response = requests.post(...)
                if response.status_code == 200:
                    try:
                        embedding = response.json()
                        embeddings.append(embedding[0])
                        #  Early return
                        return np.array(embeddings), sum([num_tokens_from_string(text) for text in texts]) 
                    except Exception as _e:
                        log_exception(_e, response)
                else:
                    raise Exception(...)
```
**Fixed Code(I just Rollback this function to the v0.19.0 version)**
```python
Class HuggingFaceEmbed(Base):
    def __init__(self, key, model_name, base_url=None):
        if not model_name:
            raise ValueError("Model name cannot be None")
        self.key = key
        self.model_name = model_name.split("___")[0]
        self.base_url = base_url or "http://127.0.0.1:8080"
        def encode(self, texts: list):
            embeddings = []
            for text in texts:
                response = requests.post(...)
                if response.status_code == 200:
                    embedding = response.json()
                    embeddings.append(embedding[0])  #  Only append, no return
                else:
                    raise Exception(...)
            return np.array(embeddings), sum([num_tokens_from_string(text) for text in texts])  #  Return after processing all
```
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-24 09:35:02 +08:00
Stephen Hu
ef5e7d8c44
Fix:embedding_model class SILICONFLOWEmbed(Base)Function reusing json (#8378)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8360

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-20 11:13:00 +08:00
Kevin Hu
65d5268439
Feat: implement novitaAI embedding and reranking. (#8250)
### What problem does this PR solve?

Close #8227

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-13 15:42:17 +08:00
Kevin Hu
d36c8d18b1
Refa: make exception more clear. (#8224)
### What problem does this PR solve?

#8156

### Type of change
- [x] Refactoring
2025-06-12 17:53:59 +08:00
Liu An
a43adafc6b
Refa: Add error handling for JSON decode in embedding models (#8162)
### What problem does this PR solve?

Improve robustness of Jina, Nvidia, and SILICONFLOW embedding models by:
1. Adding try-catch blocks for JSON decode errors
2. Logging error details including response content
3. Raising exceptions with meaningful error messages

### Type of change

- [x] Refactoring
2025-06-10 19:04:17 +08:00
Kevin Hu
156290f8d0
Fix: url path join issue. (#8013)
### What problem does this PR solve?

Close #7980

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-03 14:18:40 +08:00
Stephen Hu
65537b8200
Fix:Set CUDA_VISIBLE_DEVICES In DefaultEmbedding (#7465)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/7420

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-06 14:38:36 +08:00
Alex Chen
46b5e32cd7
Feat: support vision llm for gpustack (#6636)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/6138

This PR is going to support vision llm for gpustack, modify url path
from `/v1-openai` to `/v1`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-31 15:33:52 +08:00
Kevin Hu
b77ce4e846
Feat: support api-key for Ollama. (#6448)
### What problem does this PR solve?

#6189

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-24 14:53:17 +08:00
zhou
85480f6292
Fix: the error of Ollama embeddings interface returning "500 Internal Server Error" (#6350)
### What problem does this PR solve?

Fix the error where the Ollama embeddings interface returns a “500
Internal Server Error” when using models such as xiaobu-embedding-v2 for
embedding.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-21 15:25:48 +08:00
Omar Leonardo Sanchez Granados
4f2816c01c
Add support to boto3 default connection (#5246)
### What problem does this PR solve?
 
This pull request includes changes to the initialization logic of the
`ChatModel` and `EmbeddingModel` classes to enhance the handling of AWS
credentials.

Use cases:
- Use env variables for credentials instead of managing them on the DB 
- Easy connection when deploying on an AWS machine

### Type of change

- [X] New Feature (non-breaking change which adds functionality)
2025-02-24 11:01:14 +08:00
Kevin Hu
4776fa5e4e
Refactor for total_tokens. (#4652)
### What problem does this PR solve?

#4567
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-26 13:54:26 +08:00
Kevin Hu
f1d9f4290e
Fix TogetherAIEmbed. (#4623)
### What problem does this PR solve?

#4567

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-24 10:29:30 +08:00
Kevin Hu
be5f830878
Truncate text for zhipu embedding. (#4490)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-15 14:36:27 +08:00
Alex Chen
7944aacafa
Feat: add gpustack model provider (#4469)
### What problem does this PR solve?

Add GPUStack as a new model provider.
[GPUStack](https://github.com/gpustack/gpustack) is an open-source GPU
cluster manager for running LLMs. Currently, locally deployed models in
GPUStack cannot integrate well with RAGFlow. GPUStack provides both
OpenAI compatible APIs (Models / Chat Completions / Embeddings /
Speech2Text / TTS) and other APIs like Rerank. We would like to use
GPUStack as a model provider in ragflow.

[GPUStack Docs](https://docs.gpustack.ai/latest/quickstart/)

Related issue: https://github.com/infiniflow/ragflow/issues/4064.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)



### Testing Instructions
1. Install GPUStack and deploy the `llama-3.2-1b-instruct` llm, `bge-m3`
text embedding model, `bge-reranker-v2-m3` rerank model,
`faster-whisper-medium` Speech-to-Text model, `cosyvoice-300m-sft` in
GPUStack.
2. Add provider in ragflow settings.
3. Testing in ragflow.
2025-01-15 14:15:58 +08:00
Kevin Hu
b93c136797
Fix gemini embedding error. (#4356)
### What problem does this PR solve?

#4314

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-01-06 14:41:29 +08:00
Jin Hai
4abc144d3d
Fix error of changing embedding model (#4184)
### What problem does this PR solve?

1. Change embedding model of knowledge base won't change the default
embedding model.
2. Retrieval test bug

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-12-23 16:23:54 +08:00
Kevin Hu
d8fca43017
Make fast embed and default embed mutually exclusive. (#4121)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2024-12-19 17:27:09 +08:00
Kevin Hu
7474348394
Fix fastembed reloading issue. (#4117)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-19 16:18:18 +08:00
Kevin Hu
593ffc4067
Fix HuggingFace model error. (#3870)
### What problem does this PR solve?

#3865

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-05 13:28:42 +08:00
Zhichang Yu
92ab7ef659
Refactor embedding batch_size (#3825)
### What problem does this PR solve?

Refactor embedding batch_size. Close #3657

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2024-12-03 16:22:39 +08:00
Kevin Hu
6a0583f5ad
Fix voyage embedding. (#3818)
### What problem does this PR solve?

#3816 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-03 09:33:54 +08:00
Zhichang Yu
d19f059f34
Detect invalid response from api.siliconflow.cn (#3792)
### What problem does this PR solve?

Detect invalid response from api.siliconflow.cn. Close #2643

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-12-02 12:55:05 +08:00
devMls
59a5813f1b
add jina new models in jina connector (#3770)
### What problem does this PR solve?

add new models in jinna connector, to allow use models that support
multilingual models

### Type of change

- [X] Other (please describe): new connectors no breaking change
2024-12-02 10:06:39 +08:00
Kevin Hu
57208d8e53
Fix batch size issue. (#3675)
### What problem does this PR solve?

#3657

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2024-11-27 18:06:43 +08:00
liuhua
8b35776916
Fix a bug in VolcEngine (#3658)
### What problem does this PR solve?

Fix a bug in VolcEngine  #3553

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
2024-11-27 09:30:49 +08:00
Kevin Hu
e5af18d5ea
Update docs for v0.14.0 (#3625)
### What problem does this PR solve?


### Type of change

- [x] Documentation Update
2024-11-25 11:37:56 +08:00
liuhua
d42362deb6
Add api for sessions and add max_tokens for tenant_llm (#3472)
### What problem does this PR solve?

Add api for sessions and add max_tokens for tenant_llm

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: liuhua <10215101452@stu.ecun.edu.cn>
2024-11-19 14:51:33 +08:00
Zhichang Yu
4413683898
Introduced beartype (#3460)
### What problem does this PR solve?

Introduced [beartype](https://github.com/beartype/beartype) for runtime
type-checking.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2024-11-18 17:38:17 +08:00
Jin Hai
1e90a1bf36
Move settings initialization after module init phase (#3438)
### What problem does this PR solve?

1. Module init won't connect database any more.
2. Config in settings need to be used with settings.CONFIG_NAME

### Type of change

- [x] Refactoring

Signed-off-by: jinhai <haijin.chn@gmail.com>
2024-11-15 17:30:56 +08:00
Zhichang Yu
30f6421760
Use consistent log file names, introduced initLogger (#3403)
### What problem does this PR solve?

Use consistent log file names, introduced initLogger

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2024-11-14 17:13:48 +08:00
roc king
fa54cd5f5c
exstract model dir from model‘s full name (#3368)
### What problem does this PR solve?

When model’s group name contains 0-9,we can't find downloaded
model,because we do not correctly exstract model dir's name from model‘s
full name

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: 王志鹏 <zhipeng3.wang@midea.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-11-13 14:10:16 +08:00
Zhichang Yu
a2a5631da4
Rework logging (#3358)
Unified all log files into one.

### What problem does this PR solve?

Unified all log files into one.

### Type of change

- [x] Refactoring
2024-11-12 17:35:13 +08:00
ksztone-huanggonghao
0dff64f6ad
fix: TypeError: only length-1 arrays can be converted to Python scalars (#3211)
### What problem does this PR solve?
fix "TypeError: only length-1 arrays can be converted to Python scalars"
while using cohere embedding model.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)


![image](https://github.com/user-attachments/assets/2c21a69f-cd76-4d25-b320-058964812db8)
2024-11-06 11:15:00 +08:00
0000sir
4991107822
Fix keys of Xinference deployed models, especially has the same model name with public hosted models. (#2832)
### What problem does this PR solve?

Fix keys of Xinference deployed models, especially has the same model
name with public hosted models.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: 0000sir <0000sir@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-10-16 10:21:08 +08:00
JobSmithManipulation
18f80743eb
support api-version and change default-model in adding azure-openai and openai (#2799)
### What problem does this PR solve?
#2701 #2712 #2749

### Type of change
-[x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-10-11 11:26:42 +08:00
Kevin Hu
7f44cf543a
move import positions (#2753)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
2024-10-09 10:34:58 +08:00
Omar Leonardo Sanchez Granados
34761fa4ca
Fix/bedrock issues (#2718)
### What problem does this PR solve?

Adding a Bedrock API key for Claude Sonnet was broken. I find the issue
came up when trying to test the LLM configuration, the system is a
required parameter in boto3.

As well, there were problems in Bedrock implementation for embeddings
when trying to encode queries.

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
2024-10-05 16:44:50 +08:00
JobSmithManipulation
96f56a3c43
add huggingface model (#2624)
### What problem does this PR solve?

#2469

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2024-09-27 19:15:38 +08:00