899 Commits

Author SHA1 Message Date
Can Wang
779932dcb0
Fix: graphrag, raptor can be null for api created kb issue (#8743)
### What problem does this PR solve?

When knowledgebase/dataset created by API, graphrag and raptor can be
null, and will trigger NoneType error when reach to this code, causing
chunking task not able to finish.

![image](https://github.com/user-attachments/assets/998a63e9-611b-4301-8808-24839a05be8a)

Proposed solution will result in None and pass the condition check
without error.

![image](https://github.com/user-attachments/assets/184374fb-e06a-46e6-b8ac-d66a3fd93b59)


### Type of change

-   Bug Fix (non-breaking change which fixes an issue)
2025-07-09 17:12:42 +08:00
Yongteng Lei
c1f6e6f00e
Feat: add advanced document filter (#8723)
### What problem does this PR solve?

Add advanced document filter

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-09 09:33:11 +08:00
Liu An
addda5ccbe
Fix: Add validation for dialog name (#8722)
### What problem does this PR solve?

- Validate dialog name in `dialog_app.py` to ensure it is a non-empty
string and does not exceed 255 bytes in UTF-8 encoding.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-08 19:20:29 +08:00
Yongteng Lei
4d7bfd2ba3
Fix: typo process_duration (#8696)
### What problem does this PR solve?

Fix typo process_duration.

### Type of change

- [x] Documentation Update
- [x] Refactoring
2025-07-07 14:11:47 +08:00
Fee He
ae3683c346
fix task_service.py (#8687)
Fix the case where pages variable might be None

### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-07 09:48:51 +08:00
Yongteng Lei
1ac61c0f0f
Fix: secure canvas (#8670)
### What problem does this PR solve?

Secure canvas access.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-04 19:40:39 +08:00
Yongteng Lei
4243330d5c
Feat: add MCP server test endpoint (#8632)
### What problem does this PR solve?

Add MCP server test endpoint.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-02 18:52:24 +08:00
Can Wang
83c8af1b59
Fix: page_size can be None error (#8603)
### What problem does this PR solve?

Issue #8602

`parser_config.task_page_size` can be defaults to `None` when dataset is
created by API. This was not handled by the `task_executor.py` code thus
`page_size` could sometimes be `None` which will cause issue in line
351.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-02 18:38:48 +08:00
Kevin Hu
fffb7c0bba
Fix: anthropic llm issue. (#8633)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-02 18:37:34 +08:00
Liu An
0b40eb3e90
Test: Add tests for chunk API endpoints (#8616)
### What problem does this PR solve?

- Add comprehensive test suite for chunk operations including:
  - Test files for create, list, retrieve, update, and delete chunks
  - Authorization tests
  - Batch operations tests
- Update test configurations and common utilities
- Validate `important_kwd` and `question_kwd` fields are lists in
chunk_app.py
- Reorganize imports and clean up duplicate code

### Type of change

- [x] Add test cases
2025-07-02 09:49:08 +08:00
Kevin Hu
e3edcc3064
Trivals. (#8597)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-01 14:05:18 +08:00
天海蒼灆
d4da6dce6e
Feat: Add file management HTTP_API (#8395)
### What problem does this PR solve?

Add file management HTTP_API for operating files

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-01 09:51:53 +08:00
Yongteng Lei
8801de2772
Refa: change mcp_client module to rag/utils/conn (#8578)
### What problem does this PR solve?

Change mcp_client module to rag/utils/conn.

### Type of change

- [x] Refactoring
2025-07-01 09:29:19 +08:00
Yongteng Lei
0478f36e36
Feat: allow users to choose which MCP tools are enabled (#8519)
### What problem does this PR solve?

Allow users to choose which MCP tools are enabled.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-27 10:23:34 +08:00
Stephen Hu
7dbe06f7d8
Refactor: remove useless initialize logic in list_doc (#8523)
### What problem does this PR solve?

Remove useless logic in a loop for list_doc

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-06-27 10:23:08 +08:00
Stephen Hu
938d8dd878
Fix: user_default_llm configuration doesn't work for OpenAI API compatible LLM factory (#8502)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8467
when add llm the llm_name will like "llm1___OpenAI-API"
f09ca8e795/api/apps/llm_app.py (L173)
so we should not use llm1 to query


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-27 09:41:12 +08:00
zhanglei
daf6c82066
fix: list index out of range (#8518)
### What problem does this PR solve?

stack:

```
2025-06-26 17:22:24,739 ERROR    1609 list index out of range
Traceback (most recent call last):
  File "/ragflow/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/ragflow/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/ragflow/api/utils/api_utils.py", line 298, in decorated_function
    return func(*args, **kwargs)
  File "/ragflow/api/apps/sdk/session.py", line 472, in list_session
    print(conv["reference"][message_num])
IndexError: list index out of range

```


![图片](https://github.com/user-attachments/assets/93fe90a8-0434-4842-ba9f-bb5a995b498a)


### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-06-27 09:38:33 +08:00
Yongteng Lei
d768130204
Fix: chunk number error after re-parsing (#8513)
### What problem does this PR solve?

Fix chunk number error after re-parsing. #8503.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-26 17:46:53 +08:00
Liu An
d11cfd4e45
Fix: Add input validation to chunk creation endpoint (#8516)
### What problem does this PR solve?

- Include optional `tag_feas` field if present in request
- Add input validation for `important_kwd` and `question_kwd` to ensure
they are lists
- #8462

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-26 17:46:00 +08:00
Yongteng Lei
0eb90e73a5
Feat: add MCP dashboard functionalities list_tools and test_tool (#8505)
### What problem does this PR solve?

Add MCP dashboard functionalities list_tools and test_tool.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-26 13:52:01 +08:00
Liu An
dac5bcdf17
Fix: Enforce default embedding model in create_dataset / update_dataset (#8486)
### What problem does this PR solve?

Previous:
- Defaulted to hardcoded model 'BAAI/bge-large-zh-v1.5@BAAI'
- Did not respect user-configured default embedding_model

Now:
- Correctly prioritizes user-configured default embedding_model

Other:
- Make embedding_model optional in CreateDatasetReq with proper None
handling
- Add default embedding model fallback in dataset update when empty
- Enhance validation utils to handle None values and string
normalization
- Update SDK default embedding model to None to match API changes
- Adjust related test cases to reflect new validation rules

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-25 16:41:32 +08:00
Stephen Hu
8d9d2cc0a9
Fix: some cases Task return but not set progress (#8469)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8466
I go through the codes, current logic:
When do_handle_task raises an exception, handle_task will set the
progress, but for some cases do_handle_task internal will just return
but not set the right progress, at this cases the redis stream will been
acked but the task is running.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-25 09:58:55 +08:00
Yongteng Lei
af6850c8d8
Feat: add MCP dashboard operations (#8460)
### What problem does this PR solve?

Add MCP server dashboard operations.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-25 09:26:04 +08:00
Stephen Hu
382d2d0373
Refactor:Improve insert file logic (#8445)
### What problem does this PR solve?

before refactor
1. create file record
2. Add to blob

if have some execption at 2 the system db will have a file record but
not have related blob, which will introduce some bug.

after refactor
1. add to blob
2. create file record.

if 1 success but 2 failed just have a dirty blob in blob system, user
will not feel that



### Type of change


- [x] Refactoring
2025-06-24 13:17:22 +08:00
Song Fuchang
fd7ac17605
Feat: Scratch MCP tool calling support. (#8263)
### What problem does this PR solve?

This is a cherry-pick from #7781 as requested.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-23 17:45:35 +08:00
Stephen Hu
794a4102c2
Fix: Document parse via API will alot problen (#8407)
### What problem does this PR solve?
#8391
#8404

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-23 13:08:11 +08:00
Yongteng Lei
936a91c5fe
Fix: code debug may corrupt by history answer (#8385)
### What problem does this PR solve?

Fix code debug may corrupt by history answer.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-20 14:23:02 +08:00
Liu An
9077ee8d15
Fix: desc parameter parsing (#8362)
### What problem does this PR solve?

- Correct boolean parsing for 'desc' parameter in document_app.py to
properly handle string values

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-19 14:22:56 +08:00
RyanFernandes23
c8b1790c92
Fix typo in dataset name length error message (#8351)
### What problem does this PR solve?

Fixes a minor grammar issue in a user-facing error message. The original
message said "large than" instead of the correct comparative form
"larger than". Just a quick fix I noticed while reading the code.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-19 09:54:30 +08:00
Yongteng Lei
1b022116d5
Feat: wrap search app (#8320)
### What problem does this PR solve?

Wrap search app

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-18 16:45:42 +08:00
Jin Hai
4a2ff633e0
Fix typo in code (#8327)
### What problem does this PR solve?

Fix typo in code

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-06-18 09:41:09 +08:00
Liu An
0a13d79b94
Refa: Implement centralized file name length limit using FILE_NAME_LEN_LIMIT constant (#8318)
### What problem does this PR solve?

- Replace hardcoded 255-byte file name length checks with
FILE_NAME_LEN_LIMIT constant
- Update error messages to show the actual limit value
- #8290

### Type of change

- [x] Refactoring

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-17 18:01:30 +08:00
Liu An
64e281b398
Fix: Add validation for empty filenames in document_app.py (#8321)
### What problem does this PR solve?

- Add validation for empty filenames in document_app.py and trim
whitespace

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-17 15:53:41 +08:00
Liu An
a3bebeb599
Fix: Enforce 255-byte filename limit (#8290)
### What problem does this PR solve?

- Add filename length validation (<=255 bytes) for document
upload/rename in both HTTP and SDK APIs
- Update error messages for consistency
- Fix comparison operator in SDK from '>=' to '>' for filename length
check

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-16 16:39:41 +08:00
Yongteng Lei
0fa1a1469e
Fix: avoid mixing different embedding models in document parsing (#8260)
### What problem does this PR solve?

Fix mixing different embedding models in document parsing.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-16 13:40:12 +08:00
Kevin Hu
f7074037ef
Feat: Let number of task ahead be visible. (#8259)
### What problem does this PR solve?


![image](https://github.com/user-attachments/assets/d4ef0526-343a-426f-a85a-b05eb8b559a1)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-13 17:32:40 +08:00
Yongteng Lei
b2eed8fed1
Fix: incorrect progress updating (#8253)
### What problem does this PR solve?

Progress is only updated if it's valid and not regressive.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-13 17:24:14 +08:00
Liu An
99725444f1
Fix: desc parameter parsing (#8229)
### What problem does this PR solve?

- Fix boolean parsing for 'desc' parameter in kb_app.py to properly
handle string values

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 19:17:47 +08:00
Stephen Hu
1ab0f52832
Fix:The OpenAI-Compatible Agent API returns an incorrect message (#8177)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8175

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 19:17:15 +08:00
Kevin Hu
d36c8d18b1
Refa: make exception more clear. (#8224)
### What problem does this PR solve?

#8156

### Type of change
- [x] Refactoring
2025-06-12 17:53:59 +08:00
Liu An
7fbbc9650d
Fix: Move pagerank field from create to update dataset API (#8217)
### What problem does this PR solve?

- Remove pagerank from CreateDatasetReq and add to UpdateDatasetReq
- Add pagerank update logic in dataset update endpoint
- Update API documentation to reflect changes
- Modify related test cases and SDK references

#8208

This change makes pagerank a mutable property that can only be set after
dataset creation, and only when using elasticsearch as the doc engine.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 15:47:49 +08:00
Liu An
d0c5ff04a6
Fix: Add pagerank validation for non-elasticsearch doc engines (#8215)
### What problem does this PR solve?

Validate that pagerank updates are only allowed when using elasticsearch
as the document engine. Return an error if pagerank is set while using a
different doc engine, preventing potential inconsistencies in document
scoring.

#8208

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 15:47:22 +08:00
Liu An
cef587abc2
Fix: Add validation for dataset name in KB update API (#8194)
### What problem does this PR solve?

Validate dataset name in knowledge base update endpoint to ensure:
- Name is a non-empty string
- Name length doesn't exceed DATASET_NAME_LIMIT
- Whitespace is trimmed before processing

Prevents invalid dataset names from being saved and provides clear error
messages.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 11:37:25 +08:00
Liu An
60c1bf5a19
Fix: duplicate knowledgebase name validation logic (#8199)
### What problem does this PR solve?

Change the condition from checking for >1 to >=1 when validating
duplicate knowledgebase names to properly catch all duplicates. This
ensures no two knowledgebases can have the same name for a tenant.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 09:46:57 +08:00
Liu An
e87ad8126c
Fix: Improve dataset name validation in KB app (#8188)
### What problem does this PR solve?

- Trim whitespace before checking for empty dataset names
- Change length check from >= to > DATASET_NAME_LIMIT for consistency

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-11 16:14:29 +08:00
Stephen Hu
e6f68e1ccf
Fix: When List Kbs some times the total is wrong (#8151)
### What problem does this PR solve?
for kb.app list method when owner_ids the total calculate is wrong (now
will base on the paged result to calculate total)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-10 11:34:30 +08:00
yurhett
9c6c6c51e0
Fix: use jwks_uri from OIDC metadata for JWKS client (#8136)
### What problem does this PR solve?
Issue: #8051

The current implementation assumes JWKS endpoints follow the standard
`/.well-known/jwks.json` convention. This breaks authentication for OIDC
providers that use non-standard JWKS paths, resulting in 404 errors
during token validation.

Root Cause Analysis
- The OpenID Connect specification doesn't mandate a fixed path for JWKS
endpoints
- Some identity providers (like certain Keycloak configurations) use
custom endpoints
- Our previous approach constructed JWKS URLs by convention rather than
discovery

### Solution Approach
Instead of constructing JWKS URLs by appending to the issuer URI, we
now:
1. Properly leverage the `jwks_uri` from the OIDC discovery metadata
2. Honor the identity provider's actual configured endpoint

```python
# Before (fragile approach)
jwks_url = f"{self.issuer}/.well-known/jwks.json"

# After (standards-compliant)
jwks_cli = jwt.PyJWKClient(self.jwks_uri)  # Use discovered endpoint
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-10 10:16:58 +08:00
Liu An
968ffc7ef3
Refa: dataset operations to simplify error handling (#8132)
### What problem does this PR solve?

- Consolidate database operations within single try-except blocks in the
methods

### Type of change

- [x] Refactoring
2025-06-09 13:29:56 +08:00
Liu An
92625e1ca9
Fix: document typo in test (#8091)
### What problem does this PR solve?

fix document typo in test

### Type of change

- [x] Typo
2025-06-05 19:03:46 +08:00
Stephen Hu
6953ae89c4
Fix:when stream=false,new message without sessionid does no (#8078)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/8070

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 15:14:15 +08:00