136 Commits

Author SHA1 Message Date
Stephen Hu
1409bb30df
Refactor:Improve the logic so that it does not decode base 64 for the test image each time (#9264)
### What problem does this PR solve?

Improve the logic so that it does not decode base 64 for the test image
each time

### Type of change

- [x] Refactoring
- [x] Performance Improvement

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-08-06 11:42:25 +08:00
Yongteng Lei
e6bad45c6d
Fix: update broken agent OpenAI-Compatible completion due to v0.20.0 changes (#9241)
### What problem does this PR solve?

Update broken agent OpenAI-Compatible completion due to v0.20.0. #9199 

Usage example:

**Referring the input is important, otherwise, will result in empty
output.**

<img width="1273" height="711" alt="Image"
src="https://github.com/user-attachments/assets/30740be8-f4d6-400d-9fda-d2616f89063f"
/>

<img width="622" height="247" alt="Image"
src="https://github.com/user-attachments/assets/0a2ca57a-9600-4cec-9362-0cafd0ab3aee"
/>

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-05 17:47:25 +08:00
Stephen Hu
45bf294117
Refactor: support config strong test (#9198)
### What problem does this PR solve?


https://github.com/infiniflow/ragflow/issues/9189#issuecomment-3148920950

### Type of change
- [x] Refactoring

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-08-04 13:54:18 +08:00
Kevin Hu
30e9212db9
Fix: enlarge the timeout limits. (#9201)
### What problem does this PR solve?

#9189

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-08-04 13:34:34 +08:00
Kevin Hu
d9fe279dde
Feat: Redesign and refactor agent module (#9113)
### What problem does this PR solve?

#9082 #6365

<u> **WARNING: it's not compatible with the older version of `Agent`
module, which means that `Agent` from older versions can not work
anymore.**</u>

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-30 19:41:09 +08:00
Liu An
ffff5c2e8c
Refa: Update base64 test image with new sample data (#9115)
### What problem does this PR solve?

Replace the placeholder test image in base64_image.py with a new sample
image data string.

### Type of change

- [x] Refactoring
2025-07-30 14:34:26 +08:00
kuschzzp
b638d3f773
Image validation of the image2text model without using local paths (#9052)
### What problem does this PR solve?

#9050

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-30 12:57:24 +08:00
Yongteng Lei
39ef2ffba9
Feat: parsing supports jsonl or ldjson format (#9087)
### What problem does this PR solve?

Supports jsonl or ldjson format. Feature request from
[discussion](https://github.com/orgs/infiniflow/discussions/8774).

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-30 09:48:20 +08:00
Liu An
b5ffca332a
Refa: validation utils to use Pydantic v2 style models (#9037)
### What problem does this PR solve?

- Update BaseModel to use model_config instead of Config class
- Replace StrEnum with Literal types for method fields
- Convert Field declarations to Annotated style

### Type of change

- [x] Refactoring
2025-07-25 12:16:45 +08:00
Gifford Nowland
34c35cf8ae
fix: obfuscate additional server secrets values (#9014)
### What problem does this PR solve?

Obfuscates additional secrets values on ragflow_server startup to
prevent leakage:
* `secret` (azure)
* `client_secret` (oauth)
* `http_secret_key` (authentication)
* `sas_token` (azure)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Gifford R Nowland <gifford.r.nowland@aero.org>
2025-07-24 10:16:23 +08:00
Liu An
b4b6d296ea
Fix: Increase timeouts for document parsing and model checks (#8996)
### What problem does this PR solve?

- Extended embedding model timeout from 3 to 10 seconds in api_utils.py
- Added more time for large file batches and concurrent parsing
operations to prevent test flakiness
- Import from #8940
- https://github.com/infiniflow/ragflow/actions/runs/16422052652

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-23 15:08:36 +08:00
Liu An
0020c50000
Fix: Refactor parser config handling and add GraphRAG defaults (#8778)
### What problem does this PR solve?

- Update `get_parser_config` to merge provided configs with defaults
- Add GraphRAG configuration defaults for all chunk methods
- Make raptor and graphrag fields non-nullable in ParserConfig schema
- Update related test cases to reflect config changes
- Ensure backward compatibility while adding new GraphRAG support
- #8396

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-23 09:29:37 +08:00
Kevin Hu
c783d90ba3
Perf: set timeout for building chunks. (#8940)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2025-07-21 15:56:45 +08:00
Kevin Hu
ab53a73768
Perf: limit embedding in KG. (#8917)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2025-07-18 19:51:14 +08:00
Kevin Hu
9767c26535
Fix: wrong parameters. (#8900)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-17 18:19:13 +08:00
Kevin Hu
ecdb1701df
Perf: test llm before RAPTOR. (#8897)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2025-07-17 16:48:50 +08:00
Kevin Hu
fbd115773b
Perf: set timeout of some steps in KG. (#8873)
### What problem does this PR solve?

### Type of change


- [x] Performance Improvement
2025-07-16 18:06:03 +08:00
Kevin Hu
24c41d2a61
Perf: make do_cancel quicker. (#8846)
### What problem does this PR solve?

### Type of change

- [x] Performance Improvement
2025-07-15 14:35:00 +08:00
Kevin Hu
c642dbefca
Perf: Enhance timeout handling. (#8826)
### What problem does this PR solve?


### Type of change

- [x] Performance Improvement
2025-07-15 09:36:45 +08:00
Yongteng Lei
237e59532b
Feat: refine create and list operations for MCP dashboard (#8823)
### What problem does this PR solve?

Refine MCP dashboard create and list operations.

### Type of change

- [x] Refactoring
2025-07-14 14:36:56 +08:00
Yongteng Lei
72c19b44c3
Refa: better MIME content type (#8801)
### What problem does this PR solve?

Better uniform MIME content type.

### Type of change

- [x] Refactoring
2025-07-11 18:47:19 +08:00
Liu An
f8524462b0 Fix: Increase default chunk_token_num from 128 to 512 in parser config (#8753)
### What problem does this PR solve?

Updated the default `chunk_token_num` value in `api_utils.py` and
`validation_utils.py` to 512 to accommodate larger text chunks. Adjusted
corresponding test cases in HTTP and SDK API tests to reflect this
change.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-10 09:53:20 +08:00
Kevin Hu
e3edcc3064
Trivals. (#8597)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-01 14:05:18 +08:00
Yongteng Lei
0eb90e73a5
Feat: add MCP dashboard functionalities list_tools and test_tool (#8505)
### What problem does this PR solve?

Add MCP dashboard functionalities list_tools and test_tool.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-26 13:52:01 +08:00
Liu An
dac5bcdf17
Fix: Enforce default embedding model in create_dataset / update_dataset (#8486)
### What problem does this PR solve?

Previous:
- Defaulted to hardcoded model 'BAAI/bge-large-zh-v1.5@BAAI'
- Did not respect user-configured default embedding_model

Now:
- Correctly prioritizes user-configured default embedding_model

Other:
- Make embedding_model optional in CreateDatasetReq with proper None
handling
- Add default embedding model fallback in dataset update when empty
- Enhance validation utils to handle None values and string
normalization
- Update SDK default embedding model to None to match API changes
- Adjust related test cases to reflect new validation rules

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-25 16:41:32 +08:00
Yongteng Lei
af6850c8d8
Feat: add MCP dashboard operations (#8460)
### What problem does this PR solve?

Add MCP server dashboard operations.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-25 09:26:04 +08:00
Stephen Hu
794a4102c2
Fix: Document parse via API will alot problen (#8407)
### What problem does this PR solve?
#8391
#8404

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-23 13:08:11 +08:00
Jin Hai
4a2ff633e0
Fix typo in code (#8327)
### What problem does this PR solve?

Fix typo in code

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-06-18 09:41:09 +08:00
Kevin Hu
d36c8d18b1
Refa: make exception more clear. (#8224)
### What problem does this PR solve?

#8156

### Type of change
- [x] Refactoring
2025-06-12 17:53:59 +08:00
Liu An
7fbbc9650d
Fix: Move pagerank field from create to update dataset API (#8217)
### What problem does this PR solve?

- Remove pagerank from CreateDatasetReq and add to UpdateDatasetReq
- Add pagerank update logic in dataset update endpoint
- Update API documentation to reflect changes
- Modify related test cases and SDK references

#8208

This change makes pagerank a mutable property that can only be set after
dataset creation, and only when using elasticsearch as the doc engine.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 15:47:49 +08:00
Liu An
92625e1ca9
Fix: document typo in test (#8091)
### What problem does this PR solve?

fix document typo in test

### Type of change

- [x] Typo
2025-06-05 19:03:46 +08:00
Stephen Hu
f819378fb0
Update api_utils.py (#8069)
### What problem does this PR solve?


https://github.com/infiniflow/ragflow/issues/8059#issuecomment-2942407486
lazy throw exception to better support custom embedding model

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-05 12:05:58 +08:00
liu an
fed1221302
Refa: HTTP API list datasets / test cases / docs (#7720)
### What problem does this PR solve?

This PR introduces Pydantic-based validation for the list datasets HTTP
API, improving code clarity and robustness. Key changes include:

Pydantic Validation
Error Handling
Test Updates
Documentation Updates

### Type of change

- [x] Documentation Update
- [x] Refactoring
2025-05-20 09:58:26 +08:00
Yongteng Lei
0ebf05440e
Feat: repair corrupted PDF files on upload automatically (#7693)
### What problem does this PR solve?

Try the best to repair corrupted PDF files on upload automatically.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-05-19 14:54:06 +08:00
liu an
ae8b628f0a
Refa: HTTP API delete dataset / test cases / docs (#7657)
### What problem does this PR solve?

This PR introduces Pydantic-based validation for the delete dataset HTTP
API, improving code clarity and robustness. Key changes include:

1. Pydantic Validation
2. Error Handling
3. Test Updates
4. Documentation Updates

### Type of change

- [x] Documentation Update
- [x] Refactoring
2025-05-16 10:16:43 +08:00
liu an
f8cc557892
Fix(api): correct default value handling in dataset parser config (#7589)
### What problem does this PR solve?

Fix  HTTP API Create/Update dataset parser config default value error

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-05-12 19:39:18 +08:00
liu an
35e36cb945
Refa: HTTP API update dataset / test cases / docs (#7564)
### What problem does this PR solve?

This PR introduces Pydantic-based validation for the update dataset HTTP
API, improving code clarity and robustness. Key changes include:
1. Pydantic Validation
2. ​​Error Handling
3. Test Updates
4. Documentation Updates
5. fix bug: #5915

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
- [x] Refactoring
2025-05-09 19:17:08 +08:00
liu an
c98933499a
refa: Optimize create dataset validation (#7451)
### What problem does this PR solve?

Optimize dataset validation and add function docs

### Type of change

- [x] Refactoring
2025-05-06 17:38:06 +08:00
liu an
fc379e90d1
Fix: change create dataset htto api delimiter default value to r'\n' (#7434)
### What problem does this PR solve?

change create dataset delimiter default value to r'\n'

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-30 17:43:42 +08:00
liu an
1f82889001
Fix: create dataset remove unnecessary parameter constraints (#7432)
### What problem does this PR solve?

Remove unnecessary parameter restrictions in dataset creation API

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-30 14:50:23 +08:00
liu an
78380fa181
Refa: http API create dataset and test cases (#7393)
### What problem does this PR solve?

This PR introduces Pydantic-based validation for the create dataset HTTP
API, improving code clarity and robustness. Key changes include:
1. Pydantic Validation
2. ​​Error Handling
3. Test Updates
4. Documentation

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
- [x] Refactoring
2025-04-29 16:53:57 +08:00
xiaosl-cell
969c596d4c
Fix: tenant_id spelling error. (#7331)
### What problem does this PR solve?

In the generate_confirmation_token method, a spelling error was found
with 'tenent_id'. The correct spelling should be 'tenant_id'.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: shengliang xiao <shengliangxiao2024@gmail.com>
2025-04-27 17:34:13 +08:00
Yongteng Lei
dc2c74b249
Feat: add primitive support for function calls (#6840)
### What problem does this PR solve?

This PR introduces ​**​primitive support for function calls​**​,
enabling the system to handle basic function call capabilities.
However, this feature is currently experimental and ​**​not yet enabled
for general use​**​, as it is only supported by a subset of models,
namely, Qwen and OpenAI models.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-08 16:09:03 +08:00
so95
cded812b97
Feat: add OpenAI compatible API for agent (#6329)
### What problem does this PR solve?
add openai agent
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-04-03 16:51:37 +08:00
liwenju0
efdfb39a33
Feat: Add Duplicate ID Check and Update Deletion Logic (#6376)
- Introduce the `check_duplicate_ids` function in `dataset.py` and
`doc.py` to check for and handle duplicate IDs.
- Update the deletion operation to ensure that when deleting datasets
and documents, error messages regarding duplicate IDs can be returned.
- Implement the `check_duplicate_ids` function in `api_utils.py` to
return unique IDs and error messages for duplicate IDs.


### What problem does this PR solve?

Close https://github.com/infiniflow/ragflow/issues/6234

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: wenju.li <wenju.li@deepctr.cn>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-03-21 14:05:17 +08:00
liu an
d16033dd2c
Fix: #5719 Added type check for parser_config (#6243)
### What problem does this PR solve?

Fix #5719 
Add data type validation for parser_config

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-18 18:40:06 +08:00
Kevin Hu
37f3486483
Fix: validation of readonly fields. (#6144)
### What problem does this PR solve?

#6104

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-17 12:22:49 +08:00
Kevin Hu
da3f279495
Fix: add the validation for parser_config. (#5755)
### What problem does this PR solve?

#5719

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-07 15:34:34 +08:00
Kevin Hu
ff35c140dc
Refa: remove dataset language and validate dataset name length. (#5707)
### What problem does this PR solve?

#5686
#5702

### Type of change

- [x] Refactoring
2025-03-06 17:08:28 +08:00
Zhichang Yu
c813c1ff4c
Made task_executor async to speedup parsing (#5530)
### What problem does this PR solve?

Made task_executor async to speedup parsing

### Type of change

- [x] Performance Improvement
2025-03-03 18:59:49 +08:00