3285 Commits

Author SHA1 Message Date
Stephen Hu
ce65ea1fc1
Fix: Change allocate_container_blocking Calculate Time by async time (#8206)
### What problem does this PR solve?

Change allocate_container_blocking Calculate Time by async time

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-13 14:05:11 +08:00
writinwaters
2341939376
Docs: Miscellaneous editorial updates (#8237)
### What problem does this PR solve?


### Type of change


- [x] Documentation Update
2025-06-13 09:46:24 +08:00
balibabu
a9d9215547
Feat: Connect conditional operators to other operators #3221 (#8231)
### What problem does this PR solve?

Feat: Connect conditional operators to other operators #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-13 09:30:34 +08:00
Liu An
99725444f1
Fix: desc parameter parsing (#8229)
### What problem does this PR solve?

- Fix boolean parsing for 'desc' parameter in kb_app.py to properly
handle string values

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 19:17:47 +08:00
Stephen Hu
1ab0f52832
Fix:The OpenAI-Compatible Agent API returns an incorrect message (#8177)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8175

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 19:17:15 +08:00
Yongteng Lei
24ca4cc6b7
Refa: GraphRAG and explaining GraphRAG stalling behavior on large files (#8223)
### What problem does this PR solve?

This PR investigates the cause of #7957.

TL;DR: Incorrect similarity calculations lead to too many candidates.
Since candidate selection involves interaction with the LLM, this causes
significant delays in the program.

What this PR does:

1. **Fix similarity calculation**:
When processing a 64 pages government document, the corrected similarity
calculation reduces the number of candidates from over 100,000 to around
16,000. With a default batch size of 100 pairs per LLM call, this fix
reduces unnecessary LLM interactions from over 1,000 calls to around
160, a roughly 10x improvement.
2. **Add concurrency and timeout limits**: 
Up to 5 entity types are processed in "parallel", each with a 180-second
timeout. These limits may be configurable in future updates.
3. **Improve logging**:
The candidate resolution process now reports progress in real time.
4. **Mitigates potential concurrency risks**


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-06-12 19:09:50 +08:00
Kevin Hu
d36c8d18b1
Refa: make exception more clear. (#8224)
### What problem does this PR solve?

#8156

### Type of change
- [x] Refactoring
2025-06-12 17:53:59 +08:00
Liu An
86a1411b07
Refa: Test configs (#8220)
### What problem does this PR solve?

- Move common constants (HOST_ADDRESS, INVALID_API_TOKEN, etc.) to
configs.py
- Update test imports to use centralized configs
- Clean up duplicate constant definitions across test files

This improves maintainability by centralizing configuration.

### Type of change

- [x] Refactoring test case
2025-06-12 17:42:00 +08:00
Liu An
54a465f9e8
Test: fix chunk deletion test assertions (#8222)
### What problem does this PR solve?

- Fix test assertions in test_delete_chunks.py to expect empty results
after deletion

Action 7619

### Type of change

- [x] Bug Fix test cases
2025-06-12 17:41:46 +08:00
balibabu
bf7f7c7027
Feat: Display the connection lines between multiple conditions of the conditional operator #3221 (#8218)
### What problem does this PR solve?

Feat: Display the connection lines between multiple conditions of the
conditional operator #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-12 17:11:24 +08:00
Liu An
7fbbc9650d
Fix: Move pagerank field from create to update dataset API (#8217)
### What problem does this PR solve?

- Remove pagerank from CreateDatasetReq and add to UpdateDatasetReq
- Add pagerank update logic in dataset update endpoint
- Update API documentation to reflect changes
- Modify related test cases and SDK references

#8208

This change makes pagerank a mutable property that can only be set after
dataset creation, and only when using elasticsearch as the doc engine.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 15:47:49 +08:00
Liu An
d0c5ff04a6
Fix: Add pagerank validation for non-elasticsearch doc engines (#8215)
### What problem does this PR solve?

Validate that pagerank updates are only allowed when using elasticsearch
as the document engine. Return an error if pagerank is set while using a
different doc engine, preventing potential inconsistencies in document
scoring.

#8208

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 15:47:22 +08:00
Kevin Hu
d5236b71f4
Refa: ollama keep alive issue. (#8216)
### What problem does this PR solve?

#8122

### Type of change

- [x] Refactoring
2025-06-12 15:09:40 +08:00
Stephen Hu
e7c85e569b
Fix: Improve TS Warning For http_api_reference.md (#8172)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8157
The current master code should work fine, but hI ave some warnings, so I
added a declare to improve the warning

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 14:20:15 +08:00
balibabu
84b4e32c34
Feat: The value selected in the Select component only displays the icon #3221 (#8209)
### What problem does this PR solve?
Feat: The value selected in the Select component only displays the icon
#3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-12 12:31:57 +08:00
Kevin Hu
56ee69e9d9
Refa: chat with tools. (#8210)
### What problem does this PR solve?


### Type of change
- [x] Refactoring
2025-06-12 12:31:10 +08:00
africa-worker
44287fb05f
Oss support opendal(including mysql) (#8204)
### What problem does this PR solve?

#8074
Oss support opendal(including mysql)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-12 11:37:42 +08:00
Liu An
cef587abc2
Fix: Add validation for dataset name in KB update API (#8194)
### What problem does this PR solve?

Validate dataset name in knowledge base update endpoint to ensure:
- Name is a non-empty string
- Name length doesn't exceed DATASET_NAME_LIMIT
- Whitespace is trimmed before processing

Prevents invalid dataset names from being saved and provides clear error
messages.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 11:37:25 +08:00
Yongteng Lei
1a5f991d86
Fix: auto-keyword and auto-question fail with qwq model (#8190)
### What problem does this PR solve?

Fix auto-keyword and auto-question fail with qwq model. #8189 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 11:37:07 +08:00
balibabu
713b574c9d
Feat: Add SwitchForm component #3221 (#8200)
### What problem does this PR solve?

Feat: Add SwitchForm component #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-12 09:50:25 +08:00
Liu An
60c1bf5a19
Fix: duplicate knowledgebase name validation logic (#8199)
### What problem does this PR solve?

Change the condition from checking for >1 to >=1 when validating
duplicate knowledgebase names to properly catch all duplicates. This
ensures no two knowledgebases can have the same name for a tenant.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 09:46:57 +08:00
writinwaters
d331866a12
Docs: Miscellaneous (#8198)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2025-06-12 09:42:07 +08:00
Kevin Hu
69e1fc496d
Refa: chat models (#8187)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2025-06-11 17:20:12 +08:00
Liu An
e87ad8126c
Fix: Improve dataset name validation in KB app (#8188)
### What problem does this PR solve?

- Trim whitespace before checking for empty dataset names
- Change length check from >= to > DATASET_NAME_LIMIT for consistency

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-11 16:14:29 +08:00
Yongteng Lei
5e30426916
Feat: add Qwen3-Embedding text-embedding-v4 (#8184)
### What problem does this PR solve?

Add Qwen3-Embedding text-embedding-v4.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-11 15:32:05 +08:00
Liu An
6aff3e052a
Test: Refactor test fixtures to use HttpApiAuth naming consistently (#8180)
### What problem does this PR solve?

- Rename `api_key` fixture to `HttpApiAuth` across all test files
- Update all dependent fixtures and test cases to use new naming
- Maintain same functionality while improving naming clarity

The rename better reflects the fixture's purpose as an HTTP API
authentication helper rather than just an API key.

### Type of change

- [x] Refactoring
2025-06-11 14:25:40 +08:00
Liu An
f29d9fa3f9
Test: fix test cases and improve document parsing validation (#8179)
### What problem does this PR solve?

- Update chat assistant tests to use dataset.id directly in payloads
- Enhance document parsing tests with better condition checking
- Add explicit type hints and improve timeout handling

Action_7556

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-11 14:25:30 +08:00
balibabu
31003cd5f6
Feat: Display the agent node running timeline #3221 (#8185)
### What problem does this PR solve?

Feat: Display the agent node running timeline #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-11 14:24:43 +08:00
balibabu
f0a3d91171
Feat: Display agent operator call log #3221 (#8169)
### What problem does this PR solve?

Feat: Display agent operator call log #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-11 09:22:07 +08:00
cwr31
e6d36f3a3a
Improve image rotation logic for text recognition (#8167)
### What problem does this PR solve?

Enhanced the image rotation handling by evaluating the original
orientation, clockwise 90°, and counter-clockwise 90° rotations. The
image with the highest text recognition score is now selected, improving
accuracy for text detection in images with aspect ratios >= 1.5.

#8166

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: wenrui.cao <wenrui.cao@univers.com>
2025-06-11 09:20:30 +08:00
writinwaters
c8269206d7
Docs: UI updates (#8170)
### What problem does this PR solve?


### Type of change


- [x] Documentation Update
2025-06-11 09:17:30 +08:00
Gifford Nowland
ab67292aa3
fix: silence deprecation in huggingface snapshot_download function (#8150)
### What problem does this PR solve?

fixes the following deprecation emitted from `download_deps.py`: 

```
UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-10 21:00:03 +08:00
writinwaters
4f92af3cd4
Docs: Updated Auto-question Auto-keyword (#8168)
### What problem does this PR solve?

### Type of change

- [x] Documentation Update
2025-06-10 19:38:28 +08:00
Liu An
a43adafc6b
Refa: Add error handling for JSON decode in embedding models (#8162)
### What problem does this PR solve?

Improve robustness of Jina, Nvidia, and SILICONFLOW embedding models by:
1. Adding try-catch blocks for JSON decode errors
2. Logging error details including response content
3. Raising exceptions with meaningful error messages

### Type of change

- [x] Refactoring
2025-06-10 19:04:17 +08:00
balibabu
c5e4684b44
Feat: Let system variables appear in operator prompts #3221 (#8154)
### What problem does this PR solve?
Feat: Let system variables appear in operator prompts #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-10 17:06:30 +08:00
Liu An
3a34def55f
Test: Migrate test workflow to use top-level test directory (#8145)
### What problem does this PR solve?

- Replace manual venv activation with `uv run` for pytest commands
- Add dynamic test level (p2/p3) based on GitHub event type
- Simplify test commands by removing redundant directory changes

### Type of change

- [x] Update Action
2025-06-10 13:55:26 +08:00
Stephen Hu
e6f68e1ccf
Fix: When List Kbs some times the total is wrong (#8151)
### What problem does this PR solve?
for kb.app list method when owner_ids the total calculate is wrong (now
will base on the paged result to calculate total)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-10 11:34:30 +08:00
Jacky Wu
60ab7027c0
fix: allow to do role auth for S3 bucket use. (#8149)
### What problem does this PR solve?

Close #8148 .

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-10 10:50:07 +08:00
balibabu
08f2223a6a
Feat: Constructing query parameter options for the Retrieval operator #3221 (#8152)
### What problem does this PR solve?

Feat: Constructing query parameter options for the Retrieval operator
#3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-10 10:49:41 +08:00
yurhett
9c6c6c51e0
Fix: use jwks_uri from OIDC metadata for JWKS client (#8136)
### What problem does this PR solve?
Issue: #8051

The current implementation assumes JWKS endpoints follow the standard
`/.well-known/jwks.json` convention. This breaks authentication for OIDC
providers that use non-standard JWKS paths, resulting in 404 errors
during token validation.

Root Cause Analysis
- The OpenID Connect specification doesn't mandate a fixed path for JWKS
endpoints
- Some identity providers (like certain Keycloak configurations) use
custom endpoints
- Our previous approach constructed JWKS URLs by convention rather than
discovery

### Solution Approach
Instead of constructing JWKS URLs by appending to the issuer URI, we
now:
1. Properly leverage the `jwks_uri` from the OIDC discovery metadata
2. Honor the identity provider's actual configured endpoint

```python
# Before (fragile approach)
jwks_url = f"{self.issuer}/.well-known/jwks.json"

# After (standards-compliant)
jwks_cli = jwt.PyJWKClient(self.jwks_uri)  # Use discovered endpoint
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-10 10:16:58 +08:00
HaiyangP
baf32ee461
Display only the duplicate column names and corresponding original source. (#8138)
### What problem does this PR solve?
This PR aims to slove #8120 which request a better error display of
duplicate column names.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-10 10:16:38 +08:00
balibabu
8fb6b5d945
Feat: Add agent operator node from agent form #3221 (#8144)
### What problem does this PR solve?

Feat: Add agent operator node from agent form #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-09 19:19:48 +08:00
Liu An
5cc2eda362
Test: Refactor test fixtures and add SDK session management tests (#8141)
### What problem does this PR solve?

- Consolidate HTTP API test fixtures using batch operations
(batch_add_chunks, batch_create_chat_assistants)
- Fix fixture initialization order in clear_session_with_chat_assistants
- Add new SDK API test suite for session management
(create/delete/list/update)

### Type of change

- [x] Add test cases
- [x] Refactoring
2025-06-09 18:13:26 +08:00
balibabu
9a69d5f367
Feat: Display chat content on the agent page #3221 (#8140)
### What problem does this PR solve?

Feat: Display chat content on the agent page #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-09 18:13:06 +08:00
balibabu
d9b98cbb18
Feat: Convert the prompt field of the agent operator to an array #3221 (#8137)
### What problem does this PR solve?

Feat: Convert the prompt field of the agent operator to an array #3221

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2025-06-09 16:02:33 +08:00
Kevin Hu
24625e0695
Fix: presentation of PDF using vlm. (#8133)
### What problem does this PR solve?

#8109

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-09 15:01:52 +08:00
Liu An
4649accd54
Test: Add SDK API tests for chat assistant management and improve con… (#8131)
### What problem does this PR solve?

- Implement new SDK API test cases for chat assistant CRUD operations
- Enhance HTTP API concurrent tests to use as_completed for better
reliability

### Type of change

- [x] Add test cases
- [x] Refactoring
2025-06-09 13:30:12 +08:00
Liu An
968ffc7ef3
Refa: dataset operations to simplify error handling (#8132)
### What problem does this PR solve?

- Consolidate database operations within single try-except blocks in the
methods

### Type of change

- [x] Refactoring
2025-06-09 13:29:56 +08:00
Stephen Hu
2337bbf6ca
Perf: pass useless check for tidy graph (#8121)
### What problem does this PR solve?
Support passing the attribute check when the upstream has already made
sure it.

### Type of change
- [X] Performance Improvement
2025-06-09 11:44:13 +08:00
Liu An
ad1f89fea0
Fix: chat module update LLM defaults (#8125)
### What problem does this PR solve?

Previously when LLM.model_name was not configured:
- System incorrectly defaulted to 'deepseek-chat' model
- This caused permission errors for unauthorized tenants

Now:
- Use tenant's default chat_model configuration first

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-09 11:44:02 +08:00