170 Commits

Author SHA1 Message Date
Kevin Hu
d9fe279dde
Feat: Redesign and refactor agent module (#9113)
### What problem does this PR solve?

#9082 #6365

<u> **WARNING: it's not compatible with the older version of `Agent`
module, which means that `Agent` from older versions can not work
anymore.**</u>

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-30 19:41:09 +08:00
謝富祥
021e8b57ae
Fix: fix error 429 api rate limit when building knowledge graph for all chat model and Mistral embedding model (#9106)
### What problem does this PR solve?

fix error 429 api rate limit when building knowledge graph for all chat
model and Mistral embedding model.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-30 11:37:49 +08:00
Viktor Dmitriyev
b47dcc9108
Fix issue with keep_alive=-1 for ollama chat model by allowing a user to set an additional configuration option (#9017)
### What problem does this PR solve?

fix issue with `keep_alive=-1` for ollama chat model by allowing a user
to set an additional configuration option. It is no-breaking change
because it still uses a previous default value such as: `keep_alive=-1`

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [X] Performance Improvement
- [X] Other (please describe):
- Additional configuration option has been added to control behavior of
RAGFlow while working with ollama LLM
2025-07-24 11:20:14 +08:00
Yongteng Lei
7ebc1f0943
Feat: add model provider DeepInfra (#9003)
### What problem does this PR solve?

Add model provider DeepInfra. This model list comes from our community. 

NOTE: most endpoints haven't been tested, but they should work as OpenAI
does.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-23 18:10:35 +08:00
Liu An
9e45fcfdb3 Fix: fix typo in OpenAI error logging message (#8865)
### What problem does this PR solve?

Correct the logging message from "OpenAI cat_with_tools" to "OpenAI
chat_with_tools" in the `_exceptions` method of the `Base` class to
accurately reflect the method name and improve error traceability.

### Type of change

- [x] Typo
2025-07-16 15:31:57 +08:00
Yongteng Lei
1895667573
Feat: add xAI provider (#8781)
### What problem does this PR solve?

Add xAI provider (experimental feature, requires user feedback).

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-07-11 10:35:23 +08:00
Kevin Hu
8281ceb406
Refa: refine retry gap. (#8773)
### What problem does this PR solve?

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2025-07-10 14:28:57 +08:00
Yongteng Lei
f8a6987f1e
Refa: automatic LLMs registration (#8651)
### What problem does this PR solve?

Support automatic LLMs registration.

### Type of change

- [x] Refactoring
2025-07-03 19:05:31 +08:00
Kevin Hu
fffb7c0bba
Fix: anthropic llm issue. (#8633)
### What problem does this PR solve?

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-02 18:37:34 +08:00
Tuan Le
1c77b4ed9b
fix: Correctly format message parts in GoogleChat (#8596)
### What problem does this PR solve?

This PR addresses an incompatibility issue with the Google Chat API by
correcting the message content format in the `GoogleChat` class.
Previously, the content was directly assigned to the "parts" field,
which did not align with the API's expected format. This change ensures
that messages are properly formatted with a "text" key within a
dictionary, as required by the API.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-07-01 14:06:07 +08:00
Kevin Hu
aafeffa292
Feat: add gitee as LLM provider. (#8545)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-06-30 09:22:31 +08:00
Kevin Hu
e441c17c2c
Refa: limit embedding concurrency and fix chat_with_tool (#8543)
### What problem does this PR solve?

#8538

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2025-06-27 19:28:41 +08:00
Kevin Hu
a10f05f4d7
Fix: chat with tools bug. (#8528)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-27 12:10:53 +08:00
Rainman
340354b79c
fix the error 'Unknown field for GenerationConfig: max_tokens' when u… (#8473)
### What problem does this PR solve?
[https://github.com/infiniflow/ragflow/issues/8324](url)

docker image version: v0.19.1

The `_clean_conf` function was not implemented in the `_chat` and
`chat_streamly` methods of the `GeminiChat` class, causing the error
"Unknown field for GenerationConfig: max_tokens" when the default LLM
config includes the "max_tokens" parameter.

**Buggy Code(ragflow/rag/llm/chat_model.py)**
```python
class GeminiChat(Base):
    def __init__(self, key, model_name, base_url=None, **kwargs):
        super().__init__(key, model_name, base_url=base_url, **kwargs)

        from google.generativeai import GenerativeModel, client

        client.configure(api_key=key)
        _client = client.get_default_generative_client()
        self.model_name = "models/" + model_name
        self.model = GenerativeModel(model_name=self.model_name)
        self.model._client = _client

    def _clean_conf(self, gen_conf):
        for k in list(gen_conf.keys()):
            if k not in ["temperature", "top_p"]:
                del gen_conf[k]
        return gen_conf

    def _chat(self, history, gen_conf):
        from google.generativeai.types import content_types

        system = history[0]["content"] if history and history[0]["role"] == "system" else ""
        hist = []
        for item in history:
            if item["role"] == "system":
                continue
            hist.append(deepcopy(item))
            item = hist[-1]
            if "role" in item and item["role"] == "assistant":
                item["role"] = "model"
            if "role" in item and item["role"] == "system":
                item["role"] = "user"
            if "content" in item:
                item["parts"] = item.pop("content")

        if system:
            self.model._system_instruction = content_types.to_content(system)
        response = self.model.generate_content(hist, generation_config=gen_conf)
        ans = response.text
        return ans, response.usage_metadata.total_token_count

    def chat_streamly(self, system, history, gen_conf):
        from google.generativeai.types import content_types

        if system:
            self.model._system_instruction = content_types.to_content(system)
        #_clean_conf was not implemented 
        for k in list(gen_conf.keys()):
            if k not in ["temperature", "top_p", "max_tokens"]:
                del gen_conf[k]
        for item in history:
            if "role" in item and item["role"] == "assistant":
                item["role"] = "model"
            if "content" in item:
                item["parts"] = item.pop("content")
        ans = ""
        try:
            response = self.model.generate_content(history, generation_config=gen_conf, stream=True)
            for resp in response:
                ans = resp.text
                yield ans

            yield response._chunks[-1].usage_metadata.total_token_count
        except Exception as e:
            yield ans + "\n**ERROR**: " + str(e)

        yield 0
```
**Implement the _clean_conf function**
```python
class GeminiChat(Base):
    def __init__(self, key, model_name, base_url=None, **kwargs):
        super().__init__(key, model_name, base_url=base_url, **kwargs)

        from google.generativeai import GenerativeModel, client

        client.configure(api_key=key)
        _client = client.get_default_generative_client()
        self.model_name = "models/" + model_name
        self.model = GenerativeModel(model_name=self.model_name)
        self.model._client = _client

    def _clean_conf(self, gen_conf):
        for k in list(gen_conf.keys()):
            if k not in ["temperature", "top_p"]:
                del gen_conf[k]
        return gen_conf

    def _chat(self, history, gen_conf):
        from google.generativeai.types import content_types
        # implement _clean_conf to remove the wrong parameters
        gen_conf = self._clean_conf(gen_conf)

        system = history[0]["content"] if history and history[0]["role"] == "system" else ""
        hist = []
        for item in history:
            if item["role"] == "system":
                continue
            hist.append(deepcopy(item))
            item = hist[-1]
            if "role" in item and item["role"] == "assistant":
                item["role"] = "model"
            if "role" in item and item["role"] == "system":
                item["role"] = "user"
            if "content" in item:
                item["parts"] = item.pop("content")

        if system:
            self.model._system_instruction = content_types.to_content(system)
        response = self.model.generate_content(hist, generation_config=gen_conf)
        ans = response.text
        return ans, response.usage_metadata.total_token_count

    def chat_streamly(self, system, history, gen_conf):
        from google.generativeai.types import content_types
        # implement _clean_conf to remove the wrong parameters
        gen_conf = self._clean_conf(gen_conf)

        if system:
            self.model._system_instruction = content_types.to_content(system)
        #Removed duplicate parameter filtering logic "for k in list(gen_conf.keys()):"
        for item in history:
            if "role" in item and item["role"] == "assistant":
                item["role"] = "model"
            if "content" in item:
                item["parts"] = item.pop("content")
        ans = ""
        try:
            response = self.model.generate_content(history, generation_config=gen_conf, stream=True)
            for resp in response:
                ans = resp.text
                yield ans

            yield response._chunks[-1].usage_metadata.total_token_count
        except Exception as e:
            yield ans + "\n**ERROR**: " + str(e)

        yield 0
```

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-25 16:23:35 +08:00
Song Fuchang
fd7ac17605
Feat: Scratch MCP tool calling support. (#8263)
### What problem does this PR solve?

This is a cherry-pick from #7781 as requested.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-06-23 17:45:35 +08:00
Liu An
244d8a47b9
Fix: AzureChat model code (#8426)
### What problem does this PR solve?

- Simplify AzureChat constructor by passing base_url directly
- Clean up spacing and formatting in chat_model.py
- Remove redundant parentheses and improve code consistency
- #8423

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-23 15:59:25 +08:00
Stephen Hu
35034fed73
Fix: Raptor: [Bug]: **ERROR**: Unknown field for GenerationConfig: max_tokens (#8331)
### What problem does this PR solve?

https://github.com/infiniflow/ragflow/issues/8324

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-18 16:40:57 +08:00
Kevin Hu
b1117a8717
Fix: base url issue. (#8281)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-16 13:40:25 +08:00
Kevin Hu
d5236b71f4
Refa: ollama keep alive issue. (#8216)
### What problem does this PR solve?

#8122

### Type of change

- [x] Refactoring
2025-06-12 15:09:40 +08:00
Kevin Hu
56ee69e9d9
Refa: chat with tools. (#8210)
### What problem does this PR solve?


### Type of change
- [x] Refactoring
2025-06-12 12:31:10 +08:00
Yongteng Lei
1a5f991d86
Fix: auto-keyword and auto-question fail with qwq model (#8190)
### What problem does this PR solve?

Fix auto-keyword and auto-question fail with qwq model. #8189 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-12 11:37:07 +08:00
Kevin Hu
69e1fc496d
Refa: chat models (#8187)
### What problem does this PR solve?


### Type of change

- [x] Refactoring
2025-06-11 17:20:12 +08:00
Kevin Hu
156290f8d0
Fix: url path join issue. (#8013)
### What problem does this PR solve?

Close #7980

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-06-03 14:18:40 +08:00
Song Fuchang
a1f06a4fdc
Feat: Support tool calling in Generate component (#7572)
### What problem does this PR solve?

Hello, our use case requires LLM agent to invoke some tools, so I made a
simple implementation here.

This PR does two things:

1. A simple plugin mechanism based on `pluginlib`:

This mechanism lives in the `plugin` directory. It will only load
plugins from `plugin/embedded_plugins` for now.

A sample plugin `bad_calculator.py` is placed in
`plugin/embedded_plugins/llm_tools`, it accepts two numbers `a` and `b`,
then give a wrong result `a + b + 100`.

In the future, it can load plugins from external location with little
code change.

Plugins are divided into different types. The only plugin type supported
in this PR is `llm_tools`, which must implement the `LLMToolPlugin`
class in the `plugin/llm_tool_plugin.py`.
More plugin types can be added in the future.

2. A tool selector in the `Generate` component:

Added a tool selector to select one or more tools for LLM:


![image](https://github.com/user-attachments/assets/74a21fdf-9333-4175-991b-43df6524c5dc)

And with the `bad_calculator` tool, it results this with the `qwen-max`
model:


![image](https://github.com/user-attachments/assets/93aff9c4-8550-414a-90a2-1a15a5249d94)


### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2025-05-16 16:32:19 +08:00
Kevin Hu
5b626870d0
Refa: remove ollama keep alive. (#7560)
### What problem does this PR solve?

#7518

### Type of change

- [x] Refactoring
2025-05-09 17:51:49 +08:00
Yongteng Lei
97a13ef1ab
Fix: Qwen-vl-plus url error (#7281)
### What problem does this PR solve?

Fix Qwen-vl-* url error. #7277

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-25 09:20:10 +08:00
Yongteng Lei
a008b38cf5
Fix: local variable referenced before assignment (#6909)
### What problem does this PR solve?

Fix: local variable referenced before assignment. #6803 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-04-09 20:29:12 +08:00
Yongteng Lei
dc2c74b249
Feat: add primitive support for function calls (#6840)
### What problem does this PR solve?

This PR introduces ​**​primitive support for function calls​**​,
enabling the system to handle basic function call capabilities.
However, this feature is currently experimental and ​**​not yet enabled
for general use​**​, as it is only supported by a subset of models,
namely, Qwen and OpenAI models.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-04-08 16:09:03 +08:00
Zhichang Yu
e7a2a4b7ff
Log llm response on exception (#6750)
### What problem does this PR solve?

Log llm response on exception

### Type of change

- [x] Refactoring
2025-04-02 17:10:57 +08:00
Alex Chen
46b5e32cd7
Feat: support vision llm for gpustack (#6636)
### What problem does this PR solve?
https://github.com/infiniflow/ragflow/issues/6138

This PR is going to support vision llm for gpustack, modify url path
from `/v1-openai` to `/v1`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-03-31 15:33:52 +08:00
Marcus Yuan
c61df5dd25
Dynamic Context Window Size for Ollama Chat (#6582)
# Dynamic Context Window Size for Ollama Chat

## Problem Statement
Previously, the Ollama chat implementation used a fixed context window
size of 32768 tokens. This caused two main issues:
1. Performance degradation due to unnecessarily large context windows
for small conversations
2. Potential business logic failures when using smaller fixed sizes
(e.g., 2048 tokens)

## Solution
Implemented a dynamic context window size calculation that:
1. Uses a base context size of 8192 tokens
2. Applies a 1.2x buffer ratio to the total token count
3. Adds multiples of 8192 tokens based on the buffered token count
4. Implements a smart context size update strategy

## Implementation Details

### Token Counting Logic
```python
def count_tokens(text):
    """Calculate token count for text"""
    # Simple calculation: 1 token per ASCII character
    # 2 tokens for non-ASCII characters (Chinese, Japanese, Korean, etc.)
    total = 0
    for char in text:
        if ord(char) < 128:  # ASCII characters
            total += 1
        else:  # Non-ASCII characters
            total += 2
    return total
```

### Dynamic Context Calculation
```python
def _calculate_dynamic_ctx(self, history):
    """Calculate dynamic context window size"""
    # Calculate total tokens for all messages
    total_tokens = 0
    for message in history:
        content = message.get("content", "")
        content_tokens = count_tokens(content)
        role_tokens = 4  # Role marker token overhead
        total_tokens += content_tokens + role_tokens

    # Apply 1.2x buffer ratio
    total_tokens_with_buffer = int(total_tokens * 1.2)
    
    # Calculate context size in multiples of 8192
    if total_tokens_with_buffer <= 8192:
        ctx_size = 8192
    else:
        ctx_multiplier = (total_tokens_with_buffer // 8192) + 1
        ctx_size = ctx_multiplier * 8192
    
    return ctx_size
```

### Integration in Chat Method
```python
def chat(self, system, history, gen_conf):
    if system:
        history.insert(0, {"role": "system", "content": system})
    if "max_tokens" in gen_conf:
        del gen_conf["max_tokens"]
    try:
        # Calculate new context size
        new_ctx_size = self._calculate_dynamic_ctx(history)
        
        # Prepare options with context size
        options = {
            "num_ctx": new_ctx_size
        }
        # Add other generation options
        if "temperature" in gen_conf:
            options["temperature"] = gen_conf["temperature"]
        if "max_tokens" in gen_conf:
            options["num_predict"] = gen_conf["max_tokens"]
        if "top_p" in gen_conf:
            options["top_p"] = gen_conf["top_p"]
        if "presence_penalty" in gen_conf:
            options["presence_penalty"] = gen_conf["presence_penalty"]
        if "frequency_penalty" in gen_conf:
            options["frequency_penalty"] = gen_conf["frequency_penalty"]
            
        # Make API call with dynamic context size
        response = self.client.chat(
            model=self.model_name,
            messages=history,
            options=options,
            keep_alive=60
        )
        return response["message"]["content"].strip(), response.get("eval_count", 0) + response.get("prompt_eval_count", 0)
    except Exception as e:
        return "**ERROR**: " + str(e), 0
```

## Benefits
1. **Improved Performance**: Uses appropriate context windows based on
conversation length
2. **Better Resource Utilization**: Context window size scales with
content
3. **Maintained Compatibility**: Works with existing business logic
4. **Predictable Scaling**: Context growth in 8192-token increments
5. **Smart Updates**: Context size updates are optimized to reduce
unnecessary model reloads

## Future Considerations
1. Fine-tune buffer ratio based on usage patterns
2. Add monitoring for context window utilization
3. Consider language-specific token counting optimizations
4. Implement adaptive threshold based on conversation patterns
5. Add metrics for context size update frequency

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-03-28 12:38:27 +08:00
Kevin Hu
d2043ff9f2
Fix: LmStudioChat issue. (#6591)
### What problem does this PR solve?

#6577

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-27 14:59:15 +08:00
Yongteng Lei
df3890827d
Refa: change LLM chat output from full to delta (incremental) (#6534)
### What problem does this PR solve?

Change LLM chat output from full to delta (incremental)

### Type of change

- [x] Refactoring
2025-03-26 19:33:14 +08:00
Kevin Hu
12ad746ee6
Fix: Bedrock model invocation error. (#6533)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-26 11:27:12 +08:00
Kevin Hu
095fc84cf2
Fix: claude max tokens. (#6484)
### What problem does this PR solve?

#6458

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-25 10:41:55 +08:00
Kevin Hu
85eb3775d6
Refa: update Anthropic models. (#6445)
### What problem does this PR solve?

#6421

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-24 12:34:57 +08:00
fansir
efc4796f01
Fix ratelimit errors during document parsing (#6413)
### What problem does this PR solve?

When using the online large model API knowledge base to extract
knowledge graphs, frequent Rate Limit Errors were triggered,
causing document parsing to fail. This commit fixes the issue by
optimizing API calls in the following way:
Added exponential backoff and jitter to the API call to reduce the
frequency of Rate Limit Errors.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2025-03-22 23:07:03 +08:00
Kevin Hu
a2a4bfe3e3
Fix: change ollama default num_ctx. (#6395)
### What problem does this PR solve?

#6163

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-21 16:22:03 +08:00
Kevin Hu
e9a6675c40
Fix: enable ollama api-key. (#6205)
### What problem does this PR solve?

#6189

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-18 13:37:34 +08:00
Kevin Hu
7e4d693054
Fix: in case response.choices[0].message.content is None. (#6190)
### What problem does this PR solve?

#6164

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-03-18 10:00:27 +08:00
writinwaters
9c8060f619
0.17.1 release notes (#6021)
### What problem does this PR solve?



### Type of change

- [x] Documentation Update
2025-03-13 14:43:24 +08:00
Kevin Hu
3571270191
Refa: refine the context window size warning. (#5993)
### What problem does this PR solve?


### Type of change
- [x] Refactoring
2025-03-12 19:40:54 +08:00
kuro5989
6e13922bdc
Feat: Add qwq model support to Tongyi-Qianwen factory (#5981)
### What problem does this PR solve?

add qwq model support to Tongyi-Qianwen factory
https://github.com/infiniflow/ragflow/issues/5869

### Type of change

- [x] New Feature (non-breaking change which adds functionality)


![image](https://github.com/user-attachments/assets/49f5c6a0-ecaf-41dd-a23a-2009f854d62c)


![image](https://github.com/user-attachments/assets/93ffa303-920e-4942-8188-bcd6b7209204)


![1741774779438](https://github.com/user-attachments/assets/25f2fd1d-8640-4df0-9a08-78ee9daaa8fe)


![image](https://github.com/user-attachments/assets/4763cf6c-1f76-43c4-80ee-74dfd666a184)

Co-authored-by: zhaozhicheng <zhicheng.zhao@fastonetech.com>
2025-03-12 18:54:15 +08:00
Kevin Hu
251ba7f058
Refa: remove max tokens since no one needs it. (#5690)
### What problem does this PR solve?

#5646 #5640

### Type of change

- [x] Refactoring
2025-03-06 11:29:40 +08:00
Kevin Hu
955801db2e
Resolve super class invokation error. (#5337)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-25 17:42:29 +08:00
Kevin Hu
daddfc9e1b
Remove dup gb2312, solve currupt error. (#5326)
### What problem does this PR solve?

#5252 
#5325

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-25 12:22:37 +08:00
Kevin Hu
df3d0f61bd
Fix base url missing for deepseek from Tongyi. (#5294)
### What problem does this PR solve?


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-02-24 15:43:32 +08:00
Kevin Hu
ec96426c00
Tongyi adapts deepseek. (#5285)
### What problem does this PR solve?


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-02-24 14:04:25 +08:00
Omar Leonardo Sanchez Granados
4f2816c01c
Add support to boto3 default connection (#5246)
### What problem does this PR solve?
 
This pull request includes changes to the initialization logic of the
`ChatModel` and `EmbeddingModel` classes to enhance the handling of AWS
credentials.

Use cases:
- Use env variables for credentials instead of managing them on the DB 
- Easy connection when deploying on an AWS machine

### Type of change

- [X] New Feature (non-breaking change which adds functionality)
2025-02-24 11:01:14 +08:00
yrk111222
7ce675030b
Support downloading models from ModelScope Community. (#5073)
This PR supports downloading models from ModelScope. The main
modifications are as follows:
-New Feature (non-breaking change which adds functionality)
-Documentation Update

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-02-24 10:12:20 +08:00