midscene/apps/site/docs/zh/model-provider.md

# 自定义模型和服务商

Midscene 默认集成了 OpenAI SDK 调用 AI 服务，你可以通过环境变量来自定义配置。这些配置同样可以在 [Chrome 插件](./quick-experience) 中使用。

主要配置项如下，其中 `OPENAI_API_KEY` 是必选项：

必选项:

```bash
# 替换为你自己的 API Key
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

可选项:

```bash
# 可选, 如果你想更换 base URL
export OPENAI_BASE_URL="https://..."

# 可选, 如果你想指定模型名称
export MIDSCENE_MODEL_NAME='qwen-vl-max-latest';

# 可选, 如果你想变更 SDK 的初始化参数
export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"baseURL":"....","defaultHeaders":{"key": "value"}}'

# 可选, 如果你想使用代理。Midscene 使用 `socks-proxy-agent` 作为底层库。
export MIDSCENE_OPENAI_SOCKS_PROXY="socks5://127.0.0.1:1080"

# 可选, 如果你想指定模型 max_tokens
export OPENAI_MAX_TOKENS=2048
```

## 使用 Azure OpenAI 服务时的配置

使用 ADT token provider

```bash
# 使用 Azure OpenAI 服务时，配置为 1
export MIDSCENE_USE_AZURE_OPENAI=1

export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```

使用 keyless 模式

```bash
export MIDSCENE_USE_AZURE_OPENAI=1
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_KEY="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```

## 选用 `gpt-4o` 以外的其他模型

我们发现 `gpt-4o` 是目前表现最佳的模型。其他已知支持的模型有：`claude-3-opus-20240229`, `gemini-1.5-pro`, `qwen-vl-max-latest`（千问）, `doubao-vision-pro-32k`（豆包）

如果你想要使用其他模型，请遵循以下步骤：

1. 选择一个支持视觉输入的模型（也就是“多模态模型”）。
2. 找出如何使用 OpenAI SDK 兼容的方式调用它，模型提供商一般都会提供这样的接入点，你需要配置的是 `OPENAI_BASE_URL`, `OPENAI_API_KEY` 和 `MIDSCENE_MODEL_NAME`。
3. 如果发现使用新模型后效果不佳，可以尝试使用一些简短且清晰的提示词（或回滚到之前的模型）。更多详情请参阅 [Prompting Tips](./prompting-tips)。
4. 请遵守各模型的使用条款。

## 示例：使用阿里云的 `qwen-vl-max-latest` 模型

配置环境变量：

```bash
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
```

## 示例：使用 Anthropic 的 `claude-3-opus-20240229` 模型

当配置 `MIDSCENE_USE_ANTHROPIC_SDK=1` 时，Midscene 会使用 Anthropic SDK (`@anthropic-ai/sdk`) 来调用模型。

配置环境变量：

```bash
export MIDSCENE_USE_ANTHROPIC_SDK=1
export ANTHROPIC_API_KEY="....."
export MIDSCENE_MODEL_NAME="claude-3-opus-20240229"
```

## 示例：使用 Google 的 `gemini-1.5-pro` 模型

配置环境变量：

```bash
export OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai"
export OPENAI_API_KEY="....."
export MIDSCENE_MODEL_NAME="gemini-1.5-pro"
```

## 示例：使用火山云的豆包 `doubao-vision-pro-32k` 模型

调用前需要配置推理点：https://console.volcengine.com/ark/region:ark+cn-beijing/endpoint

配置环境变量：

```bash
export OPENAI_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
export OPENAI_API_KEY="..."
export MIDSCENE_MODEL_NAME="ep-202....."
```

## 调试 LLM 服务连接问题

如果你想要调试 LLM 服务连接问题，可以使用示例项目中的 `connectivity-test` 目录：[https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test](https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test)

将你的 `.env` 文件放在 `connectivity-test` 文件夹中，然后运行 `npm i && npm run test` 来查看问题。
-												feat: support the if-statement in planning prompt (#184)


											
										
										
											2024-12-19 10:44:08 +08:00
+								# 自定义模型和服务商
-												feat(cli): implement cli wrapper (#43)


											
										
										
											2024-08-08 15:39:07 +08:00
-												feat: allow tracking newly-opened tabs in Chrome extension (#272)


											
										
										
											2025-01-14 11:22:20 +08:00
+								Midscene 默认集成了 OpenAI SDK 调用 AI 服务，你可以通过环境变量来自定义配置。这些配置同样可以在 [Chrome 插件](./quick-experience) 中使用。
-												feat(cli): implement cli wrapper (#43)


											
										
										
											2024-08-08 15:39:07 +08:00
 								主要配置项如下，其中 `OPENAI_API_KEY` 是必选项：
 								必选项:
 								```bash
 								# 替换为你自己的 API Key
 								export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
 								```
 								可选项:
 								```bash
 								# 可选, 如果你想更换 base URL
 								export OPENAI_BASE_URL="https://..."
 								# 可选, 如果你想指定模型名称
-												fix: typo in model name (#223)


											
										
										
											2024-12-29 22:19:53 +08:00
+								export MIDSCENE_MODEL_NAME='qwen-vl-max-latest';
-												feat(cli): implement cli wrapper (#43)


											
										
										
											2024-08-08 15:39:07 +08:00
 								# 可选, 如果你想变更 SDK 的初始化参数
 								export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"baseURL":"....","defaultHeaders":{"key": "value"}}'
-												feat: support socks proxy for OpenAI SDK (#175)

* feat: support socks proxy https://github.com/web-infra-dev/midscene-example/issues/14

* feat: show error for invalid json
											
										
										
											2024-12-10 09:24:21 +08:00
 								# 可选, 如果你想使用代理。Midscene 使用 `socks-proxy-agent` 作为底层库。
 								export MIDSCENE_OPENAI_SOCKS_PROXY="socks5://127.0.0.1:1080"
-												feat: let max_tokens configurable (#212)

* feat: let max_tokens configurable

* fix: update ci test case
											
										
										
											2024-12-26 13:24:21 +08:00
 								# 可选, 如果你想指定模型 max_tokens
 								export OPENAI_MAX_TOKENS=2048
-												feat(cli): implement cli wrapper (#43)


											
										
										
											2024-08-08 15:39:07 +08:00
+								```
-												feat: support the if-statement in planning prompt (#184)


											
										
										
											2024-12-19 10:44:08 +08:00
-												feat: update the Azure OpenAI integration, add instruction for other models (#193)


											
										
										
											2024-12-20 15:18:52 +08:00
+								## 使用 Azure OpenAI 服务时的配置
-												feat: support keyless auth mode for azure (#227)

* feat: support keyless auth mode for azure

* feat: support keyless auth mode for azure

* fix: remove default scope config
											
										
										
											2024-12-31 18:00:20 +08:00
+								使用 ADT token provider
-												feat: update the Azure OpenAI integration, add instruction for other models (#193)


											
										
										
											2024-12-20 15:18:52 +08:00
+								```bash
-												feat: add bridge mode for extension (#228)


											
										
										
											2025-01-07 11:10:28 +08:00
+								# 使用 Azure OpenAI 服务时，配置为 1
-												feat: update the Azure OpenAI integration, add instruction for other models (#193)


											
										
										
											2024-12-20 15:18:52 +08:00
+								export MIDSCENE_USE_AZURE_OPENAI=1
-												feat: add bridge mode for extension (#228)


											
										
										
											2025-01-07 11:10:28 +08:00
-												feat: update the Azure OpenAI integration, add instruction for other models (#193)


											
										
										
											2024-12-20 15:18:52 +08:00
+								export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"
-												feat: support keyless auth mode for azure (#227)

* feat: support keyless auth mode for azure

* feat: support keyless auth mode for azure

* fix: remove default scope config
											
										
										
											2024-12-31 18:00:20 +08:00
+								export AZURE_OPENAI_ENDPOINT="..."
 								export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
 								export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
 								```
 								使用 keyless 模式
 								```bash
 								export MIDSCENE_USE_AZURE_OPENAI=1
 								export AZURE_OPENAI_ENDPOINT="..."
 								export AZURE_OPENAI_KEY="..."
 								export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
 								export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
-												feat: update the Azure OpenAI integration, add instruction for other models (#193)


											
										
										
											2024-12-20 15:18:52 +08:00
+								```
 								## 选用 `gpt-4o` 以外的其他模型
-												fix: keypress issue in chrome extension (#201)

* fix: keypress issue in chrome extension

* fix: keypress issue in chrome extension

* fix: connectivity

* doc: update readme
											
										
										
											2024-12-23 14:38:07 +08:00
+								我们发现 `gpt-4o` 是目前表现最佳的模型。其他已知支持的模型有：`claude-3-opus-20240229`, `gemini-1.5-pro`, `qwen-vl-max-latest`（千问）, `doubao-vision-pro-32k`（豆包）
-												feat: update the Azure OpenAI integration, add instruction for other models (#193)


											
										
										
											2024-12-20 15:18:52 +08:00
 								如果你想要使用其他模型，请遵循以下步骤：
 . 选择一个支持视觉输入的模型（也就是“多模态模型”）。
 . 找出如何使用 OpenAI SDK 兼容的方式调用它，模型提供商一般都会提供这样的接入点，你需要配置的是 `OPENAI_BASE_URL`, `OPENAI_API_KEY` 和 `MIDSCENE_MODEL_NAME`。
-												feat: allow tracking newly-opened tabs in Chrome extension (#272)


											
										
										
											2025-01-14 11:22:20 +08:00
+. 如果发现使用新模型后效果不佳，可以尝试使用一些简短且清晰的提示词（或回滚到之前的模型）。更多详情请参阅 [Prompting Tips](./prompting-tips)。
-												feat: update the Azure OpenAI integration, add instruction for other models (#193)


											
										
										
											2024-12-20 15:18:52 +08:00
+. 请遵守各模型的使用条款。
-												feat: invoke anthropic SDK to call Claude (#197)

* feat: invoke anthropic SDK

* chore: set response format for extract

* fix: do not throw if waitUntilNetworkIdle failed in aiAction

* fix: timeout config for Puppeteer

* chore: add instruction for connectivity test
											
										
										
											2024-12-23 12:03:05 +08:00
+								## 示例：使用阿里云的 `qwen-vl-max-latest` 模型
-												feat: update the Azure OpenAI integration, add instruction for other models (#193)


											
										
										
											2024-12-20 15:18:52 +08:00
 								配置环境变量：
-												feat: support the if-statement in planning prompt (#184)


											
										
										
											2024-12-19 10:44:08 +08:00
-												feat: update the Azure OpenAI integration, add instruction for other models (#193)


											
										
										
											2024-12-20 15:18:52 +08:00
+								```bash
-												feat: invoke anthropic SDK to call Claude (#197)

* feat: invoke anthropic SDK

* chore: set response format for extract

* fix: do not throw if waitUntilNetworkIdle failed in aiAction

* fix: timeout config for Puppeteer

* chore: add instruction for connectivity test
											
										
										
											2024-12-23 12:03:05 +08:00
+								export OPENAI_API_KEY="sk-..."
 								export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
 								export MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
-												feat: update the Azure OpenAI integration, add instruction for other models (#193)


											
										
										
											2024-12-20 15:18:52 +08:00
+								```
-												docs: add docs for customize model and endpoint (#190)

* docs: add docs for customize model and endpoint

* doc: update docs
											
										
										
											2024-12-19 15:49:06 +08:00
-												feat: invoke anthropic SDK to call Claude (#197)

* feat: invoke anthropic SDK

* chore: set response format for extract

* fix: do not throw if waitUntilNetworkIdle failed in aiAction

* fix: timeout config for Puppeteer

* chore: add instruction for connectivity test
											
										
										
											2024-12-23 12:03:05 +08:00
+								## 示例：使用 Anthropic 的 `claude-3-opus-20240229` 模型
 								当配置 `MIDSCENE_USE_ANTHROPIC_SDK=1` 时，Midscene 会使用 Anthropic SDK (`@anthropic-ai/sdk`) 来调用模型。
-												docs: add docs for customize model and endpoint (#190)

* docs: add docs for customize model and endpoint

* doc: update docs
											
										
										
											2024-12-19 15:49:06 +08:00
 								配置环境变量：
 								```bash
-												feat: invoke anthropic SDK to call Claude (#197)

* feat: invoke anthropic SDK

* chore: set response format for extract

* fix: do not throw if waitUntilNetworkIdle failed in aiAction

* fix: timeout config for Puppeteer

* chore: add instruction for connectivity test
											
										
										
											2024-12-23 12:03:05 +08:00
+								export MIDSCENE_USE_ANTHROPIC_SDK=1
 								export ANTHROPIC_API_KEY="....."
 								export MIDSCENE_MODEL_NAME="claude-3-opus-20240229"
 								```
 								## 示例：使用 Google 的 `gemini-1.5-pro` 模型
 								配置环境变量：
 								```bash
 								export OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai"
 								export OPENAI_API_KEY="....."
 								export MIDSCENE_MODEL_NAME="gemini-1.5-pro"
-												docs: add docs for customize model and endpoint (#190)

* docs: add docs for customize model and endpoint

* doc: update docs
											
										
										
											2024-12-19 15:49:06 +08:00
+								```
-												feat: update the Azure OpenAI integration, add instruction for other models (#193)


											
										
										
											2024-12-20 15:18:52 +08:00
 								## 示例：使用火山云的豆包 `doubao-vision-pro-32k` 模型
 								调用前需要配置推理点：https://console.volcengine.com/ark/region:ark+cn-beijing/endpoint
 								配置环境变量：
 								```bash
 								export OPENAI_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
 								export OPENAI_API_KEY="..."
 								export MIDSCENE_MODEL_NAME="ep-202....."
 								```
-												feat: invoke anthropic SDK to call Claude (#197)

* feat: invoke anthropic SDK

* chore: set response format for extract

* fix: do not throw if waitUntilNetworkIdle failed in aiAction

* fix: timeout config for Puppeteer

* chore: add instruction for connectivity test
											
										
										
											2024-12-23 12:03:05 +08:00
 								## 调试 LLM 服务连接问题
 								如果你想要调试 LLM 服务连接问题，可以使用示例项目中的 `connectivity-test` 目录：[https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test](https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test)
 								将你的 `.env` 文件放在 `connectivity-test` 文件夹中，然后运行 `npm i && npm run test` 来查看问题。