midscene/apps/site/docs/en/model-provider.md

# Config Model and Provider

Midscene uses the OpenAI SDK to call AI services. Using this SDK limits the input and output format of AI services, but it doesn't mean you can only use OpenAI's models. You can use any model service that supports the same interface (most platforms or tools support this).

In this article, we will show you how to config AI service provider and how to choose a different model. You may read [Choose a model](./choose-a-model) to learn more about how to choose a model.

## Configs

These are the most common configs, in which `OPENAI_API_KEY` is required.

| Name | Description |
|------|-------------|
| `OPENAI_API_KEY` | Required. Your OpenAI API key (e.g. "sk-abcdefghijklmnopqrstuvwxyz") |
| `OPENAI_BASE_URL` | Optional. Custom endpoint URL for API endpoint. Often used to switch to a provider other than OpenAI (e.g. "https://some_service_name.com/v1") |
| `MIDSCENE_MODEL_NAME` | Optional. Specify a different model name (default is gpt-4o). Often used to switch to a different model. |

Config to use `UI-TARS` model:

`UI-TARS` is a dedicated model for UI automation. See more details in [Choose a model](./choose-a-model).

| Name | Description |
|------|-------------|
| `MIDSCENE_USE_VLM_UI_TARS` | Optional. Set to "1" to use UI-TARS model. |

Some advanced configs are also supported. Usually you don't need to use them.

| Name | Description |
|------|-------------|
| `OPENAI_USE_AZURE` | Optional. Set to "true" to use Azure OpenAI Service. See more details in the following section. |
| `MIDSCENE_OPENAI_INIT_CONFIG_JSON` | Optional. Custom JSON config for OpenAI SDK initialization |
| `MIDSCENE_OPENAI_SOCKS_PROXY` | Optional. Proxy configuration (e.g. "socks5://127.0.0.1:1080") |
| `OPENAI_MAX_TOKENS` | Optional. Maximum tokens for model response |

## Two ways to config environment variables

Pick one of the following ways to config environment variables.

### 1. Set environment variables in your system

```bash
# replace by your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```

### 2. Set environment variables using dotenv

This is what we used in our [demo project](https://github.com/web-infra-dev/midscene-example).

[Dotenv](https://www.npmjs.com/package/dotenv) is a zero-dependency module that loads environment variables from a `.env` file into `process.env`.

```bash
# install dotenv
npm install dotenv --save
```

Create a `.env` file in your project root directory, and add the following content. There is no need to add `export` before each line.

```
OPENAI_API_KEY=sk-abcdefghijklmnopqrstuvwxyz
```

Import the dotenv module in your script. It will automatically read the environment variables from the `.env` file.

```typescript
import 'dotenv/config';
```

## Example: Config `claude-3-opus-20240229` from Anthropic

When configuring `MIDSCENE_USE_ANTHROPIC_SDK=1`, Midscene will use Anthropic SDK (`@anthropic-ai/sdk`) to call the model.

Configure the environment variables:

```bash
export MIDSCENE_USE_ANTHROPIC_SDK=1
export ANTHROPIC_API_KEY="....."
export MIDSCENE_MODEL_NAME="claude-3-opus-20240229"
```

## Using Azure OpenAI Service

There are some extra configs when using Azure OpenAI Service.

### Use ADT token provider

This mode cannot be used in Chrome extension.

```bash
# this is always true when using Azure OpenAI Service
export MIDSCENE_USE_AZURE_OPENAI=1

export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```

### Use keyless authentication

```bash
export MIDSCENE_USE_AZURE_OPENAI=1
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_KEY="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```

## Example: Using `gemini-1.5-pro` from Google

Configure the environment variables:

```bash
export OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai"
export OPENAI_API_KEY="....."
export MIDSCENE_MODEL_NAME="gemini-1.5-pro"
```

## Example: Using `qwen-vl-max-latest` from Aliyun

Configure the environment variables:

```bash
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
```

## Example: Using `doubao-vision-pro-32k` from Volcengine

Create a inference point first: https://console.volcengine.com/ark/region:ark+cn-beijing/endpoint

In the inference point interface, find an ID like `ep-202...` as the model name.

Configure the environment variables:

```bash
export OPENAI_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
export OPENAI_API_KEY="..."
export MIDSCENE_MODEL_NAME="ep-202....."
```

## Example: config request headers (like for openrouter)

```bash
export OPENAI_BASE_URL="https://openrouter.ai/api/v1"
export OPENAI_API_KEY="..."
export MIDSCENE_MODEL_NAME="..."
export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"defaultHeaders":{"HTTP-Referer":"...","X-Title":"..."}}'
```

## Troubleshooting LLM Service Connectivity Issues

If you want to troubleshoot connectivity issues, you can use the 'connectivity-test' folder in our example project: [https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test](https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test)

Put your `.env` file in the `connectivity-test` folder, and run the test with `npm i && npm run test`.
docs(ai-model): update docs for ui-tars (#305) * feat: update docs for ui-tars * doc: update * doc: update * doc: update * chore: update readme * fix: ci * docs: upgrade video * chore: modify huagging face icon --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-22 09:24:29 +08:00			`# Config Model and Provider`
feat(cli): implement cli wrapper (#43) 2024-08-08 15:39:07 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			`Midscene uses the OpenAI SDK to call AI services. Using this SDK limits the input and output format of AI services, but it doesn't mean you can only use OpenAI's models. You can use any model service that supports the same interface (most platforms or tools support this).`
feat(cli): implement cli wrapper (#43) 2024-08-08 15:39:07 +08:00
docs(ai-model): update docs for ui-tars (#305) * feat: update docs for ui-tars * doc: update * doc: update * doc: update * chore: update readme * fix: ci * docs: upgrade video * chore: modify huagging face icon --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-22 09:24:29 +08:00			`In this article, we will show you how to config AI service provider and how to choose a different model. You may read [Choose a model](./choose-a-model) to learn more about how to choose a model.`
feat(cli): implement cli wrapper (#43) 2024-08-08 15:39:07 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			`## Configs`
feat(cli): implement cli wrapper (#43) 2024-08-08 15:39:07 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			These are the most common configs, in which `OPENAI_API_KEY` is required.
feat(cli): implement cli wrapper (#43) 2024-08-08 15:39:07 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			`\| Name \| Description \|`
			`\|------\|-------------\|`
			\| `OPENAI_API_KEY` \| Required. Your OpenAI API key (e.g. "sk-abcdefghijklmnopqrstuvwxyz") \|
feat(chrome-devtool): add 'stop' button in extension (#281) * feat: add 'stop' to playground * feat: make extension stopable --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 19:18:26 +08:00			\| `OPENAI_BASE_URL` \| Optional. Custom endpoint URL for API endpoint. Often used to switch to a provider other than OpenAI (e.g. "https://some_service_name.com/v1") \|
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			\| `MIDSCENE_MODEL_NAME` \| Optional. Specify a different model name (default is gpt-4o). Often used to switch to a different model. \|
feat(cli): implement cli wrapper (#43) 2024-08-08 15:39:07 +08:00
docs(ai-model): update docs for ui-tars (#305) * feat: update docs for ui-tars * doc: update * doc: update * doc: update * chore: update readme * fix: ci * docs: upgrade video * chore: modify huagging face icon --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-22 09:24:29 +08:00			Config to use `UI-TARS` model:

			`UI-TARS` is a dedicated model for UI automation. See more details in [Choose a model](./choose-a-model).

			`\| Name \| Description \|`
			`\|------\|-------------\|`
			\| `MIDSCENE_USE_VLM_UI_TARS` \| Optional. Set to "1" to use UI-TARS model. \|

doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			`Some advanced configs are also supported. Usually you don't need to use them.`
feat(cli): implement cli wrapper (#43) 2024-08-08 15:39:07 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			`\| Name \| Description \|`
			`\|------\|-------------\|`
			\| `OPENAI_USE_AZURE` \| Optional. Set to "true" to use Azure OpenAI Service. See more details in the following section. \|
			\| `MIDSCENE_OPENAI_INIT_CONFIG_JSON` \| Optional. Custom JSON config for OpenAI SDK initialization \|
			\| `MIDSCENE_OPENAI_SOCKS_PROXY` \| Optional. Proxy configuration (e.g. "socks5://127.0.0.1:1080") \|
			\| `OPENAI_MAX_TOKENS` \| Optional. Maximum tokens for model response \|
feat(model): support azure open ai (#90) * feat(model): support open ai azure methods * chore: fix e2e test * chore: add OPENAI_USE_AZURE env config * docs: add openai azure env 2024-09-10 14:29:01 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			`## Two ways to config environment variables`
feat(cli): implement cli wrapper (#43) 2024-08-08 15:39:07 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			`Pick one of the following ways to config environment variables.`
feat: support socks proxy for OpenAI SDK (#175) * feat: support socks proxy https://github.com/web-infra-dev/midscene-example/issues/14 * feat: show error for invalid json 2024-12-10 09:24:21 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			`### 1. Set environment variables in your system`
feat: let max_tokens configurable (#212) * feat: let max_tokens configurable * fix: update ci test case 2024-12-26 13:24:21 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			```bash
			`# replace by your own`
			`export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"`
feat(cli): implement cli wrapper (#43) 2024-08-08 15:39:07 +08:00			```
feat: support the if-statement in planning prompt (#184) 2024-12-19 10:44:08 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			`### 2. Set environment variables using dotenv`

			`This is what we used in our [demo project](https://github.com/web-infra-dev/midscene-example).`
feat: support the if-statement in planning prompt (#184) 2024-12-19 10:44:08 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			[Dotenv](https://www.npmjs.com/package/dotenv) is a zero-dependency module that loads environment variables from a `.env` file into `process.env`.
feat: support keyless auth mode for azure (#227) * feat: support keyless auth mode for azure * feat: support keyless auth mode for azure * fix: remove default scope config 2024-12-31 18:00:20 +08:00
feat: update the Azure OpenAI integration, add instruction for other models (#193) 2024-12-20 15:18:52 +08:00			```bash
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			`# install dotenv`
			`npm install dotenv --save`
			```
feat: add bridge mode for extension (#228) 2025-01-07 11:10:28 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			Create a `.env` file in your project root directory, and add the following content. There is no need to add `export` before each line.

			```
			`OPENAI_API_KEY=sk-abcdefghijklmnopqrstuvwxyz`
feat: support keyless auth mode for azure (#227) * feat: support keyless auth mode for azure * feat: support keyless auth mode for azure * fix: remove default scope config 2024-12-31 18:00:20 +08:00			```

doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			Import the dotenv module in your script. It will automatically read the environment variables from the `.env` file.
feat: support keyless auth mode for azure (#227) * feat: support keyless auth mode for azure * feat: support keyless auth mode for azure * fix: remove default scope config 2024-12-31 18:00:20 +08:00
doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			```typescript
			`import 'dotenv/config';`
feat: update the Azure OpenAI integration, add instruction for other models (#193) 2024-12-20 15:18:52 +08:00			```

docs(ai-model): update docs for ui-tars (#305) * feat: update docs for ui-tars * doc: update * doc: update * doc: update * chore: update readme * fix: ci * docs: upgrade video * chore: modify huagging face icon --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-22 09:24:29 +08:00			## Example: Config `claude-3-opus-20240229` from Anthropic
feat: invoke anthropic SDK to call Claude (#197) * feat: invoke anthropic SDK * chore: set response format for extract * fix: do not throw if waitUntilNetworkIdle failed in aiAction * fix: timeout config for Puppeteer * chore: add instruction for connectivity test 2024-12-23 12:03:05 +08:00
			When configuring `MIDSCENE_USE_ANTHROPIC_SDK=1`, Midscene will use Anthropic SDK (`@anthropic-ai/sdk`) to call the model.

			`Configure the environment variables:`

			```bash
			`export MIDSCENE_USE_ANTHROPIC_SDK=1`
			`export ANTHROPIC_API_KEY="....."`
			`export MIDSCENE_MODEL_NAME="claude-3-opus-20240229"`
			```

doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			`## Using Azure OpenAI Service`

			`There are some extra configs when using Azure OpenAI Service.`

			`### Use ADT token provider`

feat: show pointer position in chrome extension (#286) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-17 18:19:22 +08:00			`This mode cannot be used in Chrome extension.`

doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			```bash
			`# this is always true when using Azure OpenAI Service`
			`export MIDSCENE_USE_AZURE_OPENAI=1`

			`export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"`
			`export AZURE_OPENAI_ENDPOINT="..."`
			`export AZURE_OPENAI_API_VERSION="2024-05-01-preview"`
			`export AZURE_OPENAI_DEPLOYMENT="gpt-4o"`
			```

			`### Use keyless authentication`

			```bash
			`export MIDSCENE_USE_AZURE_OPENAI=1`
			`export AZURE_OPENAI_ENDPOINT="..."`
			`export AZURE_OPENAI_KEY="..."`
			`export AZURE_OPENAI_API_VERSION="2024-05-01-preview"`
			`export AZURE_OPENAI_DEPLOYMENT="gpt-4o"`
			```

feat: update the Azure OpenAI integration, add instruction for other models (#193) 2024-12-20 15:18:52 +08:00			## Example: Using `gemini-1.5-pro` from Google

			`Configure the environment variables:`

			```bash
			`export OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai"`
			`export OPENAI_API_KEY="....."`
			`export MIDSCENE_MODEL_NAME="gemini-1.5-pro"`
			```

			## Example: Using `qwen-vl-max-latest` from Aliyun
docs: add docs for customize model and endpoint (#190) * docs: add docs for customize model and endpoint * doc: update docs 2024-12-19 15:49:06 +08:00
			`Configure the environment variables:`

			```bash
			`export OPENAI_API_KEY="sk-..."`
			`export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"`
			`export MIDSCENE_MODEL_NAME="qwen-vl-max-latest"`
			```
feat: update the Azure OpenAI integration, add instruction for other models (#193) 2024-12-20 15:18:52 +08:00
			## Example: Using `doubao-vision-pro-32k` from Volcengine

			`Create a inference point first: https://console.volcengine.com/ark/region:ark+cn-beijing/endpoint`

doc: update the instructions to configure the model service (#274) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-01-15 10:14:36 +08:00			In the inference point interface, find an ID like `ep-202...` as the model name.

feat: update the Azure OpenAI integration, add instruction for other models (#193) 2024-12-20 15:18:52 +08:00			`Configure the environment variables:`

			```bash
			`export OPENAI_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"`
			`export OPENAI_API_KEY="..."`
			`export MIDSCENE_MODEL_NAME="ep-202....."`
			```
feat: invoke anthropic SDK to call Claude (#197) * feat: invoke anthropic SDK * chore: set response format for extract * fix: do not throw if waitUntilNetworkIdle failed in aiAction * fix: timeout config for Puppeteer * chore: add instruction for connectivity test 2024-12-23 12:03:05 +08:00
feat: add bridge mode for extension (#228) 2025-01-07 11:10:28 +08:00			`## Example: config request headers (like for openrouter)`

			```bash
			`export OPENAI_BASE_URL="https://openrouter.ai/api/v1"`
			`export OPENAI_API_KEY="..."`
			`export MIDSCENE_MODEL_NAME="..."`
			`export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"defaultHeaders":{"HTTP-Referer":"...","X-Title":"..."}}'`
			```

feat: invoke anthropic SDK to call Claude (#197) * feat: invoke anthropic SDK * chore: set response format for extract * fix: do not throw if waitUntilNetworkIdle failed in aiAction * fix: timeout config for Puppeteer * chore: add instruction for connectivity test 2024-12-23 12:03:05 +08:00			`## Troubleshooting LLM Service Connectivity Issues`

			`If you want to troubleshoot connectivity issues, you can use the 'connectivity-test' folder in our example project: [https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test](https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test)`

			Put your `.env` file in the `connectivity-test` folder, and run the test with `npm i && npm run test`.