2025-01-22 09:24:29 +08:00
# Config Model and Provider
2024-08-08 15:39:07 +08:00
2025-01-15 10:14:36 +08:00
Midscene uses the OpenAI SDK to call AI services. Using this SDK limits the input and output format of AI services, but it doesn't mean you can only use OpenAI's models. You can use any model service that supports the same interface (most platforms or tools support this).
2024-08-08 15:39:07 +08:00
2025-01-22 09:24:29 +08:00
In this article, we will show you how to config AI service provider and how to choose a different model. You may read [Choose a model ](./choose-a-model ) to learn more about how to choose a model.
2024-08-08 15:39:07 +08:00
2025-01-15 10:14:36 +08:00
## Configs
2024-08-08 15:39:07 +08:00
2025-01-15 10:14:36 +08:00
These are the most common configs, in which `OPENAI_API_KEY` is required.
2024-08-08 15:39:07 +08:00
2025-01-15 10:14:36 +08:00
| Name | Description |
|------|-------------|
| `OPENAI_API_KEY` | Required. Your OpenAI API key (e.g. "sk-abcdefghijklmnopqrstuvwxyz") |
2025-01-15 19:18:26 +08:00
| `OPENAI_BASE_URL` | Optional. Custom endpoint URL for API endpoint. Often used to switch to a provider other than OpenAI (e.g. "https://some_service_name.com/v1") |
2025-01-15 10:14:36 +08:00
| `MIDSCENE_MODEL_NAME` | Optional. Specify a different model name (default is gpt-4o). Often used to switch to a different model. |
2024-08-08 15:39:07 +08:00
2025-01-22 09:24:29 +08:00
Config to use `UI-TARS` model:
`UI-TARS` is a dedicated model for UI automation. See more details in [Choose a model ](./choose-a-model ).
| Name | Description |
|------|-------------|
| `MIDSCENE_USE_VLM_UI_TARS` | Optional. Set to "1" to use UI-TARS model. |
2025-01-15 10:14:36 +08:00
Some advanced configs are also supported. Usually you don't need to use them.
2024-08-08 15:39:07 +08:00
2025-01-15 10:14:36 +08:00
| Name | Description |
|------|-------------|
| `OPENAI_USE_AZURE` | Optional. Set to "true" to use Azure OpenAI Service. See more details in the following section. |
| `MIDSCENE_OPENAI_INIT_CONFIG_JSON` | Optional. Custom JSON config for OpenAI SDK initialization |
| `MIDSCENE_OPENAI_SOCKS_PROXY` | Optional. Proxy configuration (e.g. "socks5://127.0.0.1:1080") |
| `OPENAI_MAX_TOKENS` | Optional. Maximum tokens for model response |
2024-09-10 14:29:01 +08:00
2025-01-15 10:14:36 +08:00
## Two ways to config environment variables
2024-08-08 15:39:07 +08:00
2025-01-15 10:14:36 +08:00
Pick one of the following ways to config environment variables.
2024-12-10 09:24:21 +08:00
2025-01-15 10:14:36 +08:00
### 1. Set environment variables in your system
2024-12-26 13:24:21 +08:00
2025-01-15 10:14:36 +08:00
```bash
# replace by your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
2024-08-08 15:39:07 +08:00
```
2024-12-19 10:44:08 +08:00
2025-01-15 10:14:36 +08:00
### 2. Set environment variables using dotenv
This is what we used in our [demo project ](https://github.com/web-infra-dev/midscene-example ).
2024-12-19 10:44:08 +08:00
2025-01-15 10:14:36 +08:00
[Dotenv ](https://www.npmjs.com/package/dotenv ) is a zero-dependency module that loads environment variables from a `.env` file into `process.env` .
2024-12-31 18:00:20 +08:00
2024-12-20 15:18:52 +08:00
```bash
2025-01-15 10:14:36 +08:00
# install dotenv
npm install dotenv --save
```
2025-01-07 11:10:28 +08:00
2025-01-15 10:14:36 +08:00
Create a `.env` file in your project root directory, and add the following content. There is no need to add `export` before each line.
```
OPENAI_API_KEY=sk-abcdefghijklmnopqrstuvwxyz
2024-12-31 18:00:20 +08:00
```
2025-01-15 10:14:36 +08:00
Import the dotenv module in your script. It will automatically read the environment variables from the `.env` file.
2024-12-31 18:00:20 +08:00
2025-01-15 10:14:36 +08:00
```typescript
import 'dotenv/config';
2024-12-20 15:18:52 +08:00
```
2025-01-22 09:24:29 +08:00
## Example: Config `claude-3-opus-20240229` from Anthropic
2024-12-23 12:03:05 +08:00
When configuring `MIDSCENE_USE_ANTHROPIC_SDK=1` , Midscene will use Anthropic SDK (`@anthropic-ai/sdk` ) to call the model.
Configure the environment variables:
```bash
export MIDSCENE_USE_ANTHROPIC_SDK=1
export ANTHROPIC_API_KEY="....."
export MIDSCENE_MODEL_NAME="claude-3-opus-20240229"
```
2025-01-15 10:14:36 +08:00
## Using Azure OpenAI Service
There are some extra configs when using Azure OpenAI Service.
### Use ADT token provider
2025-01-17 18:19:22 +08:00
This mode cannot be used in Chrome extension.
2025-01-15 10:14:36 +08:00
```bash
# this is always true when using Azure OpenAI Service
export MIDSCENE_USE_AZURE_OPENAI=1
export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```
### Use keyless authentication
```bash
export MIDSCENE_USE_AZURE_OPENAI=1
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_KEY="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```
2024-12-20 15:18:52 +08:00
## Example: Using `gemini-1.5-pro` from Google
Configure the environment variables:
```bash
export OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai"
export OPENAI_API_KEY="....."
export MIDSCENE_MODEL_NAME="gemini-1.5-pro"
```
## Example: Using `qwen-vl-max-latest` from Aliyun
2024-12-19 15:49:06 +08:00
Configure the environment variables:
```bash
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
```
2024-12-20 15:18:52 +08:00
## Example: Using `doubao-vision-pro-32k` from Volcengine
Create a inference point first: https://console.volcengine.com/ark/region:ark+cn-beijing/endpoint
2025-01-15 10:14:36 +08:00
In the inference point interface, find an ID like `ep-202...` as the model name.
2024-12-20 15:18:52 +08:00
Configure the environment variables:
```bash
export OPENAI_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
export OPENAI_API_KEY="..."
export MIDSCENE_MODEL_NAME="ep-202....."
```
2024-12-23 12:03:05 +08:00
2025-01-07 11:10:28 +08:00
## Example: config request headers (like for openrouter)
```bash
export OPENAI_BASE_URL="https://openrouter.ai/api/v1"
export OPENAI_API_KEY="..."
export MIDSCENE_MODEL_NAME="..."
export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"defaultHeaders":{"HTTP-Referer":"...","X-Title":"..."}}'
```
2024-12-23 12:03:05 +08:00
## Troubleshooting LLM Service Connectivity Issues
If you want to troubleshoot connectivity issues, you can use the 'connectivity-test' folder in our example project: [https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test ](https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test )
Put your `.env` file in the `connectivity-test` folder, and run the test with `npm i && npm run test` .