mirror of
https://github.com/web-infra-dev/midscene.git
synced 2025-11-15 17:44:42 +00:00
211 lines
7.1 KiB
Plaintext
211 lines
7.1 KiB
Plaintext
# Config Model and Provider
|
|
|
|
Midscene uses the OpenAI SDK to call AI services. Using this SDK limits the input and output schema of AI services, but it doesn't mean you can only use OpenAI's services. You can use any model service that supports the same interface (most platforms or tools support this).
|
|
|
|
In this article, we will show you how to config AI service provider and how to choose a different model. You may read [Choose a model](./choose-a-model) first to learn more about how to choose a model.
|
|
|
|
|
|
|
|
## Configs
|
|
|
|
### Common configs
|
|
These are the most common configs, in which `OPENAI_API_KEY` is required.
|
|
|
|
| Name | Description |
|
|
|------|-------------|
|
|
| `OPENAI_API_KEY` | Required. Your OpenAI API key (e.g. "sk-abcdefghijklmnopqrstuvwxyz") |
|
|
| `OPENAI_BASE_URL` | Optional. Custom endpoint URL for API endpoint. Use it to switch to a provider other than OpenAI (e.g. "https://some_service_name.com/v1") |
|
|
| `MIDSCENE_MODEL_NAME` | Optional. Specify a different model name other than `gpt-4o` |
|
|
|
|
Extra configs to use `Qwen 2.5 VL` model:
|
|
|
|
| Name | Description |
|
|
|------|-------------|
|
|
| `MIDSCENE_USE_QWEN_VL` | Set to "1" to use the adapter of Qwen 2.5 VL model |
|
|
|
|
Extra configs to use `UI-TARS` model:
|
|
|
|
| Name | Description |
|
|
|------|-------------|
|
|
| `MIDSCENE_USE_VLM_UI_TARS` | Version of UI-TARS model, supported values are `1.0` `1.5` `DOUBAO` (volcengine version) |
|
|
|
|
Extra configs to use `Gemini 2.5 Pro` model:
|
|
|
|
| Name | Description |
|
|
|------|-------------|
|
|
| `MIDSCENE_USE_GEMINI` | Set to "1" to use the adapter of Gemini 2.5 Pro model |
|
|
|
|
For more information about the models, see [Choose a model](./choose-a-model).
|
|
|
|
### Advanced configs
|
|
|
|
Some advanced configs are also supported. Usually you don't need to use them.
|
|
|
|
| Name | Description |
|
|
|------|-------------|
|
|
| `OPENAI_USE_AZURE` | Optional. Set to "true" to use Azure OpenAI Service. See more details in the following section. |
|
|
| `MIDSCENE_OPENAI_INIT_CONFIG_JSON` | Optional. Custom JSON config for OpenAI SDK initialization |
|
|
| `MIDSCENE_OPENAI_SOCKS_PROXY` | Optional. Proxy configuration (e.g. "socks5://127.0.0.1:1080") |
|
|
| `MIDSCENE_PREFERRED_LANGUAGE` | Optional. The preferred language for the model response. The default is `Chinese` if the current timezone is GMT+8 and `English` otherwise. |
|
|
| `OPENAI_MAX_TOKENS` | Optional. Maximum tokens for model response |
|
|
|
|
### Debug configs
|
|
|
|
By setting the following configs, you can see more logs for debugging. And also, they will be printed into the `./midscene_run/log` folder.
|
|
|
|
| Name | Description |
|
|
|------|-------------|
|
|
| `DEBUG=midscene:ai:profile:stats` | Optional. Set this to print the AI service cost time, token usage, etc. in comma separated format, useful for analysis |
|
|
| `DEBUG=midscene:ai:profile:detail` | Optional. Set this to print the AI token usage details |
|
|
| `DEBUG=midscene:ai:call` | Optional. Set this to print the AI response details |
|
|
| `DEBUG=midscene:android:adb` | Optional. Set this to print the adb command calling details |
|
|
|
|
## Two ways to config environment variables
|
|
|
|
Pick one of the following ways to config environment variables.
|
|
|
|
### 1. Set environment variables in your system
|
|
|
|
```bash
|
|
# replace by your own
|
|
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
|
|
```
|
|
|
|
### 2. Set environment variables using dotenv
|
|
|
|
This is what we used in our [demo project](https://github.com/web-infra-dev/midscene-example).
|
|
|
|
[Dotenv](https://www.npmjs.com/package/dotenv) is a zero-dependency module that loads environment variables from a `.env` file into `process.env`.
|
|
|
|
```bash
|
|
# install dotenv
|
|
npm install dotenv --save
|
|
```
|
|
|
|
Create a `.env` file in your project root directory, and add the following content. There is no need to add `export` before each line.
|
|
|
|
```
|
|
OPENAI_API_KEY=sk-abcdefghijklmnopqrstuvwxyz
|
|
```
|
|
|
|
Import the dotenv module in your script. It will automatically read the environment variables from the `.env` file.
|
|
|
|
```typescript
|
|
import 'dotenv/config';
|
|
```
|
|
|
|
## Using Azure OpenAI Service
|
|
|
|
There are some extra configs when using Azure OpenAI Service.
|
|
|
|
### Use ADT token provider
|
|
|
|
This mode cannot be used in Chrome extension.
|
|
|
|
```bash
|
|
# this is always true when using Azure OpenAI Service
|
|
export MIDSCENE_USE_AZURE_OPENAI=1
|
|
|
|
export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"
|
|
export AZURE_OPENAI_ENDPOINT="..."
|
|
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
|
|
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
|
|
```
|
|
|
|
### Use keyless authentication
|
|
|
|
```bash
|
|
export MIDSCENE_USE_AZURE_OPENAI=1
|
|
export AZURE_OPENAI_ENDPOINT="..."
|
|
export AZURE_OPENAI_KEY="..."
|
|
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
|
|
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
|
|
```
|
|
|
|
## Set Config by Javascript
|
|
|
|
You can also override the config by javascript. Remember to call this before running Midscene codes.
|
|
|
|
```typescript
|
|
import { overrideAIConfig } from "@midscene/web/puppeteer";
|
|
// or import { overrideAIConfig } from "@midscene/web/playwright";
|
|
// or import { overrideAIConfig } from "@midscene/android";
|
|
|
|
|
|
overrideAIConfig({
|
|
MIDSCENE_MODEL_NAME: "...",
|
|
// ...
|
|
});
|
|
```
|
|
|
|
## Example: Using `gpt-4o` from OpenAI
|
|
|
|
Configure the environment variables:
|
|
|
|
```bash
|
|
export OPENAI_API_KEY="sk-..."
|
|
export OPENAI_BASE_URL="https://endpoint.some_other_provider.com/v1" # config this if you want to use a different endpoint
|
|
export MIDSCENE_MODEL_NAME="gpt-4o-2024-11-20" # optional, the default is "gpt-4o"
|
|
```
|
|
|
|
## Example: Using `qwen-vl-max-latest` from Aliyun
|
|
|
|
Configure the environment variables:
|
|
|
|
```bash
|
|
export OPENAI_API_KEY="sk-..."
|
|
export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
|
|
export MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
|
|
export MIDSCENE_USE_QWEN_VL=1
|
|
```
|
|
|
|
|
|
## Example: Using `Doubao-1.5-thinking-vision-pro` from Volcano Engine
|
|
|
|
Configure the environment variables:
|
|
|
|
|
|
```bash
|
|
export OPENAI_BASE_URL="https://ark-cn-beijing.bytedance.net/api/v3"
|
|
export OPENAI_API_KEY="..."
|
|
export MIDSCENE_MODEL_NAME='ep-...'
|
|
export MIDSCENE_USE_DOUBAO_VISION=1
|
|
```
|
|
|
|
## Example: Using `ui-tars-72b-sft` hosted by yourself
|
|
|
|
Configure the environment variables:
|
|
|
|
```bash
|
|
export OPENAI_API_KEY="sk-..."
|
|
export OPENAI_BASE_URL="http://localhost:1234/v1"
|
|
export MIDSCENE_MODEL_NAME="ui-tars-72b-sft"
|
|
export MIDSCENE_USE_VLM_UI_TARS=1
|
|
```
|
|
|
|
## Example: Config `claude-3-opus-20240229` from Anthropic
|
|
|
|
When configuring `MIDSCENE_USE_ANTHROPIC_SDK=1`, Midscene will use Anthropic SDK (`@anthropic-ai/sdk`) to call the model.
|
|
|
|
Configure the environment variables:
|
|
|
|
```bash
|
|
export MIDSCENE_USE_ANTHROPIC_SDK=1
|
|
export ANTHROPIC_API_KEY="....."
|
|
export MIDSCENE_MODEL_NAME="claude-3-opus-20240229"
|
|
```
|
|
|
|
## Example: config request headers (like for openrouter)
|
|
|
|
```bash
|
|
export OPENAI_BASE_URL="https://openrouter.ai/api/v1"
|
|
export OPENAI_API_KEY="..."
|
|
export MIDSCENE_MODEL_NAME="..."
|
|
export MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"defaultHeaders":{"HTTP-Referer":"...","X-Title":"..."}}'
|
|
```
|
|
|
|
## Troubleshooting LLM Service Connectivity Issues
|
|
|
|
If you want to troubleshoot connectivity issues, you can use the 'connectivity-test' folder in our example project: [https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test](https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test)
|
|
|
|
Put your `.env` file in the `connectivity-test` folder, and run the test with `npm i && npm run test`. |