midscene/apps/site/docs/en/API.mdx

# API Reference

## Constructors

Each Agent in Midscene has its own constructor.

* In Puppeteer, use [PuppeteerAgent](./integrate-with-puppeteer)
* In Bridge Mode, use [AgentOverChromeBridge](./bridge-mode-by-chrome-extension#constructor)

These Agents share some common constructor parameters:

* `generateReport: boolean`: If true, a report file will be generated. (Default: true)
* `autoPrintReportMsg: boolean`: If true, report messages will be printed. (Default: true)
* `cacheId: string | undefined`: If provided, this cacheId will be used to save or match the cache. (Default: undefined, means cache feature is disabled)

In Puppeteer, there is an additional parameter:

* `forceSameTabNavigation: boolean`: If true, page navigation is restricted to the current tab. (Default: true)

## Methods

Below are the main APIs available for the various Agents in Midscene.

> In the documentation below, you might see function calls prefixed with `agent.`. If you utilize destructuring in Playwright (e.g., `async ({ ai, aiQuery }) => { /* ... */ }`), you can call these functions without the `agent.` prefix. This is merely a syntactical difference.

### `agent.aiAction()` or `.ai()`

This method allows you to perform a series of UI actions described in natural language. Midscene automatically parses and executes the steps.

* Type

```typescript
function aiAction(steps: string): Promise<void>;
function ai(steps: string): Promise<void>; // shorthand form
```

* Parameters:
  * `steps: string` - A natural language description of the UI steps.

* Return Value:
  * Returns a Promise that resolves to void when all steps are completed; if execution fails, an error is thrown.

* Examples:

```typescript
// Basic usage
await agent.aiAction('Type "JavaScript" into the search box, then click the search button');

// Using the shorthand .ai form
await agent.ai('Click the login button at the top of the page, then enter "test@example.com" in the username field');

// Complex example of target-oriented prompts from the ui-tars model
// Other models recommend writing the execution steps for each step
await agent.aiAction(`
  1. Scroll to the product list
  2. Locate the "Sauce Labs Backpack" item
  3. Click its "Add to cart" button
  4. Wait for the shopping cart icon to update
`);
```

:::tip
For optimal results, please provide clear and detailed instructions. Avoid vague commands (e.g., "post a tweet"), as they may lead to unstable or failed execution.

Under the hood, Midscene sends the page context and screenshots to the LLM to plan the steps in detail. It then executes these steps sequentially. If Midscene determines that the actions cannot be performed, an error will be thrown.

Your task is decomposed into the following built-in methods, which you can view in the visual report:

1. **Locator**: Locate target elements using natural language descriptions.
2. **Action**: Click, scroll, perform keyboard input, hover.
3. **Others**: Wait (using sleep).

Currently, Midscene does not support planning steps with conditions or loops.

Related Documentation:
* [FAQ: Can Midscene perform intelligent operations based on a single command (e.g., "post a tweet")?](./faq)
* [Tips for Writing Prompts](./prompting-tips)

:::


### `agent.aiQuery()`

This method allows you to extract data directly from the UI using multimodal AI reasoning capabilities. Simply define the expected format (e.g., string, number, JSON, or an array) in the `dataDemand`, and Midscene will return a result that matches the format.

* Type

```typescript
function aiQuery<T>(dataShape: string | Object): Promise<T>;
```

* Parameters:
  * `dataShape: T`: A description of the expected return format.

* Return Value:
  * Returns any valid basic type, such as string, number, JSON, array, etc.
  * Just describe the format in `dataDemand`, and Midscene will return a matching result.

* Examples:

```typescript
const dataA = await agent.aiQuery({
  time: 'The date and time displayed in the top-left corner as a string',
  userInfo: 'User information in the format {name: string}',
  tableFields: 'An array of table field names, string[]',
  tableDataRecord: 'Table records in the format {id: string, [fieldName]: string}[]',
});

// You can also describe the expected return format using a string:

// dataB will be an array of strings
const dataB = await agent.aiQuery('string[], list of task names');

// dataC will be an array of objects
const dataC = await agent.aiQuery('{name: string, age: string}[], table data records');
```

### `agent.aiAssert()`

This method lets you specify an assertion in natural language, and the AI determines whether the condition is true. If the assertion fails, the SDK throws an error that includes both the optional `errorMsg` and a detailed reason generated by the AI.

* Type

```typescript
function aiAssert(assertion: string, errorMsg?: string): Promise<void>;
```

* Parameters:
  * `assertion: string` - The assertion described in natural language.
  * `errorMsg?: string` - An optional error message to append if the assertion fails.

* Return Value:
  * Returns a Promise that resolves to void if the assertion passes; if it fails, an error is thrown with `errorMsg` and additional AI-provided information.

* Example:
```typescript
await agent.aiAssert('The price of "Sauce Labs Onesie" is 7.99');
```

:::tip
Assertions are critical in test scripts. To reduce the risk of errors due to AI hallucination (e.g., missing an error), you can also combine `.aiQuery` with standard JavaScript assertions instead of using `.aiAssert`.

For example, you might replace the above code with:

```typescript
const items = await agent.aiQuery(
  '"{name: string, price: number}[], return product names and prices'
);
const onesieItem = items.find(item => item.name === 'Sauce Labs Onesie');
expect(onesieItem).toBeTruthy();
expect(onesieItem.price).toBe(7.99);
```
:::

### `agent.aiWaitFor()`

This method allows you to wait until a specified condition, described in natural language, becomes true. Considering the cost of AI calls, the check interval will not exceed the specified `checkIntervalMs`.

* Type

```typescript
function aiWaitFor(
  assertion: string, 
  options?: { 
    timeoutMs?: number;
    checkIntervalMs?: number;
  }
): Promise<void>;
```

* Parameters:
  * `assertion: string` - The condition described in natural language.
  * `options?: object` - An optional configuration object containing:
    * `timeoutMs?: number` - Timeout in milliseconds (default: 15000).
    * `checkIntervalMs?: number` - Interval for checking in milliseconds (default: 3000).

* Return Value:
  * Returns a Promise that resolves to void if the condition is met; if not, an error is thrown when the timeout is reached.

* Examples:

```typescript
// Basic usage
await agent.aiWaitFor("There is at least one headphone information displayed on the interface");

// Using custom options
await agent.aiWaitFor("The shopping cart icon shows a quantity of 2", {
  timeoutMs: 30000,    // Wait for 30 seconds
  checkIntervalMs: 5000  // Check every 5 seconds
});
```

:::tip
Given the time consumption of AI services, `.aiWaitFor` might not be the most efficient method. Sometimes, using a simple sleep function may be a better alternative.
:::

### `agent.runYaml()`

This method executes an automation script written in YAML. Only the `tasks` part of the script is executed, and it returns the results of all `.aiQuery` calls within the script.

* Type

```typescript
function runYaml(yamlScriptContent: string): Promise<{ result: any }>;
```

* Parameters:
  * `yamlScriptContent: string` - The YAML-formatted script content.

* Return Value:
  * Returns an object with a `result` property that includes the results of all `.aiQuery` calls.

* Example:

```typescript
const { result } = await agent.runYaml(`
tasks:
  - name: search weather
    flow:
      - ai: input 'weather today' in input box, click search button
      - sleep: 3000

  - name: query weather
    flow:
      - aiQuery: "the result shows the weather info, {description: string}"
`);
console.log(result);
```

:::tip
For more information about YAML scripts, please refer to [Automate with Scripts in YAML](./automate-with-scripts-in-yaml).
:::

## Properties

### `.reportFile`

The path to the report file.

## Additional Configurations

### Setting Environment Variables at Runtime

You can override environment variables at runtime by calling the `overrideAIConfig` method.

```typescript
import { overrideAIConfig } from '@midscene/web/puppeteer'; // or another Agent

overrideAIConfig({
  OPENAI_BASE_URL: "...",
  OPENAI_API_KEY: "...",
  MIDSCENE_MODEL_NAME: "..."
});
```

### Print usage information for each AI call

Set the `MIDSCENE_DEBUG_AI_PROFILE` variable to view the execution time and usage for each AI call.

```shell
export MIDSCENE_DEBUG_AI_PROFILE=1
```

### Using LangSmith

LangSmith is a platform for debugging large language models. To integrate LangSmith, follow these steps:

```bash
# Set environment variables

# Enable debug mode
export MIDSCENE_LANGSMITH_DEBUG=1 

# LangSmith configuration
export LANGSMITH_TRACING_V2=true
export LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
export LANGSMITH_API_KEY="your_key_here"
export LANGSMITH_PROJECT="your_project_name_here"
```

After starting Midscene, you should see logs similar to:

```log
DEBUGGING MODE: langsmith wrapper enabled
```
docs: optimize agent api doc (#415) * docs: optimize agent api doc * docs: optimize runyaml link * docs: optimize prompt 2025-02-24 14:29:17 +08:00			`# API Reference`

			`## Constructors`

			`Each Agent in Midscene has its own constructor.`

			`* In Puppeteer, use [PuppeteerAgent](./integrate-with-puppeteer)`
			`* In Bridge Mode, use [AgentOverChromeBridge](./bridge-mode-by-chrome-extension#constructor)`

			`These Agents share some common constructor parameters:`

			* `generateReport: boolean`: If true, a report file will be generated. (Default: true)
			* `autoPrintReportMsg: boolean`: If true, report messages will be printed. (Default: true)
feat: optimize locator (#456) --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-03-17 19:19:54 +08:00			* `cacheId: string \| undefined`: If provided, this cacheId will be used to save or match the cache. (Default: undefined, means cache feature is disabled)
docs: optimize agent api doc (#415) * docs: optimize agent api doc * docs: optimize runyaml link * docs: optimize prompt 2025-02-24 14:29:17 +08:00
			`In Puppeteer, there is an additional parameter:`

			* `forceSameTabNavigation: boolean`: If true, page navigation is restricted to the current tab. (Default: true)

			`## Methods`

			`Below are the main APIs available for the various Agents in Midscene.`

			> In the documentation below, you might see function calls prefixed with `agent.`. If you utilize destructuring in Playwright (e.g., `async ({ ai, aiQuery }) => { /* ... */ }`), you can call these functions without the `agent.` prefix. This is merely a syntactical difference.

			### `agent.aiAction()` or `.ai()`

			`This method allows you to perform a series of UI actions described in natural language. Midscene automatically parses and executes the steps.`

			`* Type`

			```typescript
			`function aiAction(steps: string): Promise<void>;`
			`function ai(steps: string): Promise<void>; // shorthand form`
			```

			`* Parameters:`
			* `steps: string` - A natural language description of the UI steps.

			`* Return Value:`
			`* Returns a Promise that resolves to void when all steps are completed; if execution fails, an error is thrown.`

			`* Examples:`

			```typescript
			`// Basic usage`
			`await agent.aiAction('Type "JavaScript" into the search box, then click the search button');`

			`// Using the shorthand .ai form`
			`await agent.ai('Click the login button at the top of the page, then enter "test@example.com" in the username field');`

			`// Complex example of target-oriented prompts from the ui-tars model`
			`// Other models recommend writing the execution steps for each step`
			await agent.aiAction(`
			`1. Scroll to the product list`
			`2. Locate the "Sauce Labs Backpack" item`
			`3. Click its "Add to cart" button`
			`4. Wait for the shopping cart icon to update`
			`);
			```

			`:::tip`
			`For optimal results, please provide clear and detailed instructions. Avoid vague commands (e.g., "post a tweet"), as they may lead to unstable or failed execution.`

			`Under the hood, Midscene sends the page context and screenshots to the LLM to plan the steps in detail. It then executes these steps sequentially. If Midscene determines that the actions cannot be performed, an error will be thrown.`

			`Your task is decomposed into the following built-in methods, which you can view in the visual report:`

			`1. Locator: Locate target elements using natural language descriptions.`
			`2. Action: Click, scroll, perform keyboard input, hover.`
			`3. Others: Wait (using sleep).`

			`Currently, Midscene does not support planning steps with conditions or loops.`

			`Related Documentation:`
			`* [FAQ: Can Midscene perform intelligent operations based on a single command (e.g., "post a tweet")?](./faq)`
			`* [Tips for Writing Prompts](./prompting-tips)`

			`:::`


			### `agent.aiQuery()`

			This method allows you to extract data directly from the UI using multimodal AI reasoning capabilities. Simply define the expected format (e.g., string, number, JSON, or an array) in the `dataDemand`, and Midscene will return a result that matches the format.

			`* Type`

			```typescript
			`function aiQuery<T>(dataShape: string \| Object): Promise<T>;`
			```

			`* Parameters:`
			* `dataShape: T`: A description of the expected return format.

			`* Return Value:`
			`* Returns any valid basic type, such as string, number, JSON, array, etc.`
			* Just describe the format in `dataDemand`, and Midscene will return a matching result.

			`* Examples:`

			```typescript
			`const dataA = await agent.aiQuery({`
			`time: 'The date and time displayed in the top-left corner as a string',`
			`userInfo: 'User information in the format {name: string}',`
			`tableFields: 'An array of table field names, string[]',`
			`tableDataRecord: 'Table records in the format {id: string, [fieldName]: string}[]',`
			`});`

			`// You can also describe the expected return format using a string:`

			`// dataB will be an array of strings`
			`const dataB = await agent.aiQuery('string[], list of task names');`

			`// dataC will be an array of objects`
			`const dataC = await agent.aiQuery('{name: string, age: string}[], table data records');`
			```

			### `agent.aiAssert()`

			This method lets you specify an assertion in natural language, and the AI determines whether the condition is true. If the assertion fails, the SDK throws an error that includes both the optional `errorMsg` and a detailed reason generated by the AI.

			`* Type`

			```typescript
			`function aiAssert(assertion: string, errorMsg?: string): Promise<void>;`
			```

			`* Parameters:`
			* `assertion: string` - The assertion described in natural language.
			* `errorMsg?: string` - An optional error message to append if the assertion fails.

			`* Return Value:`
			* Returns a Promise that resolves to void if the assertion passes; if it fails, an error is thrown with `errorMsg` and additional AI-provided information.

			`* Example:`
			```typescript
			`await agent.aiAssert('The price of "Sauce Labs Onesie" is 7.99');`
			```

			`:::tip`
			Assertions are critical in test scripts. To reduce the risk of errors due to AI hallucination (e.g., missing an error), you can also combine `.aiQuery` with standard JavaScript assertions instead of using `.aiAssert`.

			`For example, you might replace the above code with:`

			```typescript
			`const items = await agent.aiQuery(`
			`'"{name: string, price: number}[], return product names and prices'`
			`);`
			`const onesieItem = items.find(item => item.name === 'Sauce Labs Onesie');`
			`expect(onesieItem).toBeTruthy();`
			`expect(onesieItem.price).toBe(7.99);`
			```
			`:::`

			### `agent.aiWaitFor()`

			This method allows you to wait until a specified condition, described in natural language, becomes true. Considering the cost of AI calls, the check interval will not exceed the specified `checkIntervalMs`.

			`* Type`

			```typescript
			`function aiWaitFor(`
			`assertion: string,`
			`options?: {`
			`timeoutMs?: number;`
			`checkIntervalMs?: number;`
			`}`
			`): Promise<void>;`
			```

			`* Parameters:`
			* `assertion: string` - The condition described in natural language.
			* `options?: object` - An optional configuration object containing:
			* `timeoutMs?: number` - Timeout in milliseconds (default: 15000).
			* `checkIntervalMs?: number` - Interval for checking in milliseconds (default: 3000).

			`* Return Value:`
			`* Returns a Promise that resolves to void if the condition is met; if not, an error is thrown when the timeout is reached.`

			`* Examples:`

			```typescript
			`// Basic usage`
			`await agent.aiWaitFor("There is at least one headphone information displayed on the interface");`

			`// Using custom options`
			`await agent.aiWaitFor("The shopping cart icon shows a quantity of 2", {`
			`timeoutMs: 30000, // Wait for 30 seconds`
			`checkIntervalMs: 5000 // Check every 5 seconds`
			`});`
			```

			`:::tip`
			Given the time consumption of AI services, `.aiWaitFor` might not be the most efficient method. Sometimes, using a simple sleep function may be a better alternative.
			`:::`

			### `agent.runYaml()`

			This method executes an automation script written in YAML. Only the `tasks` part of the script is executed, and it returns the results of all `.aiQuery` calls within the script.

			`* Type`

			```typescript
			`function runYaml(yamlScriptContent: string): Promise<{ result: any }>;`
			```

			`* Parameters:`
			* `yamlScriptContent: string` - The YAML-formatted script content.

			`* Return Value:`
			* Returns an object with a `result` property that includes the results of all `.aiQuery` calls.

			`* Example:`

			```typescript
			const { result } = await agent.runYaml(`
			`tasks:`
			`- name: search weather`
			`flow:`
			`- ai: input 'weather today' in input box, click search button`
			`- sleep: 3000`

			`- name: query weather`
			`flow:`
			`- aiQuery: "the result shows the weather info, {description: string}"`
			`);
			`console.log(result);`
			```

			`:::tip`
			`For more information about YAML scripts, please refer to [Automate with Scripts in YAML](./automate-with-scripts-in-yaml).`
			`:::`

			`## Properties`

			### `.reportFile`

			`The path to the report file.`

			`## Additional Configurations`

			`### Setting Environment Variables at Runtime`

			You can override environment variables at runtime by calling the `overrideAIConfig` method.

			```typescript
			`import { overrideAIConfig } from '@midscene/web/puppeteer'; // or another Agent`

			`overrideAIConfig({`
			`OPENAI_BASE_URL: "...",`
			`OPENAI_API_KEY: "...",`
			`MIDSCENE_MODEL_NAME: "..."`
			`});`
			```

fix(core): the prompt of qwen (#427) * fix: fix the prompt of qwen * docs: update debugging config * fix: sleep in qwen --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-02-26 15:04:28 +08:00			`### Print usage information for each AI call`
docs: optimize agent api doc (#415) * docs: optimize agent api doc * docs: optimize runyaml link * docs: optimize prompt 2025-02-24 14:29:17 +08:00
fix(core): the prompt of qwen (#427) * fix: fix the prompt of qwen * docs: update debugging config * fix: sleep in qwen --------- Co-authored-by: zhouxiao.shaw <zhouxiao.shaw@bytedance.com> 2025-02-26 15:04:28 +08:00			Set the `MIDSCENE_DEBUG_AI_PROFILE` variable to view the execution time and usage for each AI call.
docs: optimize agent api doc (#415) * docs: optimize agent api doc * docs: optimize runyaml link * docs: optimize prompt 2025-02-24 14:29:17 +08:00
			```shell
			`export MIDSCENE_DEBUG_AI_PROFILE=1`
			```

			`### Using LangSmith`

			`LangSmith is a platform for debugging large language models. To integrate LangSmith, follow these steps:`

			```bash
			`# Set environment variables`

			`# Enable debug mode`
			`export MIDSCENE_LANGSMITH_DEBUG=1`

			`# LangSmith configuration`
			`export LANGSMITH_TRACING_V2=true`
			`export LANGSMITH_ENDPOINT="https://api.smith.langchain.com"`
			`export LANGSMITH_API_KEY="your_key_here"`
			`export LANGSMITH_PROJECT="your_project_name_here"`
			```

			`After starting Midscene, you should see logs similar to:`

			```log
			`DEBUGGING MODE: langsmith wrapper enabled`
			```