> In the documentation below, you might see function calls prefixed with `agent.`. If you utilize destructuring in Playwright (e.g., `async ({ ai, aiQuery }) => { /* ... */ }`), you can call these functions without the `agent.` prefix. This is merely a syntactical difference.
* `cacheId: string | undefined`: If provided, this cacheId will be used to save or match the cache. (Default: undefined, means cache feature is disabled)
* `actionContext: string`: Some background knowledge that should be sent to the AI model when calling `agent.aiAction()`, like 'close the cookie consent dialog first if it exists' (Default: undefined)
* `waitForNetworkIdleTimeout: number`: The timeout for waiting for network idle between each action. (Default: 2000ms, set to 0 to disable the timeout)
* `waitForNavigationTimeout: number`: The timeout for waiting for navigation finished. (Default: 5000ms, set to 0 to disable the timeout)
In Midscene, you can choose to use either auto planning or instant action.
* `agent.ai()` is for Auto Planning: Midscene will automatically plan the steps and execute them. It's more smart and looks like more fashionable style for AI agents. But it may be slower and heavily rely on the quality of the AI model.
* `agent.aiTap()`, `agent.aiHover()`, `agent.aiInput()`, `agent.aiKeyboardPress()`, `agent.aiScroll()` are for Instant Action: Midscene will directly perform the specified action, while the AI model is responsible for basic tasks such as locating elements. It's faster and more reliable if you are certain about the action you want to perform.
Under the hood, Midscene uses AI model to split the instruction into a series of steps (a.k.a. "Planning"). It then executes these steps sequentially. If Midscene determines that the actions cannot be performed, an error will be thrown.
For optimal results, please provide clear and detailed instructions for `agent.aiAction()`. For guides about writing prompts, you may read this doc: [Tips for Writing Prompts](./prompting-tips).
* `scrollParam: PlanningActionParamScroll` - The scroll parameter
* `direction: 'up' | 'down' | 'left' | 'right'` - The direction to scroll.
* `scrollType: 'once' | 'untilBottom' | 'untilTop' | 'untilRight' | 'untilLeft'` - Optional, the type of scroll to perform.
* `distance: number` - Optional, the distance to scroll in px.
* `locate?: string` - Optional, a natural language description of the element to scroll on. If not provided, Midscene will perform scroll on the current mouse position.
The `deepThink` feature is a powerful feature that allows Midscene to call AI model twice to precisely locate the element. It is useful when the AI model find it hard to distinguish the element from its surroundings.
This method allows you to extract data directly from the UI using multimodal AI reasoning capabilities. Simply define the expected format (e.g., string, number, JSON, or an array) in the `dataDemand`, and Midscene will return a result that matches the format.
* Type
```typescript
function aiQuery<T>(dataShape: string | Object): Promise<T>;
```
* Parameters:
* `dataShape: T`: A description of the expected return format.
* Return Value:
* Returns any valid basic type, such as string, number, JSON, array, etc.
* Just describe the format in `dataDemand`, and Midscene will return a matching result.
* Examples:
```typescript
const dataA = await agent.aiQuery({
time: 'The date and time displayed in the top-left corner as a string',
userInfo: 'User information in the format {name: string}',
tableFields: 'An array of table field names, string[]',
tableDataRecord: 'Table records in the format {id: string, [fieldName]: string}[]',
});
// You can also describe the expected return format using a string:
// dataB will be an array of strings
const dataB = await agent.aiQuery('string[], list of task names');
Specify an assertion in natural language, and the AI determines whether the condition is true. If the assertion fails, the SDK throws an error that includes both the optional `errorMsg` and a detailed reason generated by the AI.
function aiAssert(assertion: string, errorMsg?: string): Promise<void>;
```
* Parameters:
* `assertion: string` - The assertion described in natural language.
* `errorMsg?: string` - An optional error message to append if the assertion fails.
* Return Value:
* Returns a Promise that resolves to void if the assertion passes; if it fails, an error is thrown with `errorMsg` and additional AI-provided information.
* Example:
```typescript
await agent.aiAssert('The price of "Sauce Labs Onesie" is 7.99');
```
:::tip
Assertions are critical in test scripts. To reduce the risk of errors due to AI hallucination (e.g., missing an error), you can also combine `.aiQuery` with standard JavaScript assertions instead of using `.aiAssert`.
For example, you might replace the above code with:
```typescript
const items = await agent.aiQuery(
'"{name: string, price: number}[], return product names and prices'
Wait until a specified condition, described in natural language, becomes true. Considering the cost of AI calls, the check interval will not exceed the specified `checkIntervalMs`.
* `assertion: string` - The condition described in natural language.
* `options?: object` - An optional configuration object containing:
* `timeoutMs?: number` - Timeout in milliseconds (default: 15000).
* `checkIntervalMs?: number` - Interval for checking in milliseconds (default: 3000).
* Return Value:
* Returns a Promise that resolves to void if the condition is met; if not, an error is thrown when the timeout is reached.
* Examples:
```typescript
// Basic usage
await agent.aiWaitFor("There is at least one headphone information displayed on the interface");
// Using custom options
await agent.aiWaitFor("The shopping cart icon shows a quantity of 2", {
timeoutMs: 30000, // Wait for 30 seconds
checkIntervalMs: 5000 // Check every 5 seconds
});
```
:::tip
Given the time consumption of AI services, `.aiWaitFor` might not be the most efficient method. Sometimes, using a simple sleep function may be a better alternative.
Execute an automation script written in YAML. Only the `tasks` part of the script is executed, and it returns the results of all `.aiQuery` calls within the script.
Set the `MIDSCENE_RUN_DIR` variable to customize the run artifact directory.
```bash
export MIDSCENE_RUN_DIR=midscene_run # The default value is the midscene_run in the current working directory, you can set it to an absolute path or a relative path