midscene/apps/site/docs/en/API.md

# API

In the following documentation, you may see functions called with the `mid.` prefix. If you use destructuring in Playwright, like `async ({ ai, aiQuery }) => { /* ... */}`, you can call the functions without this prefix. It's just a matter of syntax.

## `.aiAction(steps: string)` or `.ai(steps: string)` - Interact with the page

You can use `.aiAction` to perform a series of actions. It accepts a `steps: string` as a parameter, which describes the actions. In the prompt, you should clearly describe the steps. Midscene will take care of the rest.

`.ai` is the shortcut for `.aiAction`.

These are some good samples:

```typescript
await mid.aiAction('Enter "Learn JS today" in the task box, then press Enter to create');
await mid.aiAction('Move your mouse over the second item in the task list and click the Delete button to the right of the second task');

// use `.ai` shortcut
await mid.ai('Click the "completed" status button below the task list');
```

Steps should always be clearly and thoroughly described. A very brief prompt like 'Tweet "Hello World"' will result in unstable performance and a high likelihood of failure.

Under the hood, Midscene will plan the detailed steps by sending your page context and a screenshot to the AI. After that, Midscene will execute the steps one by one. If Midscene deems it impossible to execute, an error will be thrown.

The main capabilities of Midscene are as follows, and your task will be split into these types. You can see them in the visualized report:

1. **Locator**: Identify the target element using a natural language description
2. **Action**: Tap, scroll, keyboard input, hover
3. **Others**: Sleep

Currently, Midscene can't plan steps that include conditions and loops.

Related Docs:
* [FAQ: Can Midscene smartly plan the actions according to my one-line goal? Like executing "Tweet 'hello world'](./faq.html)
* [Prompting Tips](./prompting-tips.html)

## `.aiQuery(dataDemand: any)` - extract any data from page

You can extract customized data from the UI. Provided that the multi-modal AI can perform inference, it can return both data directly written on the page and any data based on "understanding". The return value is in JSON format, so it should be valid primitive types, like String, Number, JSON, Array, etc. Just describe it in the `dataDemand`.

For example, to parse detailed information from page:

```typescript
const dataA = await mid.aiQuery({
  time: 'date and time, string',
  userInfo: 'user info, {name: string}',
  tableFields: 'field names of table, string[]',
  tableDataRecord: 'data record of table, {id: string, [fieldName]: string}[]',
});
```

You can also describe the expected return value format as a plain string:

```typescript
// dataB will be a string array
const dataB = await mid.aiQuery('string[], task names in the list');

// dataC will be an array with objects
const dataC = await mid.aiQuery('{name: string, age: string}[], Data Record in the table');
```

## `.aiAssert(assertion: string, errorMsg?: string)` - do an assertion

`.aiAssert` works just like the normal `assert` method, except that the condition is a prompt string written in natural language. Midscene will call AI to determine if the `assertion` is true. If the condition is not met, an error will be thrown containing `errorMsg` and a detailed reason generated by AI.

```typescript
await mid.aiAssert('The price of "Sauce Labs Onesie" is 7.99');
```

:::tip
Assertions are usually a very important part of your script. To prevent the possibility of AI hallucinations ( especially for the false negative situation ), you can also use `.aiQuery` + normal JavaScript assertions to replace the `.aiAssert` calls.

For example, to replace the previous assertion,

```typescript
const items = await mid.aiQuery(
  '"{name: string, price: number}[], return item name and price of each item',
);
const onesieItem = items.find(item => item.name === 'Sauce Labs Onesie');
expect(onesieItem).toBeTruthy();
expect(onesieItem.price).toBe(7.99);
```
:::

## `.aiWaitFor(assertion: string, {timeoutMs?: number, checkIntervalMs?: number })` - wait until the assertion is met

`.aiWaitFor` will help you check if your assertion has been met or a timeout error occurred. Considering the AI service cost, the check interval will not exceed `checkIntervalMs` milliseconds. The default config sets `timeoutMs` to 15 seconds and `checkIntervalMs` to 3 seconds: i.e. check at most 5 times if all assertions fail and the AI service always responds immediately.

When considering the time required for the AI service, `.aiWaitFor` may not be very efficient. Using a simple `sleep` method might be a useful alternative to `waitFor`.

```typescript
await mid.aiWaitFor("there is at least one headphone item on page");
```

## Debug Config (Optional)

### AI profiling

By setting `MIDSCENE_DEBUG_AI_PROFILE`, you can take a look at the time and token consumption of AI calls.

```shell
export MIDSCENE_DEBUG_AI_PROFILE=1
```

### Use LangSmith

LangSmith is a platform designed to debug the LLMs. To integrate LangSmith, please follow these steps:

```shell
# set env variables

# Flag to enable debug
export MIDSCENE_LANGSMITH_DEBUG=1

# LangSmith config
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
export LANGCHAIN_API_KEY="your_key_here"
export LANGCHAIN_PROJECT="your_project_name_here"
```

Launch Midscene, you should see logs like this:

```log
DEBUGGING MODE: langsmith wrapper enabled
```