midscene/apps/site/docs/en/automate-with-scripts-in-yaml.mdx
Leyang 03a597e022
feat(web-integration): enhance timeout configurations and logging for network idle and navigation (#624)
* feat(web-integration): enhance timeout configurations and logging for network idle and navigation

* fix(web-integration): refine timeout warning messages and remove unnecessary test files

* feat(site): add network timeout customization details and additional parameters for Puppeteer

* fix(site): update default timeout values and enhance customization options for network idle in YAML

* fix(site): remove redundant timeout customization details in FAQ documentation

* fix(web-integration): enhance Playwright agent to support network idle functionality

* docs(playwright): update config docs

* docs(playwright): update config docs

* fix(web-integration): refactor network idle handling in Playwright agent

---------

Co-authored-by: yutao <yutao.tao@bytedance.com>
2025-04-24 10:28:26 +08:00

322 lines
9.7 KiB
Plaintext
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

import SetupEnv from './common/setup-env.mdx';
# Automate with Scripts in YAML
In most cases, developers write automation just to perform some smoke tests, like checking the appearance of some content, or verifying that the key user path is accessible. Maintaining a large test project is unnecessary in this situation.
Midscene offers a way to do this kind of automation with `.yaml` files, which helps you to focus on the script itself instead of the test infrastructure. Any team member can write an automation script without learning any API.
Here is an example of `.yaml` script, you may have already understood how it works by reading its content.
```yaml
web:
url: https://www.bing.com
tasks:
- name: search weather
flow:
- ai: search for 'weather today'
- sleep: 3000
- name: check result
flow:
- aiAssert: the result shows the weather info
```
:::info Demo Project
You can find the demo project with YAML scripts
[https://github.com/web-infra-dev/midscene-example/tree/main/yaml-scripts-demo](https://github.com/web-infra-dev/midscene-example/tree/main/yaml-scripts-demo)
- [Web](https://github.com/web-infra-dev/midscene-example/tree/main/yaml-scripts-demo)
- [Android](https://github.com/web-infra-dev/midscene-example/tree/main/android/yaml-scripts-demo)
:::
<SetupEnv />
or you can use a `.env` file locate at the same directory as you run the command to store the configuration, Midscene command line tool will automatically load it.
```env filename=.env
OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
```
## Install Command Line Tool
Install `@midscene/cli` globally
```bash
npm i -g @midscene/cli
# or if you prefer a project-wide installation
npm i @midscene/cli --save-dev
```
Write a yaml file to `bing-search.yaml` to automate in web browser :
```yaml
web:
url: https://www.bing.com
tasks:
- name: search weather
flow:
- ai: search for 'weather today'
- sleep: 3000
- aiAssert: the result shows the weather info
```
or to automate in Android device connected by adb :
```yaml
android:
# launch: https://www.bing.com
deviceId: s4ey59
tasks:
- name: search weather
flow:
- ai: open browser and navigate to bing.com
- ai: search for 'weather today'
- sleep: 3000
- aiAssert: the result shows the weather info
```
Run this script
```bash
midscene ./bing-search.yaml
# or if you installed midscene inside the project
npx midscene ./bing-search.yaml
```
You should see that the output shows the progress of the running process and the report file.
## Command line usage
### Run single `.yaml` file
```bash
midscene /path/to/yaml
```
### Run all `.yaml` files under a folder
```bash
midscene /dir/of/yaml/
# glob is also supported
midscene /dir/**/yaml/
```
## YAML file schema
There are two parts in a `.yaml` file, the `web/android` and the `tasks`.
The `web/android` part defines the basic of a task. Use `web` parameter (also previously named as `target`) for web browser automation, and use `android` parameter for Android device automation. They are mutually exclusive.
### The `web` part
```yaml
web:
# The URL to visit, required. If `serve` is provided, provide the path to the file to visit
url: <url>
# Serve the local path as a static server, optional
serve: <root-directory>
# The user agent to use, optional
userAgent: <ua>
# number, the viewport width, default is 1280, optional
viewportWidth: <width>
# number, the viewport height, default is 960, optional
viewportHeight: <height>
# number, the device scale factor (dpr), default is 1, optional
deviceScaleFactor: <scale>
# string, the path to the json format cookie file, optional
cookie: <path-to-cookie-file>
# object, the strategy to wait for network idle, optional
waitForNetworkIdle:
# number, the timeout in milliseconds, 2000ms for default, optional
timeout: <ms>
# boolean, continue on network idle error, true for default
continueOnNetworkIdleError: <boolean>
# string, the path to save the aiQuery result, optional
output: <path-to-output-file>
# boolean, if limit the popup to the current page, true for default in yaml script
forceSameTabNavigation: <boolean>
# string, the bridge mode to use, optional, default is false, can be 'newTabWithUrl' or 'currentTab'. More details see the following section
bridgeMode: false | 'newTabWithUrl' | 'currentTab'
# boolean, if close the new tabs after the bridge is disconnected, optional, default is false
closeNewTabsAfterDisconnect: <boolean>
# boolean, if allow insecure https certs, optional, default is false
acceptInsecureCerts: <boolean>
# string, the background knowledge to send to the AI model when calling aiAction, optional
aiActionContext: <string>
```
### The `android` part
```yaml
android:
# The device id to use, optional, default is the first connected device
deviceId: <device-id>
# The url to launch, optional, default is the current page
launch: <url>
```
### The `tasks` part
The `tasks` part is an array indicates the tasks to do. Remember to write a `-` before each item which means an array item.
The interfaces of the `flow` part are almost the same as the [API](./API.html), except for some parameter levels.
```yaml
tasks:
- name: <name>
continueOnError: <boolean> # optional, default is false
flow:
# Auto Planning (.ai)
# ----------------
# perform an action, this is the shortcut for aiAction
- ai: <prompt>
# this is the same as ai
- aiAction: <prompt>
# Instant Action(.aiTap, .aiHover, .aiInput, .aiKeyboardPress, .aiScroll)
# ----------------
# tap an element located by prompt
- aiTap: <prompt>
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element
# hover an element located by prompt
- aiHover: <prompt>
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element
# input text into an element located by prompt
- aiInput: <final text content of the input>
locate: <prompt>
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element
# press a key (like Enter, Tab, Escape, etc.) on an element located by prompt
- aiKeyboardPress: <key>
locate: <prompt>
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element
# scroll globally or on an element located by prompt
- aiScroll:
direction: 'up' # or 'down' | 'left' | 'right'
scrollType: 'once' # or 'untilTop' | 'untilBottom' | 'untilLeft' | 'untilRight'
distance: <number> # optional, distance to scroll in px
locate: <prompt> # optional, the element to scroll on
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element
# Data Extraction
# ----------------
# perform a query, return a json object
- aiQuery: <prompt> # remember to describe the format of the result in the prompt
name: <name> # the name of the result, will be used as the key in the output json
# More APIs
# ----------------
# wait for a condition to be met with a timeout (ms, optional, default 30000)
- aiWaitFor: <prompt>
timeout: <ms>
# perform an assertion
- aiAssert: <prompt>
# sleep for a number of milliseconds
- sleep: <ms>
# evaluate a javascript expression in web page context
- javascript: <javascript>
name: <name> # assign a name to the return value, will be used as the key in the output json, optional
- name: <name>
flow:
# ...
```
## More features
### Use environment variables in `.yaml` file
You can use environment variables in `.yaml` file by `${variable-name}`.
For example, if you have a `.env` file with the following content:
```env filename=.env
topic=weather today
```
You can use the environment variable in the `.yaml` file like this:
```yaml
#...
- ai: type ${topic} in input box
#...
```
### Debug in headed mode
> `web` scenario only
'headed mode' means the browser will be visible. The default behavior is to run in headless mode.
To turn on headed mode, you can use `--headed` option. Besides, if you want to keep the browser window open after the script finishes, you can use `--keep-window` option. `--keep-window` implies `--headed`.
When running in headed mode, it will consume more resources, so we recommend you to use it locally only when needed.
```bash
# run in headed mode
midscene /path/to/yaml --headed
# run in headed mode and keep the browser window open after the script finishes
midscene /path/to/yaml --keep-window
```
### Use bridge mode
> `web` scenario only
By using bridge mode, you can utilize YAML scripts to automate the web browser on your desktop. This is particularly useful if you want to reuse cookies, plugins, and page states, or if you want to manually interact with automation scripts.
To use bridge mode, you should install the Chrome extension first, and use this configuration in the `target` section:
```diff
web:
url: https://www.bing.com
+ bridgeMode: newTabWithUrl
```
See [Bridge Mode by Chrome Extension](./bridge-mode-by-chrome-extension) for more details.
### Run yaml script with javascript
You can also run a yaml script with javascript by using the [`runYaml`](./api.html#runyaml) method of the Midscene agent. Only the `tasks` part of the yaml script will be executed.
## FAQ
**How to get cookies in JSON format from Chrome?**
You can use this [chrome extension](https://chromewebstore.google.com/detail/get-cookiestxt-locally/cclelndahbckbenkjhflpdbgdldlbecc) to export cookies in JSON format.
## More
You may also be interested in [Prompting Tips](./prompting-tips)