mirror of
https://github.com/web-infra-dev/midscene.git
synced 2025-07-09 10:01:44 +00:00
164 lines
6.3 KiB
Plaintext
164 lines
6.3 KiB
Plaintext
# Support Android Automation
|
|
|
|
From Midscene v0.15, we are happy to announce the support for Android automation. The era for AI-driven Android automation is here!
|
|
|
|
## Showcases
|
|
|
|
### Navigation to attraction
|
|
|
|
Open Maps, search for a destination, and navigate to it.
|
|
|
|
<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/android-maps.mp4" controls/>
|
|
|
|
### Auto-like tweets
|
|
|
|
Open Twitter, auto-like the first tweet by [@midscene_ai](https://x.com/midscene_ai).
|
|
|
|
<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/android-twitter.mp4" controls/>
|
|
|
|
## Suitable for ALL apps
|
|
|
|
For our developers, all you need is the adb connection and a visual-language model (vl-model) service. Everything is ready!
|
|
|
|
Behind the scenes, we utilize the visual grounding capabilities of vl-model to locate target elements on the screen. So, regardless of whether it's a native app, a [Lynx](https://github.com/lynx-family/lynx) page, or a hybrid app with a webview, it makes no difference. Developers can write automation scripts without the burden of worrying about the technology stack of the app.
|
|
|
|
## With ALL the power of Midscene
|
|
|
|
When using Midscene to do web automation, our users loves the tools like playgrounds and reports. Now, we bring the same power to Android automation!
|
|
|
|
### Use the playground to run automation without any code
|
|
|
|
<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/android-playground-lark-en.mp4" poster="/blog/android-playground-lark-poster-en.png" controls/>
|
|
|
|
### Use the report to replay the whole process
|
|
|
|
<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/android-ebay.mp4" controls/>
|
|
|
|
### Write the automation scripts by yaml file
|
|
|
|
Connect to the device, open ebay.com, and get some items info.
|
|
|
|
```yaml
|
|
# search headphone on ebay, extract the items info into a json file, and assert the shopping cart icon
|
|
|
|
android:
|
|
deviceId: s4ey59
|
|
|
|
tasks:
|
|
- name: search headphones
|
|
flow:
|
|
- aiAction: open browser and navigate to ebay.com
|
|
- aiAction: type 'Headphones' in ebay search box, hit Enter
|
|
- sleep: 5000
|
|
- aiAction: scroll down the page for 800px
|
|
|
|
- name: extract headphones info
|
|
flow:
|
|
- aiQuery: >
|
|
{name: string, price: number, subTitle: string}[], return item name, price and the subTitle on the lower right corner of each item
|
|
name: headphones
|
|
|
|
- name: assert Filter button
|
|
flow:
|
|
- aiAssert: There is a Filter button on the page
|
|
```
|
|
|
|
### Use the javascript SDK
|
|
|
|
Use the javascript SDK to do the automation by code.
|
|
|
|
```ts
|
|
import { AndroidAgent, AndroidDevice, getConnectedDevices } from '@midscene/android';
|
|
import "dotenv/config"; // read environment variables from .env file
|
|
|
|
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
|
|
Promise.resolve(
|
|
(async () => {
|
|
const devices = await getConnectedDevices();
|
|
const page = new AndroidDevice(devices[0].udid);
|
|
|
|
// 👀 init Midscene agent
|
|
const agent = new AndroidAgent(page,{
|
|
aiActionContext:
|
|
'If any location, permission, user agreement, etc. popup, click agree. If login page pops up, close it.',
|
|
});
|
|
await page.connect();
|
|
await page.launch('https://www.ebay.com');
|
|
|
|
await sleep(5000);
|
|
|
|
// 👀 type keywords, perform a search
|
|
await agent.aiAction('type "Headphones" in search box, hit Enter');
|
|
|
|
// 👀 wait for the loading
|
|
await agent.aiWaitFor("there is at least one headphone item on page");
|
|
// or you may use a plain sleep:
|
|
// await sleep(5000);
|
|
|
|
// 👀 understand the page content, find the items
|
|
const items = await agent.aiQuery(
|
|
"{itemTitle: string, price: Number}[], find item in list and corresponding price"
|
|
);
|
|
console.log("headphones in stock", items);
|
|
|
|
// 👀 assert by AI
|
|
await agent.aiAssert("There is a category filter on the left");
|
|
})()
|
|
);
|
|
|
|
```
|
|
|
|
### Two style APIs to do interaction
|
|
|
|
The auto-planning style:
|
|
|
|
```javascript
|
|
await agent.ai('input "Headphones" in search box, hit Enter');
|
|
```
|
|
|
|
The instant action style:
|
|
|
|
```javascript
|
|
await agent.aiInput('Headphones', 'search box');
|
|
await agent.aiKeyboardPress('Enter');
|
|
```
|
|
|
|
## Quick start
|
|
|
|
You can use the playground to experience the Android automation without any code. Please refer to [Quick experience with Android](./quick-experience-with-android) for more details.
|
|
|
|
After the experience, you can integrate with the Android device by javascript code. Please refer to [Integrate with Android(adb)](./integrate-with-android) for more details.
|
|
|
|
If you prefer the yaml file for automation scripts, please refer to [Automate with scripts in yaml](./automate-with-scripts-in-yaml).
|
|
|
|
### Demo projects
|
|
|
|
We have prepared a demo project for javascript SDK:
|
|
|
|
[JavaScript demo project](https://github.com/web-infra-dev/midscene-example/blob/main/android/javascript-sdk-demo)
|
|
|
|
If you want to use the automation for testing purpose, you can use the javascript with vitest. We have setup a demo project for you to see how it works:
|
|
|
|
[Vitest demo project](https://github.com/web-infra-dev/midscene-example/blob/main/android/vitest-demo)
|
|
|
|
You can also write the automation scripts by yaml file:
|
|
|
|
[YAML demo project](https://github.com/web-infra-dev/midscene-example/blob/main/android/yaml-scripts-demo)
|
|
|
|
## Limitations
|
|
|
|
1. Caching feature for element locator is not supported. Since no view-hierarchy is collected, we cannot cache the element identifier and reuse it.
|
|
2. LLMs like gpt-4o or deepseek are not supported. Only some known vl models with visual grounding ability are supported for now. If you want to introduce other vl models, please let us know.
|
|
3. The performance is not good enough for now. We are still working on it.
|
|
4. The vl model may not perform well on `.aiQuery` and `.aiAssert`. We will give a way to switch model for different kinds of tasks.
|
|
5. Due to some security restrictions, you may got a blank screenshot for the password input and Midscene will not be able to work for that.
|
|
|
|
## Credits
|
|
|
|
We would like to thank the following projects:
|
|
|
|
- [scrcpy](https://github.com/Genymobile/scrcpy) and [yume-chan](https://github.com/yume-chan) allow us to control Android devices with browser.
|
|
- [appium-adb](https://github.com/appium/appium-adb) for the javascript bridge of adb.
|
|
- [YADB](https://github.com/ysbing/YADB) for the yadb tool which improves the performance of text input.
|
|
|