midscene/apps/site/docs/en/integrate-with-android.mdx

264 lines
7.5 KiB
Plaintext
Raw Normal View History

# Integrate with Android (adb)
After connecting the Android device with adb, you can use Midscene to control Android devices with visual-language (VL) models.
import { PackageManagerTabs } from '@theme';
:::info Demo Project
Control Android devices with javascript: [https://github.com/web-infra-dev/midscene-example/blob/main/android-demo](https://github.com/web-infra-dev/midscene-example/blob/main/android-demo)
Control Android devices with javascript and Vitest: [https://github.com/web-infra-dev/midscene-example/tree/main/android-with-vitest-demo](https://github.com/web-infra-dev/midscene-example/tree/main/android-with-vitest-demo)
:::
## Preparation
### Config API Key
Config the API key for VL model. For example, if you use qwen-2.5-vl, you can config the API key like this:
```bash
# replace with your own
OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
OPENAI_API_KEY="......"
MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
MIDSCENE_USE_QWEN_VL=1
```
Android devices can be only be controlled by visual-language (VL) models. For more details, please refer to [choose a model](./choose-a-model).
### Adb environment
`adb` is a command-line tool that allows you to communicate with an Android device.
- way 1: use [Android Studio](https://developer.android.com/studio?hl=zh-cn) to install
- way 2: use [Android command-line tools](https://developer.android.com/studio#command-line-tools-only) to install
Verify adb is installed successfully:
```bash
adb --version
```
When you see the following output, adb is installed successfully:
```log
Android Debug Bridge version 1.0.41
Version 34.0.4-10411341
Installed as /usr/local/bin//adb
Running on Darwin 24.3.0 (arm64)
```
### Connect Android device with adb
In the developer options of the system settings, enable the 'USB debugging' of the Android device, if the 'USB debugging (secure settings)' exists, also enable it, then connect the Android device with a USB cable
<p align="center">
<img src="/android-usb-debug-en.png" alt="android usb debug" width="400"/>
</p>
Verify the connection:
```bash
adb devices -l
```
When you see the following output, the connection is successful:
```log
List of devices attached
s4ey59 device usb:34603008X product:cezanne model:M2006J device:cezan transport_id:3
```
## Step 1. Install dependencies
<PackageManagerTabs command="install @midscene/android --save-dev" />
## Step 2. Write scripts
Let's take a simple example: search for headphones on eBay using the browser in the Android device. Of course, you can also use any other apps on the Android device.
Write the following code, and save it as `./demo.ts`
```typescript title="./demo.ts"
import { AndroidAgent, AndroidDevice, getConnectedDevices } from '@midscene/android';
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
(async () => {
const devices = await getConnectedDevices();
const page = new AndroidDevice(devices[0].udid);
// 👀 init Midscene agent
const agent = new AndroidAgent(page,{
aiActionContext:
'If any location, permission, user agreement, etc. popup, click agree. If login page pops up, close it.',
});
await page.connect();
// 👀 open browser and navigate to ebay.com (Please ensure that the current page has a browser app)
await agent.aiAction('open browser and navigate to ebay.com');
await sleep(5000);
// 👀 type keywords, perform a search
await agent.aiAction('type "Headphones" in search box, hit Enter');
// 👀 wait for loading completed
await agent.aiWaitFor("There is at least one headphone product");
// or you can use a normal sleep:
// await sleep(5000);
// 👀 understand the page content, extract data
const items = await agent.aiQuery(
"{itemTitle: string, price: Number}[], find item in list and corresponding price"
);
console.log("headphones in stock", items);
// 👀 assert by AI
await agent.aiAssert("There is a category filter on the left");
})()
);
```
## Step 3. run
Using `tsx` to run
```bash
# run
npx tsx demo.ts
```
After a while, you will see the following output:
```log
[
{
itemTitle: 'Beats by Dr. Dre Studio Buds Totally Wireless Noise Cancelling In Ear + OPEN BOX',
price: 505.15
},
{
itemTitle: 'Skullcandy Indy Truly Wireless Earbuds-Headphones Green Mint',
price: 186.69
}
]
```
## Step 4: view the report
After the above command executes successfully, the console will output: `Midscene - report file updated: /path/to/report/some_id.html`. You can open this file in a browser to view the report.
## More interfaces in AndroidAgent
Except the common agent interfaces in [API Reference](./API), AndroidAgent also provides some other interfaces:
### `agent.launch()`
Launch a webpage or native page.
* Type
```typescript
function launch(uri: string): Promise<void>;
```
* Parameters:
feat: android playground (#542) * refactor: android api * refactor: enhance Android agent to accept options for device connection * fix: type error * fix: click after clearInput * fix: click before clearInput * feat: android playground * feat: support npx package name * feat: android playground joint * fix: git ignore conflicts * feat: ensure adb server is running before initializing adb client * fix: deps consistency * ci: add android playground * feat: integrate shared constants and improve server configuration in android playground * feat: android playground style * feat: style opt * feat: add @rsbuild/plugin-svgr dependency and improve URI handling in adb * feat: remove unused water flow scripts and update comments to English * feat: download report file * feat: standalone android playground * feat: use dynamic import * feat: migrate CSS to LESS and remove unused styles in chrome extension and report * feat: enhance Android playground with ScrcpyPlayer ref integration and device management improvements * feat: optimize styles and layout in Android playground and visualizer components * chore: add bin back * chore: update build script to exclude documentation generation * feat: add not ready message to PlaygroundResult for improved user guidance * feat: add error handling for screenshot capture in Android page * docs: update readme * feat: add PNG validation for screenshot buffer in Android page * feat: enhance UI components with improved styling and tooltips in ScrcpyPlayer and PromptInput * docs: update uri parameter description in integrate-with-android documentation and improve uri handling in launch function * style: update primary color to #2B83FF across multiple components and adjust margin in App.less * refactor: replace userConfig with globalConfig for environment configuration management and update related functions * feat: integrate server validation logic in App, AdbDevice, and ScrcpyPlayer components for improved connection handling * style: enhance player component layout with overflow handling and margin adjustments * style: refine player component layout with flex adjustments and improved spacing * feat: add midscene model name display and improve layout in EnvConfig component * feat: integrate ShinyText component for enhanced loading progress display in PlaygroundResult * test: add test for isValidPNGImageBuffer * style: remove background color from App.less and adjust AI config override behavior in env.ts --------- Co-authored-by: yutao <yutao.tao@bytedance.com>
2025-04-17 17:44:11 +08:00
* `uri: string` - The uri to open, can be a webpage url or a native app's package name or activity name, if the activity name exists, it should be separated by / (e.g. com.android.settings/.Settings).
* Return Value:
* Returns a Promise that resolves to void when the page is opened.
* Examples:
```typescript
import { AndroidAgent, AndroidDevice } from '@midscene/android';
const page = new AndroidDevice('s4ey59ytbitot4yp');
const agent = new AndroidAgent(page);
await agent.launch('https://www.ebay.com'); // open a webpage
await agent.launch('com.android.settings'); // open a native page
await agent.launch('com.android.settings/.Settings'); // open a native page
```
### `agentFromAdbDevice()`
Create a AndroidAgent from a connected adb device.
* Type
```typescript
function agentFromAdbDevice(deviceId?: string, opts?: PageAgentOpt): Promise<AndroidAgent>;
```
* Parameters:
* `deviceId?: string` - Optional, the adb device id to connect. If not provided, the first connected device will be used.
* `opts?: PageAgentOpt` - Optional, the options for the AndroidAgent, refer to [constructor](./API).
* Return Value:
* `Promise<AndroidAgent>` Returns a Promise that resolves to an AndroidAgent.
* Examples:
```typescript
import { agentFromAdbDevice } from '@midscene/android';
const agent = await agentFromAdbDevice('s4ey59ytbitot4yp'); // create a AndroidAgent from a specific adb device
const agent = await agentFromAdbDevice(); // no deviceId, use the first connected device
```
### `getConnectedDevices()`
Get all connected Android devices.
* Type
```typescript
function getConnectedDevices(): Promise<Device[]>;
interface Device {
/**
* The device udid.
*/
udid: string;
/**
* Current device state, as it is visible in
* _adb devices -l_ output.
*/
state: string;
port?: number;
}
```
* Return Value:
* `Promise<Device[]>` Returns a Promise that resolves to an array of Device.
* Examples:
```typescript
import { agentFromAdbDevice, getConnectedDevices } from '@midscene/android';
const devices = await getConnectedDevices();
console.log(devices);
const agent = await agentFromAdbDevice(devices[0].udid);
```
## More
* For all the APIs on the Agent, please refer to [API Reference](./API).
* For more details about prompting, please refer to [Prompting Tips](./prompting-tips)
## FAQ
### Why can't I control the device even though I've connected it?
Please check if the device is unlocked in the developer options of the system settings.
<p align="center">
<img src="/android-usb-debug-en.png" alt="android usb debug" width="400"/>
</p>
### Why can't I see the device even though I've connected it?
[Connect device](./integrate-with-android#connect-android-device-with-adb) , please ensure that the device is unlocked.