midscene/apps/site/docs/en/index.mdx
Leyang 2589a9c4ca
docs(android): update android docs (#607)
* docs: release android automation

* chore(docs): update doubao docs

* chore(docs): merge docs for doubao

* docs(android): update

* docs(site): add more android case

* docs(site): update slogan and authors

* docs(site): android yaml

* docs(core): instruction for override config

* docs(core): update readme

* Update README.md

* docs(core): update readme

* docs(core): update readme

* docs(core): update readme

* docs(core): update readme

* docs(core): update README and blog for Android automation support

* docs(core): update android playground doc

* docs(core): enhance Android integration documentation with setup instructions

* docs(core): update android playground doc

* docs(core): update Android integration documentation and add setup instructions

* docs(core): update bridge mode title

* docs(core): update yaml docs

* docs(site): chore update

* docs(site): update YAML documentation with setup instructions and clarify parameters

* docs(core): update instructions

* chore: update docs

* chore: update bridge mode docs

* docs(site): translate to zh

* docs(site): translate error

* docs(site): remove unnecessary code block in YAML automation documentation

* docs(core): update blog

* docs(core): update instructions

* docs(core): update instructions

---------

Co-authored-by: yutao <yutao.tao@bytedance.com>
Co-authored-by: yuyutaotao <167746126+yuyutaotao@users.noreply.github.com>
2025-04-21 20:51:17 +08:00

100 lines
4.8 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Midscene.js - Joyful Automation by AI
Your AI Operator for Web, Android, Automation & Testing
<div style={{"width": "100%", "display": "flex", justifyContent: "center"}}>
<iframe
style={{"maxWidth": "100%", "width": "800px", "height": "450px"}}
src="https://www.youtube.com/embed/lrF0lPfrwag?vq=hd1080"
frameBorder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowFullScreen
title="Embedded youtube"
></iframe>
</div>
## Interact, query and assert by natural language
There are three main capabilities: **action**, **query**, **assert**.
* Use **action (`.ai`, `.aiAction`)** to execute a series of actions by describing the steps
* Use **query (`.aiQuery`)** to extract customized data from the UI. Describe the JSON format you want, and AI will give the answer based on its "understanding" of the page
* Use **assert (`.aiAssert`)** to perform assertions on the page.
All these methods accept natural language prompt as param. Obviously, the cost of script maintenance will be greatly decreased.
## Start with Chrome extension
To quickly experience the main features of Midscene, you can use the Midscene Chrome extension. It allows you to use Midscene on any webpage without writing any code.
Click [here](https://chromewebstore.google.com/detail/midscene/gbldofcpkknbggpkmbdaefngejllnief) to install Midscene extension from Chrome Web Store.
For instructions, please refer to [Quick Experience](./quick-experience).
## Multiple ways to integrate
Maintaining automation scripts by Midscene could be a brand new experience. For example, to search for headphones on a website, you can do this:
```typescript
// 👀 type keywords, perform a search
await ai('type "Headphones" in search box, hit Enter');
// 👀 find the items, return in JSON
const items = await aiQuery(
"{itemTitle: string, price: Number}[], find item in list and corresponding price"
);
console.log("headphones in stock", items);
// 👀 assert by natural language
await aiAssert("There is a category filter on the left");
```
There are several ways to integrate Midscene into your code project:
* [Automate with Scripts in YAML](./automate-with-scripts-in-yaml), use this if you prefer to write YAML file instead of code
* [Bridge Mode by Chrome Extension](./bridge-mode-by-chrome-extension), use this to control the desktop Chrome by scripts
* [Integrate with Puppeteer](./integrate-with-puppeteer)
* [Integrate with Playwright](./integrate-with-playwright)
* [Integrate with Android](./integrate-with-android)
## Visualized report
Midscene wants to provide a way to make automation more stable and easier to debug, so we provide a visual report after each run. With this report, you can review the animated replay and view the details of each step in the process.
What's more, there is a playground in the report file for you to adjust your prompt without re-running all your scripts.
<p align="center">
<img src="/report.gif" alt="visualized report" loading="lazy" />
</p>
## ✨ Model Choices
You can use multimodal LLMs like `gpt-4o`, or visual-language models like `Qwen2.5-VL`, `gemini-2.5-pro` and `UI-TARS`. In which `UI-TARS` is an open-source model dedicated for UI automation.
Read more about [Choose a model](https://midscenejs.com/choose-a-model)
## 👀 Comparing to ...
There are so many UI automation tools out there, and each one seems to be all-powerful. What's special about Midscene.js?
* Debugging Experience: You will soon realize that debugging and maintaining automation scripts is the real challenge. No matter how magical the demo looks, ensuring stability over time requires careful debugging. Midscene.js offers a visualized report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need, and were continually working to improve the debugging experience.
* Open Source, Free, Deploy as you want: Midscene.js is an open-source project. It's decoupled from any cloud service and model provider, you can choose either public or private deployment. There is always a suitable plan for your business.
* Integrate with Javascript: You can always bet on Javascript 😎
## Just you and model provider, no third-party services
All data gathered from pages will be sent directly to OpenAI or the custom model provider according to your configuration. Therefore, no third-party platform will access the data.
For more details, please refer to [Data Privacy](./data-privacy).
## Follow us
* [GitHub - give us a star if you like it!](https://github.com/web-infra-dev/midscene)
* [Twitter](https://x.com/midscene_ai)
* [Discord](https://discord.gg/2JyBHxszE4)
* [Lark](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=291q2b25-e913-411a-8c51-191e59aab14d)