midscene/README.md
Leyang 2589a9c4ca
docs(android): update android docs (#607)
* docs: release android automation

* chore(docs): update doubao docs

* chore(docs): merge docs for doubao

* docs(android): update

* docs(site): add more android case

* docs(site): update slogan and authors

* docs(site): android yaml

* docs(core): instruction for override config

* docs(core): update readme

* Update README.md

* docs(core): update readme

* docs(core): update readme

* docs(core): update readme

* docs(core): update readme

* docs(core): update README and blog for Android automation support

* docs(core): update android playground doc

* docs(core): enhance Android integration documentation with setup instructions

* docs(core): update android playground doc

* docs(core): update Android integration documentation and add setup instructions

* docs(core): update bridge mode title

* docs(core): update yaml docs

* docs(site): chore update

* docs(site): update YAML documentation with setup instructions and clarify parameters

* docs(core): update instructions

* chore: update docs

* chore: update bridge mode docs

* docs(site): translate to zh

* docs(site): translate error

* docs(site): remove unnecessary code block in YAML automation documentation

* docs(core): update blog

* docs(core): update instructions

* docs(core): update instructions

---------

Co-authored-by: yutao <yutao.tao@bytedance.com>
Co-authored-by: yuyutaotao <167746126+yuyutaotao@users.noreply.github.com>
2025-04-21 20:51:17 +08:00

7.6 KiB
Raw Blame History

Midscene.js

Midscene.js

English | 简体中文

Your AI Operator for Web, Android, Automation & Testing

npm version huagging face model downloads License discord twitter

Midscene.js allows AI to serve as your web and Android operator 🤖. Simply describe what you want to achieve in natural language, and it will assist you in operating the interface, validating content, and extracting data. Whether you seek a quick experience or in-depth development, you'll find it easy to get started.

Showcases

Instruction Video
Post a Tweet (By UI-TARS model)
Use JS code to drive task orchestration, collect information about Jay Chou's concert, and write it into Google Docs (By UI-TARS model)
Control Maps App on Android (By Qwen-2.5-VL model)

📢 2025 Feb: New open-source model choice - UI-TARS and Qwen2.5-VL

Besides the default model GPT-4o, we have added two new recommended open-source models to Midscene.js: UI-TARS and Qwen2.5-VL. (Yes, Open Source models !) They are dedicated models for image recognition and UI automation, which are known for performing well in UI automation scenarios. Read more about it in Choose a model.

💡 Features

  • Natural Language Interaction 👆: Just describe your goals and steps, and Midscene will plan and operate the user interface for you.
  • UI Automation 🤖
  • Visual Reports for Debugging 🎞️: Through our test reports and Playground, you can easily understand, replay and debug the entire process.
  • Support Caching 🔄: The first time you execute a task through AI, it will be cached, and subsequent executions of the same task will significantly improve execution efficiency.
  • Completely Open Source 🔥: Experience a whole new automation development experience, enjoy!
  • Understand UI, JSON Format Responses 🔍: You can specify data format requirements and receive responses in JSON format.
  • Intuitive Assertions 🤔: Express your assertions in natural language, and AI will understand and process them.

Model Choices

You can use multimodal LLMs like gpt-4o, or visual-language models like Qwen2.5-VL, gemini-2.5-pro and UI-TARS. In which UI-TARS is an open-source model dedicated for UI automation.

Read more about Choose a model

👀 Comparing to ...

There are so many UI automation tools out there, and each one seems to be all-powerful. What's special about Midscene.js?

  • Debugging Experience: You will soon realize that debugging and maintaining automation scripts is the real challenge. No matter how magical the demo looks, ensuring stability over time requires careful debugging. Midscene.js offers a visualized report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need, and were continually working to improve the debugging experience.

  • Open Source, Free, Deploy as you want: Midscene.js is an open-source project. It's decoupled from any cloud service and model provider, you can choose either public or private deployment. There is always a suitable plan for your business.

  • Integrate with Javascript: You can always bet on Javascript 😎

📄 Resources

🤝 Community

📝 Credits

We would like to thank the following projects:

  • Rsbuild for the build tool.
  • UI-TARS for the open-source agent model UI-TARS.
  • Qwen2.5-VL for the open-source VL model Qwen2.5-VL.
  • scrcpy and yume-chan allow us to control Android devices with browser.
  • appium-adb for the javascript bridge of adb.
  • YADB for the yadb tool which improves the compatibility of text input.

Citation

If you use Midscene.js in your research or project, please cite:

@software{Midscene.js,
  author = {Xiao Zhou, Tao Yu, YiBing Lin},
  title = {Midscene.js: Your AI Operator for Web, Android, Automation & Testing.},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/web-infra-dev/midscene}
}

📝 License

Midscene.js is MIT licensed.


If this project helps you or inspires you, please give us a