Midscene.js provides AI caching features to improve the stability and speed of the entire AI execution process. The cache mainly refers to caching how AI recognizes page elements. Cached AI query results are used if page elements haven't changed.
Currently, Midscene's caching strategy in all scenarios is mainly based on the test file unit. AI behavior in each test file will be cached. The cached content is mainly divided into two categories:
After the AI has planned the user's instructions into tasks, it needs to operate on specific elements, so the AI's element recognition capability is needed. For example, the following task:
The above `test` will generate caches along the dimensions of `ai todo` and `ai todo2`, and `todo-mvc.spec.ts-1.json` and `todo-mvc.spec.ts-2.json` cache files will be generated in the `midscene/midscene_run/cache` directory in the project root.
"prompt": "Enter \"Learn JS today\" in the task box, then press Enter to create",
"response": {
// AI's tasks
"plans": [
{
"thought": "The user wants to input a new task in the todo list input box and then press enter to create it. The input field is identified by its placeholder text 'What needs to be done?'.",
"type": "Locate",
"param": {
"prompt": "The input box with the placeholder text 'What needs to be done?'."
}
},
{
"thought": "Once the input box is located, we need to enter the task description.",
"type": "Input",
"param": {
"value": "Learn JS today"
}
},
{
"thought": "After entering the task, we need to commit it by pressing 'Enter'.",
"prompt": "The input box with the placeholder text 'What needs to be done?'.",
"response": {
// Returned element content
"elements": [
{
// Why AI found this element
"reason": "The element with ID '3530a9c1eb' is an INPUT Node. Its placeholder text is 'What needs to be done?', which matches the user's description.",
// Element text
"text": "What needs to be done?",
// Unique ID generated based on the element (generated based on position and size)
When the `MIDSCENE_CACHE=true` environment variable is used and there are cache files, the AI's corresponding results will be read through the above cache file. The following are the conditions for cache hit:
1. High AI response latency, a task will take several seconds, and when there are dozens or even hundreds of tasks, there will be a higher latency
2. AI response stability, through training and experiments, we found that GPT-4 has an accuracy rate of over 95% in page element recognition tasks, but it cannot reach 100% accuracy yet. The caching capability can effectively reduce online stability issues
For AI behaviors that do not hit the cache, they will be re-executed by AI, and the cache will be updated after the entire test group is executed. You can check the cache file to determine which tasks have been updated.
* When deleting the corresponding cache file, the cache of the entire test group will automatically become invalid
* When deleting specific tasks in the cache file, the corresponding tasks will automatically become invalid. Deleting the tasks before will not affect the tasks after. The tasks will be updated after successful execution