You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The corresponding code can be found in our example repository: [puppeteer-demo/extract-data.ts](https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo/extract-data.ts)
149
149
150
150
151
+
### Refactor Cache
151
152
153
+
Use xpath cache instead of coordinates, improve cache hit rate.
154
+
155
+
Refactor cache file format from json to yaml, improve readability.
156
+
157
+
## v0.16 - Support MCP
158
+
159
+
### Midscene MCP
160
+
161
+
🤖 Use Cursor / Trae to help write test cases.
162
+
🕹️ Quickly implement browser operations akin to the Manus platform.
163
+
🔧 Integrate Midscene capabilities swiftly into your platforms and tools.
* Optimize the information ability of the dom tree, accelerate the inference process of models like GPT 4o
235
+
236
+

237
+
238
+
239
+
## v0.10.0 - UI-TARS Model Released
240
+
241
+
UI-TARS is a Native GUI agent model released by the **Seed** team. It is named after the [TARS robot](https://interstellarfilm.fandom.com/wiki/TARS) in the movie [Star Trek](https://en.wikipedia.org/wiki/Star_Trek), which has high intelligence and autonomous thinking capabilities. UI-TARS **takes images and human instructions as input information**, can correctly perceive the next action, and gradually approach the goal of human instructions, leading to the best performance in various benchmark tests of GUI automation tasks compared to open-source and closed-source commercial models.
UI-TARS: Pioneering Automated GUI Interaction with Native - Figure 4
250
+
251
+
### **✨** Model Advantage
252
+
253
+
UI-TARS has the following advantages in GUI tasks:
254
+
255
+
***Target-driven**
256
+
257
+
***Fast inference speed**
258
+
259
+
***Native GUI agent model**
260
+
261
+
***Private deployment without data security issues**
262
+
263
+
264
+
## v0.9.0 - Bridge Mode Released
265
+
266
+
With the Midscene browser extension, you can now use scripts to link with the desktop browser for automated operations!
267
+
268
+
We call it "Bridge Mode".
269
+
270
+
Compared to previous CI environment debugging, the advantages are:
271
+
272
+
1. You can reuse the desktop browser, especially Cookie, login state, and front-end interface state, and start automation without worrying about environment setup.
273
+
274
+
2. Support manual and script cooperation to improve the flexibility of automation tools.
275
+
276
+
3. Simple business regression, just run it locally with Bridge Mode.
277
+
278
+

279
+
280
+
Documentation: [Use Chrome Extension to Experience Midscene](./bridge-mode-by-chrome-extension.mdx)
281
+
282
+
283
+
## v0.8.0 - Chrome Extension
284
+
285
+
### **✨ New Chrome Extension, Run Midscene Anywhere**
286
+
287
+
Through the Midscene browser extension, you can run Midscene on any page, without writing any code.
288
+
289
+
Experience it now 👉:[Use Chrome Extension to Experience Midscene](./quick-experience.mdx)
Now you don't have to keep re-running scripts to debug prompts!
300
+
301
+
On the new test report page, you can debug the AI execution results at any time, including page operations, page information extraction, and page assertions.
* Currently, Doubao only has pure text models, which means "seeing" is not available. In scenarios where pure text is used for reasoning, it performs well.
321
+
322
+
* If the use case requires combining UI analysis, it is completely unusable
323
+
324
+
325
+
Example:
326
+
327
+
✅ The price of a multi-meat grape (can be guessed from the order of the text on the interface)
328
+
329
+
✅ The language switch text button (can be guessed from the text content on the interface: Chinese, English text)
330
+
331
+
❌ The left-bottom play button (requires image understanding, failed)
332
+
333
+
### **✨ Support for GPT-4o Structured Output, Cost Reduction**
334
+
335
+
By using the gpt-4o-2024-08-06 model, Midscene now supports structured output (structured-output) features, ensuring enhanced stability and reduced costs by 40%+.
336
+
337
+
Midscene now supports hitting GPT-4o prompt caching features, and the cost of AI calls will continue to decrease as the company's GPT platform is deployed.
338
+
339
+
### **✨ Test Report: Support Animation Playback**
340
+
341
+
Now you can view the animation playback of each step in the test report, quickly debug your running script
### **✨ Speed Up: Merge Plan and Locate Operations, Response Speed Increased by 30%**
346
+
347
+
In the new version, we have merged the Plan and Locate operations in the prompt execution to a certain extent, which increases the response speed of AI by 30%.
348
+
349
+
> Before
350
+
351
+

352
+
353
+
> after
354
+
355
+

356
+
357
+
### **✨ Test Report: The Accuracy of Different Models**
358
+
359
+
* GPT 4o series models, 100% correct rate
360
+
361
+
* doubao-pro-4k pure text model, approaching usable state
362
+
363
+

364
+
365
+

366
+
367
+
### **🐞** Problem Fix
368
+
369
+
* Optimize the page information extraction to avoid collecting obscured elements, improving success rate, speed, and AI call cost 🚀
370
+
371
+
> before
372
+
373
+

374
+
375
+
> after
376
+
377
+

378
+
379
+
380
+
## v0.5.0 - Support GPT-4o Structured Output
381
+
382
+
### **✨ New Features**
383
+
384
+
* Support for gpt-4o-2024-08-06 model to provide 100% JSON format limit, reducing Midscene task planning hallucination behavior
385
+
386
+

387
+
388
+
* Support for Playwright AI behavior real-time visualization, improve the efficiency of troubleshooting
389
+
390
+

391
+
392
+
* Cache generalization, cache capabilities are no longer limited to playwright, pagepass, puppeteer can also use cache
393
+
394
+
```diff
395
+
- playwright test --config=playwright.config.ts
396
+
# Enable cache
397
+
+ MIDSCENE_CACHE=true playwright test --config=playwright.config.ts
398
+
```
399
+
400
+
* Support for azure openAI
401
+
402
+
* Support for AI to add, delete, and modify the existing input
403
+
404
+
### **🐞** Problem Fix
405
+
406
+
* Optimize the page information extraction to avoid collecting obscured elements, improving success rate, speed, and AI call cost 🚀
407
+
408
+
* During the AI interaction process, unnecessary attribute fields were trimmed, reducing token consumption.
409
+
410
+
* Optimize the AI interaction process to reduce the likelihood of hallucination in KeyboardPress and Input events
411
+
412
+
* For pagepass, provide an optimization solution for the flickering behavior that occurs during the execution of Midscene
413
+
414
+
```javascript
415
+
// Currently, pagepass relies on a too low version of puppeteer, which may cause the interface to flicker and the cursor to be lost. The following solution can be used to solve this problem
416
+
constoriginScreenshot=puppeteerPage.screenshot;
417
+
puppeteerPage.screenshot=async (options) => {
418
+
returnawaitoriginScreenshot.call(puppeteerPage, {
419
+
...options,
420
+
captureBeyondViewport:false
421
+
});
422
+
};
423
+
```
424
+
425
+
## v0.4.0 - Support Cli Usage
426
+
427
+
### **✨ New Features**
428
+
429
+
* Support for Cli usage, reducing the usage threshold of Midscene
430
+
431
+
```bash
432
+
# headed mode (visible browser) access baidu.com and search "weather"
0 commit comments