Skip to content

Commit aeab83c

Browse files
authored
docs: add previous changelog (#816)
* docs(site): add previous changelog * docs(site): add recently changelog * docs(site): summary for previous changlog
1 parent 7ced918 commit aeab83c

File tree

16 files changed

+689
-3
lines changed

16 files changed

+689
-3
lines changed

apps/site/docs/en/changelog.mdx

Lines changed: 332 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ Upgrade now to experience these powerful new features!
7575
* [API documentation for more Android configuration items](/en/integrate-with-android.mdx#androiddevice-constructor)
7676

7777

78-
## v0.17.4 - Let AI See the DOM of the Page
78+
## v0.17 - Let AI See the DOM of the Page
7979

8080
### Data Query API Enhanced
8181

@@ -148,4 +148,335 @@ Report file: [puppeteer-2025-06-04_20-34-48-zyh4ry4e.html](https://lf3-static.by
148148
The corresponding code can be found in our example repository: [puppeteer-demo/extract-data.ts](https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo/extract-data.ts)
149149

150150

151+
### Refactor Cache
151152

153+
Use xpath cache instead of coordinates, improve cache hit rate.
154+
155+
Refactor cache file format from json to yaml, improve readability.
156+
157+
## v0.16 - Support MCP
158+
159+
### Midscene MCP
160+
161+
🤖 Use Cursor / Trae to help write test cases.
162+
🕹️ Quickly implement browser operations akin to the Manus platform.
163+
🔧 Integrate Midscene capabilities swiftly into your platforms and tools.
164+
165+
<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/ozpmyhn_lm_hymuPild/ljhwZthlaukjlkulzlp/midscene/en-midscene-mcp-Sauce-Demo.mp4" controls/>
166+
167+
Read more: [MCP](./mcp.mdx)
168+
169+
### Support structured API for agent
170+
171+
APIs: `aiBoolean`, `aiNumber`, `aiString`, `aiLocate`
172+
173+
Read more: [Use JavaScript to Optimize the AI Automation Code](./blog-programming-practice-using-structured-api.md)
174+
175+
## v0.15 - Android automation unlocked!
176+
177+
### Android automation unlocked!
178+
179+
🤖 AI Playground: natural‑language debugging
180+
📱 Supports native, Lynx & WebView apps
181+
🔁 Replayable runs
182+
🛠️ YAML or JS SDK
183+
⚡ Auto‑planning & Instant Actions APIs
184+
185+
Read more: [Android automation](./blog-support-android-automation.mdx)
186+
187+
### More features
188+
189+
* Allow custom midscene_run dir
190+
* Enhance report filename generation with unique identifiers and support split mode
191+
* Enhance timeout configurations and logging for network idle and navigation
192+
* Adapt for gemini-2.5-pro
193+
194+
## v0.14 - Instant Actions
195+
196+
"Instant Actions" introduces new atomic APIs, enhancing the accuracy of AI operations.
197+
198+
Read more: [Instant Actions](./blog-introducing-instant-actions-and-deep-think.md)
199+
200+
## v0.13 - DeepThink Mode
201+
202+
### Atomic AI Interaction Methods
203+
204+
* Supports aiTap, aiInput, aiHover, aiScroll, and aiKeyboardPress for precise AI actions.
205+
206+
### DeepThink Mode
207+
208+
* Enhances click accuracy with deeper contextual understanding.
209+
210+
![](/blog/0.13.jpeg)
211+
212+
## v0.12 - Integrate Qwen 2.5 VL
213+
214+
### Integrate Qwen 2.5 VL's native capabilities
215+
216+
* Keeps output accuracy.
217+
* Supports more element interactions.
218+
* Cuts operating cost by over 80%.
219+
220+
## v0.11.0 - UI-TARS Model Caching
221+
222+
### **✨ UI-TARS Model Support Caching**
223+
224+
* Enable caching by document 👉 : [Enable Caching](./caching.mdx)
225+
226+
* Enable effect
227+
228+
<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/antd-form-cache.mp4" controls/>
229+
230+
![](/blog/0.11.0.png)
231+
232+
### **✨ Optimize DOM Tree Extraction Strategy**
233+
234+
* Optimize the information ability of the dom tree, accelerate the inference process of models like GPT 4o
235+
236+
![](/blog/0.11.0-2.png)
237+
238+
239+
## v0.10.0 - UI-TARS Model Released
240+
241+
UI-TARS is a Native GUI agent model released by the **Seed** team. It is named after the [TARS robot](https://interstellarfilm.fandom.com/wiki/TARS) in the movie [Star Trek](https://en.wikipedia.org/wiki/Star_Trek), which has high intelligence and autonomous thinking capabilities. UI-TARS **takes images and human instructions as input information**, can correctly perceive the next action, and gradually approach the goal of human instructions, leading to the best performance in various benchmark tests of GUI automation tasks compared to open-source and closed-source commercial models.
242+
243+
![](/blog/0.10.0.png)
244+
245+
UI-TARS: Pioneering Automated GUI Interaction with Native Agents - Figure 1
246+
247+
![](/blog/0.10.0-2.png)
248+
249+
UI-TARS: Pioneering Automated GUI Interaction with Native - Figure 4
250+
251+
### **** Model Advantage
252+
253+
UI-TARS has the following advantages in GUI tasks:
254+
255+
* **Target-driven**
256+
257+
* **Fast inference speed**
258+
259+
* **Native GUI agent model**
260+
261+
* **Private deployment without data security issues**
262+
263+
264+
## v0.9.0 - Bridge Mode Released
265+
266+
With the Midscene browser extension, you can now use scripts to link with the desktop browser for automated operations!
267+
268+
We call it "Bridge Mode".
269+
270+
Compared to previous CI environment debugging, the advantages are:
271+
272+
1. You can reuse the desktop browser, especially Cookie, login state, and front-end interface state, and start automation without worrying about environment setup.
273+
274+
2. Support manual and script cooperation to improve the flexibility of automation tools.
275+
276+
3. Simple business regression, just run it locally with Bridge Mode.
277+
278+
![](/blog/0.9.0.png)
279+
280+
Documentation: [Use Chrome Extension to Experience Midscene](./bridge-mode-by-chrome-extension.mdx)
281+
282+
283+
## v0.8.0 - Chrome Extension
284+
285+
### **✨ New Chrome Extension, Run Midscene Anywhere**
286+
287+
Through the Midscene browser extension, you can run Midscene on any page, without writing any code.
288+
289+
Experience it now 👉:[Use Chrome Extension to Experience Midscene](./quick-experience.mdx)
290+
291+
<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/Midscene_extension.mov" controls/>
292+
293+
294+
295+
## v0.7.0 - Playground Ability
296+
297+
### **✨ Playground Ability, Debug Anytime**
298+
299+
Now you don't have to keep re-running scripts to debug prompts!
300+
301+
On the new test report page, you can debug the AI execution results at any time, including page operations, page information extraction, and page assertions.
302+
303+
<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/midscene-playground.mov" controls/>
304+
305+
306+
## v0.6.0 - Doubao Model Support
307+
308+
### **✨ Doubao Model Support**
309+
310+
* Support for calling Doubao models, reference the environment variables below to experience.
311+
312+
```bash
313+
MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"baseURL":"https://xxx.net/api/v3","apiKey":"xxx"}'
314+
MIDSCENE_MODEL_NAME='ep-20240925111815-mpfz8'
315+
MIDSCENE_MODEL_TEXT_ONLY='true'
316+
```
317+
318+
Summarize the availability of Doubao models:
319+
320+
* Currently, Doubao only has pure text models, which means "seeing" is not available. In scenarios where pure text is used for reasoning, it performs well.
321+
322+
* If the use case requires combining UI analysis, it is completely unusable
323+
324+
325+
Example:
326+
327+
✅ The price of a multi-meat grape (can be guessed from the order of the text on the interface)
328+
329+
✅ The language switch text button (can be guessed from the text content on the interface: Chinese, English text)
330+
331+
❌ The left-bottom play button (requires image understanding, failed)
332+
333+
### **✨ Support for GPT-4o Structured Output, Cost Reduction**
334+
335+
By using the gpt-4o-2024-08-06 model, Midscene now supports structured output (structured-output) features, ensuring enhanced stability and reduced costs by 40%+.
336+
337+
Midscene now supports hitting GPT-4o prompt caching features, and the cost of AI calls will continue to decrease as the company's GPT platform is deployed.
338+
339+
### **✨ Test Report: Support Animation Playback**
340+
341+
Now you can view the animation playback of each step in the test report, quickly debug your running script
342+
343+
<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/midscene-play-all.mp4" controls/>
344+
345+
### **✨ Speed Up: Merge Plan and Locate Operations, Response Speed Increased by 30%**
346+
347+
In the new version, we have merged the Plan and Locate operations in the prompt execution to a certain extent, which increases the response speed of AI by 30%.
348+
349+
> Before
350+
351+
![](/blog/0.6.0.png)
352+
353+
> after
354+
355+
![](/blog/0.6.0-2.png)
356+
357+
### **✨ Test Report: The Accuracy of Different Models**
358+
359+
* GPT 4o series models, 100% correct rate
360+
361+
* doubao-pro-4k pure text model, approaching usable state
362+
363+
![](/blog/0.6.0-3.png)
364+
365+
![](/blog/0.6.0-4.png)
366+
367+
### **🐞** Problem Fix
368+
369+
* Optimize the page information extraction to avoid collecting obscured elements, improving success rate, speed, and AI call cost 🚀
370+
371+
> before
372+
373+
![](/blog/0.6.0-5.png)
374+
375+
> after
376+
377+
![](/blog/0.6.0-6.png)
378+
379+
380+
## v0.5.0 - Support GPT-4o Structured Output
381+
382+
### **✨ New Features**
383+
384+
* Support for gpt-4o-2024-08-06 model to provide 100% JSON format limit, reducing Midscene task planning hallucination behavior
385+
386+
![](/blog/0.5.0.png)
387+
388+
* Support for Playwright AI behavior real-time visualization, improve the efficiency of troubleshooting
389+
390+
![](/blog/0.5.0-2.png)
391+
392+
* Cache generalization, cache capabilities are no longer limited to playwright, pagepass, puppeteer can also use cache
393+
394+
```diff
395+
- playwright test --config=playwright.config.ts
396+
# Enable cache
397+
+ MIDSCENE_CACHE=true playwright test --config=playwright.config.ts
398+
```
399+
400+
* Support for azure openAI
401+
402+
* Support for AI to add, delete, and modify the existing input
403+
404+
### **🐞** Problem Fix
405+
406+
* Optimize the page information extraction to avoid collecting obscured elements, improving success rate, speed, and AI call cost 🚀
407+
408+
* During the AI interaction process, unnecessary attribute fields were trimmed, reducing token consumption.
409+
410+
* Optimize the AI interaction process to reduce the likelihood of hallucination in KeyboardPress and Input events
411+
412+
* For pagepass, provide an optimization solution for the flickering behavior that occurs during the execution of Midscene
413+
414+
```javascript
415+
// Currently, pagepass relies on a too low version of puppeteer, which may cause the interface to flicker and the cursor to be lost. The following solution can be used to solve this problem
416+
const originScreenshot = puppeteerPage.screenshot;
417+
puppeteerPage.screenshot = async (options) => {
418+
return await originScreenshot.call(puppeteerPage, {
419+
...options,
420+
captureBeyondViewport: false
421+
});
422+
};
423+
```
424+
425+
## v0.4.0 - Support Cli Usage
426+
427+
### **✨ New Features**
428+
429+
* Support for Cli usage, reducing the usage threshold of Midscene
430+
431+
```bash
432+
# headed mode (visible browser) access baidu.com and search "weather"
433+
npx @midscene/cli --headed --url https://www.baidu.com --action "input 'weather', press enter" --sleep 3000
434+
435+
# visit github status page and save the status to ./status.json
436+
npx @midscene/cli --url https://www.githubstatus.com/ \
437+
--query-output status.json \
438+
--query '{serviceName: string, status: string}[], github page status, return service name'
439+
```
440+
441+
* Support for AI to wait for a certain time to continue the subsequent task execution
442+
443+
* Playwright AI task report shows the overall time and aggregates AI tasks by test group
444+
445+
### **🐞** Problem Fix
446+
447+
* Optimize the AI interaction process to reduce the likelihood of hallucination in KeyboardPress and Input events
448+
449+
450+
## v0.3.0 - Support AI Report HTML
451+
452+
### **✨ New Features**
453+
454+
* Generate html format AI report, aggregate AI tasks by test group, facilitate test report distribution
455+
456+
### **🐞** Problem Fix
457+
458+
* Fix the problem of AI report scrolling preview
459+
460+
## v0.2.0 - Control puppeteer by natural language
461+
462+
### **✨ New Features**
463+
464+
* Support for using natural language to control puppeteer to implement page automation 🗣️💻
465+
466+
* Provide AI cache capabilities for playwright framework, improve stability and execution efficiency
467+
468+
* AI report visualization, aggregate AI tasks by test group, facilitate test report distribution
469+
470+
* Support for AI to assert the page, let AI judge whether the page meets certain conditions
471+
472+
## v0.1.0 - Control playwright by natural language
473+
474+
### **✨ New Features**
475+
476+
* Support for using natural language to control puppeteer to implement page automation 🗣️💻
477+
478+
* Support for using natural language to extract page information 🔍🗂️
479+
480+
* AI report visualization, AI behavior, AI thinking visualization 🛠️👀
481+
482+
* Direct use of GPT-4o model, no training required 🤖🔧
272 KB
Loading

apps/site/docs/public/blog/0.10.0.png

124 KB
Loading
450 KB
Loading

apps/site/docs/public/blog/0.11.0.png

876 KB
Loading
195 KB
Loading
292 KB
Loading

apps/site/docs/public/blog/0.5.0.png

117 KB
Loading
316 KB
Loading
89.1 KB
Loading
150 KB
Loading
230 KB
Loading
218 KB
Loading

apps/site/docs/public/blog/0.6.0.png

318 KB
Loading

apps/site/docs/public/blog/0.9.0.png

493 KB
Loading

0 commit comments

Comments
 (0)