Skip to content

Commit 09c6aaa

Browse files
authored
feat(web-integration): support user expected xpath option for locate methods (#844)
* feat(web-integration): support user expected xpath option for locate methods * docs(site): xpath option for api * fix(ci): nx config * docs(site): update docs
1 parent bee1c40 commit 09c6aaa

File tree

16 files changed

+227
-109
lines changed

16 files changed

+227
-109
lines changed

apps/report/src/components/detail-side.tsx

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -198,14 +198,14 @@ const DetailSide = (): JSX.Element => {
198198
},
199199
]
200200
: []),
201-
{
202-
key: 'cache',
203-
content: task?.cache ? (
204-
<pre>{JSON.stringify(task?.cache, undefined, 2)}</pre>
205-
) : (
206-
'false'
207-
),
208-
},
201+
...(task?.hitBy
202+
? [
203+
{
204+
key: 'hitBy',
205+
content: <pre>{JSON.stringify(task?.hitBy, undefined, 2)}</pre>,
206+
},
207+
]
208+
: []),
209209
...(task?.locate
210210
? [
211211
{

apps/report/src/components/sidebar.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ const SideItem = (props: {
1818
const selectedClass = selected ? 'selected' : '';
1919
let statusText: JSX.Element | string = task.status;
2020

21-
const cacheEl = task.cache?.hit ? <span>(cache) </span> : null;
21+
const cacheEl = task.hitBy?.from === 'Cache' ? <span>(cache) </span> : null;
2222

2323
const deepThinkEl = (task as ExecutionTaskInsightLocate)?.log?.dump
2424
?.deepThink ? (

apps/site/docs/en/API.mdx

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,8 @@ function aiTap(locate: string, options?: Object): Promise<void>;
108108

109109
- `locate: string` - A natural language description of the element to tap.
110110
- `options?: Object` - Optional, a configuration object containing:
111-
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element.
111+
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element. False by default.
112+
- `xpath?: string` - The xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
112113
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
113114

114115
- Return Value:
@@ -140,7 +141,8 @@ function aiHover(locate: string, options?: Object): Promise<void>;
140141

141142
- `locate: string` - A natural language description of the element to hover over.
142143
- `options?: Object` - Optional, a configuration object containing:
143-
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element.
144+
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element. False by default.
145+
- `xpath?: string` - The xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
144146
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
145147

146148
- Return Value:
@@ -168,7 +170,8 @@ function aiInput(text: string, locate: string, options?: Object): Promise<void>;
168170
- `text: string` - The final text content that should be placed in the input element. Use blank string to clear the input.
169171
- `locate: string` - A natural language description of the element to input text into.
170172
- `options?: Object` - Optional, a configuration object containing:
171-
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element.
173+
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element. False by default.
174+
- `xpath?: string` - The xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
172175
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
173176
- `autoDismissKeyboard?: boolean` - If true, the keyboard will be dismissed after input text, only available in Android. (Default: true)
174177

@@ -201,7 +204,8 @@ function aiKeyboardPress(
201204
- `key: string` - The web key to press, e.g. 'Enter', 'Tab', 'Escape', etc. Key Combination is not supported.
202205
- `locate?: string` - Optional, a natural language description of the element to press the key on.
203206
- `options?: Object` - Optional, a configuration object containing:
204-
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element.
207+
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element. False by default.
208+
- `xpath?: string` - The xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
205209
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
206210

207211
- Return Value:
@@ -236,7 +240,8 @@ function aiScroll(
236240
- `distance: number` - Optional, the distance to scroll in px.
237241
- `locate?: string` - Optional, a natural language description of the element to scroll on. If not provided, Midscene will perform scroll on the current mouse position.
238242
- `options?: Object` - Optional, a configuration object containing:
239-
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element.
243+
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element. False by default.
244+
- `xpath?: string` - The xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
240245
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
241246

242247
- Return Value:
@@ -266,7 +271,8 @@ function aiRightClick(locate: string, options?: Object): Promise<void>;
266271

267272
- `locate: string` - A natural language description of the element to right-click on.
268273
- `options?: Object` - Optional, a configuration object containing:
269-
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element.
274+
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element. False by default.
275+
- `xpath?: string` - The xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
270276
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
271277

272278
- Return Value:
@@ -286,7 +292,7 @@ await agent.aiRightClick('The file name at the top of the page', {
286292

287293
:::tip About the `deepThink` feature
288294

289-
The `deepThink` feature is a powerful feature that allows Midscene to call AI model twice to precisely locate the element. It is useful when the AI model find it hard to distinguish the element from its surroundings.
295+
The `deepThink` feature is a powerful feature that allows Midscene to call AI model twice to precisely locate the element. False by default. It is useful when the AI model find it hard to distinguish the element from its surroundings.
290296

291297
:::
292298

@@ -531,7 +537,8 @@ function aiLocate(
531537

532538
- `locate: string` - A natural language description of the element to locate.
533539
- `options?: Object` - Optional, a configuration object containing:
534-
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element.
540+
- `deepThink?: boolean` - If true, Midscene will call AI model twice to precisely locate the element. False by default.
541+
- `xpath?: string` - The xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
535542
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
536543

537544
- Return Value:

apps/site/docs/en/automate-with-scripts-in-yaml.mdx

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -206,24 +206,32 @@ tasks:
206206
207207
# tap an element located by prompt
208208
- aiTap: <prompt>
209-
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element
209+
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element. False by default.
210+
xpath: <xpath> # optional, the xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
211+
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
210212
cacheable: <boolean> # optional, whether cacheable when enabling [caching feature](./caching.mdx). True by default.
211213

212214
# hover an element located by prompt
213215
- aiHover: <prompt>
214-
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element
216+
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element. False by default.
217+
xpath: <xpath> # optional, the xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
218+
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
215219
cacheable: <boolean> # optional, whether cacheable when enabling [caching feature](./caching.mdx). True by default.
216220

217221
# input text into an element located by prompt
218222
- aiInput: <final text content of the input>
219223
locate: <prompt>
220-
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element
224+
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element. False by default.
225+
xpath: <xpath> # optional, the xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
226+
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
221227
cacheable: <boolean> # optional, whether cacheable when enabling [caching feature](./caching.mdx). True by default.
222228

223229
# press a key (like Enter, Tab, Escape, etc.) on an element located by prompt
224230
- aiKeyboardPress: <key>
225231
locate: <prompt>
226-
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element
232+
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element. False by default.
233+
xpath: <xpath> # optional, the xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
234+
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
227235
cacheable: <boolean> # optional, whether cacheable when enabling [caching feature](./caching.mdx). True by default.
228236

229237
# scroll globally or on an element located by prompt
@@ -232,7 +240,9 @@ tasks:
232240
scrollType: 'once' # or 'untilTop' | 'untilBottom' | 'untilLeft' | 'untilRight'
233241
distance: <number> # optional, distance to scroll in px
234242
locate: <prompt> # optional, the element to scroll on
235-
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element
243+
deepThink: <boolean> # optional, whether to use deepThink to precisely locate the element. False by default.
244+
xpath: <xpath> # optional, the xpath of the element to operate. If provided, Midscene will first use this xpath to locate the element before using the cache and the AI model. Empty by default.
245+
- `cacheable?: boolean` - Whether cacheable when enabling [caching feature](./caching.mdx). True by default.
236246
cacheable: <boolean> # optional, whether cacheable when enabling [caching feature](./caching.mdx). True by default.
237247

238248
# log the current screenshot with a description in the report file

apps/site/docs/zh/API.mdx

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,8 @@ function aiTap(locate: string, options?: Object): Promise<void>;
105105
- 参数:
106106
- `locate: string` - 用自然语言描述的元素定位。
107107
- `options?: Object` - 可选,一个配置对象,包含:
108-
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。
108+
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。默认值为 False
109+
- `xpath?: string` - 目标元素的 xpath 路径,用于执行当前操作。如果提供了这个 xpath,Midscene 会优先使用该 xpath 来找到元素,然后依次使用缓存和 AI 模型。默认值为空
109110
- `cacheable?: boolean` - 当启用 [缓存功能](./caching.mdx) 时,是否允许缓存当前 API 调用结果。默认值为 True
110111
- 返回值:
111112

@@ -133,7 +134,8 @@ function aiHover(locate: string, options?: Object): Promise<void>;
133134
- 参数:
134135
- `locate: string` - 用自然语言描述的元素定位。
135136
- `options?: Object` - 可选,一个配置对象,包含:
136-
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。
137+
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。默认值为 False
138+
- `xpath?: string` - 目标元素的 xpath 路径,用于执行当前操作。如果提供了这个 xpath,Midscene 会优先使用该 xpath 来找到元素,然后依次使用缓存和 AI 模型。默认值为空
137139
- `cacheable?: boolean` - 当启用 [缓存功能](./caching.mdx) 时,是否允许缓存当前 API 调用结果。默认值为 True
138140
- 返回值:
139141

@@ -160,7 +162,8 @@ function aiInput(text: string, locate: string, options?: Object): Promise<void>;
160162
- `text: string` - 要输入的文本内容。使用空字符串可以清空输入框。
161163
- `locate: string` - 用自然语言描述的元素定位。
162164
- `options?: Object` - 可选,一个配置对象,包含:
163-
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。
165+
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。默认值为 False
166+
- `xpath?: string` - 目标元素的 xpath 路径,用于执行当前操作。如果提供了这个 xpath,Midscene 会优先使用该 xpath 来找到元素,然后依次使用缓存和 AI 模型。默认值为空
164167
- `cacheable?: boolean` - 当启用 [缓存功能](./caching.mdx) 时,是否允许缓存当前 API 调用结果。默认值为 True
165168
- `autoDismissKeyboard?: boolean` - 如果为 true,则键盘会在输入文本后自动关闭,仅在 Android 中有效。默认值为 true。
166169

@@ -193,7 +196,8 @@ function aiKeyboardPress(
193196
- `key: string` - 要按下的键,如 `Enter``Tab``Escape` 等。不支持组合键。
194197
- `locate?: string` - 用自然语言描述的元素定位。
195198
- `options?: Object` - 可选,一个配置对象,包含:
196-
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。
199+
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。默认值为 False
200+
- `xpath?: string` - 目标元素的 xpath 路径,用于执行当前操作。如果提供了这个 xpath,Midscene 会优先使用该 xpath 来找到元素,然后依次使用缓存和 AI 模型。默认值为空
197201
- `cacheable?: boolean` - 当启用 [缓存功能](./caching.mdx) 时,是否允许缓存当前 API 调用结果。默认值为 True
198202

199203
- 返回值:
@@ -228,7 +232,8 @@ function aiScroll(
228232
- `distance: number` - 滚动距离,单位为像素。
229233
- `locate?: string` - 用自然语言描述的元素定位。如果未传入,Midscene 会在当前鼠标位置滚动。
230234
- `options?: Object` - 可选,一个配置对象,包含:
231-
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。
235+
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。默认值为 False
236+
- `xpath?: string` - 目标元素的 xpath 路径,用于执行当前操作。如果提供了这个 xpath,Midscene 会优先使用该 xpath 来找到元素,然后依次使用缓存和 AI 模型。默认值为空
232237
- `cacheable?: boolean` - 当启用 [缓存功能](./caching.mdx) 时,是否允许缓存当前 API 调用结果。默认值为 True
233238

234239
- 返回值:
@@ -258,7 +263,8 @@ function aiRightClick(locate: string, options?: Object): Promise<void>;
258263

259264
- `locate: string` - 用自然语言描述的元素定位。
260265
- `options?: Object` - 可选,一个配置对象,包含:
261-
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。
266+
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。默认值为 False
267+
- `xpath?: string` - 目标元素的 xpath 路径,用于执行当前操作。如果提供了这个 xpath,Midscene 会优先使用该 xpath 来找到元素,然后依次使用缓存和 AI 模型。默认值为空
262268
- `cacheable?: boolean` - 当启用 [缓存功能](./caching.mdx) 时,是否允许缓存当前 API 调用结果。默认值为 True
263269

264270
- 返回值:
@@ -529,7 +535,8 @@ function aiLocate(
529535

530536
- `locate: string` - 用自然语言描述的元素定位。
531537
- `options?: Object` - 可选,一个配置对象,包含:
532-
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。
538+
- `deepThink?: boolean` - 是否开启深度思考。如果为 true,Midscene 会调用 AI 模型两次以精确定位元素。默认值为 False
539+
- `xpath?: string` - 目标元素的 xpath 路径,用于执行当前操作。如果提供了这个 xpath,Midscene 会优先使用该 xpath 来找到元素,然后依次使用缓存和 AI 模型。默认值为空
533540
- `cacheable?: boolean` - 当启用 [缓存功能](./caching.mdx) 时,是否允许缓存当前 API 调用结果。默认值为 True
534541

535542
- 返回值:

0 commit comments

Comments
 (0)