Skip to content

feat(web-integration): support aiAsk for agent #841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 31 additions & 1 deletion apps/site/docs/en/API.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -292,9 +292,39 @@ The `deepThink` feature is a powerful feature that allows Midscene to call AI mo

## Data Extraction

### `agent.aiAsk()`

Ask the AI model any question about the current page. It returns the answer in string from the AI model.

- Type

```typescript
function aiAsk(prompt: string, options?: Object): Promise<string>;
```

- Parameters:

- `prompt: string` - A natural language description of the question.
- `options?: Object` - Optional, a configuration object containing:
- `domIncluded?: boolean | 'visible-only'` - Whether to send simplified DOM information to the model, usually used for extracting invisible attributes like image links. If set to `'visible-only'`, only the visible elements will be sent. Default: False.
- `screenshotIncluded?: boolean` - Whether to send screenshot to the model. Default: True.

- Return Value:

- Return a Promise. Return the answer from the AI model.

- Examples:

```typescript
const result = await agent.aiAsk('What should I do to test this page?');
console.log(result); // Output the answer from the AI model
```

Besides `aiAsk`, you can also use `aiQuery` to extract structured data from the UI.

### `agent.aiQuery()`

This method allows you to extract data directly from the UI using multimodal AI reasoning capabilities. Simply define the expected format (e.g., string, number, JSON, or an array) in the `dataDemand`, and Midscene will return a result that matches the format.
This method allows you to extract structured data from current page. Simply define the expected format (e.g., string, number, JSON, or an array) in the `dataDemand`, and Midscene will return a result that matches the format.

- Type

Expand Down
10 changes: 5 additions & 5 deletions apps/site/docs/en/integrate-with-playwright.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,11 @@ export const test = base.extend<PlayWrightAiFixtureType>(

### Query

- `aiQuery` - AI Query
- `aiNumber` - Number Query
- `aiString` - String Query
- `aiBoolean` - Boolean Query
- `aiAsk` - Ask AI Model anything about the current page
- `aiQuery` - Extract structured data from current page
- `aiNumber` - Extract number from current page
- `aiString` - Extract string from current page
- `aiBoolean` - Extract boolean from current page

### More APIs

Expand Down Expand Up @@ -152,4 +153,3 @@ After the command executes successfully, it will output: `Midscene - report file

- For all the methods on the Agent, please refer to [API Reference](./API).
- For more details about prompting, please refer to [Prompting Tips](./prompting-tips)
````
32 changes: 31 additions & 1 deletion apps/site/docs/zh/API.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -282,9 +282,39 @@ await agent.aiRightClick('页面顶部的文件名称', { deepThink: true });

## 数据提取

### `agent.aiAsk()`

使用此方法,你可以针对当前页面,直接向 AI 模型发起提问,并获得字符串形式的回答。

- 类型

```typescript
function aiAsk(prompt: string, options?: Object): Promise<string>;
```

- 参数:

- `prompt: string` - 用自然语言描述的询问内容。
- `options?: Object` - 可选,一个配置对象,包含:
- `domIncluded?: boolean | 'visible-only'` - 是否向模型发送精简后的 DOM 信息,一般用于提取 UI 中不可见的属性,比如图片的链接。如果设置为 `'visible-only'`,则只发送可见的元素。默认值为 False。
- `screenshotIncluded?: boolean` - 是否向模型发送截图。默认值为 True。

- 返回值:

- 返回一个 Promise。返回 AI 模型的回答。

- 示例:

```typescript
const result = await agent.aiAsk('当前页面的应该怎么进行测试?');
console.log(result); // 输出 AI 模型的回答
```

除了 `aiAsk` 方法,你还可以使用 `aiQuery` 方法,直接从 UI 提取结构化的数据。

### `agent.aiQuery()`

使用此方法,你可以直接从 UI 提取数据,并借助多模态 AI 的推理能力,实现智能提取。只需在 `dataDemand` 中描述期望的数据格式(如字符串、数字、JSON、数组等),Midscene 即返回相应结果。
使用此方法,你可以直接从 UI 提取结构化的数据。只需在 `dataDemand` 中描述期望的数据格式(如字符串、数字、JSON、数组等),Midscene 即返回相应结果。

- 类型

Expand Down
9 changes: 5 additions & 4 deletions apps/site/docs/zh/integrate-with-playwright.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,11 @@ export const test = base.extend<PlayWrightAiFixtureType>(

### 查询

- `aiQuery` - 数据查询
- `aiNumber` - 数字查询
- `aiString` - 字符串查询
- `aiBoolean` - 布尔值查询
- `aiAsk` - 询问 AI 模型任何问题
- `aiQuery` - 从当前页面提取结构化的数据
- `aiNumber` - 从当前页面提取数字
- `aiString` - 从当前页面提取字符串
- `aiBoolean` - 从当前页面提取布尔值

### 更多 API

Expand Down
7 changes: 6 additions & 1 deletion packages/core/src/yaml.ts
Original file line number Diff line number Diff line change
Expand Up @@ -105,11 +105,16 @@ export interface MidsceneYamlFlowItemAINumber extends InsightExtractOption {
name?: string;
}

export interface MidsceneYamlFlowItemAINString extends InsightExtractOption {
export interface MidsceneYamlFlowItemAIString extends InsightExtractOption {
aiString: string;
name?: string;
}

export interface MidsceneYamlFlowItemAIAsk extends InsightExtractOption {
aiAsk: string;
name?: string;
}

export interface MidsceneYamlFlowItemAIBoolean extends InsightExtractOption {
aiBoolean: string;
name?: string;
Expand Down
7 changes: 7 additions & 0 deletions packages/web-integration/src/common/agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -472,6 +472,13 @@ export class PageAgent<PageType extends WebPage = WebPage> {
return output;
}

async aiAsk(
prompt: string,
opt: InsightExtractOption = defaultInsightExtractOption,
) {
return this.aiString(prompt, opt);
}

async describeElementAtPoint(
center: [number, number],
opt?: {
Expand Down
18 changes: 17 additions & 1 deletion packages/web-integration/src/playwright/ai-fixture.ts
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,8 @@ export const PlaywrightAiFixture = (options?: {
| 'aiLocate'
| 'aiNumber'
| 'aiString'
| 'aiBoolean';
| 'aiBoolean'
| 'aiAsk';
}) {
const { page, testInfo, use, aiActionType } = options;
const agent = createOrReuseAgentForPage(page, testInfo) as PlaywrightAgent;
Expand Down Expand Up @@ -339,6 +340,18 @@ export const PlaywrightAiFixture = (options?: {
aiActionType: 'aiBoolean',
});
},
aiAsk: async (
{ page }: { page: OriginPlaywrightPage },
use: any,
testInfo: TestInfo,
) => {
await generateAiFunction({
page,
testInfo,
use,
aiActionType: 'aiAsk',
});
},
};
};

Expand Down Expand Up @@ -383,4 +396,7 @@ export type PlayWrightAiFixtureType = {
aiBoolean: (
...args: Parameters<PageAgent['aiBoolean']>
) => ReturnType<PageAgent['aiBoolean']>;
aiAsk: (
...args: Parameters<PageAgent['aiAsk']>
) => ReturnType<PageAgent['aiAsk']>;
};
20 changes: 14 additions & 6 deletions packages/web-integration/src/yaml/player.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,18 @@ import type { PageAgent } from '@/common/agent';
import type {
FreeFn,
MidsceneYamlFlowItemAIAction,
MidsceneYamlFlowItemAIAsk,
MidsceneYamlFlowItemAIAssert,
MidsceneYamlFlowItemAIBoolean,
MidsceneYamlFlowItemAIHover,
MidsceneYamlFlowItemAIInput,
MidsceneYamlFlowItemAIKeyboardPress,
MidsceneYamlFlowItemAILocate,
MidsceneYamlFlowItemAINString,
MidsceneYamlFlowItemAINumber,
MidsceneYamlFlowItemAIQuery,
MidsceneYamlFlowItemAIRightClick,
MidsceneYamlFlowItemAIScroll,
MidsceneYamlFlowItemAIString,
MidsceneYamlFlowItemAITap,
MidsceneYamlFlowItemAIWaitFor,
MidsceneYamlFlowItemEvaluateJavaScript,
Expand Down Expand Up @@ -206,21 +207,21 @@ export class ScriptPlayer<T extends MidsceneYamlScriptEnv> {
domIncluded: numberTask.domIncluded,
screenshotIncluded: numberTask.screenshotIncluded,
};
assert(prompt, 'missing prompt for number');
assert(prompt, 'missing prompt for aiNumber');
assert(
typeof prompt === 'string',
'prompt for number must be a string',
);
const numberResult = await agent.aiNumber(prompt, options);
this.setResult(numberTask.name, numberResult);
} else if ('aiString' in (flowItem as MidsceneYamlFlowItemAINString)) {
const stringTask = flowItem as MidsceneYamlFlowItemAINString;
} else if ('aiString' in (flowItem as MidsceneYamlFlowItemAIString)) {
const stringTask = flowItem as MidsceneYamlFlowItemAIString;
const prompt = stringTask.aiString;
const options = {
domIncluded: stringTask.domIncluded,
screenshotIncluded: stringTask.screenshotIncluded,
};
assert(prompt, 'missing prompt for string');
assert(prompt, 'missing prompt for aiNumber');
assert(
typeof prompt === 'string',
'prompt for string must be a string',
Expand All @@ -234,13 +235,20 @@ export class ScriptPlayer<T extends MidsceneYamlScriptEnv> {
domIncluded: booleanTask.domIncluded,
screenshotIncluded: booleanTask.screenshotIncluded,
};
assert(prompt, 'missing prompt for boolean');
assert(prompt, 'missing prompt for aiBoolean');
assert(
typeof prompt === 'string',
'prompt for boolean must be a string',
);
const booleanResult = await agent.aiBoolean(prompt, options);
this.setResult(booleanTask.name, booleanResult);
} else if ('aiAsk' in (flowItem as MidsceneYamlFlowItemAIAsk)) {
const askTask = flowItem as MidsceneYamlFlowItemAIAsk;
const prompt = askTask.aiAsk;
assert(prompt, 'missing prompt for aiAsk');
assert(typeof prompt === 'string', 'prompt for aiAsk must be a string');
const askResult = await agent.aiAsk(prompt);
this.setResult(askTask.name, askResult);
} else if ('aiLocate' in (flowItem as MidsceneYamlFlowItemAILocate)) {
const locateTask = flowItem as MidsceneYamlFlowItemAILocate;
const prompt = locateTask.aiLocate;
Expand Down