Skip to content

Commit bc9997b

Browse files
committed
Support LLM Functions
1 parent 26a1965 commit bc9997b

File tree

18 files changed

+1529
-0
lines changed

18 files changed

+1529
-0
lines changed
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
{
3+
"title": "LLM_CLASSIFY",
4+
"language": "en"
5+
}
6+
---
7+
8+
<!--
9+
Licensed to the Apache Software Foundation (ASF) under one
10+
or more contributor license agreements. See the NOTICE file
11+
distributed with this work for additional information
12+
regarding copyright ownership. The ASF licenses this file
13+
to you under the Apache License, Version 2.0 (the
14+
"License"); you may not use this file except in compliance
15+
with the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing,
20+
software distributed under the License is distributed on an
21+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
22+
KIND, either express or implied. See the License for the
23+
specific language governing permissions and limitations
24+
under the License.
25+
-->
26+
27+
## Description
28+
29+
Used to classify text into a specified set of labels.
30+
31+
## Syntax
32+
33+
```sql
34+
LLM_CLASSIFY([<resource_name>], <text>, <labels>)
35+
```
36+
37+
## Parameters
38+
39+
| Parameter | Description |
40+
| ----------------- | ------------------------------------------- |
41+
| `<resource_name>` | The specified resource name, optional |
42+
| `<text>` | The text to be classified |
43+
| `<labels>` | Array of classification labels |
44+
45+
## Return Value
46+
47+
Returns the single label that best matches the text.
48+
49+
If any input is NULL, returns NULL.
50+
51+
The result is generated by a large language model, so the output may vary.
52+
53+
## Examples
54+
55+
```sql
56+
SET default_llm_resource = 'resource_name';
57+
SELECT LLM_CLASSIFY('Apache Doris is a databases system.', ['useage', 'introduce']) AS Result;
58+
```
59+
```text
60+
+-----------+
61+
| Result |
62+
+-----------+
63+
| introduce |
64+
+-----------+
65+
```
66+
67+
```sql
68+
SELECT LLM_CLASSIFY('resource_name', 'Apache Doris is developing rapidly.', ['science', 'sport']) AS Result;
69+
```
70+
```text
71+
+---------+
72+
| Result |
73+
+---------+
74+
| science |
75+
+---------+
76+
```
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
{
3+
"title": "LLM_EXTRACT",
4+
"language": "en"
5+
}
6+
---
7+
8+
<!--
9+
Licensed to the Apache Software Foundation (ASF) under one
10+
or more contributor license agreements. See the NOTICE file
11+
distributed with this work for additional information
12+
regarding copyright ownership. The ASF licenses this file
13+
to you under the Apache License, Version 2.0 (the
14+
"License"); you may not use this file except in compliance
15+
with the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing,
20+
software distributed under the License is distributed on an
21+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
22+
KIND, either express or implied. See the License for the
23+
specific language governing permissions and limitations
24+
under the License.
25+
-->
26+
27+
## Description
28+
29+
Used to extract information corresponding to specific labels from text.
30+
31+
## Syntax
32+
33+
```sql
34+
LLM_EXTRACT([<resource_name>], <text>, <labels>)
35+
```
36+
37+
## Parameters
38+
39+
| Parameter | Description |
40+
| ----------------- | -------------------------------------------- |
41+
| `<resource_name>` | The specified resource name, optional |
42+
| `<text>` | The text from which to extract information |
43+
| `<labels>` | Array of labels to extract |
44+
45+
## Return Value
46+
47+
Returns a string containing all extracted labels and their corresponding values.
48+
49+
If any input is NULL, returns NULL.
50+
51+
The result is generated by a large language model, so the output may vary.
52+
53+
## Examples
54+
55+
```sql
56+
SET default_llm_resource = 'resource_name';
57+
SELECT LLM_EXTRACT('Apache Doris is an MPP-based real-time data warehouse known for its high query speed.',
58+
['product_name', 'architecture', 'key_feature']) AS Result;
59+
```
60+
```text
61+
+---------------------------------------------------------------------------------------+
62+
| Result |
63+
+---------------------------------------------------------------------------------------+
64+
| product_name="Apache Doris", architecture="MPP-based", key_feature="high query speed" |
65+
+---------------------------------------------------------------------------------------+
66+
```
67+
68+
```sql
69+
SELECT LLM_EXTRACT('resource_name', 'Apache Doris began in 2008 as an internal project named Palo.',
70+
['original name', 'founding time']) AS Result;
71+
```
72+
```text
73+
+----------------------------------------+
74+
| Result |
75+
+----------------------------------------+
76+
| original name=Palo, founding time=2008 |
77+
+----------------------------------------+
78+
```
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
{
3+
"title": "LLM_FIXGRAMMAR",
4+
"language": "en"
5+
}
6+
---
7+
8+
<!--
9+
Licensed to the Apache Software Foundation (ASF) under one
10+
or more contributor license agreements. See the NOTICE file
11+
distributed with this work for additional information
12+
regarding copyright ownership. The ASF licenses this file
13+
to you under the Apache License, Version 2.0 (the
14+
"License"); you may not use this file except in compliance
15+
with the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing,
20+
software distributed under the License is distributed on an
21+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
22+
KIND, either express or implied. See the License for the
23+
specific language governing permissions and limitations
24+
under the License.
25+
-->
26+
27+
## Description
28+
29+
Used to correct grammatical errors in text.
30+
31+
## Syntax
32+
33+
```sql
34+
LLM_FIXGRAMMAR([<resource_name>], <text>)
35+
```
36+
37+
## Parameters
38+
39+
| Parameter | Description |
40+
| ----------------- | ------------------------------------------- |
41+
| `<resource_name>` | The specified resource name, optional |
42+
| `<text>` | The text to be grammar-corrected |
43+
44+
## Return Value
45+
46+
Returns the text string after grammar correction.
47+
48+
If any input is NULL, returns NULL.
49+
50+
The result is generated by a large language model, so the output may vary.
51+
52+
## Examples
53+
54+
```sql
55+
SET default_llm_resource = 'resource_name';
56+
SELECT LLM_FIXGRAMMAR('Apache Doris a great system DB') AS Result;
57+
```
58+
```text
59+
+------------------------------------------+
60+
| Result |
61+
+------------------------------------------+
62+
| Apache Doris is a great database system. |
63+
+------------------------------------------+
64+
```
65+
66+
```sql
67+
SELECT LLM_FIXGRAMMAR('resource_name', 'I am like to using Doris') AS Result;
68+
```
69+
```text
70+
+--------------------+
71+
| Result |
72+
+--------------------+
73+
| I like using Doris |
74+
+--------------------+
75+
```
Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
---
2+
{
3+
"title": "LLM_Function",
4+
"language": "en"
5+
}
6+
---
7+
8+
<!--
9+
Licensed to the Apache Software Foundation (ASF) under one
10+
or more contributor license agreements. See the NOTICE file
11+
distributed with this work for additional information
12+
regarding copyright ownership. The ASF licenses this file
13+
to you under the Apache License, Version 2.0 (the
14+
"License"); you may not use this file except in compliance
15+
with the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing,
20+
software distributed under the License is distributed on an
21+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
22+
KIND, either express or implied. See the License for the
23+
specific language governing permissions and limitations
24+
under the License.
25+
-->
26+
27+
## Description
28+
29+
LLM Function is a built-in function provided by Doris based on large language model (LLM) capabilities. Users can directly call LLM in SQL queries to perform various intelligent text tasks. LLM Function connects to multiple mainstream LLM providers (such as OpenAI, Anthropic, DeepSeek, Gemini, Ollama, MoonShot, etc.) through Doris's resource mechanism.
30+
31+
The LLM used must be provided externally by Doris and support text analysis.
32+
33+
---
34+
35+
## Configure LLM Resource
36+
37+
Before using LLM Function, you need to create a Resource of type LLM to centrally manage access information for the LLM API.
38+
39+
### Example: Create LLM Resource
40+
41+
```sql
42+
CREATE RESOURCE "llm_resource_name"
43+
PROPERTIES (
44+
'type' = 'llm',
45+
'llm.provider_type' = 'openai',
46+
'llm.endpoint' = 'https://endpoint_example',
47+
'llm.model_name' = 'model_example',
48+
'llm.api_key' = 'sk-xxx',
49+
'llm.temperature' = '0.7',
50+
'llm.max_token' = '1024',
51+
'llm.max_retries' = '3',
52+
'llm.retry_delay_ms' = '1000',
53+
'llm.timeout_ms' = '30000',
54+
);
55+
```
56+
57+
##### Parameter Description
58+
59+
`type`: Required, must be `llm`, used as the type identifier for llm.
60+
61+
`llm.provider_type`: Required, external LLM provider type. Currently supported providers include: OpenAI, Anthropic, Gemini, DeepSeek, Local, MoonShot, MiniMax, Zhipu, QWen, Baichuan. If there are providers not listed above but their API format is the same as [OpenAI](https://platform.openai.com/docs/overview)/[Anthropic](https://docs.anthropic.com/en/api/messages-examples)/[Gemini](https://ai.google.dev/gemini-api/docs/quickstart#rest_1), you can directly fill in the corresponding provider.
62+
63+
`llm.endpoint`: Required, LLM API endpoint.
64+
65+
`llm.model_name`: Required, model name.
66+
67+
`llm_api_key`: Required except when `llm.provider_type = local`, API key.
68+
69+
`llm.temperature`: Optional, sampling temperature, controls output randomness, a float between 0 and 1.
70+
71+
`llm.max_tokends`: Optional, maximum number of generated tokens.
72+
73+
`llm.max_retries`: Optional, maximum number of retries for a single request.
74+
75+
`llm.retry_delay_ms`: Optional, retry delay time.
76+
77+
`llm.timeout_ms`: Optional, timeout for a single request.
78+
79+
---
80+
81+
## Resource Selection and Session Variables
82+
83+
When users call LLM-related functions, resources can be specified in the following two ways:
84+
85+
- Explicitly specify the resource: directly pass the resource name when calling the function.
86+
- Implicitly specify the resource: set the Session variable in advance, and the function will automatically use the corresponding resource.
87+
88+
Set Session variable format:
89+
```sql
90+
SET default_llm_resource='resource_name';
91+
```
92+
93+
Function call format:
94+
```sql
95+
SELECT LLM_FUNCTION([<resource_name>], <args...>);
96+
```
97+
98+
### Resource Selection Priority
99+
100+
When calling an LLM_Function, it determines which resource to use in the following order:
101+
102+
1. The resource explicitly specified by the user in the call
103+
2. The global default resource (`default_llm_resource`)
104+
105+
Example:
106+
107+
```sql
108+
SET default_llm_resource='global_default_resource';
109+
SELECT LLM_SENTIMENT('this is a test'); -- Uses resource named 'global_default_resource'
110+
SELECT LLM_SENTIMENT('invoke_resource', 'this is a test') --Uses resource named 'invoke_resource'
111+
```
112+
113+
---
114+
115+
## LLM Functions
116+
117+
Currently supported LLM Functions in Doris include:
118+
119+
- `LLM_CLASSIFY`: Information classification
120+
121+
- `LLM_EXTRACT`: Information extraction
122+
123+
- `LLM_FIXGRAMMAR`: Grammar correction
124+
125+
- `LLM_GENERATE`: Text generation
126+
127+
- `LLM_MASK`: Masking sensitive information
128+
129+
- `LLM_SENTIMENT`: Sentiment analysis
130+
131+
- `LLM_SUMMARIZE`: Text summarization
132+
133+
- `LLM_TRANSLATE`: Translation
134+
135+
### Examples
136+
137+
1. `LLM_TRANSLATE`
138+
```sql
139+
SELECT LLM_TRANSLATE('resource_name', 'this is a test', 'Chinese');
140+
-- 这是一个测试
141+
```
142+
143+
2. `LLM_SENTIMENT`
144+
```sql
145+
SET default_llm_resource = 'resource_name';
146+
SELECT LLM_SENTIMENT('Apache Doris is a great DBMS.');
147+
```
148+
149+
For detailed function and usage, please refer to the documentation of each specific function.

0 commit comments

Comments
 (0)