Skip to content

Commit d8eaf42

Browse files
committed
Merge branch 'main' into query-cache
2 parents b598698 + 52f9163 commit d8eaf42

File tree

83 files changed

+5789
-1591
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+5789
-1591
lines changed

README.md

+41-26
Original file line numberDiff line numberDiff line change
@@ -23,25 +23,57 @@
2323
</a>
2424
</p>
2525

26-
> Wren Engine is the semantic engine for LLMs, the backbone of the [Wren AI](https://github.com/Canner/WrenAI) project.
26+
> Wren Engine is the Semantic Engine for MCP Clients and AI Agents.
27+
> [Wren AI](https://github.com/Canner/WrenAI) GenBI AI Agent is based on Wren Engine.
2728
28-
<img src="./misc/wren_engine_flow.png">
29+
## 😫 Challenge Today
2930

30-
Useful links
31-
- [Wren AI Website](https://getwren.ai)
32-
- [Wren Engine Documentation](https://docs.getwren.ai/oss/engine/get_started/what_is)
31+
At the enterprise level, the stakes - and the complexity - are much higher. Businesses run on structured data stored in cloud warehouses, relational databases, and secure filesystems. From BI dashboards to CRM updates and compliance workflows, AI must not only execute commands but also **understand and retrieve the right data, with precision and in context**.
32+
33+
While many community and official MCP servers already support connections to major databases like PostgreSQL, MySQL, SQL Server, and more, there's a problem: **raw access to data isn't enough**.
34+
35+
Enterprises need:
36+
- Accurate semantic understanding of their data models
37+
- Trusted calculations and aggregations in reporting
38+
- Clarity on business terms, like "active customer," "net revenue," or "churn rate"
39+
- User-based permissions and access control
40+
41+
Natural language alone isn't enough to drive complex workflows across enterprise data systems. You need a layer that interprets intent, maps it to the correct data, applies calculations accurately, and ensures security.
3342

3443
## 🎯 Our Mission
3544

36-
The Wren engine aims to be compatible with composable data systems. It follows two important traits: Embeddable and interoperability. With these two designs in mind, you can reuse the semantic context across your AI agents through our APIs and connect freely with your on-premise and cloud data sources, which nicely fit into your existing data stack.
45+
Wren Engine is on a mission to power the future of MCP clients and AI agents through the Model Context Protocol (MCP) — a new open standard that connects LLMs with tools, databases, and enterprise systems.
46+
47+
As part of the MCP ecosystem, Wren Engine provides a **semantic engine** powered the next generation semantic layer that enables AI agents to access business data with accuracy, context, and governance.
48+
49+
By building the semantic layer directly into MCP clients, such as Claude, Cline, Cursor, etc. Wren Engine empowers AI Agents with precise business context and ensures accurate data interactions across diverse enterprise environments.
50+
51+
We believe the future of enterprise AI lies in **context-aware, composable systems**. That’s why Wren Engine is designed to be:
52+
53+
- 🔌 **Embeddable** into any MCP client or AI agentic workflow
54+
- 🔄 **Interoperable** with modern data stacks (PostgreSQL, MySQL, Snowflake, etc.)
55+
- 🧠 **Semantic-first**, enabling AI to “understand” your data model and business logic
56+
- 🔐 **Governance-ready**, respecting roles, access controls, and definitions
57+
58+
With Wren Engine, you can scale AI adoption across teams — not just with better automation, but with better understanding.
59+
60+
<img src="./misc/mcp_wren_engine.webp">
61+
62+
Check our full article
63+
64+
🤩 [Our Mission - Fueling the Next Wave of AI Agents: Building the Foundation for Future MCP Clients and Enterprise Data Access](https://getwren.ai/post/fueling-the-next-wave-of-ai-agents-building-the-foundation-for-future-mcp-clients-and-enterprise-data-access)
65+
66+
## 🚀 Get Started with MCP
67+
[MCP Server README](mcp-server/README.md)
68+
69+
https://github.com/user-attachments/assets/dab9b50f-70d7-4eb3-8fc8-2ab55dc7d2ec
70+
3771

38-
<img src="./misc/wrenai_vision.png">
3972

40-
🤩 [About our Vision - The new wave of Composable Data Systems and the Interface to LLM agents](https://getwren.ai/post/the-new-wave-of-composable-data-systems-and-the-interface-to-llm-agents)
4173

4274
## 🤔 Concepts
4375

44-
- [Introducing Wren Engine](https://docs.getwren.ai/oss/engine/get_started/what_is)
76+
- [Quick start with Wren Engine](https://docs.getwren.ai/oss/engine/get_started/quickstart)
4577
- [What is semantics?](https://docs.getwren.ai/oss/engine/concept/what_is_semantics)
4678
- [What is Modeling Definition Language (MDL)?](https://docs.getwren.ai/oss/engine/concept/what_is_mdl)
4779
- [Benefits of Wren Engine with LLMs](https://docs.getwren.ai/oss/engine/concept/benefits_llm)
@@ -54,20 +86,3 @@ Wren Engine is currently in the beta version. The project team is actively worki
5486
- Welcome to our [Discord server](https://discord.gg/5DvshJqG8Z) to give us feedback!
5587
- If there is any issues, please visit [Github Issues](https://github.com/Canner/wren-engine/issues).
5688

57-
## 🚀 Get Started
58-
59-
Check out our latest documentation to get a [Quick start](https://docs.getwren.ai/oss/engine/get_started/quickstart).
60-
61-
## 🙌 How to build?
62-
63-
### Normal Build
64-
65-
```bash
66-
mvn clean install -DskipTests
67-
```
68-
69-
### Build an executable jar
70-
71-
```bash
72-
mvn clean package -DskipTests -P exec-jar
73-
```

ibis-server/README.md

+25
Original file line numberDiff line numberDiff line change
@@ -90,3 +90,28 @@ OpenTelemetry zero-code instrumentation is highly configurable. You can set the
9090

9191
## Contributing
9292
Please see [CONTRIBUTING.md](docs/CONTRIBUTING.md) for more information.
93+
94+
### Report the Migration Issue
95+
Wren engine is migrating to v3 API (powered by Rust and DataFusion). However, there are some SQL issues currently.
96+
If you find the migration message in your log, we hope you can provide the message and related information to the Wren AI Team.
97+
Just raise an issue on GitHub or contact us in the Discord channel.
98+
99+
The message would look like the following log:
100+
```
101+
2025-03-19 22:49:08.788 | [62781772-7120-4482-b7ca-4be65c8fda96] | INFO | __init__.dispatch:14 - POST /v3/connector/postgres/query
102+
2025-03-19 22:49:08.788 | [62781772-7120-4482-b7ca-4be65c8fda96] | INFO | __init__.dispatch:15 - Request params: {}
103+
2025-03-19 22:49:08.789 | [62781772-7120-4482-b7ca-4be65c8fda96] | INFO | __init__.dispatch:22 - Request body: {"connectionInfo":"REDACTED","manifestStr":"eyJjYXRhbG9nIjoid3JlbiIsInNjaGVtYSI6InB1YmxpYyIsIm1vZGVscyI6W3sibmFtZSI6Im9yZGVycyIsInRhYmxlUmVmZXJlbmNlIjp7InNjaGVtYSI6InB1YmxpYyIsIm5hbWUiOiJvcmRlcnMifSwiY29sdW1ucyI6W3sibmFtZSI6Im9yZGVya2V5IiwidHlwZSI6InZhcmNoYXIiLCJleHByZXNzaW9uIjoiY2FzdChvX29yZGVya2V5IGFzIHZhcmNoYXIpIn1dfV19","sql":"SELECT orderkey FROM orders LIMIT 1"}
104+
2025-03-19 22:49:08.804 | [62781772-7120-4482-b7ca-4be65c8fda96] | WARN | connector.query:61 - Failed to execute v3 query, fallback to v2: DataFusion error: ModelAnalyzeRule
105+
caused by
106+
Schema error: No field named o_orderkey.
107+
Wren engine is migrating to Rust version now. Wren AI team are appreciate if you can provide the error messages and related logs for us.
108+
```
109+
110+
#### Steps to Report an Issue
111+
1. **Identify the Issue**: Look for the migration message in your log files.
112+
2. **Gather Information**: Collect the error message and any related logs.
113+
3. **Report the Issue**:
114+
- **GitHub**: Open an issue on our [GitHub repository](https://github.com/Canner/wren-engine/issues) and include the collected information.
115+
- **Discord**: Join our [Discord channel](https://discord.gg/5DvshJqG8Z) and share the details with us.
116+
117+
Providing detailed information helps us to diagnose and fix the issues more efficiently. Thank you for your cooperation!

ibis-server/app/config.py

+2
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ def update(self, diagnose: bool):
6161
def get_remote_function_list_path(self, data_source: str) -> str:
6262
if not self.remote_function_list_path:
6363
return None
64+
if data_source in {"local_file", "s3_file", "minio_file", "gcs_file"}:
65+
data_source = "duckdb"
6466
base_path = os.path.normpath(self.remote_function_list_path)
6567
path = os.path.normpath(os.path.join(base_path, f"{data_source}.csv"))
6668
if not path.startswith(base_path):

ibis-server/app/mdl/rewriter.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ async def rewrite(self, manifest_str: str, sql: str) -> str:
105105

106106
@staticmethod
107107
def handle_extract_exception(e: Exception):
108-
logger.error("Error when extracting manifest: {}", e)
108+
logger.warning("Error when extracting manifest: {}", e)
109109

110110

111111
class EmbeddedEngineRewriter:

ibis-server/app/model/data_source.py

+4
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,10 @@ def get_mssql_connection(cls, info: MSSqlConnectionInfo) -> BaseBackend:
154154
def get_mysql_connection(cls, info: MySqlConnectionInfo) -> BaseBackend:
155155
ssl_context = cls._create_ssl_context(info)
156156
kwargs = {"ssl": ssl_context} if ssl_context else {}
157+
158+
# utf8mb4 is the actual charset used by MySQL for utf8
159+
kwargs.setdefault("charset", "utf8mb4")
160+
157161
if info.kwargs:
158162
kwargs.update(info.kwargs)
159163
return ibis.mysql.connect(

ibis-server/app/routers/v2/analysis.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@
66
router = APIRouter(prefix="/analysis", tags=["analysis"])
77

88

9-
@router.get("/sql")
9+
@router.get("/sql", deprecated=True)
1010
def analyze_sql(dto: AnalyzeSQLDTO) -> list[dict]:
1111
return analyze(dto.manifest_str, dto.sql)
1212

1313

14-
@router.get("/sqls")
14+
@router.get("/sqls", deprecated=True)
1515
def analyze_sql_batch(dto: AnalyzeSQLBatchDTO) -> list[list[dict]]:
1616
return analyze_batch(dto.manifest_str, dto.sqls)

ibis-server/app/routers/v2/connector.py

+54-38
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
from fastapi import APIRouter, Depends, Header, Query, Request, Response
44
from fastapi.responses import ORJSONResponse
5+
from loguru import logger
56
from opentelemetry import trace
67

78
from app.dependencies import verify_query_dto
@@ -20,7 +21,7 @@
2021
from app.model.metadata.factory import MetadataFactory
2122
from app.model.validator import Validator
2223
from app.query_cache import QueryCacheManager
23-
from app.util import build_context, to_json
24+
from app.util import build_context, pushdown_limit, to_json
2425

2526
router = APIRouter(prefix="/connector")
2627
tracer = trace.get_tracer(__name__)
@@ -34,7 +35,9 @@ def get_query_cache_manager(request: Request) -> QueryCacheManager:
3435
return request.state.query_cache_manager
3536

3637

37-
@router.post("/{data_source}/query", dependencies=[Depends(verify_query_dto)])
38+
@router.post(
39+
"/{data_source}/query", dependencies=[Depends(verify_query_dto)], deprecated=True
40+
)
3841
async def query(
3942
data_source: DataSource,
4043
dto: QueryDTO,
@@ -50,6 +53,30 @@ async def query(
5053
with tracer.start_as_current_span(
5154
name=span_name, kind=trace.SpanKind.SERVER, context=build_context(headers)
5255
):
56+
try:
57+
sql = pushdown_limit(dto.sql, limit)
58+
except Exception as e:
59+
logger.warning("Failed to pushdown limit. Using original SQL: {}", e)
60+
sql = dto.sql
61+
62+
rewritten_sql = await Rewriter(
63+
dto.manifest_str,
64+
data_source=data_source,
65+
java_engine_connector=java_engine_connector,
66+
).rewrite(sql)
67+
connector = Connector(data_source, dto.connection_info)
68+
69+
# First check if the query is a dry run
70+
# If it is dry run.
71+
# We don't need to check query cache
72+
if dry_run:
73+
connector.dry_run(rewritten_sql)
74+
dry_response = Response(status_code=204)
75+
dry_response.headers["X-Cache-Hit"] = "true"
76+
return dry_response
77+
78+
# Not a dry run
79+
# Check if the query is cached
5380
cached_result = None
5481
cache_hit = False
5582
enable_cache = dto.enable_cache
@@ -60,40 +87,23 @@ async def query(
6087
)
6188
cache_hit = cached_result is not None
6289

63-
# Cache Hit !
64-
if cached_result is not None:
90+
if cache_hit:
6591
response = ORJSONResponse(to_json(cached_result))
6692
response.headers["X-Cache-Hit"] = str(cache_hit).lower()
6793
return response
68-
# Cache Miss
6994
else:
70-
rewritten_sql = await Rewriter(
71-
dto.manifest_str,
72-
data_source=data_source,
73-
java_engine_connector=java_engine_connector,
74-
).rewrite(dto.sql)
75-
76-
connector = Connector(data_source, dto.connection_info)
77-
78-
if dry_run:
79-
connector.dry_run(rewritten_sql)
80-
dry_response = Response(status_code=204)
81-
dry_response.headers["X-Cache-Hit"] = str(cache_hit).lower()
82-
return dry_response
83-
else:
84-
# missing cache and not dry run
85-
# so we need to query the datasource and cache the result
86-
result = connector.query(rewritten_sql, limit=limit)
87-
if enable_cache:
88-
query_cache_manager.set(
89-
data_source, dto.sql, result, dto.connection_info
90-
)
91-
response = ORJSONResponse(to_json(result))
92-
response.headers["X-Cache-Hit"] = str(cache_hit).lower()
93-
return response
94-
95-
96-
@router.post("/{data_source}/validate/{rule_name}")
95+
result = connector.query(rewritten_sql, limit=limit)
96+
if enable_cache:
97+
query_cache_manager.set(
98+
data_source, dto.sql, result, dto.connection_info
99+
)
100+
101+
response = ORJSONResponse(to_json(result))
102+
response.headers["X-Cache-Hit"] = str(cache_hit).lower()
103+
return response
104+
105+
106+
@router.post("/{data_source}/validate/{rule_name}", deprecated=True)
97107
async def validate(
98108
data_source: DataSource,
99109
rule_name: str,
@@ -117,7 +127,9 @@ async def validate(
117127
return Response(status_code=204)
118128

119129

120-
@router.post("/{data_source}/metadata/tables", response_model=list[Table])
130+
@router.post(
131+
"/{data_source}/metadata/tables", response_model=list[Table], deprecated=True
132+
)
121133
def get_table_list(
122134
data_source: DataSource,
123135
dto: MetadataDTO,
@@ -132,7 +144,11 @@ def get_table_list(
132144
).get_table_list()
133145

134146

135-
@router.post("/{data_source}/metadata/constraints", response_model=list[Constraint])
147+
@router.post(
148+
"/{data_source}/metadata/constraints",
149+
response_model=list[Constraint],
150+
deprecated=True,
151+
)
136152
def get_constraints(
137153
data_source: DataSource,
138154
dto: MetadataDTO,
@@ -147,12 +163,12 @@ def get_constraints(
147163
).get_constraints()
148164

149165

150-
@router.post("/{data_source}/metadata/version")
166+
@router.post("/{data_source}/metadata/version", deprecated=True)
151167
def get_db_version(data_source: DataSource, dto: MetadataDTO) -> str:
152168
return MetadataFactory.get_metadata(data_source, dto.connection_info).get_version()
153169

154170

155-
@router.post("/dry-plan")
171+
@router.post("/dry-plan", deprecated=True)
156172
async def dry_plan(
157173
dto: DryPlanDTO,
158174
java_engine_connector: JavaEngineConnector = Depends(get_java_engine_connector),
@@ -166,7 +182,7 @@ async def dry_plan(
166182
).rewrite(dto.sql)
167183

168184

169-
@router.post("/{data_source}/dry-plan")
185+
@router.post("/{data_source}/dry-plan", deprecated=True)
170186
async def dry_plan_for_data_source(
171187
data_source: DataSource,
172188
dto: DryPlanDTO,
@@ -184,7 +200,7 @@ async def dry_plan_for_data_source(
184200
).rewrite(dto.sql)
185201

186202

187-
@router.post("/{data_source}/model-substitute")
203+
@router.post("/{data_source}/model-substitute", deprecated=True)
188204
async def model_substitute(
189205
data_source: DataSource,
190206
dto: TranspileDTO,

0 commit comments

Comments
 (0)