Canner
diff --git a/‎README.md
+41-26 b/‎README.md
+41-26
diff --git a/‎ibis-server/README.md
+25 b/‎ibis-server/README.md
+25
diff --git a/‎ibis-server/app/config.py
+2 b/‎ibis-server/app/config.py
+2
diff --git a/‎ibis-server/app/mdl/rewriter.py
+1-1 b/‎ibis-server/app/mdl/rewriter.py
+1-1
diff --git a/‎ibis-server/app/model/data_source.py
+4 b/‎ibis-server/app/model/data_source.py
+4
diff --git a/‎ibis-server/app/routers/v2/analysis.py
+2-2 b/‎ibis-server/app/routers/v2/analysis.py
+2-2
diff --git a/‎ibis-server/app/routers/v2/connector.py
+54-38 b/‎ibis-server/app/routers/v2/connector.py
+54-38
@@ -23,25 +23,57 @@
   </a>
 </p>
 
-> Wren Engine is the semantic engine for LLMs, the backbone of the [Wren AI](https://github.com/Canner/WrenAI) project.
+> Wren Engine is the Semantic Engine for MCP Clients and AI Agents. 
+> [Wren AI](https://github.com/Canner/WrenAI) GenBI AI Agent is based on Wren Engine.
 
-<img src="./misc/wren_engine_flow.png">
+## 😫 Challenge Today
 
-Useful links
-- [Wren AI Website](https://getwren.ai)
-- [Wren Engine Documentation](https://docs.getwren.ai/oss/engine/get_started/what_is)
+At the enterprise level, the stakes - and the complexity - are much higher. Businesses run on structured data stored in cloud warehouses, relational databases, and secure filesystems. From BI dashboards to CRM updates and compliance workflows, AI must not only execute commands but also **understand and retrieve the right data, with precision and in context**.
+
+While many community and official MCP servers already support connections to major databases like PostgreSQL, MySQL, SQL Server, and more, there's a problem: **raw access to data isn't enough**.
+
+Enterprises need:
+- Accurate semantic understanding of their data models
+- Trusted calculations and aggregations in reporting
+- Clarity on business terms, like "active customer," "net revenue," or "churn rate"
+- User-based permissions and access control
+
+Natural language alone isn't enough to drive complex workflows across enterprise data systems. You need a layer that interprets intent, maps it to the correct data, applies calculations accurately, and ensures security.
 
 ## 🎯 Our Mission
 
-The Wren engine aims to be compatible with composable data systems. It follows two important traits: Embeddable and interoperability. With these two designs in mind, you can reuse the semantic context across your AI agents through our APIs and connect freely with your on-premise and cloud data sources, which nicely fit into your existing data stack.
+Wren Engine is on a mission to power the future of MCP clients and AI agents through the Model Context Protocol (MCP) — a new open standard that connects LLMs with tools, databases, and enterprise systems.
+
+As part of the MCP ecosystem, Wren Engine provides a **semantic engine** powered the next generation semantic layer that enables AI agents to access business data with accuracy, context, and governance. 
+
+By building the semantic layer directly into MCP clients, such as Claude, Cline, Cursor, etc. Wren Engine empowers AI Agents with precise business context and ensures accurate data interactions across diverse enterprise environments.
+
+We believe the future of enterprise AI lies in **context-aware, composable systems**. That’s why Wren Engine is designed to be:
+
+- 🔌 **Embeddable** into any MCP client or AI agentic workflow
+- 🔄 **Interoperable** with modern data stacks (PostgreSQL, MySQL, Snowflake, etc.)
+- 🧠 **Semantic-first**, enabling AI to “understand” your data model and business logic
+- 🔐 **Governance-ready**, respecting roles, access controls, and definitions
+
+With Wren Engine, you can scale AI adoption across teams — not just with better automation, but with better understanding.
+
+<img src="./misc/mcp_wren_engine.webp">
+
+Check our full article
+
+🤩 [Our Mission - Fueling the Next Wave of AI Agents: Building the Foundation for Future MCP Clients and Enterprise Data Access](https://getwren.ai/post/fueling-the-next-wave-of-ai-agents-building-the-foundation-for-future-mcp-clients-and-enterprise-data-access)
+
+## 🚀 Get Started with MCP 
+[MCP Server README](mcp-server/README.md)
+
+https://github.com/user-attachments/assets/dab9b50f-70d7-4eb3-8fc8-2ab55dc7d2ec
+
 
-<img src="./misc/wrenai_vision.png">
 
-🤩 [About our Vision - The new wave of Composable Data Systems and the Interface to LLM agents](https://getwren.ai/post/the-new-wave-of-composable-data-systems-and-the-interface-to-llm-agents)
 
 ## 🤔 Concepts
 
-- [Introducing Wren Engine](https://docs.getwren.ai/oss/engine/get_started/what_is)
+- [Quick start with Wren Engine](https://docs.getwren.ai/oss/engine/get_started/quickstart)
 - [What is semantics?](https://docs.getwren.ai/oss/engine/concept/what_is_semantics)
 - [What is Modeling Definition Language (MDL)?](https://docs.getwren.ai/oss/engine/concept/what_is_mdl)
 - [Benefits of Wren Engine with LLMs](https://docs.getwren.ai/oss/engine/concept/benefits_llm)
@@ -54,20 +86,3 @@ Wren Engine is currently in the beta version. The project team is actively worki
 - Welcome to our [Discord server](https://discord.gg/5DvshJqG8Z) to give us feedback!
 - If there is any issues, please visit [Github Issues](https://github.com/Canner/wren-engine/issues).
 
-## 🚀 Get Started
-
-Check out our latest documentation to get a [Quick start](https://docs.getwren.ai/oss/engine/get_started/quickstart).
-
-## 🙌 How to build?
-
-### Normal Build
-
-```bash
-mvn clean install -DskipTests
-```
-
-### Build an executable jar
-
-```bash
-mvn clean package -DskipTests -P exec-jar
-```
 
@@ -90,3 +90,28 @@ OpenTelemetry zero-code instrumentation is highly configurable. You can set the
 
 ## Contributing
 Please see [CONTRIBUTING.md](docs/CONTRIBUTING.md) for more information.
+
+### Report the Migration Issue
+Wren engine is migrating to v3 API (powered by Rust and DataFusion). However, there are some SQL issues currently.
+If you find the migration message in your log, we hope you can provide the message and related information to the Wren AI Team.
+Just raise an issue on GitHub or contact us in the Discord channel.
+
+The message would look like the following log:
+```
+2025-03-19 22:49:08.788 | [62781772-7120-4482-b7ca-4be65c8fda96] | INFO     | __init__.dispatch:14 - POST /v3/connector/postgres/query
+2025-03-19 22:49:08.788 | [62781772-7120-4482-b7ca-4be65c8fda96] | INFO     | __init__.dispatch:15 - Request params: {}
+2025-03-19 22:49:08.789 | [62781772-7120-4482-b7ca-4be65c8fda96] | INFO     | __init__.dispatch:22 - Request body: {"connectionInfo":"REDACTED","manifestStr":"eyJjYXRhbG9nIjoid3JlbiIsInNjaGVtYSI6InB1YmxpYyIsIm1vZGVscyI6W3sibmFtZSI6Im9yZGVycyIsInRhYmxlUmVmZXJlbmNlIjp7InNjaGVtYSI6InB1YmxpYyIsIm5hbWUiOiJvcmRlcnMifSwiY29sdW1ucyI6W3sibmFtZSI6Im9yZGVya2V5IiwidHlwZSI6InZhcmNoYXIiLCJleHByZXNzaW9uIjoiY2FzdChvX29yZGVya2V5IGFzIHZhcmNoYXIpIn1dfV19","sql":"SELECT orderkey FROM orders LIMIT 1"}
+2025-03-19 22:49:08.804 | [62781772-7120-4482-b7ca-4be65c8fda96] | WARN    | connector.query:61 - Failed to execute v3 query, fallback to v2: DataFusion error: ModelAnalyzeRule
+caused by
+Schema error: No field named o_orderkey.
+Wren engine is migrating to Rust version now. Wren AI team are appreciate if you can provide the error messages and related logs for us.
+```
+
+#### Steps to Report an Issue
+1. **Identify the Issue**: Look for the migration message in your log files.
+2. **Gather Information**: Collect the error message and any related logs.
+3. **Report the Issue**:
+   - **GitHub**: Open an issue on our [GitHub repository](https://github.com/Canner/wren-engine/issues) and include the collected information.
+   - **Discord**: Join our [Discord channel](https://discord.gg/5DvshJqG8Z) and share the details with us.
+
+Providing detailed information helps us to diagnose and fix the issues more efficiently. Thank you for your cooperation!
@@ -61,6 +61,8 @@ def update(self, diagnose: bool):
     def get_remote_function_list_path(self, data_source: str) -> str:
         if not self.remote_function_list_path:
             return None
+        if data_source in {"local_file", "s3_file", "minio_file", "gcs_file"}:
+            data_source = "duckdb"
         base_path = os.path.normpath(self.remote_function_list_path)
         path = os.path.normpath(os.path.join(base_path, f"{data_source}.csv"))
         if not path.startswith(base_path):
 
@@ -105,7 +105,7 @@ async def rewrite(self, manifest_str: str, sql: str) -> str:
 
     @staticmethod
     def handle_extract_exception(e: Exception):
-        logger.error("Error when extracting manifest: {}", e)
+        logger.warning("Error when extracting manifest: {}", e)
 
 
 class EmbeddedEngineRewriter:
 
@@ -154,6 +154,10 @@ def get_mssql_connection(cls, info: MSSqlConnectionInfo) -> BaseBackend:
     def get_mysql_connection(cls, info: MySqlConnectionInfo) -> BaseBackend:
         ssl_context = cls._create_ssl_context(info)
         kwargs = {"ssl": ssl_context} if ssl_context else {}
+
+        # utf8mb4 is the actual charset used by MySQL for utf8
+        kwargs.setdefault("charset", "utf8mb4")
+
         if info.kwargs:
             kwargs.update(info.kwargs)
         return ibis.mysql.connect(
 
@@ -6,11 +6,11 @@
 router = APIRouter(prefix="/analysis", tags=["analysis"])
 
 
-@router.get("/sql")
+@router.get("/sql", deprecated=True)
 def analyze_sql(dto: AnalyzeSQLDTO) -> list[dict]:
     return analyze(dto.manifest_str, dto.sql)
 
 
-@router.get("/sqls")
+@router.get("/sqls", deprecated=True)
 def analyze_sql_batch(dto: AnalyzeSQLBatchDTO) -> list[list[dict]]:
     return analyze_batch(dto.manifest_str, dto.sqls)
@@ -2,6 +2,7 @@
 
 from fastapi import APIRouter, Depends, Header, Query, Request, Response
 from fastapi.responses import ORJSONResponse
+from loguru import logger
 from opentelemetry import trace
 
 from app.dependencies import verify_query_dto
@@ -20,7 +21,7 @@
 from app.model.metadata.factory import MetadataFactory
 from app.model.validator import Validator
 from app.query_cache import QueryCacheManager
-from app.util import build_context, to_json
+from app.util import build_context, pushdown_limit, to_json
 
 router = APIRouter(prefix="/connector")
 tracer = trace.get_tracer(__name__)
@@ -34,7 +35,9 @@ def get_query_cache_manager(request: Request) -> QueryCacheManager:
     return request.state.query_cache_manager
 
 
-@router.post("/{data_source}/query", dependencies=[Depends(verify_query_dto)])
+@router.post(
+    "/{data_source}/query", dependencies=[Depends(verify_query_dto)], deprecated=True
+)
 async def query(
     data_source: DataSource,
     dto: QueryDTO,
@@ -50,6 +53,30 @@ async def query(
     with tracer.start_as_current_span(
         name=span_name, kind=trace.SpanKind.SERVER, context=build_context(headers)
     ):
+        try:
+            sql = pushdown_limit(dto.sql, limit)
+        except Exception as e:
+            logger.warning("Failed to pushdown limit. Using original SQL: {}", e)
+            sql = dto.sql
+
+        rewritten_sql = await Rewriter(
+            dto.manifest_str,
+            data_source=data_source,
+            java_engine_connector=java_engine_connector,
+        ).rewrite(sql)
+        connector = Connector(data_source, dto.connection_info)
+
+        # First check if the query is a dry run
+        # If it is dry run.
+        # We don't need to check query cache
+        if dry_run:
+            connector.dry_run(rewritten_sql)
+            dry_response = Response(status_code=204)
+            dry_response.headers["X-Cache-Hit"] = "true"
+            return dry_response
+
+        # Not a dry run
+        # Check if the query is cached
         cached_result = None
         cache_hit = False
         enable_cache = dto.enable_cache
@@ -60,40 +87,23 @@ async def query(
             )
             cache_hit = cached_result is not None
 
-        # Cache Hit !
-        if cached_result is not None:
+        if cache_hit:
             response = ORJSONResponse(to_json(cached_result))
             response.headers["X-Cache-Hit"] = str(cache_hit).lower()
             return response
-        # Cache Miss
         else:
-            rewritten_sql = await Rewriter(
-                dto.manifest_str,
-                data_source=data_source,
-                java_engine_connector=java_engine_connector,
-            ).rewrite(dto.sql)
-
-            connector = Connector(data_source, dto.connection_info)
-
-            if dry_run:
-                connector.dry_run(rewritten_sql)
-                dry_response = Response(status_code=204)
-                dry_response.headers["X-Cache-Hit"] = str(cache_hit).lower()
-                return dry_response
-            else:
-                # missing cache and not dry run
-                # so we need to query the datasource and cache the result
-                result = connector.query(rewritten_sql, limit=limit)
-                if enable_cache:
-                    query_cache_manager.set(
-                        data_source, dto.sql, result, dto.connection_info
-                    )
-                response = ORJSONResponse(to_json(result))
-                response.headers["X-Cache-Hit"] = str(cache_hit).lower()
-                return response
-
-
-@router.post("/{data_source}/validate/{rule_name}")
+            result = connector.query(rewritten_sql, limit=limit)
+            if enable_cache:
+                query_cache_manager.set(
+                    data_source, dto.sql, result, dto.connection_info
+                )
+
+            response = ORJSONResponse(to_json(result))
+            response.headers["X-Cache-Hit"] = str(cache_hit).lower()
+            return response
+
+
+@router.post("/{data_source}/validate/{rule_name}", deprecated=True)
 async def validate(
     data_source: DataSource,
     rule_name: str,
@@ -117,7 +127,9 @@ async def validate(
         return Response(status_code=204)
 
 
-@router.post("/{data_source}/metadata/tables", response_model=list[Table])
+@router.post(
+    "/{data_source}/metadata/tables", response_model=list[Table], deprecated=True
+)
 def get_table_list(
     data_source: DataSource,
     dto: MetadataDTO,
@@ -132,7 +144,11 @@ def get_table_list(
         ).get_table_list()
 
 
-@router.post("/{data_source}/metadata/constraints", response_model=list[Constraint])
+@router.post(
+    "/{data_source}/metadata/constraints",
+    response_model=list[Constraint],
+    deprecated=True,
+)
 def get_constraints(
     data_source: DataSource,
     dto: MetadataDTO,
@@ -147,12 +163,12 @@ def get_constraints(
         ).get_constraints()
 
 
-@router.post("/{data_source}/metadata/version")
+@router.post("/{data_source}/metadata/version", deprecated=True)
 def get_db_version(data_source: DataSource, dto: MetadataDTO) -> str:
     return MetadataFactory.get_metadata(data_source, dto.connection_info).get_version()
 
 
-@router.post("/dry-plan")
+@router.post("/dry-plan", deprecated=True)
 async def dry_plan(
     dto: DryPlanDTO,
     java_engine_connector: JavaEngineConnector = Depends(get_java_engine_connector),
@@ -166,7 +182,7 @@ async def dry_plan(
         ).rewrite(dto.sql)
 
 
-@router.post("/{data_source}/dry-plan")
+@router.post("/{data_source}/dry-plan", deprecated=True)
 async def dry_plan_for_data_source(
     data_source: DataSource,
     dto: DryPlanDTO,
@@ -184,7 +200,7 @@ async def dry_plan_for_data_source(
         ).rewrite(dto.sql)
 
 
-@router.post("/{data_source}/model-substitute")
+@router.post("/{data_source}/model-substitute", deprecated=True)
 async def model_substitute(
     data_source: DataSource,
     dto: TranspileDTO,