-
Notifications
You must be signed in to change notification settings - Fork 81
feat(ibis): introduce minio connector #1048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes introduce support for Minio as a file connection type across the application. Modifications include updates to the rewriter to recognize Minio file sources, new DTO and connection info classes for Minio, enhancements in the connector and metadata logic to handle Minio-specific operations, and a new utility function for initializing DuckDB with Minio settings. Additionally, dependency and testing configurations were updated to include Minio, along with a comprehensive suite of tests validating the new functionality. Changes
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
ibis-server/app/model/utils.py (1)
33-33
: Consider making the region configurable.The region is hardcoded to 'ap-northeast-1'. While this might work for Minio, consider making it configurable through
MinioFileConnectionInfo
for better flexibility.- REGION 'ap-northeast-1' + REGION '{connection_info.region.get_secret_value() if hasattr(connection_info, "region") else "ap-northeast-1"}'ibis-server/app/model/__init__.py (1)
63-64
: LGTM! Well-structured Minio connection classes.The implementation follows the established patterns and properly handles Minio-specific requirements.
Consider adding validation for the endpoint format to ensure it's a valid host:port combination. Example:
from pydantic import validator import re class MinioFileConnectionInfo(BaseModel): # ... existing fields ... @validator('endpoint') def validate_endpoint(cls, v): if not re.match(r'^[a-zA-Z0-9.-]+(?::\d+)?$', v.get_secret_value()): raise ValueError('endpoint must be in format host[:port]') return vAlso applies to: 169-181, 194-194
ibis-server/app/model/connector.py (1)
174-175
: Consider consolidating error handling for IOException.The same error handling for IOException is duplicated in both query and dry_run methods.
Consider extracting the error handling into a helper method:
+def _handle_io_exception(e: IOException, is_dry_run: bool) -> None: + error_class = QueryDryRunError if is_dry_run else UnprocessableEntityError + raise error_class(f"Failed to execute query: {e!s}") + def query(self, sql: str, limit: int) -> pd.DataFrame: try: return self.connection.execute(sql).fetch_df().head(limit) except IOException as e: - raise UnprocessableEntityError(f"Failed to execute query: {e!s}") + self._handle_io_exception(e, is_dry_run=False) except HTTPException as e: raise UnprocessableEntityError(f"Failed to execute query: {e!s}") def dry_run(self, sql: str) -> None: try: self.connection.execute(sql) except IOException as e: - raise QueryDryRunError(f"Failed to execute query: {e!s}") + self._handle_io_exception(e, is_dry_run=True) except HTTPException as e: raise QueryDryRunError(f"Failed to execute query: {e!s}")Also applies to: 182-183
ibis-server/app/model/metadata/object_storage.py (1)
223-227
: Consider using URL construction utility.Manual string concatenation for URLs can be error-prone.
Consider using urllib.parse:
+from urllib.parse import urlunparse + def _get_dal_operator(self): info: MinioFileConnectionInfo = self.connection_info - if info.ssl_enabled: - endpoint = f"https://{info.endpoint.get_secret_value()}" - else: - endpoint = f"http://{info.endpoint.get_secret_value()}" + scheme = "https" if info.ssl_enabled else "http" + endpoint = urlunparse((scheme, info.endpoint.get_secret_value(), "", "", "", ""))ibis-server/tests/routers/v2/connector/test_minio_file.py (2)
89-89
: Consider using a variable for Minio version.The Minio version is hardcoded in the image tag.
Consider extracting it to a constant:
+MINIO_VERSION = "RELEASE.2024-12-18T13-15-44Z" + @pytest.fixture(scope="module") def minio(request) -> MinioContainer: logger.info("Starting Minio container") - minio = MinioContainer(image="minio/minio:RELEASE.2024-12-18T13-15-44Z") + minio = MinioContainer(image=f"minio/minio:{MINIO_VERSION}")
93-149
: Consider using a helper function for file uploads.The file upload logic is repetitive.
Consider extracting it to a helper function:
+def _upload_test_files(client, bucket: str) -> None: + test_files = [ + ("tpch/data/orders.parquet", "tests/resource/tpch/data/orders.parquet"), + ("tpch/data/customer.parquet", "tests/resource/tpch/data/customer.parquet"), + # ... add other files + ] + for bucket_path, local_path in test_files: + client.fput_object(bucket, bucket_path, local_path) + @pytest.fixture(scope="module") def minio(request) -> MinioContainer: logger.info("Starting Minio container") minio = MinioContainer(image="minio/minio:RELEASE.2024-12-18T13-15-44Z") minio.start() client = minio.get_client() client.make_bucket(bucket) - client.fput_object(...) # Remove all individual upload calls + _upload_test_files(client, bucket)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
ibis-server/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (9)
ibis-server/app/mdl/rewriter.py
(1 hunks)ibis-server/app/model/__init__.py
(3 hunks)ibis-server/app/model/connector.py
(3 hunks)ibis-server/app/model/data_source.py
(3 hunks)ibis-server/app/model/metadata/factory.py
(2 hunks)ibis-server/app/model/metadata/object_storage.py
(3 hunks)ibis-server/app/model/utils.py
(2 hunks)ibis-server/pyproject.toml
(2 hunks)ibis-server/tests/routers/v2/connector/test_minio_file.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci
🔇 Additional comments (10)
ibis-server/app/model/metadata/factory.py (1)
8-12
: LGTM! Clean integration of MinioFileMetadata.The changes follow the established pattern for object storage metadata integration.
Also applies to: 28-28
ibis-server/app/model/utils.py (1)
25-46
: LGTM! Well-structured Minio initialization.The implementation follows the established pattern of the S3 initialization while properly handling Minio-specific configurations:
- Secret creation with proper error handling
- Minio-specific endpoint configuration
- Path-style URL configuration
- SSL configuration based on user preference
ibis-server/app/mdl/rewriter.py (1)
75-79
: LGTM! Consistent dialect handling for Minio.The change appropriately groups Minio with other file-based data sources that use the DuckDB dialect.
ibis-server/app/model/connector.py (2)
41-45
: LGTM! Clean grouping of file-based data sources.The grouping of local_file, s3_file, and minio_file data sources is logical and maintainable.
168-169
: LGTM! Proper initialization for Minio connections.The Minio initialization follows the same pattern as S3, maintaining consistency in the codebase.
ibis-server/app/model/data_source.py (1)
50-50
: LGTM! Clean integration of Minio as a data source.The Minio file source is properly integrated into both DataSource and DataSourceExtension enums.
Also applies to: 76-76
ibis-server/app/model/metadata/object_storage.py (1)
207-213
: LGTM! Clean implementation of MinioFileMetadata.The class properly extends ObjectStorageMetadata and implements the required methods.
ibis-server/tests/routers/v2/connector/test_minio_file.py (1)
548-558
: LGTM! Clean helper function for connection info.The
_to_connection_info
helper function properly encapsulates the connection info creation logic.ibis-server/pyproject.toml (2)
45-45
: MinIO Dependency Inclusion in Dev Dependencies
The addition of"minio"
to thetestcontainers
extras is aligned with the PR objective of introducing MinIO support. This dependency will aid in setting up an appropriate test environment for MinIO-related features. Ensure that the version choice (implicit here) remains compatible with your testcontainers setup.
69-69
: Addition of MinIO Marker for Pytest
The new marker"minio_file: mark a test as a minio file test"
in the[tool.pytest.ini_options]
section is a clear and useful addition. It allows you to easily filter and manage tests that are specific to MinIO functionality, which is valuable for continuous integration and selective test runs.
Description
URL
Connection Info
url
: The root path of the dataset. (It doesn't include the bucket name)format
: The specific file format.bucket
: The bucket name.endpoint
: The endpoint of minio.access_key
: The account of miniosecret_kety
: The password of minioSummary by CodeRabbit
Summary by CodeRabbit
New Features
Tests
Chores