Skip to content

Add airbyte-ci command: migrate-to-manifest-only #42576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
38d9f63
feat: pipeline validates connector and moves manifest
ChristoGrab Jul 26, 2024
ffa5e55
feat: first draft of full pipeline
ChristoGrab Jul 26, 2024
f7d588a
pipeline validates and moves manifest
ChristoGrab Jul 26, 2024
8bec567
update metadata tags
ChristoGrab Jul 26, 2024
6d26096
chore: auto-fix lint and format issues
octavia-squidington-iii Jul 27, 2024
7b1cd10
Merge branch 'master' into christo/airbyte-ci-strip
natikgadzhi Jul 27, 2024
a1e0385
task: delete source folder and add metadata language tag
ChristoGrab Jul 29, 2024
5b23bd9
chore: import cleanup
ChristoGrab Jul 29, 2024
eb9e6db
task: regenerate README file
ChristoGrab Jul 29, 2024
b409814
task: update deletion to logic to exclude unit_tests/integration folder
ChristoGrab Jul 29, 2024
4b6a345
task: update documentation page
ChristoGrab Jul 29, 2024
16fdf02
Merge branch 'master' of https://github.com/airbytehq/airbyte into ch…
ChristoGrab Jul 30, 2024
b5cb21c
task: update baseImage in metadata
ChristoGrab Jul 30, 2024
6fef1fa
bug: fix stream table generation in docs
ChristoGrab Jul 30, 2024
51dc4ba
task: remove docs update step and update readme template
ChristoGrab Jul 30, 2024
1003fa6
feat: completed pipeline with reset on failure
ChristoGrab Jul 30, 2024
3819277
refactor: update command name and break up steps
ChristoGrab Jul 30, 2024
ea5f1da
chore: format
ChristoGrab Jul 30, 2024
8ec36c5
refactor: cleanup for readability
ChristoGrab Jul 31, 2024
4d5327d
chore: add docstrings
ChristoGrab Jul 31, 2024
fa9fe37
task: add README entry and version bump
ChristoGrab Jul 31, 2024
e387e3f
task: detect and migrate non-inline specs to manifest
ChristoGrab Jul 31, 2024
8cbc115
chore: merge master
ChristoGrab Jul 31, 2024
3645b50
chore: add parameter type to util function
ChristoGrab Jul 31, 2024
91504ea
fix: remove existing metadata language tag
ChristoGrab Jul 31, 2024
930226f
chore: auto-fix lint and format issues
octavia-squidington-iii Jul 31, 2024
db74674
update readme template
ChristoGrab Aug 2, 2024
51c161e
fix: include documentation_url when fetching spec
ChristoGrab Aug 2, 2024
84c0ea5
disable pypi in metadata
ChristoGrab Aug 6, 2024
61e2c0d
task: update logic to fetch latest valid tag in dockerhub and disable…
ChristoGrab Aug 6, 2024
995f33b
chore: format
ChristoGrab Aug 6, 2024
31cd78b
task: resolve parameter refs in manifest
ChristoGrab Aug 6, 2024
84c4e1c
chore: resolve type warnings
ChristoGrab Aug 6, 2024
4f9a71d
apparently i had a seizure
ChristoGrab Aug 6, 2024
bad2693
light cleanup
natikgadzhi Aug 7, 2024
75612d5
Merge branch 'master' into christo/airbyte-ci-strip
natikgadzhi Aug 7, 2024
2889b26
Update airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/connector…
natikgadzhi Aug 7, 2024
b998b07
Update airbyte-ci/connectors/pipelines/README.md
natikgadzhi Aug 7, 2024
8fdd004
Update airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/connector…
natikgadzhi Aug 7, 2024
ccc30a7
chore: auto-fix lint and format issues
octavia-squidington-iii Aug 7, 2024
ce98657
typing
natikgadzhi Aug 7, 2024
441031b
fix merge conflicts
ChristoGrab Aug 7, 2024
96e8a5d
Merge branch 'master' into christo/airbyte-ci-strip
natikgadzhi Aug 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 24 additions & 7 deletions airbyte-ci/connectors/pipelines/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,21 +174,21 @@ At this point you can run `airbyte-ci` commands.
- [Examples](#examples-5)
- [Arguments](#arguments-1)
- [`connectors migrate-to-base-image` command](#connectors-migrate-to-base-image-command)
- [Examples](#examples-7)
- [Examples](#examples-6)
- [`connectors migrate-to-poetry` command](#connectors-migrate-to-poetry-command)
- [Examples](#examples-8)
- [Examples](#examples-7)
- [`connectors migrate-to-inline-schemas` command](#connectors-migrate-to-inline-schemas-command)
- [Examples](#examples-9)
- [Examples](#examples-8)
- [`connectors pull-request` command](#connectors-pull-request-command)
- [Examples](#examples-10)
- [Examples](#examples-9)
- [`format` command subgroup](#format-command-subgroup)
- [Options](#options-6)
- [Examples](#examples-11)
- [Examples](#examples-10)
- [`format check all` command](#format-check-all-command)
- [`format fix all` command](#format-fix-all-command)
- [`poetry` command subgroup](#poetry-command-subgroup)
- [Options](#options-7)
- [Examples](#examples-12)
- [Examples](#examples-11)
- [`publish` command](#publish-command)
- [Options](#options-8)
- [`metadata` command subgroup](#metadata-command-subgroup)
Expand All @@ -197,6 +197,8 @@ At this point you can run `airbyte-ci` commands.
- [What it runs](#what-it-runs-3)
- [`tests` command](#tests-command)
- [Options](#options-9)
- [Examples](#examples-12)
- [`migrate-to-manifest-only` command](#migrate-to-manifest-only-command)
- [Examples](#examples-13)
- [Changelog](#changelog)
- [More info](#more-info)
Expand Down Expand Up @@ -769,17 +771,32 @@ You can pass multiple `--poetry-package-path` options to run poe tasks.
E.G.: running Poe tasks on the modified internal packages of the current branch:
`airbyte-ci test --modified`

### <a id="migrate-to-manifest-only-command"></a>`migrate-to-manifest-only` command

This command migrates valid connectors to the `manifest-only` format. It contains two steps:

1. Check: Validates whether a connector is a candidate for the migration. If not, the operation will be skipped.
2. Migrate: Strips out all unneccessary files/folders, leaving only the root-level manifest, metadata, icon, and acceptance/integration test files. Unwraps the manifest (references and `$parameters`) so it's compatible with Connector Builder.

#### Examples

```bash
airbyte-ci connectors --name=source-pokeapi migrate-to-manifest-only
airbyte-ci connectors --language=low-code migrate-to-manifest-only
```

## Changelog

| Version | PR | Description |
| ------- | ---------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| 4.29.0 | [#42576](https://github.com/airbytehq/airbyte/pull/42576) | New command: `migrate-to-manifest-only` |
| 4.28.2 | [#43297](https://github.com/airbytehq/airbyte/pull/43297) | `migrate-to-inline_schemas` removes unused schema files and empty schema dirs. |
| 4.28.1 | [#42972](https://github.com/airbytehq/airbyte/pull/42972) | Add airbyte-enterprise support for format commandi |
| 4.28.0 | [#42849](https://github.com/airbytehq/airbyte/pull/42849) | Couple selection of strict-encrypt variants (e vice versa) |
| 4.27.0 | [#42574](https://github.com/airbytehq/airbyte/pull/42574) | Live tests: run from connectors test pipeline for connectors with sandbox connections |
| 4.26.1 | [#42905](https://github.com/airbytehq/airbyte/pull/42905) | Rename the docker cache volume to avoid using the corrupted previous volume. |
| 4.26.0 | [#42849](https://github.com/airbytehq/airbyte/pull/42849) | Send publish failures messages to `#connector-publish-failures` |
| 4.25.4 | [#42463](https://github.com/airbytehq/airbyte/pull/42463) | Add validation before live test runs |
| 4.25.4 | [#42463](https://github.com/airbytehq/airbyte/pull/42463) | Add validation before live test runs |
| 4.25.3 | [#42437](https://github.com/airbytehq/airbyte/pull/42437) | Ugrade-cdk: Update to work with Python connectors using poetry |
| 4.25.2 | [#42077](https://github.com/airbytehq/airbyte/pull/42077) | Live/regression tests: add status check for regression test runs |
| 4.25.1 | [#42410](https://github.com/airbytehq/airbyte/pull/42410) | Live/regression tests: disable approval requirement on forks |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ def validate_environment(is_local: bool) -> None:
"publish": "pipelines.airbyte_ci.connectors.publish.commands.publish",
"bump-version": "pipelines.airbyte_ci.connectors.bump_version.commands.bump_version",
"migrate-to-base-image": "pipelines.airbyte_ci.connectors.migrate_to_base_image.commands.migrate_to_base_image",
"migrate-to-manifest-only": "pipelines.airbyte_ci.connectors.migrate_to_manifest_only.commands.migrate_to_manifest_only",
"migrate-to-poetry": "pipelines.airbyte_ci.connectors.migrate_to_poetry.commands.migrate_to_poetry",
"migrate-to-inline_schemas": "pipelines.airbyte_ci.connectors.migrate_to_inline_schemas.commands.migrate_to_inline_schemas",
"migrate-to-logging-logger": "pipelines.airbyte_ci.connectors.migrate_to_logging_logger.commands.migrate_to_logging_logger",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ class CONNECTOR_TEST_STEP_ID(str, Enum):
AIRBYTE_LOGGER_MIGRATION = "migration_to_logging_logger.migration"
PULL_REQUEST_CREATE = "pull_request.create"
PULL_REQUEST_UPDATE = "pull_request.update"
MANIFEST_ONLY_CHECK = "migrate_to_manifest_only.check"
MANIFEST_ONLY_STRIP = "migrate_to_manifest_only.strip"
MANIFEST_ONLY_UPDATE = "migrate_to_manifest_only.update"

def __str__(self) -> str:
return self.value
Original file line number Diff line number Diff line change
Expand Up @@ -91,12 +91,6 @@ async def _run(self) -> StepResult:
)


def copy_directory(src: Path, dest: Path) -> None:
if dest.exists():
shutil.rmtree(dest)
shutil.copytree(src, dest)


class RestoreInlineState(Step):
context: ConnectorContext

Expand Down Expand Up @@ -254,6 +248,11 @@ class JsonLoaderNode:
file_path: str


def copy_directory(src: Path, dest: Path) -> None:
if dest.exists():
shutil.rmtree(dest)
shutil.copytree(src, dest)

def _has_subdirectory(directory: Path) -> bool:
# Iterate through all items in the directory
for entry in directory.iterdir():
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#
# Copyright (c) 2024 Airbyte, Inc., all rights reserved.
#

import asyncclick as click
from pipelines.airbyte_ci.connectors.context import ConnectorContext
from pipelines.airbyte_ci.connectors.migrate_to_manifest_only.pipeline import run_connectors_manifest_only_pipeline
from pipelines.airbyte_ci.connectors.pipeline import run_connectors_pipelines
from pipelines.cli.dagger_pipeline_command import DaggerPipelineCommand


@click.command(cls=DaggerPipelineCommand, short_help="Migrate a low-code connector to manifest-only")
@click.pass_context
async def migrate_to_manifest_only(ctx: click.Context) -> bool:

connectors_contexts = [
ConnectorContext(
pipeline_name=f"Migrate connector {connector.technical_name} to manifest-only",
connector=connector,
is_local=ctx.obj["is_local"],
git_branch=ctx.obj["git_branch"],
git_revision=ctx.obj["git_revision"],
diffed_branch=ctx.obj["diffed_branch"],
git_repo_url=ctx.obj["git_repo_url"],
report_output_prefix=ctx.obj["report_output_prefix"],
pipeline_start_timestamp=ctx.obj.get("pipeline_start_timestamp"),
)
for connector in ctx.obj["selected_connectors_with_modified_files"]
]

await run_connectors_pipelines(
connectors_contexts,
run_connectors_manifest_only_pipeline,
"Migrate connector to manifest-only pipeline",
ctx.obj["concurrency"],
ctx.obj["dagger_logs_path"],
ctx.obj["execute_timeout"],
ctx.obj["git_branch"],
ctx.obj["git_revision"],
ctx.obj["diffed_branch"],
ctx.obj["is_local"],
ctx.obj["ci_context"],
ctx.obj["git_repo_url"],
)

return True
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#

import copy
import typing
from typing import Any, Mapping

PARAMETERS_STR = "$parameters"


DEFAULT_MODEL_TYPES: Mapping[str, str] = {
# CompositeErrorHandler
"CompositeErrorHandler.error_handlers": "DefaultErrorHandler",
# CursorPagination
"CursorPagination.decoder": "JsonDecoder",
# DatetimeBasedCursor
"DatetimeBasedCursor.end_datetime": "MinMaxDatetime",
"DatetimeBasedCursor.end_time_option": "RequestOption",
"DatetimeBasedCursor.start_datetime": "MinMaxDatetime",
"DatetimeBasedCursor.start_time_option": "RequestOption",
# CustomIncrementalSync
"CustomIncrementalSync.end_datetime": "MinMaxDatetime",
"CustomIncrementalSync.end_time_option": "RequestOption",
"CustomIncrementalSync.start_datetime": "MinMaxDatetime",
"CustomIncrementalSync.start_time_option": "RequestOption",
# DeclarativeSource
"DeclarativeSource.check": "CheckStream",
"DeclarativeSource.spec": "Spec",
"DeclarativeSource.streams": "DeclarativeStream",
# DeclarativeStream
"DeclarativeStream.retriever": "SimpleRetriever",
"DeclarativeStream.schema_loader": "JsonFileSchemaLoader",
# DefaultErrorHandler
"DefaultErrorHandler.response_filters": "HttpResponseFilter",
# DefaultPaginator
"DefaultPaginator.decoder": "JsonDecoder",
"DefaultPaginator.page_size_option": "RequestOption",
# DpathExtractor
"DpathExtractor.decoder": "JsonDecoder",
# HttpRequester
"HttpRequester.error_handler": "DefaultErrorHandler",
# ListPartitionRouter
"ListPartitionRouter.request_option": "RequestOption",
# ParentStreamConfig
"ParentStreamConfig.request_option": "RequestOption",
"ParentStreamConfig.stream": "DeclarativeStream",
# RecordSelector
"RecordSelector.extractor": "DpathExtractor",
"RecordSelector.record_filter": "RecordFilter",
# SimpleRetriever
"SimpleRetriever.paginator": "NoPagination",
"SimpleRetriever.record_selector": "RecordSelector",
"SimpleRetriever.requester": "HttpRequester",
# SubstreamPartitionRouter
"SubstreamPartitionRouter.parent_stream_configs": "ParentStreamConfig",
# AddFields
"AddFields.fields": "AddedFieldDefinition",
# CustomPartitionRouter
"CustomPartitionRouter.parent_stream_configs": "ParentStreamConfig",
}

# We retain a separate registry for custom components to automatically insert the type if it is missing. This is intended to
# be a short term fix because once we have migrated, then type and class_name should be requirements for all custom components.
CUSTOM_COMPONENTS_MAPPING: Mapping[str, str] = {
"CompositeErrorHandler.backoff_strategies": "CustomBackoffStrategy",
"DeclarativeStream.retriever": "CustomRetriever",
"DeclarativeStream.transformations": "CustomTransformation",
"DefaultErrorHandler.backoff_strategies": "CustomBackoffStrategy",
"DefaultPaginator.pagination_strategy": "CustomPaginationStrategy",
"HttpRequester.authenticator": "CustomAuthenticator",
"HttpRequester.error_handler": "CustomErrorHandler",
"RecordSelector.extractor": "CustomRecordExtractor",
"SimpleRetriever.partition_router": "CustomPartitionRouter",
}


class ManifestComponentTransformer:
def propagate_types_and_parameters(
self,
parent_field_identifier: str,
declarative_component: Mapping[str, Any],
parent_parameters: Mapping[str, Any],
) -> Mapping[str, Any]:
"""
Recursively transforms the specified declarative component and subcomponents to propagate parameters and insert the
default component type if it was not already present. The resulting transformed components are a deep copy of the input
components, not an in-place transformation.
:param declarative_component: The current component that is having type and parameters added
:param parent_field_identifier: The name of the field of the current component coming from the parent component
:param parent_parameters: The parameters set on parent components defined before the current component
:return: A deep copy of the transformed component with types and parameters persisted to it
"""
propagated_component = dict(copy.deepcopy(declarative_component))
if "type" not in propagated_component:
# If the component has class_name we assume that this is a reference to a custom component. This is a slight change to
# existing behavior because we originally allowed for either class or type to be specified. After the pydantic migration,
# class_name will only be a valid field on custom components and this change reflects that. I checked, and we currently
# have no low-code connectors that use class_name except for custom components.
if "class_name" in propagated_component:
found_type = CUSTOM_COMPONENTS_MAPPING.get(parent_field_identifier)
else:
found_type = DEFAULT_MODEL_TYPES.get(parent_field_identifier)
if found_type:
propagated_component["type"] = found_type

# When there is no resolved type, we're not processing a component (likely a regular object) and don't need to propagate parameters
# When the type refers to a json schema, we're not processing a component as well. This check is currently imperfect as there could
# be json_schema are not objects but we believe this is not likely in our case because:
# * records are Mapping so objects hence SchemaLoader root should be an object
# * connection_specification is a Mapping
if "type" not in propagated_component or self._is_json_schema_object(propagated_component):
return propagated_component

# Combines parameters defined at the current level with parameters from parent components. Parameters at the current
# level take precedence
current_parameters = dict(copy.deepcopy(parent_parameters))
component_parameters = propagated_component.pop(PARAMETERS_STR, {})
current_parameters = {**current_parameters, **component_parameters}

# Parameters should be applied to the current component fields with the existing field taking precedence over parameters if
# both exist
for parameter_key, parameter_value in current_parameters.items():
propagated_component[parameter_key] = propagated_component.get(parameter_key) or parameter_value

for field_name, field_value in propagated_component.items():
if isinstance(field_value, dict):
# We exclude propagating a parameter that matches the current field name because that would result in an infinite cycle
excluded_parameter = current_parameters.pop(field_name, None)
parent_type_field_identifier = f"{propagated_component.get('type')}.{field_name}"
propagated_component[field_name] = self.propagate_types_and_parameters(
parent_type_field_identifier, field_value, current_parameters
)
if excluded_parameter:
current_parameters[field_name] = excluded_parameter
elif isinstance(field_value, typing.List):
# We exclude propagating a parameter that matches the current field name because that would result in an infinite cycle
excluded_parameter = current_parameters.pop(field_name, None)
for i, element in enumerate(field_value):
if isinstance(element, dict):
parent_type_field_identifier = f"{propagated_component.get('type')}.{field_name}"
field_value[i] = self.propagate_types_and_parameters(parent_type_field_identifier, element, current_parameters)
if excluded_parameter:
current_parameters[field_name] = excluded_parameter

return propagated_component

@staticmethod
def _is_json_schema_object(propagated_component: Mapping[str, Any]) -> bool:
return propagated_component.get("type") == "object"
Loading
Loading